Dify Connecting to External Knowledge Bases Tutorial

Trae

For ease of distinction, this paper will Dify Knowledge repositories outside the Platform are collectively referred to as "external knowledge repositories".

Function Introduction

Dify's built-in knowledge base functionality and text retrieval mechanism may not meet the needs of some advanced developers who may require more precise control over text recall results.

Some teams choose to self-research RAG algorithms and maintain a text recall system independently, or use a knowledge base service provided by a cloud provider (e.g., AWS Bedrock).

Dify, as an open LLM application development platform, wants to give developers more options.

Dify can connect to external knowledge bases through the "Connect to External Knowledge Bases" feature. This gives AI applications access to more sources of information.

Specifically, there are the following advantages:

  • Dify can directly access text hosted in the cloud provider's knowledge base, eliminating the need for developers to copy content into Dify's knowledge base.
  • Dify can directly access the algorithmically processed text in the self-built knowledge base, and developers only need to focus on optimizing the information retrieval mechanism to improve recall accuracy.
  • Compared to using cloud vendors' knowledge base services directly, Dify provides more flexible application layer integration capabilities, making it easy for developers to build diverse AI applications.

The following figure illustrates the principle of connecting to an external knowledge base:

Dify 连接外部知识库教程

 

Connection steps

1. Establishment of a compliant external knowledge base API

Be sure to read the External Knowledge Base API specification written by Dify carefully before setting up your API service.

2. Associate external knowledge base APIs

Please note that Dify currently only supports retrieving external knowledge bases, not modifying them. Developers need to maintain external knowledge bases on their own.

Go to the "Knowledge Base" page, click "External Knowledge Base API" in the upper right corner, and then click "Add External Knowledge Base API".

Follow the page prompts to fill out the form:

  • Knowledge base name: Can be customized to differentiate between different external knowledge base APIs.
  • API interface address: The address of the link to the external knowledge base, e.g. api-endpoint/retrieval. Refer to the External Knowledge Base API for detailed instructions.
  • API Key: The connection key for the external knowledge base, see External Knowledge Base API for details.
Dify 连接外部知识库教程

3. Connecting to external knowledge bases

On the "Knowledge Base" page, click "Connect to external knowledge base" under "Add knowledge base" to enter the parameter configuration page.

Dify 连接外部知识库教程

Fill in the following parameters:

  • Knowledge Base Name and Description
  • External Knowledge Base API: Select the external knowledge base API associated in step 2. Dify will call the text content of the external knowledge base through the API connection.
  • External Knowledge Base ID: Specify the ID of the external knowledge base to be associated with, refer to External Knowledge Base API for details.
  • Adjust recall settings:
    • Top K: The larger the value, the more text fragments are recalled. It is recommended to start experimenting with smaller values and gradually increase them until the optimal balance is found.
    • Score Threshold: The higher the value, the more relevant the recalled text segments are to the question, but the number decreases. It is recommended to start with a higher value and gradually decrease it to get a sufficient amount of relevant text.
Dify 连接外部知识库教程

4. Test connections and recalls

After the connection is established, you can simulate the problem keywords in "Recall Test" and preview the text fragments recalled from the external knowledge base. If you are not satisfied with the result, you can try to modify the recall parameters or adjust the search settings of the external knowledge base.

Dify 连接外部知识库教程

5. Integration within applications

  • Chatbot / Agent type application: In the Context on the Organizer page, select the screen with the EXTERNAL Tagged external knowledge base.
Dify 连接外部知识库教程
  • Chatflow / Workflow type application: Add the Knowledge Retrieval node, select the node with the EXTERNAL Tagged external knowledge base.
Dify 连接外部知识库教程

6. Managing the external knowledge base

On the Knowledge Base page, the External Knowledge Base card will have the following text in the upper right corner of the card EXTERNAL Tab. Enter the knowledge base you want to modify and click "Settings" to modify it:

  • Knowledge base name and description
  • Visible scope ("Only me", "All team members" and "Some team members"). Members without permissions cannot access the knowledge base.
  • Recall settings (Top K and Score thresholds)

Note: It is not possible to modify the associated External Knowledge Base API and External Knowledge ID, if you want to modify them, please associate a new External Knowledge Base API and reconnect it.

 

Connection example: How do I connect to the AWS Bedrock Knowledge Base?

This paper will outline how the Dify platform can be connected to the AWS Bedrock Knowledge Base via an external knowledge base API, enabling AI applications within the Dify platform to directly access content stored in the AWS Bedrock Knowledge Base, expanding access to new sources of information.

pre-positioning

  • AWS Bedrock Knowledge Base
  • Dify SaaS Services / Dify Community Edition
  • Backend API Development Basics

1. Register and create AWS Bedrock Knowledge Base

Visit AWS Bedrock to create a Knowledge Base service.

Dify 连接外部知识库教程Create an AWS Bedrock Knowledge Base

2. Build back-end API services

Dify platform can't connect to AWS Bedrock Knowledge Base directly yet, it needs the development team to refer to Dify's API definition on external knowledge base connection, and manually create back-end API service to establish connection with AWS Bedrock. Please refer to the architecture diagram for details:

Dify 连接外部知识库教程Building Backend API Services

You can refer to the following 2 code files to build the backend service API.

knowledge.py

from flask import request
from flask_restful import Resource, reqparse

from bedrock.knowledge_service import ExternalDatasetService

class BedrockRetrievalApi(Resource):
    # url : <your-endpoint>/retrieval
    def post(self):
        parser = reqparse.RequestParser()
        parser.add_argument("retrieval_setting", nullable=False, required=True, type=dict, location="json")
        parser.add_argument("query", nullable=False, required=True, type=str,)
        parser.add_argument("knowledge_id", nullable=False, required=True, type=str)
        args = parser.parse_args()

        # Authorization check
        auth_header = request.headers.get("Authorization")
        if " " not in auth_header:
            return {
                "error_code": 1001,
                "error_msg": "Invalid Authorization header format. Expected 'Bearer <api-key>' format."
            }, 403
        auth_scheme, auth_token = auth_header.split(None, 1)
        auth_scheme = auth_scheme.lower()
        if auth_scheme != "bearer":
            return {
                "error_code": 1001,
                "error_msg": "Invalid Authorization header format. Expected 'Bearer <api-key>' format."
            }, 403
        if auth_token:
            # process your authorization logic here
            pass

        # Call the knowledge retrieval service
        result = ExternalDatasetService.knowledge_retrieval(
            args["retrieval_setting"], args["query"], args["knowledge_id"]
        )
        return result, 200

knowledge_service.py

import boto3

class ExternalDatasetService:
    @staticmethod
    def knowledge_retrieval(retrieval_setting: dict, query: str, knowledge_id: str):
        # get bedrock client
        client = boto3.client(
            "bedrock-agent-runtime",
            aws_secret_access_key="AWS_SECRET_ACCESS_KEY",
            aws_access_key_id="AWS_ACCESS_KEY_ID",
            # example: us-east-1
            region_name="AWS_REGION_NAME",
        )
        # fetch external knowledge retrieval
        response = client.retrieve(
            knowledgeBaseId=knowledge_id,
            retrievalConfiguration={
                "vectorSearchConfiguration": {"numberOfResults": retrieval_setting.get("top_k"), "overrideSearchType": "HYBRID"}
            },
            retrievalQuery={"text": query},
        )
        # parse response
        results = []
        if response.get("ResponseMetadata") and response.get("ResponseMetadata").get("HTTPStatusCode") == 200:
            if response.get("retrievalResults"):
                retrieval_results = response.get("retrievalResults")
                for retrieval_result in retrieval_results:
                    # filter out results with score less than threshold
                    if retrieval_result.get("score") < retrieval_setting.get("score_threshold", .0):
                        continue
                    result = {
                        "metadata": retrieval_result.get("metadata"),
                        "score": retrieval_result.get("score"),
                        "title": retrieval_result.get("metadata").get("x-amz-bedrock-kb-source-uri"),
                        "content": retrieval_result.get("content").get("text"),
                    }
                    results.append(result)
        return {
            "records": results
        }

In this process, you can build the API interface address and the API Key for authentication and subsequent connections.

3. Obtaining an AWS Bedrock Knowledge Base ID

Log in to the AWS Bedrock Knowledge backend and get the ID of the Knowledge Base that has been created. this parameter will be used in subsequent steps to connect to the Dify platform.

Dify 连接外部知识库教程Get an AWS Bedrock Knowledge Base ID

4. Linked External Knowledge API

Go to the Dify platform "Knowledge base" page, click on the upper right corner of the "External Knowledge Base API"Lighten up. "Add external knowledge base API"The

Follow the page prompts and fill out the following in order:

  • The name of the knowledge base, allowing for a customized name that can be used to differentiate between the different external knowledge APIs connected within the Dify platform;
  • API interface address, the connection address of the external knowledge base, can be customized in the second step. Example api-endpoint/retrieval.;
  • API Key, the external knowledge base connection key, can be customized in step two.

Dify 连接外部知识库教程
5. Connecting to external knowledge bases

leave for "Knowledge base" page, click the Add Knowledge Base card below the "Connecting to external knowledge bases" Jump to the parameter configuration page.

Dify 连接外部知识库教程

Fill in the following parameters:

  • Knowledge Base Name and Description
  • External Knowledge Base APISelect the external knowledge base API associated in step 4
  • External Knowledge Base IDFill in the AWS Bedrock knowledge base ID obtained in step 3
  • Adjusting Recall SettingsTop K: When a user initiates a question, an external knowledge API will be requested to obtain highly relevant content segments. This parameter is used to filter text segments that have a high degree of similarity to the user's question. The default value is 3. The higher the value, the more relevant text segments will be recalled.

    Score Threshold: the similarity threshold for text fragment filtering, only the text fragments exceeding the set score will be recalled, the default value is 0.5. The higher the value, the higher the similarity between the text and the question, the less the number of text is expected to be recalled, and the result will be more accurate in relative terms.

Dify 连接外部知识库教程

Once the settings are complete, you can establish a connection to the external Knowledge Base API.

6. Testing external knowledge base connections and recalls

After establishing a connection to an external knowledge base, a developer can "Recall testing." Model possible problem keywords in the preview of text segments recalled from AWS Bedrock Knowledge Base.

Dify 连接外部知识库教程Test the connection and recall of external knowledge bases

If you are not satisfied with the results of the recall, you can try modifying the recall parameters or adjusting the AWS Bedrock Knowledge Base search settings yourself.

Dify 连接外部知识库教程Adjusting AWS Bedrock Knowledge Base Text Processing Parameters

 

common problems

What if I get an error connecting to the external Knowledge Base API?

Below are the error codes and corresponding solutions:

error codefalsemethod settle an issue
1001Invalid Authorization header formatCheck the format of the request's Authorization header
1002validate anomaliesCheck if the API Key is correct
2001Knowledge base does not existChecking the external knowledge base

 

External Knowledge Base API Specification

starting point or ending point (in stories etc)

POST <your-endpoint>/retrieval

request header

This API is used to connect to independently maintained knowledge bases within a team. For more guidance on how to do this, see Connecting to External Knowledge Bases.

can be found in the HTTP request header's Authorization fields using the API-Key to authenticate permissions. The authentication logic is defined by you in the Retrieval API, as follows:

Authorization: Bearer {API_KEY}

requestor

The request accepts data in the following JSON format:

causalitymandatory fieldtypologydescriptiveexample value
knowledge_idbestring (computer science)Knowledge Base Unique IDAAA-BBB-CCC
querybestring (computer science)User's queryWhat's Dify?
retrieval_settingbeboyfriendKnowledge retrieval parameterssee below

retrieval_setting attribute contains the following keys:

causalitymandatory fieldtypologydescriptiveexample value
top_kbeinteger (math.)Maximum number of search results5
score_thresholdbefloating pointScore limit for relevance of results to the query, range: 0~10.5

Example of a request

POST <your-endpoint>/retrieval HTTP/1.1
Content-Type: application/json
Authorization: Bearer your-api-key
{
"knowledge_id": "your-knowledge-id",
"query": "你的问题",
"retrieval_setting": {
"top_k": 2,
"score_threshold": 0.5
}
}

response body

If the operation is successful, the service returns an HTTP 200 response with the following data in JSON format:

causalitymandatory fieldtypologydescriptiveexample value
recordsbeobject listList of records queried from the knowledge basesee below

records attribute is a list of objects containing the following keys:

causalitymandatory fieldtypologydescriptiveexample value
contentbestring (computer science)Text blocks in the knowledge baseDify: GenAI Application Development Platform
scorebefloating pointCorrelation score between results and query, range: 0~10.98
titlebestring (computer science)Document TitleAbout Dify
metadatacloggedJSONMetadata attributes and their values for documents in the data sourceSee example

Response Example

HTTP/1.1 200
Content-Type: application/json
{
"records": [
{
"metadata": {
"path": "s3://dify/knowledge.txt",
"description": "dify 知识文档"
},
"score": 0.98,
"title": "knowledge.txt",
"content": "这是外部知识的文档。"
},
{
"metadata": {
"path": "s3://dify/introduce.txt",
"description": "dify 介绍"
},
"score": 0.66,
"title": "introduce.txt",
"content": "GenAI 应用程序的创新引擎"
}
]
}

incorrect

If the operation fails, the service returns the following error message (in JSON format):

causalitymandatory fieldtypologydescriptiveexample value
error_codebeinteger (math.)error code1001
error_msgbestring (computer science)API Exception DescriptionInvalid Authorization header format.

error_code Attribute Type:

codingdescriptive
1001Invalid Authorization header format
1002Authorization Failure
2001Knowledge base does not exist

HTTP Status Code

  • AccessDeniedException: Lack of access rights. (HTTP status code: 403)
  • InternalServerException: Internal server error. (HTTP status code: 500)
© Copyright notes
AiPPT

Related posts

No comments

none
No comments...