Azure Cognitive Search and LangChain: A seamless integration for advanced vector search capabilities (2023)

LangChain und Content Integration Dank an: Fabrizio Ruocco, Senior Technology Lead, AI Global Black Belt, Microsoft

introduction

In a fast-paced world, the ability to quickly access relevant and accurate information is critical to improving productivity and making informed decisions. With the increasing amount of digital data, finding the right information has become an increasingly important task. Fortunately, recent advances in LLMs (Large Language Models) have changed the information retrieval landscape, making them more efficient and effective.

A significant advance in this area is the development of embedded models, which have revolutionized the way we search for information. Unlike traditional keyword-based search methods, integration models harness the power of natural language to provide end users with more meaningful and contextual results. Embedding templates work by converting words, sentences, or even entire documents into mathematical representations called vectors. These vectors, existing in high dimensional space, capture the meaning and relationships between different words and concepts.

What is vector search?

Vector search is a feature for indexing, storing, and retrieving vector embeddings from a search index. The vector search retrieval technique uses these vector representations to find and rank relevant results. By measuring the distance or similarity between the query vector's embeddings and the indexed document vectors, vector search is able to find results that are contextually related to the query even if they don't contain exactly the same keywords.

You can use vector search to support similarity search, multimodal search, recommendation engines, or applications that implement itRecovery Augmented Generation Architecture (RAG).

Announcing vector search in Azure Cognitive Search public preview

supportvector searchemAzure Cognitive Searchis in public preview and available through07/01/2023 - Show REST API, the Azure portal, and the latest beta packages of the Azure SDKs for.NETO,Python, simJavascript.

Flow of vector search concept

To use vector search in Azure Cognitive Search, there are a few steps to perform during data ingestion and query.

Data Acquisition Steps

Below is a summary of the steps to prepare and load data into the cognitive search index..

  1. Retrieve source documents from the data source. This can be achieved withAzure Cognitive Search's built-in extraction indexersor creating custom indexersAzure functionsout ofAzure Logic Apps.
  2. Split the data givenBefore you vectorize it, you need to know the token input limits of the integrated model and other model limitations.
  3. Because Cognitive Search doesn't currently generate embeds, your solution must include calls to aAzure OpenAI embed template(or another embed template) to create a vector representation of various content types (e.g. image, audio, text).
  4. Add a vector fieldin your index definition in cognitive search.
  5. loading indexwith the payload of the document containing the vector embeds of the fragments. Your index should be ready for query at this point.

You can index vector data as fields in documents along withtext content and others.

Query time steps

Just as your solution must include calls to an embedding model to create the embeds before storing them in an index, you must also call the same embedding model to vectorize your search query before submitting it to Cognitive Search.

Vector queries can be issued independently or in combination with other types of queries, including keyword queries (combining vectors and keywords is called a hybrid search) and filters in the same query.

This is the order in which you should run vector-only or hybrid search queries.

  1. After the user submits the query in the client application, invoke itAzure OpenAI embed template(or another embed template) used to create the vector embeds originally stored in the index.
  2. Submit the vector or hybrid queryadd to your cognitive search index.

Azure Cognitive Search and LangChain: A seamless integration for advanced vector search capabilities (1)

search modalities

Some of the existing research modalities include the traditional oneFull text search(keyword search) and of course the topic of this article: vector search and hybrid search. You may be wondering when to use each approach. Here are some guidelines.

Because vector search retrieves results that are contextually similar to the query even when the exact keywords are not present in the index, it is ideal for complex and granular queries and for situations where synonyms or related terms are used.

In contrast, the full-text search is based on matching certain search terms with terms from indexed documents. This approach is simple, fast, and effective for simple queries where the desired results contain exactly the terms used in the search, e.g. B. Product and serial numbers, identifiers and similar terms. However, this traditional keyword search may not be enough to understand context or identify semantically similar results.

In many cases, a hybrid search approach that combines the strengths of vector and keyword research can provide the best results. By leveraging the contextual understanding of vector search and the accuracy of keyword search, a hybrid system can provide highly relevant and accurate results for a variety of query types. In addition, Cognitive Search offers a new classifiersemantic searchThis provides more relevant results in different scenarios by applying language understanding to the original search result.

For a comparison table of search modes, seeAnnouncing vector search in Azure Cognitive Search public preview.

What is LongChain?

LangChainis a framework for developing language model-driven applications. You can use it to connect a language model to other data sources, interact with its environment, and create call sequences to perform specific tasks.

With LangChain you can build applications like chatbots, question-answer systems, natural language generation systems and much more.

LangChain offers modular components and ready-to-use chains for working with language models, as well as integrations with other tools and platforms.

The framework offers several high-level abstractions such as document uploaders, text splitters, and vector stores.

Introduction to Azure Cognitive Search on LangChain

What is the place of LangChain in the history of cognitive search vector research? HeAzure Cognitive Search LangChain-Integration, built in Python, provides the ability to group documents, seamlessly connect an embedding template for document vectorization, store vectorized content in a predefined index, perform similarity search (pure vector), hybrid and hybrid search with search semantics. It also provides the ability to create your own index and apply scoring profiles for higher search accuracy. With LangChain, you can combine native workflows (indexing and queries) with non-native workflows (like grouping and embedding) to create a complete similarity search solution.

Below is the minimal set of code examples and commands to integrate Cognitive Search Vector functionality and LangChain. The following examples are fromAzure Cognitive Search integration pagein the LangChain documentation.

Install an SDK for Azure Cognitive Search

pip instala azure-search-documents==11.4.0b6pip instala azure-identity

Import the required libraries

import openaiimport OS from langchain.embeddings.openai, import OpenAIEmbeddings from langchain.vectorstores.azuresearch, import AzureSearch

Configure OpenAI settings

Configure the OpenAI settings to use Azure OpenAI or OpenAI:

os.environ["OPENAI_API_TYPE"] = "azure"os.environ["OPENAI_API_BASE"] = "YOUR_OPENAI_ENDPOINT"os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"os.environ["OPENAI_API_VERSION"] = "2023-05- 15“ Modell: str = „text-incrustation-ada-002“

Define Vector Warehouse settings

Configure vector storage settings using the Azure Cognitive Search admin endpoint and key. You can get them from the Azure portal:

vector_store_address: str = "SEU_AZURE_SEARCH_ENDPOINT"vector_store_password: str = "SEU_AZURE_SEARCH_ADMIN_KEY"

Create embeds and instances of vector storage.

Create instances of the OpenAIEmbeddings and AzureSearch classes:

Einträge: OpenAIEmbeddings = OpenAIEmbeddings(implementation=modelo, tamanho_fragmento=1)index_name: str = "langchain-vector-demo"vector_store: AzureSearch = AzureSearch( azure_search_endpoint=vector_store_address, azure_search_key=vector_store_password, index_name=index_name, embedding_function=embeddings .em cama_query, )

Enter text and embed it in vector memory

Destroy documents and add the (already vectorized) content to the vector store:

von langchain.document_loaders importieren TextLoaderde langchain.text_splitter importieren CharacterTextSplitterloader = TextLoader("path_to_your_file", binding="utf-8")documents = loader.load()text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)docs = text_splitter.split_documents (Dokumente)vector_store.add_documents(Dokumente=Dokumente)

Perform a vector similarity search

Perform a pure vector similarity search using the similarity_search() method:

# Do a similarity searchdocs = vector_store.similarity_search( query="What did the President say about Ketanji Brown Jackson?", k=3, search_type="similarity",)print(docs[0].page_content)

do a hybridIt issearch for

Run the hybrid search using the search_type or hybrid_search() method:

# Do a hybrid search docs = vector_store.similarity_search( query="What did the President say about Ketanji Brown Jackson?", k=3, search_type="hybrid")print(docs[0].page_content)

Here is the full code and more examples from LangCFor the integration of Grove Vector Search and Cognitive Search, visit the official websiteAzure Cognitive Search LangChain-IntegrationDocumentation.

References

Top Articles
Latest Posts
Article information

Author: Pres. Lawanda Wiegand

Last Updated: 08/07/2023

Views: 6332

Rating: 4 / 5 (71 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Pres. Lawanda Wiegand

Birthday: 1993-01-10

Address: Suite 391 6963 Ullrich Shore, Bellefort, WI 01350-7893

Phone: +6806610432415

Job: Dynamic Manufacturing Assistant

Hobby: amateur radio, Taekwondo, Wood carving, Parkour, Skateboarding, Running, Rafting

Introduction: My name is Pres. Lawanda Wiegand, I am a inquisitive, helpful, glamorous, cheerful, open, clever, innocent person who loves writing and wants to share my knowledge and understanding with you.