A step-by-step tutorial to building semantic search with LangChain

⚠️
This tutorial is designed for Meilisearch versions earlier than v1.6. If you're using Meilisearch v1.6 or later, please refer to the updated guide available in our documentation.

Vector search allows you to find documents that share similar characteristics. In this tutorial, we’ll use OpenAI’s text embeddings to measure the similarity between document properties. Then, we’ll use the LangChain framework to seamlessly integrate Meilisearch and build semantic search.

💡
This guide uses Meilisearch’s Python SDK. But you can use Meilisearch with almost any language.

Requirements

This tutorial assumes a basic understanding of Python and LangChain. Fear not, for the code samples will be heavily documented. You will not feel lost even if you’re not a LangChain expert yet!

This tutorial requires

  • Python (LangChain requires >= 3.8.1 and < 4.0) and the pip CLI
  • Meilisearch 1.3, 1.4, or 1.5 (not above ⚠️)
  • An OpenAI API key — get yours
💡
We're working on making LangChain compatible with v1.6 and above.

Setting up a Meilisearch instance

First, make sure you have a Meilisearch running. You can run Meilisearch locally by following the local installation docs or create a project in Meilisearch Cloud.

Either way, you’ll need to enable the vector store feature. For self-hosted Meilisearch, read the docs on enabling experimental features. On Meilisearch Cloud, enable Vector Store via your project’s Settings page.

After creating your Meilisearch instance, make sure to grab your Meilisearch host and API key. In this tutorial, we’ll use a single key with write permissions: the Master API Key.

⚠️
In production, we recommend using different API keys holding only the minimal permissions to operate.

Creating the project

Let’s create a folder for our project with an empty main.py file for our code.

Before writing the code, let’s install the necessary dependencies:

pip install langchain openai meilisearch python-dotenv
💡
This guide uses python-dotenv to load environment variables from a `.env` file. Feel free to load them in any other convenient way for you.

If using dotenv, first create a .env to store our credentials:

# .env

MEILI_HTTP_ADDR="your Meilisearch host"
MEILI_MASTER_KEY="your Meilisearch API key"
OPENAI_API_KEY="your OpenAI API key"

Now that we have our environment variables available, let’s create a setup.py file with some boilerplate code for our example:

# setup.py

import os
from dotenv import load_dotenv # remove if not using dotenv
from langchain.vectorstores import Meilisearch
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import JSONLoader

load_dotenv() # remove if not using dotenv

# exit if missing env vars
if "MEILI_HTTP_ADDR" not in os.environ:
    raise Exception("Missing MEILI_HTTP_ADDR env var")
if "MEILI_MASTER_KEY" not in os.environ:
    raise Exception("Missing MEILI_MASTER_KEY env var")
if "OPENAI_API_KEY" not in os.environ:
    raise Exception("Missing OPENAI_API_KEY env var")

# Setup code will go here 👇

Importing documents and embeddings

Now that our project is ready, let’s import some documents in Meilisearch. First, download this small movies dataset:

🔗 movies-lite.json

Then, let’s update our setup.py file to load the JSON and store it in Meilisearch. We will also use the OpenAI text search models to generate our vector embeddings.

# setup.py

# previous code

# Load documents
loader = JSONLoader(
    file_path="./movies-lite.json",
    jq_schema=".[] | {id: .id, overview: .overview, title: .title}",
    text_content=False,
)
documents = loader.load()
print("Loaded {} documents".format(len(documents)))

# Store documents in Meilisearch
embeddings = OpenAIEmbeddings()
vector_store = Meilisearch.from_documents(documents=documents, embedding=embeddings)

print("Started importing documents")

Et voilà! Run Your Meilisearch instance will now contain your documents. Meilisearch runs tasks like document import asynchronously, so you might need to wait a bit for documents to be available.

🤔
Got stuck? Don’t hesitate to ask for help on our Discord community.

Our database should now contain our movies. Let’s create a new search.py file to make a semantic search query: searching for documents using similarity search.

# search.py

import os
from dotenv import load_dotenv
from langchain.vectorstores import Meilisearch
from langchain.embeddings.openai import OpenAIEmbeddings
import meilisearch

load_dotenv()

# You can use the same code as `setup.py` to check for missing env vars

# Create the vector store
client = meilisearch.Client(
    url=os.environ.get("MEILI_HTTP_ADDR"),
    api_key=os.environ.get("MEILI_MASTER_KEY"),
)
embeddings = OpenAIEmbeddings()
vector_store = Meilisearch(client=client, embedding=embeddings)

# Make similarity search
query = "superhero fighting evil in a city at night"
results = vector_store.similarity_search(
    query=query,
    k=3,
)

# Display results
for result in results:
    print(result.page_content)

Let’s try to run our file! If everything works, we should see an output like this:

{"id": 155, "title": "The Dark Knight", "overview": "Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets. The partnership proves to be effective, but they soon find themselves prey to a reign of chaos unleashed by a rising criminal mastermind known to the terrified citizens of Gotham as the Joker."}
{"id": 314, "title": "Catwoman", "overview": "Liquidated after discovering a corporate conspiracy, mild-mannered graphic artist Patience Phillips washes up on an island, where she's resurrected and endowed with the prowess of a cat -- and she's eager to use her new skills ... as a vigilante. Before you can say \"cat and mouse,\" handsome gumshoe Tom Lone is on her tail."}
{"id": 268, "title": "Batman", "overview": "Batman must face his most ruthless nemesis when a deformed madman calling himself \"The Joker\" seizes control of Gotham's criminal underworld."}

Congrats 🎉 We managed to make a similarity search using Meilisearch as a LangChain vector store.

Going further

Using Meilisearch as a LangChain vector store allows you to load documents and search for them in different ways:

For additional information, make sure to consult:

Finally, should you want to use Meilisearch vector search capabilities without LangChain, choose your favorite language and check out the relevant SDK documentation.


Stay in the loop by subscribing to our newsletter. To learn more about Meilisearch's future and help shape it, take a look at our roadmap and come participate in our Product Discussions.

For anything else, join our developer community on Discord. I’ll see you there!