In today's interconnected world, many applications need to handle content in multiple languages. Meilisearch offers powerful capabilities for multilingual search, combining intelligent language handling with specialized models for optimal relevance. This guide will walk you through setting up multilingual search with Meilisearch, covering both full-text and semantic search approaches.
Why multilingual search matters
Before we dive into the technical details, let's consider why multilingual search is crucial:
- Global reach: Expand your audience by catering to users in their native languages.
- Improved user experience: Provide relevant results regardless of the language used in queries or content.
- Increased engagement: Users are more likely to interact with content they can easily find and understand.
Now, let's explore how Meilisearch makes multilingual search a breeze.
Full-text search: intelligent language handling
At its core, Meilisearch excels at handling diverse languages out of the box. Our advanced tokenizer is designed to work with:
- Languages that use whitespace to separate words (like English, Spanish, and French)
- Languages with different writing systems (such as Chinese, Japanese, and Thai)
Automatic language detection
When you add documents to your Meilisearch index, the engine automatically detects the language of each field. This means you don't need to specify languages manually in most cases - Meilisearch will handle it for you.
For example, if your documents contain a mix of English and Spanish content:
[
{
"id": 1,
"title": "The quick brown fox",
"description": "El zorro marrón rápido salta sobre el perro perezoso"
},
{
"id": 2,
"title": "La luna y las estrellas",
"description": "A beautiful night sky filled with twinkling stars"
}
]
Meilisearch will automatically detect and properly handle the different languages in each field, ensuring accurate search results across your multilingual content.
Specifying languages for specific fields
While automatic detection works well in most cases, you might want more control over language handling for certain fields. Meilisearch allows you to specify languages for particular attributes using the localizedAttributes
setting.
Here's how you can set it up:
{
"localizedAttributes": [
{
"locales": ["eng"],
"attributePatterns": ["title_en", "*_english"]
},
{
"locales": ["spa"],
"attributePatterns": ["description_es", "*_spanish"]
}
]
}
This configuration tells Meilisearch to treat fields ending with _en
or _english
as English, and fields ending with _es
or _spanish
as Spanish. This approach is particularly useful when you have a consistent naming convention for your multilingual fields or when you want to ensure specific treatment for certain attributes.
Semantic search: Enhancing multilingual relevance
While full-text search works well for many multilingual scenarios, semantic search can provide even better results, especially for complex queries or when dealing with nuanced meanings across languages.
Meilisearch supports various embedding models for semantic search, offering powerful auto-embedding capabilities. This means you can easily set up semantic search without having to manage the embedding process yourself. Meilisearch handles the generation and storage of embeddings automatically, seamlessly integrating them into your search experience.
For multilingual applications, we recommend using specialized multilingual models rather than relying on general-purpose models. These models are designed to understand and represent meaning across multiple languages, leading to more relevant search results in multilingual contexts.
One excellent option for multilingual semantic search is the Cohere Multilingual embedding model. It offers a great balance of speed and relevance across a wide range of languages. Cohere's model is particularly adept at understanding context and nuance in different languages, making it an ideal choice for diverse, multilingual content. Follow the guide if you want to do it now.
To explore different models and their performance, you can use the Meilisearch playground. This interactive tool allows you to test various embedding models and see how they perform with your specific multilingual queries and content.
Conclusion
We encourage you to experiment with different settings and models to find the optimal configuration for your specific use case. And remember, the Meilisearch community is always here to help - join us on Discord or check out our documentation for more detailed information and support.
Ready to get started with multilingual search? Try Meilisearch today with a 14 day trial!