How to speed up GoodReads book titles search with Meilisearch and JavaScript
This tutorial uses a sample dataset from GoodReads uploaded by Jealous Leopard on Kaggle.
The goal of this tutorial is to learn more about advanced Meilisearch concepts, such as:
- How Meilisearch handles nested objects
- How to use facets to calculate the distribution of documents
- How to use distinct attributes
- How to define searchable attributes using the settings object
So, what are the requirements to follow this tutorial?
Requirements
To follow this standalone tutorial, we expect you to have a basic understanding of Meilisearch. If you’re unsure, feel free to check out the previous tutorial about searching for Nobel prize winners—however, it isn’t required.
Other requirements include.
- A Node.js installation
- A Meilisearch instance: this can be running locally, via Docker, or on a DigitalOcean droplet. Don’t want to set up your own Meilisearch instance? Try out our Meilisearch cloud, remote-hosted instance
- cURL or Postman for sending requests from the terminal
- The Meilisearch-js wrapper and dependencies (see the installation guide).
All set? Let's dive right into it!
Project setup and Meilisearch-js installation
To follow along, we need to set up our JavaScript project and install Meilisearch-js. Create a new folder and run the following command in your terminal.
npm init -y
This will prepare your project setup. Next, we can add the Meilisearch-js dependency.
npm install meilisearch
Lastly, let’s create a file called index.js
inside your project. We’ll use this file to add our JavaScript code.
touch index.js
Done? Let's move on!
Step 1: Creating an index
This step will prepare the index.js
file so we can experiment with the meilisearch
package.
First, we need to connect with our Meilisearch instance. If you've used the Meilisearch cloud, you've received a master key that protects all API endpoints for your Meilisearch instance. If you've used another installation method, we highly recommend setting a master key for security reasons. For instance, an unsecured DigitalOcean droplet allows anyone to access your instance via the publicly available IP address.
Below, you’ll find a boilerplate code snippet that you can reuse in all of your Meilisearch projects. We wrap our code in an asynchronous main
function in order to access the async/await
syntax. We also use the client
object to connect with our Meilisearch instance.
Add the code snippet below to your index.js
file.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const indexes = await client.getIndexes()
console.log(indexes)
}
main()
Note that we call the getIndexes()
method on the newly created client
object.
Now, execute the file from your terminal with the node
command.
node index.js
If you've received a response from your client, the connection object works. If you haven’t, double check your host address and API key.
For the next step, let's create the books
index to add our GoodReads data.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const indexes = await client.getIndexes()
console.log(indexes)
const indexCreationTask = await client.createIndex('books')
await client.waitForTask(indexCreationTask.taskUid)
const updatedIndexes = await client.getIndexes()
console.log(updatedIndexes)
}
main()
Execute the file with the node
command like before. You should see the following response containing your books
index. Note that your values for createdAt
and updatedAt
will be different from ours.
{
results: [
Index {
uid: 'books',
primaryKey: null,
httpRequest: [HttpRequests],
tasks: [TaskClient]
}
],
offset: 0,
limit: 20,
total: 1
}
Our index is created, but we haven’t given it a primary key yet. When we add data in the next step, Meilisearch will infer our primary key because our data set contains an id
field.
Index created? Good! Let’s explore the GoodReads books data set.
Step 2: Adding the GoodReads books dataset
This step explores the GoodReads books dataset. For clarity, we’ve used a modified, smaller version of this dataset, but you can find the original dataset on Kaggle, if you’re interested. First, let's download our dataset using the cURL command.
curl -L https://raw.githubusercontent.com/meilisearch/datasets/main/datasets/books/books.json -o books.json
So, what does a book
object look like?
{
id: "1",
title: "Harry Potter and the Half-Blood Prince",
author: "J.K. Rowling/Mary GrandPré",
cover: "hard cover with dust jacket",
language: "eng",
publisher: "Scholastic Inc.",
details: {
isbn: "0439785960",
rating: "4.57",
pages: "652"
}
}
Each book contains a unique id
. The cover
property has three possible values: hard cover
, hard cover with dust jacket
, and soft cover
. This property will be useful later on when we take a look at distinct attributes.
To display how Meilisearch handles nested objects, we've created a details
property that contains the book's isbn
code, rating
, and number of pages
.
Note that the dataset contains a nested JSON object. JSON objects are broken into separate string tokens during indexing, which means separate words. This means that each value gets tokenized and indexed so the value is searchable.
Ok, let's add the data to our books
index. For this, we'll be using the curl
command again. Make sure you execute the command in the folder that contains the books.json
file.
curl -i -X POST 'http://127.0.0.1:7700/indexes/books/documents' \
--header 'content-type: application/json' \
--header 'Authorization: Bearer your-master-key' \
--data @books.json
After adding documents, you should receive a response like this:
{
"taskUid": 1,
"indexUid": "books",
"status": "enqueued",
"type": "documentAdditionOrUpdate",
"enqueuedAt": "2023-04-19T14:10:22.962629Z"
}
Alternatively, you can use JavaScript code to upload documents to your index. Here's an example.
const { MeiliSearch } = require('meilisearch')
const books = require('./books.json')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const index = client.index('books')
index.addDocuments(books).then((res) => console.log(res))
}
main()
Now, if you open a browser and navigate to the host address of your Meilisearch instance (by default: http://localhost:7700), you can use our web interface to start searching with the index you just created.
That's it for adding documents!
Step 3: Using facets to calculate the distribution of documents
To improve our search, we can use filters. They index data for a particular property so the Meilisearch instance can retrieve data faster. Furthermore, filters allows to build faceted search interfaces enabling users to browse data by category and narrow their search selection, which leads to faster searching.
To retrieve the facet distribution, we have to first define a filter. For this example, we want to figure out the language distribution of all books written by Douglas Adams
. Therefore, let's define a filter for the language
property. Our dataset contains five different languages. That's perfect for faceting.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const index = client.index('books')
await index.updateSettings({
filterableAttributes:
[
"language"
]
})
}
main()
Let's say we want to figure out the language distribution of all books written by author Douglas Adams
. In other words, we want to know how many books are written in eng
(English), esp
(Spanish), or fre
(French).
You can try for yourself how to solve this problem. Information and examples about the facets distribution can be found in the documentation (scroll down to "The facets distribution"). You can also find a list of Meilisearch-js functions on the repository README.
We expect the following outcome for Douglas Adams
.
{
hits:[ ... ],
query: 'Douglas Adams',
processingTimeMs: 0,
limit: 20,
offset: 0,
estimatedTotalHits: 11,
facetDistribution: { language: { eng: 10, esp: 1 } },
facetStats: {}
}
Code solution for facets distribution
Solution: To obtain the facet distribution, we can pass the `facets` property with our search query.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const index = client.index('books')
const distribution = await index
.search('Douglas Adams', {
facets: ['language']
})
console.log(distribution)
}
main()
Cool, right? Let's move on!
Step 4: Using distinct attributes to avoid duplicates
Perform a search for The Lord of the Rings 2
. Notice anything strange?
{
"id": "35",
"title": "The Lord of the Rings 2",
"author": "J.R.R. Tolkien/Alan Lee",
"cover": "hard cover",
"language": "eng",
"publisher": "Houghton Mifflin Harcourt",
"details": {
"isbn": "0618260587",
"rating": "4.50",
"pages": "1216"
},
"isbn13": "9780439785989"
},
{
"id": "38",
"title": "The Lord of the Rings 2",
"author": "J.R.R. Tolkien/Alan Lee",
"cover": "soft cover",
"language": "eng",
"publisher": "Houghton Mifflin Harcourt",
"details": {
"isbn": "0618260587",
"rating": "4.50",
"pages": "1216"
},
"isbn13": "9780439785989"
}
Currently, our dataset contains duplicate books that have different cover
types. When users search for a particular book, we don't want to show them the same book twice just because they have a different cover type. Luckily, the isbn13
property is unique for each book; therefore, we can use it as a distinct attribute to prevent double results.
A distinct attribute is a field whose value will always be unique in the returned documents. We want to set isbn13
as a distinct attribute, so that Meilisearch won't return results that share the same isbn13
value.
We encourage you to find the solution to this problem yourself, but if you get stuck, you can always take a look at the provided code solution below.
To verify if your solution works, try querying for The Lord of the Rings 2
. The doubled results should be gone.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const index = client.index('books')
const search = await index.search('The Lord of the Rings 2')
console.log(search)
}
main()
Code solution for distinct attribute
Here, the solution is to define isbn13
as a distinct attribute, since each book has only one ISBN even across different cover versions.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const index = client.index('books')
await index.updateDistinctAttribute('isbn13')
}
main()
All good? Let's move on!
Step 5: How to define searchable attributes?
What do you think happens when we query for 13
? I'll give you a second.
Well, it returns all objects that contain the number 13
. In other words, we receive results for isbn
codes containing the number 13
but also objects with id = 13
or id = 131
. For the user, it's not useful to search for object IDs.
Therefore, we can manually define some attributes as searchable attributes, and others as non-searchable. Try it for yourself using the documentation for searchable attributes. And don't forget the Meilisearch-js API reference!
You can verify your solution by querying for 159
. Without defining searchable attributes, we receive 13 results, which includes one ID-based match. After making id
non-searchable, we should receive only 12 results.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const index = client.index('books')
const search = await index.search('159')
console.log(search.estimatedTotalHits) // Output: 12
}
main()
Code solution for searchable attributes
The solution looks like this. Note that this array of searchable attributes is sorted by order of importance.
const { MeiliSearch } = require('meilisearch')
const main = async () => {
const client = new MeiliSearch({
host: 'http://127.0.0.1:7700',
apiKey: 'your-master-key'
})
const index = client.index('books')
await index.updateSearchableAttributes([
'author', 'title', 'details', 'publisher'
])
}
main()
Awesome, problem solved!
Conclusion: Harry Potter or The Lord of the Rings?
That's the end of this tutorial. We've covered how to retrieve a facets distribution, set searchable attributes, set distinct attributes, and how Meilisearch handles nested objects.
Feel free to play around with the code to understand the examples fully. For every example, we've linked to the relevant documentation page where you can find more examples and information about the different API endpoints.
Have fun searching GoodReads book data! Did you enjoy using Meilisearch? Make sure to show us some love by giving Meilisearch a star on GitHub!
Photo by Susan Yin