How to speed up GoodReads book titles search with Meilisearch and JavaScript

👉
This post was originally published in February 2021 by guest author Michiel Mulders. At the time, Meilisearch was on v0.18. It has been updated by Carolina Ferreira to work with Meilisearch v1.

This tutorial uses a sample dataset from GoodReads uploaded by Jealous Leopard on Kaggle.

The goal of this tutorial is to learn more about advanced Meilisearch concepts, such as:

  • How Meilisearch handles nested objects
  • How to use facets to calculate the distribution of documents
  • How to use distinct attributes
  • How to define searchable attributes using the settings object

So, what are the requirements to follow this tutorial?

Requirements

To follow this standalone tutorial, we expect you to have a basic understanding of Meilisearch. If you’re unsure, feel free to check out the previous tutorial about searching for Nobel prize winners—however, it isn’t required.

Other requirements include.

All set? Let's dive right into it!

Project setup and Meilisearch-js installation

To follow along, we need to set up our JavaScript project and install Meilisearch-js. Create a new folder and run the following command in your terminal.

npm init -y

This will prepare your project setup. Next, we can add the Meilisearch-js dependency.

npm install meilisearch

Lastly, let’s create a file called index.js inside your project. We’ll use this file to add our JavaScript code.

touch index.js

Done? Let's move on!

Step 1: Creating an index

This step will prepare the index.js file so we can experiment with the meilisearch package.

First, we need to connect with our Meilisearch instance. If you've used the Meilisearch cloud, you've received a master key that protects all API endpoints for your Meilisearch instance. If you've used another installation method, we highly recommend setting a master key for security reasons. For instance, an unsecured DigitalOcean droplet allows anyone to access your instance via the publicly available IP address.

Below, you’ll find a boilerplate code snippet that you can reuse in all of your Meilisearch projects. We wrap our code in an asynchronous main function in order to access the async/await syntax. We also use the client object to connect with our Meilisearch instance.

Add the code snippet below to your index.js file.

const { MeiliSearch } = require('meilisearch')

const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key' 
    })
    const indexes = await client.getIndexes()
    console.log(indexes)
}

main()

Note that we call the getIndexes() method on the newly created client object.

Now, execute the file from your terminal with the node command.

node index.js

If you've received a response from your client, the connection object works. If you haven’t, double check your host address and API key.

For the next step, let's create the books index to add our GoodReads data.

const { MeiliSearch } = require('meilisearch')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key' 
    })
 
    const indexes = await client.getIndexes()
    console.log(indexes)
 
    const indexCreationTask = await client.createIndex('books')
    await client.waitForTask(indexCreationTask.taskUid)
 
    const updatedIndexes = await client.getIndexes()
    console.log(updatedIndexes)
}
 
main()

Execute the file with the node command like before. You should see the following response containing your books index. Note that your values for createdAt and updatedAt will be different from ours.

{
  results: [
    Index {
      uid: 'books',
      primaryKey: null,
      httpRequest: [HttpRequests],
      tasks: [TaskClient]
    }
  ],
  offset: 0,
  limit: 20,
  total: 1
}

Our index is created, but we haven’t given it a primary key yet. When we add data in the next step, Meilisearch will infer our primary key because our data set contains an id field.

Index created? Good! Let’s explore the GoodReads books data set.

Step 2: Adding the GoodReads books dataset

This step explores the GoodReads books dataset. For clarity, we’ve used a modified, smaller version of this dataset, but you can find the original dataset on Kaggle, if you’re interested. First, let's download our dataset using the cURL command.

curl -L https://raw.githubusercontent.com/meilisearch/datasets/main/datasets/books/books.json -o books.json

So, what does a book object look like?

{
    id: "1",
    title: "Harry Potter and the Half-Blood Prince",
    author: "J.K. Rowling/Mary GrandPré",
    cover: "hard cover with dust jacket",
    language: "eng",
    publisher: "Scholastic Inc.",
    details: {
        isbn: "0439785960",
        rating: "4.57",
        pages: "652"
    }
}

Each book contains a unique id. The cover property has three possible values: hard cover, hard cover with dust jacket, and soft cover. This property will be useful later on when we take a look at distinct attributes.

To display how Meilisearch handles nested objects, we've created a details property that contains the book's isbn code, rating, and number of pages.

Note that the dataset contains a nested JSON object. JSON objects are broken into separate string tokens during indexing, which means separate words. This means that each value gets tokenized and indexed so the value is searchable.

Ok, let's add the data to our books index. For this, we'll be using the curl command again. Make sure you execute the command in the folder that contains the books.json file.

curl -i -X POST 'http://127.0.0.1:7700/indexes/books/documents' \
  --header 'content-type: application/json' \
  --header 'Authorization: Bearer your-master-key' \
  --data @books.json

After adding documents, you should receive a response like this:

{
    "taskUid": 1,
    "indexUid": "books",
    "status": "enqueued",
    "type": "documentAdditionOrUpdate",
    "enqueuedAt": "2023-04-19T14:10:22.962629Z"
}

Alternatively, you can use JavaScript code to upload documents to your index. Here's an example.

const { MeiliSearch } = require('meilisearch')
const books = require('./books.json')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key'
    })
 
    const index = client.index('books')
    index.addDocuments(books).then((res) => console.log(res))
}
 
main()

Now, if you open a browser and navigate to the host address of your Meilisearch instance (by default: http://localhost:7700), you can use our web interface to start searching with the index you just created.

That's it for adding documents!

Step 3: Using facets to calculate the distribution of documents

To improve our search, we can use filters. They index data for a particular property so the Meilisearch instance can retrieve data faster. Furthermore, filters allows to build faceted search interfaces enabling users to browse data by category and narrow their search selection, which leads to faster searching.

To retrieve the facet distribution, we have to first define a filter. For this example, we want to figure out the language distribution of all books written by Douglas Adams. Therefore, let's define a filter for the language property. Our dataset contains five different languages. That's perfect for faceting.

const { MeiliSearch } = require('meilisearch')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key'
    })
 
    const index = client.index('books')
    
    await index.updateSettings({
        filterableAttributes:
        [
            "language"
        ]
    })
}

main()

Let's say we want to figure out the language distribution of all books written by author Douglas Adams. In other words, we want to know how many books are written in eng (English), esp (Spanish), or fre (French).

You can try for yourself how to solve this problem. Information and examples about the facets distribution can be found in the documentation (scroll down to "The facets distribution"). You can also find a list of Meilisearch-js functions on the repository README.

We expect the following outcome for Douglas Adams.

{ 
  hits:[ ... ],
  query: 'Douglas Adams',
  processingTimeMs: 0,
  limit: 20,
  offset: 0,
  estimatedTotalHits: 11,
  facetDistribution: { language: { eng: 10, esp: 1 } },
  facetStats: {}
}


Code solution for facets distribution

Solution: To obtain the facet distribution, we can pass the `facets` property with our search query.

const { MeiliSearch } = require('meilisearch')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key'
    })
 
    const index = client.index('books')
    
    const distribution = await index
        .search('Douglas Adams', {
            facets: ['language']
        })
    console.log(distribution)
}

main()

Cool, right? Let's move on!

Step 4: Using distinct attributes to avoid duplicates

Perform a search for The Lord of the Rings 2. Notice anything strange?

{
    "id": "35",
    "title": "The Lord of the Rings 2",
    "author": "J.R.R. Tolkien/Alan  Lee",
    "cover": "hard cover",
    "language": "eng",
    "publisher": "Houghton Mifflin Harcourt",
    "details": {
        "isbn": "0618260587",
        "rating": "4.50",
        "pages": "1216"
    },
    "isbn13": "9780439785989"
},
{
    "id": "38",
    "title": "The Lord of the Rings 2",
    "author": "J.R.R. Tolkien/Alan  Lee",
    "cover": "soft cover",
    "language": "eng",
    "publisher": "Houghton Mifflin Harcourt",
    "details": {
        "isbn": "0618260587",
        "rating": "4.50",
        "pages": "1216"
    },
    "isbn13": "9780439785989"
}

Currently, our dataset contains duplicate books that have different cover types. When users search for a particular book, we don't want to show them the same book twice just because they have a different cover type. Luckily, the isbn13 property is unique for each book; therefore, we can use it as a distinct attribute to prevent double results.

A distinct attribute is a field whose value will always be unique in the returned documents. We want to set isbn13 as a distinct attribute, so that Meilisearch won't return results that share the same isbn13 value.

We encourage you to find the solution to this problem yourself, but if you get stuck, you can always take a look at the provided code solution below.

To verify if your solution works, try querying for The Lord of the Rings 2. The doubled results should be gone.

const { MeiliSearch } = require('meilisearch')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key'
    })
 
    const index = client.index('books')
    const search = await index.search('The Lord of the Rings 2')
    console.log(search)
}

main()


Code solution for distinct attribute

Here, the solution is to define isbn13 as a distinct attribute, since each book has only one ISBN even across different cover versions.

const { MeiliSearch } = require('meilisearch')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key'
    })
 
    const index = client.index('books')
    
    await index.updateDistinctAttribute('isbn13')
}

main()

All good? Let's move on!

Step 5: How to define searchable attributes?

What do you think happens when we query for 13? I'll give you a second.

Well, it returns all objects that contain the number 13. In other words, we receive results for isbn codes containing the number 13 but also objects with id = 13 or id = 131. For the user, it's not useful to search for object IDs.

Therefore, we can manually define some attributes as searchable attributes, and others as non-searchable. Try it for yourself using the documentation for searchable attributes. And don't forget the Meilisearch-js API reference!

You can verify your solution by querying for 159. Without defining searchable attributes, we receive 13 results, which includes one ID-based match. After making id non-searchable, we should receive only 12 results.

const { MeiliSearch } = require('meilisearch')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key'
    })
 
    const index = client.index('books')
    const search = await index.search('159')
    console.log(search.estimatedTotalHits) // Output: 12
}

main()


Code solution for searchable attributes

The solution looks like this. Note that this array of searchable attributes is sorted by order of importance.

const { MeiliSearch } = require('meilisearch')
 
const main = async () => {
    const client = new MeiliSearch({
        host: 'http://127.0.0.1:7700',
        apiKey: 'your-master-key'
    })
 
    const index = client.index('books')
    await index.updateSearchableAttributes([
        'author', 'title', 'details', 'publisher'
    ])
}

main()


Awesome, problem solved!

Conclusion: Harry Potter or The Lord of the Rings?

That's the end of this tutorial. We've covered how to retrieve a facets distribution, set searchable attributes, set distinct attributes, and how Meilisearch handles nested objects.

Feel free to play around with the code to understand the examples fully. For every example, we've linked to the relevant documentation page where you can find more examples and information about the different API endpoints.

Have fun searching GoodReads book data! Did you enjoy using Meilisearch? Make sure to show us some love by giving Meilisearch a star on GitHub!

Photo by Susan Yin