Elastic Search Basic
Elastic Search Basic

There’s no secrete that in modern Web applications an ability to find required content fast is one of the most important things. So if your application won’t be responsive enough clients will use something else. That’s why all the processes should be optimized as much as possible, never the less it can be a very challenging task…

… especially with search. What makes the situation even worse – sometimes set of search criteria can be simply large, let’s take a look:

Now imagine that you need to search through such entries with pattern match (for PostgreSQL it can be done via LIKE command) while you might have thousands of such entities. Query will be very complex and heavy in that case, and you might need to join some data after all:

Resolving of such tasks is not a new kind of problem. Years ago search indexing process was invented. The main idea is to store only searchable parts of document and optimize process of going through such parts. Let’s say if we want to search through large entries and we know exactly what parts we’re going to search for, we can just add them into the index to do substring search faster using search engine. (There’re many other optimizations in search engine, you can read about specifics here)

One of the most popular search engines is Elastic Search, and actually we’re going to use it with PostrgreSQL and Node.js to build searchable grid with entry schema above. But let’s discuss communication between components first:

Workflow of Elastic Search integrated environment (Node + PostrgreSQL)

Process goes in the following way:

  1. Client requests data from the server API;
  2. Instead of taking data immediately from the DB, server requests ids from ES index;
  3. Elastic Search Engine goes though the indexed entries (and indexed parameters) inside and selects only ones that meet the search criteria;
  4. PostgreSQL selects entries based on the ES result identifiers array;

In our case (you can test the code base yourself, the link is below) we’re going to search through 10000 entities, for search index we’re going to take just few parameters with one customized  (but again you can change the algorithm of entities’ generation and test performance, request will be really fast). In that case configuration for our ORM model can look like this (in order to make code simpler we used Sequelize ORM, nothing outstanding, just common ORM) for search usage later:

Here we specify what name of search index collection is going to be used and what properties of actual data should be indexed. There’s one rule for you guys: know your data and queries! Never index everything that’s saved in DB if it can be avoided. Analyze for what kind of data user expects to have a search. For example, if we had a phone number book, I would index phone, last name and first name as most common searchable fields in such case. But actually how those props are connected with actual search index? Good question though. And here we need to discuss another pitfall of ES usage – data duplication:

Shows the way how entry is properly saved inside Search Index

So each change in database should be reflected in search index. We save the same data twice. Sounds a bit complex, but there’s the easy way: usage of middleware!

Shows the middleware layers between API and actual DB

Actually we already have one – Sequelize ORM. Nowadays it’s a common practice to use ORM as a provider of connection to DB to have all the models, relations, etc. easily readable and well-structured. Actually we can use ORM functionality for the same purpose: it will provide the interface for indexing inside ES server. So we’re going to kill 2 birds with one stone – change index and db with one abstraction with no duplicate code every time when we need to do such operations (the idea is not a new one, you can easily find such extensions for another ORMs, like mongoosastic for MongoDB ORM). But in the case if there’s no ready to use solution you might need to decorate existing create/update/delete methods, let’s take a look into the possible way:

As you can see we provide decorator in the form of module. It saves original create method and extends with Elastic Search indexing functionality. So whenever user calls original ORM create it will automatically create search index of created entity. You can easily do the same extension for update and delete as well.

And now when we have our data indexed it’s a time to do a simple search, we’re going to do promised substring search:

So we do search for query pushed by user and filter out identifiers in the form of array so it can be used for querying of entries from the actual DB.

Waiting for your comments and questions below! All the code can be found here.