Search beyond just search

Choice of search engine for the commerce touch points plays a critical role in increasing product discovery and increasing add to cart rate from product list, details page – in turn boosting revenue of the channel.

Search have been evolving from standard query parser parameters to DSL or json based parameter to evolve along with the connecting systems. Not just that, advancement towards search algorithms to be adaptable and having AI plugged-in helped towards building a strong search solutions to have best ranking, relevancy and sorting.

Businesses have also started moving toward “build your own search” or use SaaS search options like Algolia, Bloomreach etc as part of realization that lucene based solution like solr or elastic search just gives a 20% readiness to product which is needed to be customer ready in out-of-the-box mode.

But is this enough? Can we compare built search engines used on commerce touch points with that of Google? in the end what stand out is Speed of search.

Google search is faster than blink of an eye, on average a human blink every 100 ms. on average google responds 2,000,000,000 search results in 400 ms and suggestions/autocomplete in 50 ms. What matters is this response time is of the end interface and not just underlying search engine. According to Google’ search guru, every 400 ms delay leads to 0.44 % drop in search volumes on site (https://www.thinkwithgoogle.com/marketing-resources/the-google-gospel-of-speed-urs-hoelzle/)

Es importante entender este problema y centrarse en resolver los inconvenientes de la latencia: una búsqueda más rápida atrae a más clientes y les lleva a comprar más.

Its important to understand this problem and focus on resolving latency issues – a faster search, engage more customers and lead them to buy more.

Table of Contents

Infrastructure choices

Latency between interface and search engine is majorly because choice of infrastructure. Search indexes should be as close to interface as possible to minimize the delay. On-premise search solutions can really impact search latency due to roundtrip needed, this is one of the reason on-premise endeca solution is not anymore top choice of businesses. As CDN helps to cater the content from anywhere faster because customer is being served from closest edge, search should also be available to customer from indices as close as possible. Solr and elasticsearch have launched replication technique but those are still meant for active-passive setup and not recommended to have multiple clusters in different data centers and locations. DNS resolution, load-balancing are also something to review as each of these take a good amount of time adding hops in between.

Search processing

Most of search engine in market currently applies relevancy, sorting and other rules after the results are fetched from indexes, this costs in time. More and more personalization in search means more response time. Search engine needs to be accommodated to have results pre-indexed with relevancy, sorting needed to avoid post processing.

Indexing process

Almost all of the search engine in their “out-of-the-box” mode when do indexing, which is a heavy process do on the same node also being used for search , this causes to compete for resources like CPU and memory which often reduce performance of the search.

Garbage collection

There are several search engine libraries like Clucene , xapian etc out now which are using C++ in the core which is really fast as it being compiled into machine code and has no background of GC operations . Those garbage collection pauses have always been a problem for java based search engines like lucene.

Wrapper layer

Most of the time, search engine APIs are wrapped with a business layer to plugin some business logic or interpret the response as needed for interface to handle. Its important to rethink about the approach as every layer added will add overhead to latency. Is this tradeoff with speed worth it? Restructure the underlying response of the search engine to be easily interpreted by customer interface to avoid any layer in between.

In summary, its important to help visitors find correct products faster!!

Author: Prateek Srivastava
Lead Solutions Architect / Engineering Manager

Original post: https://www.linkedin.com/pulse/search-beyond-just-prateek-srivastava/

Search beyond just search

Infrastructure choices

Search processing

Indexing process

Garbage collection

Wrapper layer

People First, Solutions Next!

About NU

IT Solutions

Client Stories

People First, Solutions Next!