When a visitor types "lightweight jacket for autumn" into your search bar, who decides which products appear - and in what order? Behind every answer lies an algorithm. You don't need to know how to code one to understand how it works. But grasping its logic fundamentally changes how you diagnose a search problem and choose the right tool.

There are four main families of search algorithms. Every modern e-commerce search engine relies on a combination of these approaches - with different balances depending on the tool.

Why this matters for your business: your search engine's algorithm directly determines your zero-result rate, the relevance of answers to natural language queries, and your engine's ability to improve over time. Choosing the right algorithm type - or diagnosing the limitations of the one you're using - is a business decision, not just a technical one.

The 4 main families of search algorithms

1

Keyword-based search - BM25 / TF-IDF

This is the foundational mechanism of search since the 1970s. The engine compares the exact terms typed by the visitor against the terms present in your product listings. BM25 (Best Match 25) is the modern version of TF-IDF: it accounts for word frequency within a document, its rarity across the catalogue, and the length of descriptions.

In practice: if a customer types "running shoes", the engine surfaces products whose titles and descriptions contain exactly "running" and "shoes" - favouring those where these words appear frequently and in key fields (title > description).

Fast Transparent Precise on exact references Synonyms invisible Zero results on natural language
Concrete example "sneakers for running" won't surface "Nike sport shoes" if the word "sneakers" isn't in the product listing. "Waterproof trousers" won't find "softshell pants". Without manually configured synonyms, every alternative phrasing is a zero-result rate climbing higher.
2

Semantic search - embeddings and vector space

Instead of comparing words, semantic search compares meaning. Each product and each query is converted into a numerical vector - a sequence of numbers representing its meaning in a space with hundreds of dimensions. Search then consists of finding the vectors closest to the query in that space.

These vectors are produced by language models trained on billions of texts. They "know" that "parka" and "winter coat" are close, that "sneakers" and "trainers" mean the same thing, or that a query like "something warm for hiking" corresponds to down jackets and technical fleeces.

Understands implicit synonyms Natural language Tolerates typos and rephrasing Less precise on exact references Slower indexing
Concrete example "something warm for winter" surfaces down jackets, blankets, beanies - even if those words don't appear in the query. On the other hand, "ref XYZ-4521" may be less reliable than with pure BM25, which handles exact matches better.
3

Hybrid search - the best of both

Hybrid search combines BM25 and semantic into a single final score. The query is processed in parallel by both methods, and their scores are merged - typically through an adjustable weighting (a parameter often called alpha) that determines the relative contribution of each approach.

This is the standard for modern engines: Elasticsearch since version 8.x, Google Vertex AI Search, Algolia Neural Search. Hybrid corrects the blind spots of each approach in isolation: BM25's precision on product codes, combined with semantic understanding on natural language.

Precision on references Understanding of meaning Fewer zero results More complex to tune Heavier infrastructure
Concrete example "Nike Air Max 90 white": BM25 finds the exact reference with precision. "Shoes for trail running in the woods": semantic surfaces trail shoes even if that word isn't typed. Hybrid handles both in the same engine.
4

Behavioural ranking - AI that learns from your customers

This fourth type doesn't change which products are returned, but in what order they appear. The engine observes the real behaviours of your visitors - clicks, add-to-cart actions, conversions, time spent on a product page - and automatically adjusts the ranking to surface the products that perform best for each type of query.

Result: if your customers who search for "beginner running" consistently click and buy from a particular brand, those products rise in results for that query - without any manual configuration from you.

Adapts automatically Optimises conversion Zero manual configuration Ineffective at launch (cold start) Requires sufficient data volume
Key consideration Behavioural ranking is powerful, but it amplifies existing trends. A product with little initial visibility struggles to emerge. Searchandising actions (manual boosting) remain necessary for new arrivals or promotions.

Algorithm comparison at a glance

Algorithm Strengths Main limitations Data required
BM25 / TF-IDF Precise on exact references, fast, transparent Synonyms and natural language ignored None
Semantic Understands meaning, natural language, rephrasing Less reliable on exact codes and SKUs Embeddings model (pre-trained)
Hybrid Precision + meaning, fewer zero results BM25/semantic weight tuning is delicate Embeddings model
ML ranking Adapts to your audience, continuously optimises conversion Requires click and conversion history User behaviour data (sufficient volume)

Which combination is right for your store?

The answer depends on the size of your catalogue, your traffic volume, and the richness of your product data.

  • Small catalogue (<1,000 products): a well-configured BM25 with carefully maintained synonyms can be enough. Investing in a semantic layer only pays off beyond a certain volume of missed queries.
  • Medium to large catalogue (1,000 to 50,000 products): hybrid search becomes essential. The sheer variety of variants, brands, and phrasings makes manual synonym configuration insufficient on its own.
  • High-traffic sites (>50,000 visitors/month): behavioural ranking delivers continuous improvement without manual work. The condition: enough data for the algorithm to converge on a reliable signal.
  • In all cases: manual synonyms remain the most cost-effective short-term configuration. Cheap to set up, and the impact on zero-result rate is immediate.

See how Vectail handles the algorithms for you

Hybrid search, behavioural ranking, and configurable synonyms - all from a single dashboard. 14-day free trial, no credit card required.

Start for free

What Vectail does with these algorithms

Vectail is built on Google Vertex AI Search for Retail, which natively integrates a hybrid architecture - BM25 and semantic search combined in a single relevance score. No need to choose or calibrate the balance between the two: it's handled by Google's infrastructure, trained on billions of retail queries.

  • Behavioural ranking enabled by default: Vertex AI auto-learning observes your visitors' behaviour from day one and refines ranking continuously. It initialises progressively as traffic builds.
  • Automatic query expansion: the queryExpansionSpec: AUTO parameter lets the engine automatically broaden a query to semantically close terms when exact results are insufficient.
  • Synonyms configurable from the dashboard: for cases where the algorithm doesn't cover a sector-specific synonym - "MTB" vs "mountain bike", "trainers" vs "sneakers" - you can define synonym groups that take priority over the semantic layer.

The combination of these three layers - hybrid, behavioural, manual synonyms - covers the vast majority of e-commerce search scenarios without requiring any technical configuration.