When a visitor types "lightweight jacket for autumn" into your search bar, who decides which products appear - and in what order? Behind every answer lies an algorithm. You don't need to know how to code one to understand how it works. But grasping its logic fundamentally changes how you diagnose a search problem and choose the right tool.
There are four main families of search algorithms. Every modern e-commerce search engine relies on a combination of these approaches - with different balances depending on the tool.
The 4 main families of search algorithms
Keyword-based search - BM25 / TF-IDF
This is the foundational mechanism of search since the 1970s. The engine compares the exact terms typed by the visitor against the terms present in your product listings. BM25 (Best Match 25) is the modern version of TF-IDF: it accounts for word frequency within a document, its rarity across the catalogue, and the length of descriptions.
In practice: if a customer types "running shoes", the engine surfaces products whose titles and descriptions contain exactly "running" and "shoes" - favouring those where these words appear frequently and in key fields (title > description).
Semantic search - embeddings and vector space
Instead of comparing words, semantic search compares meaning. Each product and each query is converted into a numerical vector - a sequence of numbers representing its meaning in a space with hundreds of dimensions. Search then consists of finding the vectors closest to the query in that space.
These vectors are produced by language models trained on billions of texts. They "know" that "parka" and "winter coat" are close, that "sneakers" and "trainers" mean the same thing, or that a query like "something warm for hiking" corresponds to down jackets and technical fleeces.
Hybrid search - the best of both
Hybrid search combines BM25 and semantic into a single final score. The query is processed in parallel by both methods, and their scores are merged - typically through an adjustable weighting (a parameter often called alpha) that determines the relative contribution of each approach.
This is the standard for modern engines: Elasticsearch since version 8.x, Google Vertex AI Search, Algolia Neural Search. Hybrid corrects the blind spots of each approach in isolation: BM25's precision on product codes, combined with semantic understanding on natural language.
Behavioural ranking - AI that learns from your customers
This fourth type doesn't change which products are returned, but in what order they appear. The engine observes the real behaviours of your visitors - clicks, add-to-cart actions, conversions, time spent on a product page - and automatically adjusts the ranking to surface the products that perform best for each type of query.
Result: if your customers who search for "beginner running" consistently click and buy from a particular brand, those products rise in results for that query - without any manual configuration from you.
Algorithm comparison at a glance
| Algorithm | Strengths | Main limitations | Data required |
|---|---|---|---|
| BM25 / TF-IDF | Precise on exact references, fast, transparent | Synonyms and natural language ignored | None |
| Semantic | Understands meaning, natural language, rephrasing | Less reliable on exact codes and SKUs | Embeddings model (pre-trained) |
| Hybrid | Precision + meaning, fewer zero results | BM25/semantic weight tuning is delicate | Embeddings model |
| ML ranking | Adapts to your audience, continuously optimises conversion | Requires click and conversion history | User behaviour data (sufficient volume) |
Which combination is right for your store?
The answer depends on the size of your catalogue, your traffic volume, and the richness of your product data.
- Small catalogue (<1,000 products): a well-configured BM25 with carefully maintained synonyms can be enough. Investing in a semantic layer only pays off beyond a certain volume of missed queries.
- Medium to large catalogue (1,000 to 50,000 products): hybrid search becomes essential. The sheer variety of variants, brands, and phrasings makes manual synonym configuration insufficient on its own.
- High-traffic sites (>50,000 visitors/month): behavioural ranking delivers continuous improvement without manual work. The condition: enough data for the algorithm to converge on a reliable signal.
- In all cases: manual synonyms remain the most cost-effective short-term configuration. Cheap to set up, and the impact on zero-result rate is immediate.
See how Vectail handles the algorithms for you
Hybrid search, behavioural ranking, and configurable synonyms - all from a single dashboard. 14-day free trial, no credit card required.
Start for freeWhat Vectail does with these algorithms
Vectail is built on Google Vertex AI Search for Retail, which natively integrates a hybrid architecture - BM25 and semantic search combined in a single relevance score. No need to choose or calibrate the balance between the two: it's handled by Google's infrastructure, trained on billions of retail queries.
- Behavioural ranking enabled by default: Vertex AI auto-learning observes your visitors' behaviour from day one and refines ranking continuously. It initialises progressively as traffic builds.
- Automatic query expansion: the
queryExpansionSpec: AUTOparameter lets the engine automatically broaden a query to semantically close terms when exact results are insufficient. - Synonyms configurable from the dashboard: for cases where the algorithm doesn't cover a sector-specific synonym - "MTB" vs "mountain bike", "trainers" vs "sneakers" - you can define synonym groups that take priority over the semantic layer.
The combination of these three layers - hybrid, behavioural, manual synonyms - covers the vast majority of e-commerce search scenarios without requiring any technical configuration.