Sitecore Search Analyzers

Jaya Jha
8 min readJun 15, 2024

--

If you’ve worked with Sitecore Search, you’ve undoubtedly encountered the term ‘analyzers.’ Today, I’m going to cover the different types of analyzers available in Sitecore Search, how you can configure them, and how they can be utilized to improve search functionality and relevance.

Let’s begin with fundamental question what’s the purpose of analyzers?

The use of analyzers ensures that search results are relevant and comprehensive, going beyond just exact matches.

How analyzers work?

Analyzers translate text input into a structured format that is search-optimized using a three-step process:

Firstly, if it uses character filters, the analyzer applies them, typically removing or replacing certain characters such as punctuation.

Secondly, it tokenizes the search phrase, breaking it down into smaller units or tokens, often single words, but they can also be partial words or phrases.

Lastly, it applies token filters if it uses them, changing the tokens in multiple ways, such as applying synonyms, trimming tokens to their root words, eliminating stop words, among others.

The following diagram will illustrate how it functions.

Sitecore search provides two different types of analyzers out of box

Basic analyzers

Basic analyzers encompass multi-locale standard, standard, alphanumeric only, keyword, lowercase, and prefix match analyzers. These are predominantly utilized in the majority of scenarios.

Multi locale standard

The multi-locale standard analyzer handles input by converting it to lowercase, identifying the root version of each word through stemming, implementing synonyms, and discarding stop words and punctuation. Throughout this process, it takes the locale into account.

The following diagram illustrates the processing done by the multi-locale analyzer. The process first transforms the input statement to lowercase, then applies synonyms and removes stop words and punctuation while considering the locale, eventually generating output tokens.

Note: Best practice by Sitecore to use this analyzer — use with Textual relevance even if your domain does not support multi-locales

Standard Analyzer

The standard analyzer is an earlier version of the multi-locale standard analyzer designed specifically for English. It conducts all the operations identical to the multi-locale standard but doesn’t consider locale.

This analyzer can be employed for textual relevance if your work involves data in the English language only.

The following diagram illustrates the processing done by the standard analyzer. The process first transforms the input statement to lowercase, then applies synonyms and removes stop words and punctuation while considering the locale as English only, eventually generating output tokens.

Note: Best practice by Sitecore to use this — If your work is solely based on data in the English language, you can employ this analyzer for Textual relevance.

Alphanumeric only

The alphanumeric only analyzer carries out the same transformations as the standard analyzer. However, it also removes all characters that are not alphanumeric, rather than using them to separate tokens.

The subsequent diagram shows the processing performed by the Alphanumeric only analyzer. In this, the input string, after the removal of hyphens, generates a single token.

This is beneficial when you want to provide your visitors with the capability to search either with or without utilizing hyphens.

Note: Best practice by Sitecore to use this analyzer in functions like sorting or filtering.

Keyword

The keyword analyzer processes the input text and creates it as a single token.

The following diagram demonstrates the processing carried out by the Keyword analyzer. It generates a single token, which implies that it operates exclusively for precise matches.

This implies that you won’t be able to match ‘Sitecore’ or ‘search’ individually as the matches are only possible with the exact complete text.

Note: Best practice by Sitecore to use this analyzer for specific cases like filters or scenarios where you require an exact match.

Lowercase

The lowercase analyzer yields a single output token that consists of the entirely lowercase version of the input.

The subsequent diagram illustrates the process of the lowercase analyzer, which transforms the input string into lowercase.

Note: Best practice by Sitecore to use this analyzer for operations such as sorting and filtering.

Prefix match

The prefix match analyzer produces lowercase prefixes with a length varying from 3 to 15 characters, while simultaneously removing all non-alphanumeric characters from the input.

The subsequent illustrative diagram vividly shows how a prefix match analyzer transforms an input string with hyphens into various combinations, which can be subsequently employed in our searches.

Note: Best practice by Sitecore to use this analyzer in textual relevance for aligning unique IDs.

Advanced analyzers

Ngram based matching

The Ngram-based matching analyzer dissects text into distinct words and then forms n-grams of length ’n’ for each word.

As is clearly evident, when ’n’ is 2, the Ngram based analyzer generates tokens.

n=2

This analyzer has advantages when querying languages that don’t incorporate spaces, such as Japanese, and languages employing lengthy compound words, like German.

Note: Best practice by Sitecore to use this analyzer in suggestions blocks

Partial match

The partial match analyzer outputs lowercase versions of input tokens, which involves breaking up and linking special characters as well as discarding stop words.

By examining the following diagram, we can comprehend how a partial match analyzer processes a statement into tokens after eliminating stop words and punctuation marks.

Note: Best practice by Sitecore to use this analyzer in suggestions blocks and Textual Relevance

Shingle generator

The shingle generator analyzer operates by producing word-level n-grams, referred to as shingles.

The following diagram explains the operation of the Shingle generator, producing shingles that are two words long. The output tokens correspond to the combination of these two words.

This analyzer comes in handy for pulling out partial data and conducting matches against it.

Note: Best practice by Sitecore to use this analyzer in suggestions blocks

Standard no stemmer

The standard no stemmer analyzer carries out similar operations to the standard analyzer, however, it doesn’t reduce tokens to their root form using stemming.

The subsequent diagram elucidates the functioning of the standard no stemmer analyzer. It converts the input text statement into individual words by eliminating stop words, in this instance ‘to’, and generates the corresponding tokens.

Note: Best practice by Sitecore to use this analyzer in suggestions blocks

Now that we have familiarized ourselves with the various types of analyzers available in Sitecore, which are categorized into two categories and provided out of the box in the Sitecore search platform, the next task is to comprehend where we can locate these in Sitecore search and how we can employ them for website searches that we will configure using this platform.

To fully grasp this, let’s log into the Sitecore Customer Engagement Console and proceed to the designated path.

In which part of Sitecore search can we find it?

Please proceed as per the following navigation path:

Navigate to Sitecore Search -> Administration ->Domain Settings->Feature Configuration

Analyzers are employed extensively across various facets of Sitecore Search such as filters, personalization, suggestion blocks, determining textual relevance, and facilitating sorting options.

As per the above screenshot, we can evidently see that it’s available for different sections. Let’s consider ‘Textual Relevance’ as an example here and click on edit to comprehend how we can append analyzers to it based on our use case.

Based on available attributes, there’s an option to append an analyzer by picking one from the list. Since this is textual relevance, we can choose an analyzer according to our use case and best practices.

After choosing the analyzer, click on the save button and then publish in order to implement this change.

Now that we have grasped where in the Sitecore search we can add analyzers, you might be under the impression that our task is complete. However, an important step still remains — understanding how these analyzers are effective and utilized in our Sitecore search pages.

Navigate to widgets -> Widget Name (ex — Preview Search)

step 1: click on edit ->widget variation (default)

step 2: Additional settings

step 3: Textual Relevance in our case -> Enable Configuration

step 4: Incorporate the attribute where the analyzer is added under the domain setting feature configuration.

step 5: Save and publish the widget to implement this change.

Note: This is a crucial step in our entire process of including an analyzer, as without this, it will not be operational or effective.

Often, when we append an analyzer under the domain setting and presume our work is complete, we later find that the search isn’t functioning as anticipated and we wonder why.

Based on my experience working with Sitecore search and the questions posed about search in the Slack channel, I’ve found that emphasizing this step is vital.

After going through this blog post, I’m confident that you’ll find it easier to comprehend analyzers and their related configurations on the CEC platform. This understanding will assist you in quickly implementing real-time business use cases. If you’re an experienced developer, this will serve as a quick recap about analyzers. If you’re new to the platform, this blog post will provide a detailed overview.

References

Analyzers | Sitecore Documentation

Enjoy reading and happy learning!

--

--

Jaya Jha
Jaya Jha

Written by Jaya Jha

I am a full-stack Web Application Developer with extensive experience in Sitecore Ecosystem .Passionate about exploring cutting edge technologies.

No responses yet