SearchStax Site Search solution is engineered to give marketers the agility they need to optimize site search outcomes. Get full visibility into search analytics and make real-time changes with one click.
SearchStax Managed Search service automates, manages and scales hosted Solr infrastructure in public or private clouds. Free up developers for value-added tasks and reduce costs with fewer incidents.
Feb. 09, 2018
Sameer Maggon
|
We recently assisted a client in debugging counter-intuitive search results and improving their Apache Solr relevance. The client’s search application focuses on retrieval of party-oriented listings such as services available for clowns, magicians, bounce houses etc. Documents in the party collection include listings for clowns, magicians, face painters, games etc. The document schema includes the following fields:
We leverage Solr’s Extended DisMax Query Parser to exercise fine control over the scoring of search results. Two features of the parser we make heavy use of are text boosts and boost queries. Text boosts weight the match score specified fields by a multiplicative factor between 0 and 10. For example, supplying a text boost of name^10 will disproportionately weight the name field in a document over other fields. Boost queries weight field+term combinations by multiplying field+term matches by a factor between 0 and 10. For example, supplying a Boost query of category:Clown^10 will significantly boost documents that contain the term “Clown” in the category field. We use the term “relevance model” to denote a particular set of text boost and boost query parameters. We leveraged these relevance models to optimize Solr relevance.
Our client received the following search results when querying “Krusty the Clown”1:
This result seems to indicate the query is not matching documents as expected. In particular, the #3 result, “Escape from Grandma’s House” is not even a clown – it’s a game.
The essential tool for debugging search results is the “explain” mechanism of Solr which is triggered by adding &debug=true to search queries. This produces detail scoring information for each document. A sample of the debug output is shown below, with explanations.
[1]
1.7509375 [2] = sum of [3]:
0.87546873 [4] = sum of [5] :
0.0 = max of:
0.0 = ConstantScore(description:the), product of:
0.0 = boost
1.0 = queryNorm
0.0 = ConstantScore(name:the), product of:
0.0 = boost
1.0 = queryNorm
0.87546873 [6] = max of [7]:
0.87546873 = weight(Synonym(category:clown category:jester) in 0) [8] [SchemaSimilarity], result of:
0.87546873 = score(doc=0,freq=1.0 = termFreq=1.0
), product of [9]:
0.87546873 = idf [10], computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
2.0 [11] = docFreq [12]
5.0 = docCount [13]
1.0 = tfNorm [14], computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
1.0 [15] = termFreq [16]=1.0
1.2 = parameter k1 [17]
0.75 = parameter b [18]
1.0 [19] = avgFieldLength [20]
1.0 [21] = fieldLength [22]
0.0 = ConstantScore(Synonym(description:clown description:jester)), product of:
0.0 = boost [23]
1.0 = queryNorm
0.0 = ConstantScore(Synonym(name:clown name:jester)), product of:
0.0 = boost
1.0 = queryNorm
0.87546873 [24] = weight(Synonym(category:clown [25] category:jester) in 0) [SchemaSimilarity], result of:
0.87546873 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
0.87546873 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
2.0 = docFreq
5.0 = docCount
1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.0 = avgFieldLength
1.0 = fieldLength
Explanations:
To investigate the Krusty-on-bottom issue, we set up a small test environment with only the five listings shown above. We then applied various relevance models to see the effect of the models on results and improve Solr relevance.
The first experiment applied no relevance model, allowing Solr’s native scoring algorithm to take effect – effectively the default Solr relevance. Querying for “Krusty the Clown” resulted in:
This result is encouraging since it demonstrates that it’s possible to surface Krusty to the top, as we would hope for. However, somewhat surprisingly, Escape from Grandma’s House is retrieved as the #2 result. It turns out that Escape from Grandma’s House is a strong match for Krusty the Clown since the supplier for Escape from Grandma’s House contains the word “Clown”. Furthermore, the term “Clown” is rare in the supplier field, so Escape from Grandma’s House gets an additional boost due to the idf (inverse document frequency) factor. This contrasts with presence of the term “Clown” in the category field of 2 other listings. Since “Clown” is not a particularly rare category, it does not significantly boost listings that have “Clown” as a category.
In the next experiment, we created a model that has zero-valued boosts for name and description, which is consistent with the client’s original Solr relevance model. Applying this model result in:
This experiments dramatically illustrates the effect of zero-valued boosts – Krusty goes from rank #1 to rank #5. What is happening is that zero-valued boosts cause the name and description to be completely ignored by the matching algorithm. In a nutshell, a score is generated by matching these documents fields against the query, then that score is multiplied by 0 which effectively removes those fields from consideration in the scoring algorithm.
Although Experiment 2 demonstrates how Krusty can be moved down the ranking list, the results do not match the original “bad” results. In our next experiment, we applied a boost query similar to the original Solr relevance model. Specifically we boosted listings that have category matching Clown. This resulted in the following result set:
This brings us closer the original results. It might be a bit confusing why 2 clowns are at the top of the list, then a game, then 2 more clowns, and finally poor Krusty at the bottom. It turns out that Rusty the Clown and Sir Widebottom have their category fields set to Clown, where the other clowns do not. In fact, there is only one component that The Sir Widebottom and Krusty match: the word “the” in supplier. In Krusty’s case, the supplier field is slightly longer, so Krusty is penalized relative to the Sir Widebottom due to the term frequency norm computation in the relevance algorithm.
In this article, we discussed how a seemingly confusing set of search results can be “debugged” to surface the core factors determining result orderings. In debugging Solr relevance issues, it is important to take an incremental approach. The first step should be to strip away all boosting-related parameters and observe the “natural” results. The next step should be to tweak a single boost parameter and observe the effects, then the next parameter, and so on. This method will not only produce better search results, but will produce greater insight into why search sometimes behaves in seemingly unexpected ways.
[1] Names of the Search Results have been changed from the originals to protect the innocent.
Copyrights © SearchStax Inc.2014-2024. All Rights Reserved.