Re: Performance issue in Wildcard Query in Solr 8.9.0

Shawn Heisey Tue, 24 Oct 2023 12:47:35 -0700

On 10/23/23 05:00, Vishal Patel wrote:

We want to search data which contains test. So, we are making our query 
doc_ref:*test*. I think wildcard query is taking high memory and CPU. Sometimes 
we faced issue that collection goes into recovery mode due to usage of wildcard 
query.
Fo better performance, We have implemented ReversedWildcardFilterFactory: 
https://risdenk.github.io/2018/10/25/apache-solr-leading-wildcard-queries-reversedwildcardfilterfactory.html

As Mikhail indicated, ReversedWildcardFilterFactory is not designed tohelp with this. It is for leading wildcards, and your query has bothleading and trailing wildcards.


Wildcard queries are particularly resource intensive.

Let's say that doc_ref:*test* matches one million different terms in thedoc_ref field. I am not talking about documents, I am talking about terms.

Internally, Solr will do this in two steps: First it will expand thewildcard to retrieve all one million matching terms, and then it willexecute the query, which will literally contain one million terms. Thisis going to consume a lot of CPU and memory.

Will "test" be a distinct word in the doc_ref field, or would you alsoneed it to match a value of abctestxyz? If it's a distinctive word, youmight be better off with a relatively standard analysis chain on afieldType of TextField and no wildcards.


Thanks,
Shawn

Re: Performance issue in Wildcard Query in Solr 8.9.0

Reply via email to