On 10/23/23 05:00, Vishal Patel wrote:
We want to search data which contains test. So, we are making our query
doc_ref:*test*. I think wildcard query is taking high memory and CPU. Sometimes
we faced issue that collection goes into recovery mode due to usage of wildcard
query.
Fo better performance, We have implemented ReversedWildcardFilterFactory:
https://risdenk.github.io/2018/10/25/apache-solr-leading-wildcard-queries-reversedwildcardfilterfactory.html
As Mikhail indicated, ReversedWildcardFilterFactory is not designed to
help with this. It is for leading wildcards, and your query has both
leading and trailing wildcards.
Wildcard queries are particularly resource intensive.
Let's say that doc_ref:*test* matches one million different terms in the
doc_ref field. I am not talking about documents, I am talking about terms.
Internally, Solr will do this in two steps: First it will expand the
wildcard to retrieve all one million matching terms, and then it will
execute the query, which will literally contain one million terms. This
is going to consume a lot of CPU and memory.
Will "test" be a distinct word in the doc_ref field, or would you also
need it to match a value of abctestxyz? If it's a distinctive word, you
might be better off with a relatively standard analysis chain on a
fieldType of TextField and no wildcards.
Thanks,
Shawn