Hello Vishal,
I've done some research earlier https://www.youtube.com/watch?v=FQPKAmh0s_I
but haven't got an elegant solution for this problem.

ReverseWildcard hardly helps here, but just blows up an index size, so drop
it first. NGram Tokenizer blows ups indexes much more.

Then, make an experiment reducing the number of segments, via <optimize>
and requesting fewer segments as possible.
The reason behind it is that segmentation repeats almost the same terms in
every segment.

Second, try the following range query [0 TO z] - it should hit many terms
and almost all docs.
It gives you the estimate for a heavy wildcard expansion query. I suppose
the wildcard query will run somewhat about runtime of that range query.
If the range is running slow you can only add hardware and slice more
shards (but it hardly scales linearly).

Another measure, which is worth taking is to limit Solr heap leaving enough
RAM to mmap index files.

On Mon, Oct 23, 2023 at 2:00 PM Vishal Patel <vishalpatel199...@outlook.com>
wrote:

> We are using Solr 8.9.0. We have configured Solr cloud like 2 shards and
> each shard has one replica. We have used 5 zoo keepers for Solr cloud.
>
>  We have created collection name documents and index size of one shard is
> 21GB. Schema fields like here
> <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" omitNorms="true" termVectors="false"
> termPositions="false" termOffsets="false" docValues="true"/>
> <field name="doc_ref" type="text_string" indexed="true" stored="true"
> multiValued="false" omitNorms="true" termVectors="false"
> termPositions="false" termOffsets="false" omitTermFreqAndPositions="true"/>
> <fieldtype name="text_string" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldtype>
>
>
>
> We want to search data which contains test. So, we are making our query
> doc_ref:*test*. I think wildcard query is taking high memory and CPU.
> Sometimes we faced issue that collection goes into recovery mode due to
> usage of wildcard query.
> Fo better performance, We have implemented ReversedWildcardFilterFactory:
> https://risdenk.github.io/2018/10/25/apache-solr-leading-wildcard-queries-reversedwildcardfilterfactory.html
>
> How can we search after the applying ReversedWildcardFilterFactory? We are
> not getting benefits in term of query execution time if we search in same
> manner doc_ref_rev:*test*
>
> Can you please suggest best approach when we want to search wildcard
> string(*test*) when index size is large?
>
> Regards,
>
> Vishal
>
>

-- 
Sincerely yours
Mikhail Khludnev

Reply via email to