Hello Vishal, I've done some research earlier https://www.youtube.com/watch?v=FQPKAmh0s_I but haven't got an elegant solution for this problem.
ReverseWildcard hardly helps here, but just blows up an index size, so drop it first. NGram Tokenizer blows ups indexes much more. Then, make an experiment reducing the number of segments, via <optimize> and requesting fewer segments as possible. The reason behind it is that segmentation repeats almost the same terms in every segment. Second, try the following range query [0 TO z] - it should hit many terms and almost all docs. It gives you the estimate for a heavy wildcard expansion query. I suppose the wildcard query will run somewhat about runtime of that range query. If the range is running slow you can only add hardware and slice more shards (but it hardly scales linearly). Another measure, which is worth taking is to limit Solr heap leaving enough RAM to mmap index files. On Mon, Oct 23, 2023 at 2:00 PM Vishal Patel <vishalpatel199...@outlook.com> wrote: > We are using Solr 8.9.0. We have configured Solr cloud like 2 shards and > each shard has one replica. We have used 5 zoo keepers for Solr cloud. > > We have created collection name documents and index size of one shard is > 21GB. Schema fields like here > <field name="id" type="string" indexed="true" stored="true" > required="true" multiValued="false" omitNorms="true" termVectors="false" > termPositions="false" termOffsets="false" docValues="true"/> > <field name="doc_ref" type="text_string" indexed="true" stored="true" > multiValued="false" omitNorms="true" termVectors="false" > termPositions="false" termOffsets="false" omitTermFreqAndPositions="true"/> > <fieldtype name="text_string" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldtype> > > > > We want to search data which contains test. So, we are making our query > doc_ref:*test*. I think wildcard query is taking high memory and CPU. > Sometimes we faced issue that collection goes into recovery mode due to > usage of wildcard query. > Fo better performance, We have implemented ReversedWildcardFilterFactory: > https://risdenk.github.io/2018/10/25/apache-solr-leading-wildcard-queries-reversedwildcardfilterfactory.html > > How can we search after the applying ReversedWildcardFilterFactory? We are > not getting benefits in term of query execution time if we search in same > manner doc_ref_rev:*test* > > Can you please suggest best approach when we want to search wildcard > string(*test*) when index size is large? > > Regards, > > Vishal > > -- Sincerely yours Mikhail Khludnev