WDGF with both "generate*Parts"/"splitOn" _and_ "catenate*"/"perserveOriginal" generates a graph TokenStream structure that relies on PositionLengthAttribute to accurately reflect the graph structure. Because Lucene does not index PositionLengthAttribute, this information is lost when WDGF is used at index-time (resulting in the kind of strange searching behavior you're observing). As a workaround, I would recommend indexing into (and searching against) two fields: one with index-time WDGF applying only "split"-type manipulations, one with index-time WDGF applying only "catenate"-style operations. Another alternative (making different compromises) would be to increase query-time `ps` (phrase slop) to a value large enough to accommodate the "graph edges" omitted from the Lucene index.
(Note that if you have multi-term synonyms at index-time, analogous issues apply). Some further relevant issues/blog posts: https://issues.apache.org/jira/browse/LUCENE-4312 https://issues.apache.org/jira/browse/LUCENE-7398 https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/ https://lucidworks.com/post/multi-word-synonyms-solr-adds-query-time-support/ http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html https://www.elastic.co/blog/multitoken-synonyms-and-graph-queries-in-elasticsearch https://michaelgibney.net/lucene/graph/ On Wed, Apr 21, 2021 at 7:34 PM gnandre <arnoldbron...@gmail.com> wrote: > Hi, > > I have a field value as bim.ClassUnderlying and a search query as > classunderlying does not return any results. If I search for > classUnderlying, it works.What can I change so that it works for > classunderlying query too? If I change splitOnCaseChange value from 1 to 0 > in index time analyzer chain, then it works but I don't want to do it > because I want to extract class and underlying tokens too from > classUnderlying word. > > Following is my field type definition. > <fieldType name="text_en" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > protected="protect.txt" > preserveOriginal="1" generateWordParts="1" generateNumberParts="1" > catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" > /> <filter class="solr.FlattenGraphFilterFactory"/> <filter class= > "solr.LowerCaseFilterFactory"/> <filter class= > "solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/> <filter > class="solr.SynonymGraphFilterFactory" synonyms="synonyms_en.txt" > ignoreCase > ="true" expand="true"/> <filter class="solr.FlattenGraphFilterFactory"/> < > filter class="solr.KStemFilterFactory"/> <filter class= > "solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type= > "query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter > class= > "solr.WordDelimiterGraphFilterFactory" protected="protect.txt" > preserveOriginal="1" generateWordParts="1" generateNumberParts="1" > catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" > /> <filter class="solr.LowerCaseFilterFactory"/> <filter class= > "solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/> <filter > class="solr.SynonymGraphFilterFactory" synonyms="synonyms_en_query.txt" > ignoreCase="true" expand="true"/> <filter class="solr.KStemFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </ > fieldType> >