Could this be related ? https://solr.apache.org/guide/6_6/filter-descriptions.html#FilterDescriptions-WordDelimiterGraphFilter
"If you use this filter during indexing, you must follow it with a Flatten Graph Filter to squash tokens on top of one another like the Word Delimiter Filter, because the indexer can’t directly consume a graph. To get fully correct positional queries when tokens are split, you should instead use this filter at query time." -----Original Message----- From: Michael Gibney <mich...@michaelgibney.net> Sent: Wednesday, November 17, 2021 12:07 PM To: users@solr.apache.org Subject: Re: Solr limit in words search - take 2 This is not the most thorough answer, but hopefully gets you headed in the right direction: Very strange things can happen when your index-time analysis chain generates "graph" token-streams (as yours does). A couple of things you could try: 1. experiment with setting `enableGraphQueries=false` on the fieldtype 2. upgrading to solr >=8.1 may address your issue partially, via LUCENE-8730 -- here I go out on a limb in guessing that you're not _already_ on 8.1+ :-) 3. increase the phrase slop param, to be more lenient in matching "phrases". (as I say this I'm not sure it would actually help your case, because you're dealing with explicit phrases, and iirc phrase slop may only configure _implicit_ ("pf") phrase searches?) The _best_ approach would be to configure your index-time analysis chain(s) so that they don't have multi-term "expand" synonyms, and WDGF either only splits ("generate*Parts", etc.) or only catenates ("catenate*", "preserveOriginal"). One approach that can work is to index into two fields, each with a dedicated index-time analysis type (split or catenate). Some relevant issues: https://issues.apache.org/jira/browse/LUCENE-7398 https://issues.apache.org/jira/browse/LUCENE-4312 Michael On Wed, Nov 17, 2021 at 11:18 AM Scott <qm...@top-consulting.net> wrote: > My apologies for the previous e-mail…should have never sent that as > html > > I am facing a weird issue, possibly caused by my config. > > I have indexed a document which has a field called subject, subject is > defined as: > > <field name="subject" type="partial_text_general"/> > > <fieldType name="partial_text_general" class="solr.TextField" > positionIncrementGap="100" multiValued="true"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="0" splitOnCaseChange="1" > catenateWords="1" catenateNumbers="1" preserveOriginal="1" > splitOnNumerics="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="45" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="0" splitOnCaseChange="1" > catenateWords="1" catenateNumbers="1" splitOnNumerics="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > </analyzer> > </fieldType> > > I have a document with subject field: <str>cobrancas E-mail marketing > em dezembro, 2020 - referente ao uso de novembro</str> > > If I search for <str name="q">subject:"cobrancas e-mail"</str> then it > finds the document, but if I search for <str > name="q">subject:"cobrancas e-mail marketing"</str> I have no match. > > Why would this happen ? > > Thank you! > > >