Hi Drini, I would recommend investigating the code a bit, that token filter is meant to flat multiple terms at the same position to make it super simple so It seems suspicious that merging two adjacent tokens putting generated incorrect positions is what happens. Have you checked the positionLength, position attributes of the tokens generated?
Cheers -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Thu, 6 May 2021 at 19:54, Drini Cami <cdr...@gmail.com> wrote: > Hello! I have a question about the text_en_splitting fieldType (solr 8.8.2, > very vanilla schema). I noticed that it was failing for queries like: > `title:"The > Mark of the Crown"`, but succeeding for queries like `title:The Mark of the > Crown`. Using the solr analysis tool, I noticed that the index analyzer > converts "The Mark of the Crown" to `[_, mark, _, crown]`, but the query > analyzer converts it to `[_, mark, _, _, crown]`. I then noticed the index > analyzer has as a final filter FlattenGraphFilterFactory, which seems to > combine adjacent `_`. I tried also adding FlattenGraphFilterFactory to the > query analyzer and that fixed the issue. Is this a reasonable solution? If > so, should that be the default? Or am I using the wrong fieldType > altogether? > > Thank you, > > Drini >