Hi Alessandro, Which code do you recommend I look into? The Solr's FlattenGraphFilterFactoryor or a setting in my Solr schema?
This is the final result of the default index analyzer and query analyzer for "the mark of the crown" with position data: Index: `[{ text: "mark", start: 4, end: 8, positionLength: 1, *position: 2*, ... }, { text: "crown", start: 16, end: 21, positionLength: 1, *position: 4*, ... }]` Query: ` [{ text: "mark", start: 4, end: 8, positionLength: 1, *position: 2*, ... }, { text: "crown", start: 16, end: 21, positionLength: 1, *position: 5*, ... }]` Here's a screenshot of the full verbose analyzer tool output: https://user-images.githubusercontent.com/6251786/118551202-a4b90e00-b72b-11eb-92f1-4d4b13828d83.png The only difference is that crown is `position: 5` in the Query Analyzer. And in the Index Analyzer, it was set to `position: 4` after passing through the FlattenGraphFilter. Do you think this might then in fact be a potential bug with the FlattenGraphFilter? Or does this look like expected behaviour? Thank you, Drini On 2021/05/11 11:38:57, Alessandro Benedetti <a...@sease.io> wrote: > Hi Drini,> > I would recommend investigating the code a bit, that token filter is meant> > to flat multiple terms at the same position to make it super simple so It> > seems suspicious that merging two adjacent tokens putting generated> > incorrect positions is what happens.> > Have you checked the positionLength, position attributes of the tokens> > generated?> > > Cheers> > --------------------------> > Alessandro Benedetti> > Apache Lucene/Solr Committer> > Director, R&D Software Engineer, Search Consultant> > > www.sease.io> > > > On Thu, 6 May 2021 at 19:54, Drini Cami <cd...@gmail.com> wrote:> > > > Hello! I have a question about the text_en_splitting fieldType (solr 8.8.2,> > > very vanilla schema). I noticed that it was failing for queries like:> > > `title:"The> > > Mark of the Crown"`, but succeeding for queries like `title:The Mark of the> > > Crown`. Using the solr analysis tool, I noticed that the index analyzer> > > converts "The Mark of the Crown" to `[_, mark, _, crown]`, but the query> > > analyzer converts it to `[_, mark, _, _, crown]`. I then noticed the index> > > analyzer has as a final filter FlattenGraphFilterFactory, which seems to> > > combine adjacent `_`. I tried also adding FlattenGraphFilterFactory to the> > > query analyzer and that fixed the issue. Is this a reasonable solution? If> > > so, should that be the default? Or am I using the wrong fieldType> > > altogether?> > >> > > Thank you,> > >> > > Drini> > >> >