Re: text_en_splitting with quotes not matching when there are 2 adjacent stopwords

Alessandro Benedetti Tue, 11 May 2021 04:39:25 -0700

Hi Drini,
I would recommend investigating the code a bit, that token filter is meant
to flat multiple terms at the same position to make it super simple so It
seems suspicious that merging two adjacent tokens putting generated
incorrect positions is what happens.
Have you checked the positionLength, position attributes of the tokens
generated?


Cheers
--------------------------
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Thu, 6 May 2021 at 19:54, Drini Cami <cdr...@gmail.com> wrote:

> Hello! I have a question about the text_en_splitting fieldType (solr 8.8.2,
> very vanilla schema). I noticed that it was failing for queries like:
> `title:"The
> Mark of the Crown"`, but succeeding for queries like `title:The Mark of the
> Crown`. Using the solr analysis tool, I noticed that the index analyzer
> converts "The Mark of the Crown" to `[_, mark, _, crown]`, but the query
> analyzer converts it to `[_, mark, _, _, crown]`. I then noticed the index
> analyzer has as a final filter FlattenGraphFilterFactory, which seems to
> combine adjacent `_`. I tried also adding FlattenGraphFilterFactory to the
> query analyzer and that fixed the issue. Is this a reasonable solution? If
> so, should that be the default? Or am I using the wrong fieldType
> altogether?
>
> Thank you,
>
> Drini
>

Re: text_en_splitting with quotes not matching when there are 2 adjacent stopwords

Reply via email to