Re: text_en_splitting with quotes not matching when there are 2 adjacent stopwords

2021-05-18 Thread Alessandro Benedetti
Hi Drini, from the analysis admin pag you shared it seems un-correct to me. I would investigate a bit further, reproduce it via tests and check the FlattenGraph Token FIlter code! Cheers -- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, S

Re: text_en_splitting with quotes not matching when there are 2 adjacent stopwords

2021-05-17 Thread Drini Cami
Hi Alessandro, Which code do you recommend I look into? The Solr's FlattenGraphFilterFactoryor or a setting in my Solr schema? This is the final result of the default index analyzer and query analyzer for "the mark of the crown" with position data: Index: `[{ text: "mark", start: 4, end: 8, posi

Re: text_en_splitting with quotes not matching when there are 2 adjacent stopwords

2021-05-11 Thread Alessandro Benedetti
Hi Drini, I would recommend investigating the code a bit, that token filter is meant to flat multiple terms at the same position to make it super simple so It seems suspicious that merging two adjacent tokens putting generated incorrect positions is what happens. Have you checked the positionLength

text_en_splitting with quotes not matching when there are 2 adjacent stopwords

2021-05-06 Thread Drini Cami
Hello! I have a question about the text_en_splitting fieldType (solr 8.8.2, very vanilla schema). I noticed that it was failing for queries like: `title:"The Mark of the Crown"`, but succeeding for queries like `title:The Mark of the Crown`. Using the solr analysis tool, I noticed that the index an