After reindexing with SGF the document matches, as expected.
Still, it looks like SGF was designed to work well when used only in query, and it's just a bug revealed by an edge case. Shall I submit an issue to https://github.com/apache/lucene ?
W dniu 11.01.2023 o 13:09, Dave pisze:
Yes then that is a problem, and I agree it should be intuitive that the quotes work without the modifier. I’m not familiar with the underlying code enough to know for sure what’s going on in this instance, but reinfecting the content with the filter I wonder would fix it? You can experiment with just that one document and see. Otherwise reindexing your content from scratch should have a plan, as upgrades/new filters to use become necessary. It’s definitely inconvenient but sometimes you got to do what you got to do, so better to be ready for it since a search index should always be considered temporary and replaceable, it’s not a database, it’s a search tool to search a data set, and if done with that in mind you put the index on replaceable hardware and expect/have a plan for them to simply die and be replacedOn Jan 11, 2023, at 6:27 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote: W dniu 11.01.2023 o 12:04, Dave pisze:Hmm. As an experiment what happens when you use a range of three or four with the quotes using the tilda in the query?You mean query like "test polskie"~1 ? Yes, it does match. Unfortunately it's not a workaround I can use because the query is provided by the users. It's quite intuitive for them to use quotes, but not necessarily tildas. And if I added it artificially, it's a bit different query, may not always be what the user wants.Also generally o find it best to use the same filters for both indexing and query, just a personal preference, I know it’s not always possible however.The problem here is that I'd need to reindex documents when synonyms definitions change, which is quite inconvenient. It should solve the problem if SGF did not increase the positions. Am I correct to assume it's not the correct behavior and should be fixed? It doesn't do that when there's only one token on the position it modifies, for example: test(1) polski(2) -> test(1) pol(2) polski(2) Then the document does match.
smime.p7s
Description: Kryptograficzna sygnatura S/MIME