I found a small change in code that seem to fix the problem. Thank you Dave for the feedback!
W dniu 11.01.2023 o 15:17, Dave pisze:
On one hand that’s great news, on the other ot probably deserves a ticket but you need to have a very specific scenario where your index filters don’t match your query filters. Also maybe spend some time putting together a reindexing plan. Solr can use multiple cores so you can index content simultaneously if it’s split up rather than a single indexing process. In Perl you can use forking via the process manager cpan module, most other languages do it as well (but not as well imo)On Jan 11, 2023, at 8:47 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote: After reindexing with SGF the document matches, as expected. Still, it looks like SGF was designed to work well when used only in query, and it's just a bug revealed by an edge case. Shall I submit an issue to https://github.com/apache/lucene ? W dniu 11.01.2023 o 13:09, Dave pisze:Yes then that is a problem, and I agree it should be intuitive that the quotes work without the modifier. I’m not familiar with the underlying code enough to know for sure what’s going on in this instance, but reinfecting the content with the filter I wonder would fix it? You can experiment with just that one document and see. Otherwise reindexing your content from scratch should have a plan, as upgrades/new filters to use become necessary. It’s definitely inconvenient but sometimes you got to do what you got to do, so better to be ready for it since a search index should always be considered temporary and replaceable, it’s not a database, it’s a search tool to search a data set, and if done with that in mind you put the index on replaceable hardware and expect/have a plan for them to simply die and be replacedOn Jan 11, 2023, at 6:27 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote:W dniu 11.01.2023 o 12:04, Dave pisze:Hmm. As an experiment what happens when you use a range of three or four with the quotes using the tilda in the query?You mean query like "test polskie"~1 ? Yes, it does match. Unfortunately it's not a workaround I can use because the query is provided by the users. It's quite intuitive for them to use quotes, but not necessarily tildas. And if I added it artificially, it's a bit different query, may not always be what the user wants.Also generally o find it best to use the same filters for both indexing and query, just a personal preference, I know it’s not always possible however.The problem here is that I'd need to reindex documents when synonyms definitions change, which is quite inconvenient. It should solve the problem if SGF did not increase the positions. Am I correct to assume it's not the correct behavior and should be fixed? It doesn't do that when there's only one token on the position it modifies, for example: test(1) polski(2) -> test(1) pol(2) polski(2) Then the document does match.
smime.p7s
Description: Kryptograficzna sygnatura S/MIME