If anyone's interested, I've submitted https://github.com/apache/lucene/issues/12080
I found a small change in code that seem to fix the problem.
Thank you Dave for the feedback!

W dniu 11.01.2023 o 15:17, Dave pisze:
On one hand that’s great news, on the other ot probably deserves a ticket but 
you need to have a very specific scenario where your index filters don’t match 
your query filters.

Also maybe spend some time putting together a reindexing plan.  Solr can use 
multiple cores so you can index content simultaneously if it’s split up rather 
than a single indexing process. In Perl you can use forking via the process 
manager cpan module, most other languages do it as well (but not as well imo)



On Jan 11, 2023, at 8:47 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote:

After reindexing with SGF the document matches, as expected.

Still, it looks like SGF was designed to work well when used only in query, and 
it's just a bug revealed by an edge case. Shall I submit an issue to 
https://github.com/apache/lucene ?

W dniu 11.01.2023 o 13:09, Dave pisze:
Yes then that is a problem, and I agree it should be intuitive that the quotes 
work without the modifier.  I’m not familiar with the underlying code enough to 
know for sure what’s going on in this instance, but reinfecting the content 
with the filter I wonder would fix it? You can experiment with just that one 
document and see.

Otherwise reindexing your content from scratch should have a plan, as 
upgrades/new filters to use become necessary.  It’s definitely inconvenient but 
sometimes you got to do what you got to do, so better to be ready for it since 
a search index should always be considered temporary and replaceable, it’s not 
a database, it’s a search tool to search a data set, and if done with that in 
mind you put the index on replaceable hardware and expect/have a plan for them 
to simply die and be replaced

On Jan 11, 2023, at 6:27 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote:
W dniu 11.01.2023 o 12:04, Dave pisze:
Hmm. As an experiment what happens when you use a range of three or four with 
the quotes using the tilda in the query?
You mean query like "test polskie"~1 ? Yes, it does match.

Unfortunately it's not a workaround I can use because the query is provided by 
the users. It's quite intuitive for them to use quotes, but not necessarily 
tildas. And if I added it artificially, it's a bit different query, may not 
always be what the user wants.

Also generally o find it best to use the same filters for both indexing and 
query, just a personal preference, I know it’s not always possible however.
The problem here is that I'd need to reindex documents when synonyms 
definitions change, which is quite inconvenient.
It should solve the problem if SGF did not increase the positions. Am I correct 
to assume it's not the correct behavior and should be fixed? It doesn't do that 
when there's only one token on the position it modifies, for example:

test(1) polski(2) -> test(1) pol(2) polski(2)

Then the document does match.


Attachment: smime.p7s
Description: Kryptograficzna sygnatura S/MIME

Reply via email to