After reindexing with SGF the document matches, as expected.

Still, it looks like SGF was designed to work well when used only in query, and it's just a bug revealed by an edge case. Shall I submit an issue to https://github.com/apache/lucene ?

W dniu 11.01.2023 o 13:09, Dave pisze:
Yes then that is a problem, and I agree it should be intuitive that the quotes 
work without the modifier.  I’m not familiar with the underlying code enough to 
know for sure what’s going on in this instance, but reinfecting the content 
with the filter I wonder would fix it? You can experiment with just that one 
document and see.

Otherwise reindexing your content from scratch should have a plan, as 
upgrades/new filters to use become necessary.  It’s definitely inconvenient but 
sometimes you got to do what you got to do, so better to be ready for it since 
a search index should always be considered temporary and replaceable, it’s not 
a database, it’s a search tool to search a data set, and if done with that in 
mind you put the index on replaceable hardware and expect/have a plan for them 
to simply die and be replaced

On Jan 11, 2023, at 6:27 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote:

W dniu 11.01.2023 o 12:04, Dave pisze:
Hmm. As an experiment what happens when you use a range of three or four with 
the quotes using the tilda in the query?
You mean query like "test polskie"~1 ? Yes, it does match.

Unfortunately it's not a workaround I can use because the query is provided by 
the users. It's quite intuitive for them to use quotes, but not necessarily 
tildas. And if I added it artificially, it's a bit different query, may not 
always be what the user wants.

Also generally o find it best to use the same filters for both indexing and 
query, just a personal preference, I know it’s not always possible however.
The problem here is that I'd need to reindex documents when synonyms 
definitions change, which is quite inconvenient.
It should solve the problem if SGF did not increase the positions. Am I correct 
to assume it's not the correct behavior and should be fixed? It doesn't do that 
when there's only one token on the position it modifies, for example:

test(1) polski(2) -> test(1) pol(2) polski(2)

Then the document does match.


Attachment: smime.p7s
Description: Kryptograficzna sygnatura S/MIME

Reply via email to