That’s awesome you found it! And of course anytime. But again the idea of having a complete reindex plan ready would be wise in my opinion. Just something that makes you feel a tad safer when the s and the fan hit each other. I’ve had to rebuild well over a terabyte of a solr index in less than a couole weeks and the stress the first time was enough to make sure I was ready for when I needed to do it again, which of course, I did
> On Jan 12, 2023, at 10:02 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote: > > If anyone's interested, I've submitted > https://github.com/apache/lucene/issues/12080 > I found a small change in code that seem to fix the problem. > Thank you Dave for the feedback! > > W dniu 11.01.2023 o 15:17, Dave pisze: >> On one hand that’s great news, on the other ot probably deserves a ticket >> but you need to have a very specific scenario where your index filters don’t >> match your query filters. >> >> Also maybe spend some time putting together a reindexing plan. Solr can use >> multiple cores so you can index content simultaneously if it’s split up >> rather than a single indexing process. In Perl you can use forking via the >> process manager cpan module, most other languages do it as well (but not as >> well imo) >> >> >> >>>> On Jan 11, 2023, at 8:47 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote: >>> >>> After reindexing with SGF the document matches, as expected. >>> >>> Still, it looks like SGF was designed to work well when used only in query, >>> and it's just a bug revealed by an edge case. Shall I submit an issue to >>> https://github.com/apache/lucene ? >>> >>> W dniu 11.01.2023 o 13:09, Dave pisze: >>>> Yes then that is a problem, and I agree it should be intuitive that the >>>> quotes work without the modifier. I’m not familiar with the underlying >>>> code enough to know for sure what’s going on in this instance, but >>>> reinfecting the content with the filter I wonder would fix it? You can >>>> experiment with just that one document and see. >>>> >>>> Otherwise reindexing your content from scratch should have a plan, as >>>> upgrades/new filters to use become necessary. It’s definitely >>>> inconvenient but sometimes you got to do what you got to do, so better to >>>> be ready for it since a search index should always be considered temporary >>>> and replaceable, it’s not a database, it’s a search tool to search a data >>>> set, and if done with that in mind you put the index on replaceable >>>> hardware and expect/have a plan for them to simply die and be replaced >>>> >>>>>> On Jan 11, 2023, at 6:27 AM, Mateusz Matela <mmat...@man.poznan.pl> >>>>>> wrote: >>>>> W dniu 11.01.2023 o 12:04, Dave pisze: >>>>>> Hmm. As an experiment what happens when you use a range of three or four >>>>>> with the quotes using the tilda in the query? >>>>> You mean query like "test polskie"~1 ? Yes, it does match. >>>>> >>>>> Unfortunately it's not a workaround I can use because the query is >>>>> provided by the users. It's quite intuitive for them to use quotes, but >>>>> not necessarily tildas. And if I added it artificially, it's a bit >>>>> different query, may not always be what the user wants. >>>>> >>>>>> Also generally o find it best to use the same filters for both indexing >>>>>> and query, just a personal preference, I know it’s not always possible >>>>>> however. >>>>> The problem here is that I'd need to reindex documents when synonyms >>>>> definitions change, which is quite inconvenient. >>>>> It should solve the problem if SGF did not increase the positions. Am I >>>>> correct to assume it's not the correct behavior and should be fixed? It >>>>> doesn't do that when there's only one token on the position it modifies, >>>>> for example: >>>>> >>>>> test(1) polski(2) -> test(1) pol(2) polski(2) >>>>> >>>>> Then the document does match. >>>>> >