That’s awesome you found it! And of course anytime.  But again the idea of 
having a complete reindex plan ready would be wise in my opinion. Just 
something that makes you feel a tad safer when the s and the fan hit each 
other.  I’ve had to rebuild well over a terabyte of a solr index in less than a 
couole weeks and the stress the first time was enough to make sure I was ready 
for when I needed to do it again, which of course, I did 

> On Jan 12, 2023, at 10:02 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote:
> 
> If anyone's interested, I've submitted 
> https://github.com/apache/lucene/issues/12080
> I found a small change in code that seem to fix the problem.
> Thank you Dave for the feedback!
> 
> W dniu 11.01.2023 o 15:17, Dave pisze:
>> On one hand that’s great news, on the other ot probably deserves a ticket 
>> but you need to have a very specific scenario where your index filters don’t 
>> match your query filters.
>> 
>> Also maybe spend some time putting together a reindexing plan.  Solr can use 
>> multiple cores so you can index content simultaneously if it’s split up 
>> rather than a single indexing process. In Perl you can use forking via the 
>> process manager cpan module, most other languages do it as well (but not as 
>> well imo)
>> 
>> 
>> 
>>>> On Jan 11, 2023, at 8:47 AM, Mateusz Matela <mmat...@man.poznan.pl> wrote:
>>> 
>>> After reindexing with SGF the document matches, as expected.
>>> 
>>> Still, it looks like SGF was designed to work well when used only in query, 
>>> and it's just a bug revealed by an edge case. Shall I submit an issue to 
>>> https://github.com/apache/lucene ?
>>> 
>>> W dniu 11.01.2023 o 13:09, Dave pisze:
>>>> Yes then that is a problem, and I agree it should be intuitive that the 
>>>> quotes work without the modifier.  I’m not familiar with the underlying 
>>>> code enough to know for sure what’s going on in this instance, but 
>>>> reinfecting the content with the filter I wonder would fix it? You can 
>>>> experiment with just that one document and see.
>>>> 
>>>> Otherwise reindexing your content from scratch should have a plan, as 
>>>> upgrades/new filters to use become necessary.  It’s definitely 
>>>> inconvenient but sometimes you got to do what you got to do, so better to 
>>>> be ready for it since a search index should always be considered temporary 
>>>> and replaceable, it’s not a database, it’s a search tool to search a data 
>>>> set, and if done with that in mind you put the index on replaceable 
>>>> hardware and expect/have a plan for them to simply die and be replaced
>>>> 
>>>>>> On Jan 11, 2023, at 6:27 AM, Mateusz Matela <mmat...@man.poznan.pl> 
>>>>>> wrote:
>>>>> W dniu 11.01.2023 o 12:04, Dave pisze:
>>>>>> Hmm. As an experiment what happens when you use a range of three or four 
>>>>>> with the quotes using the tilda in the query?
>>>>> You mean query like "test polskie"~1 ? Yes, it does match.
>>>>> 
>>>>> Unfortunately it's not a workaround I can use because the query is 
>>>>> provided by the users. It's quite intuitive for them to use quotes, but 
>>>>> not necessarily tildas. And if I added it artificially, it's a bit 
>>>>> different query, may not always be what the user wants.
>>>>> 
>>>>>> Also generally o find it best to use the same filters for both indexing 
>>>>>> and query, just a personal preference, I know it’s not always possible 
>>>>>> however.
>>>>> The problem here is that I'd need to reindex documents when synonyms 
>>>>> definitions change, which is quite inconvenient.
>>>>> It should solve the problem if SGF did not increase the positions. Am I 
>>>>> correct to assume it's not the correct behavior and should be fixed? It 
>>>>> doesn't do that when there's only one token on the position it modifies, 
>>>>> for example:
>>>>> 
>>>>> test(1) polski(2) -> test(1) pol(2) polski(2)
>>>>> 
>>>>> Then the document does match.
>>>>> 
> 

Reply via email to