Re: Correct usage of synonyms with Japanese

2021-05-19 Thread Geoffrey Lawson
Thanks for the background Mike! I am using the kuromoji tokenizer. Using discardCompoundToken is a good point. I had not considered that. For fixing the issue I've created a Jira ticket for it here: https://issues.apache.org/jira/browse/LUCENE-9966. geoff On Tue, May 18, 2021 at 11:07 PM Michae

Re: Lucene/Solr and BERT

2021-05-19 Thread Michael Wechner
Hi Alex Just to make sure I understand better what the additions are about Am 21.04.21 um 17:21 schrieb Alex K: There were a couple additions recently merged into lucene but not yet released: - A first-class vector codec do you mean the classes inside https://github.com/apache/lucene/tree/ma

RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
Hi again, I found the difference causing the slow down. It's NRTCachingDirectory#doCacheWrite method. With the implementation of 8.8 it's slow. With the version of 8.3 it's fast. Hope it helps, Markus -Original Message- From: Gietzen, Markus Sent: Wednesday, 19 May 2021 13:55 T

Re: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Adrien Grand
LUCENE-9115 certainly creates more files in the FSDirectory than in the ByteBuffersDirectory, e.g. stored fields are now always flushed to the FSDirectory since their size can't be known in advance, while they were always written to the ByteBuffersDirectory before (which was a big since these files

RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
Hi, thanks for reaching me that fast! Your hint that there were changes to NRTCachingDirectory were the right point: I copied the 8.3 NRTCachingDirectory implementation into the project (with a different classname, you get the idea😉 ) and used that one. And believe it or not: everything is fin

Re: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Michael McCandless
> The update showed no issues (e.g. compiled without changes) but I noticed that our test-suites take a lot longer to finish. Hmm, that sounds bad. We need our tests to stay fast but also do a good job testing things ;) Does your production usage also slow down? Tests do other interesting thing

Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
Hello, recently I updated the Lucene version in one of our products from 8.3 to 8.8.x (8.8.2 as of now). The update showed no issues (e.g. compiled without changes) but I noticed that our test-suites take a lot longer to finish. So I took a closer look at one test-case which showed a severe slo