Re: Lucene9.11 has longer query warm up time for vector queries compared to lucene9.7

2024-10-30 Thread Rui Wu
Sorry, the attached image didn't go through. Here is a public google doc containing the image. https://docs.google.com/document/d/1thLMJIiJhbOBrL59R8JDY6iA_rpu7cgWZnrR0Z1xI5I/edit?tab=t.0 On Wed, Oct 30, 2024 at 1:41 PM Rui Wu wrote: > Dear users, > > We recently migrated fro

Lucene9.11 has longer query warm up time for vector queries compared to lucene9.7

2024-10-30 Thread Rui Wu
Dear users, We recently migrated from Lucene9.7 to Lucene9.11. During the migration, we noticed that the Lucene9.11 has a longer warm up time for vector queries. The warm up time means: when the index just finishes building, the query time is high for the first few minutes. The following figure s

Re: How to find RAM/disk usage of each vector field

2024-10-30 Thread Rui Wu
Hi Tanmay, Are you bothered by the .vec files hidden within the compound files? If yes, I have a snippet that can sum up the .vec files inside and outside compound files. https://gist.github.com/wurui90/28de20d46079108d7ae5ed181ba939d4 On Tue, Oct 29, 2024 at 12:08 PM Tanmay Goel wrote: > Hi al

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-10-04 Thread Rui Wu
mes.In my mental model, the 1's query result is identical to 2 and 1 can be optimized to 2. I wonder why doesn't Lucene internal does this optimization?" On Lucene97, both queries invoke the score 1001 times. Thanks! On Fri, Sep 20, 2024 at 11:53 AM Rui Wu wrote: > Hi Adrien

Re: KnnQueries and result discrepancy between indexes with the same data

2024-10-02 Thread Rui Wu
Hi all, We happen to be testing on similar things. Based on our experience: 1) For one index that is not changing anymore: issuing the same queries repeatedly will generate the same results. This is true with concurrent segment search on. But we are not so sure if this still holds after https://g

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-09-20 Thread Rui Wu
: > This suggests that BlockMaxConjunctionBulkScorer has a similar issue, I'll > look into it too. > > On Thu, Sep 19, 2024 at 2:48 AM Rui Wu wrote: > >> Hi Adrien, >> >> Thanks for your help and putting up a fix! >> >> Another experiment I did without

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-09-18 Thread Rui Wu
works? Thanks! On Wed, Sep 18, 2024 at 1:51 AM Adrien Grand wrote: > Thank you, this last comment was helpful and helped me understand the > problem. I opened a PR at https://github.com/apache/lucene/pull/13800. > > On Tue, Sep 17, 2024 at 7:45 PM Rui Wu wrote: > >> Anothe

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-09-17 Thread Rui Wu
clauses, it collects 3.6M results. On Tue, Sep 17, 2024 at 9:09 AM Rui Wu wrote: > This query latency increased from 14.65 to 20.90ms. > > We use the `TopScoreDocCollector.createSharedManager(/*batchSize*/ 101, > /*searchAfterFieldDoc*/ null, /*hitsThreshold*/ 1000); ` > > Thanks

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-09-17 Thread Rui Wu
to take, and how long it > takes now? > Also are you using IndexSearcher's default total hit count threshold of > 1,000, or are you passing a custom value to TopScoreDocCollectorManager? > > On Tue, Sep 17, 2024 at 10:14 AM Rui Wu wrote: > >> Hi Adrien, >> >

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-09-17 Thread Rui Wu
lame > graph, it looks like it may be truncated a the top? > > On Mon, Sep 16, 2024 at 10:01 PM Rui Wu wrote: > >> Correction: The index has 3.6 million documents. >> >> On Mon, Sep 16, 2024 at 1:00 PM Rui Wu wrote: >> >>> Dear experts, >>> &g

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-09-16 Thread Rui Wu
Correction: The index has 3.6 million documents. On Mon, Sep 16, 2024 at 1:00 PM Rui Wu wrote: > Dear experts, > > In our Mongodb Atlas Search performance regression test between Lucene 9.7 > and Lucene 9.11, we detect a 43% latency regression in this query shape: > 12 SHOULD c

MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

2024-09-16 Thread Rui Wu
ime in search() is spent on the MaxScoreBulkScorer class: [image: image.png] We wonder if this extreme test case is expected to be slow on MaxScoreBulkScorer? Thanks a lot! Rui Wu Lead Engineer, MongoDB

Re: Question about the performance of Lucene99PostingsFormat

2024-09-16 Thread Rui Wu
Dear Adrien, We found that the regression of match-all is not caused by the PostingList format, and instead it's caused by MaxScoreBulkScorer class. Let me create a new email thread about it since the tile of this email thread is N/A anymore. On Wed, Sep 11, 2024 at 6:24 PM Rui Wu

Re: Get knowledge about apache lucene index migrate

2024-09-12 Thread Rui Wu
Maybe a silly question: is it feasible (on your scale) to rebuild your index from your source of truth data? Thanks! On Tue, Aug 6, 2024 at 2:11 PM Michael Sokolov wrote: > Yes, there is no support for upgrading a pre-8.x index to 9 or later. > At some point it was decided that supporting that

Re: Question about the performance of Lucene99PostingsFormat

2024-09-11 Thread Rui Wu
the move from PFOR to > FOR significantly increased disk usage (unlike indexes that use > IndexOptions.DOCS_AND_FREQS_AND_POSITIONS where space is typically > dominated by positions anyway). > Got it. Thanks! > > On Tue, Sep 10, 2024 at 9:31 PM Rui Wu wrote: > > > Dear exp

Question about the performance of Lucene99PostingsFormat

2024-09-10 Thread Rui Wu
Dear experts, I have a question about the following change: The Lucene9.11 changed the Posting list format (Lucene GITHUB#12696 : Change Postings back to using FOR in Lucene99PostingsFormat. Freqs, positions and offset keep using PFOR) However, in our