Trading high latency for better throughput

2024-06-12 Thread Gautam Worah
Hi folks, I was wondering if people had thought about this problem before. If in a search system, you have clients that can tolerate high latency, is there a way to increase their latency and improve overall system throughput, where there may be other clients that demand reasonable latency and hig

scalar quantization heap usage during merge

2024-06-12 Thread Michael Sokolov
Hi folks. I've been experimenting with our new scalar quantization support - yay, thanks for adding it! I'm finding that when I index a large number of large vectors, enabling quantization (vs simply indexing the full-width floats) requires more heap - I keep getting OOMs and have to increase heap

Re: scalar quantization heap usage during merge

2024-06-12 Thread Benjamin Trent
Heya Michael, > the first one I traced was referenced by vector writers involved in a merge > (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected? Yes, that is holding the raw floats before flush. You should see nearly the exact same overhead there as you would indexing raw vector

Re: scalar quantization heap usage during merge

2024-06-12 Thread Michael Sokolov
Empirically I thought I saw the need to increase JVM heap with this, but let me do some more testing to narrow down what is going on. It's possible the same heap requirements exist for the non-quantized case and I am just seeing some random vagary of the merge process happening to tip over a limit

Re: scalar quantization heap usage during merge

2024-06-12 Thread Benjamin Trent
Michael, Empirically, I am not surprised there is an increase in heap usage. We do have extra overhead with the scalar quantization on flush. There may also be some additional heap usage on merge. I just don't think it is via: Lucene99FlatVectorsWriter On Wed, Jun 12, 2024 at 11:55 AM Michael So