Thanks for the PR Ben. I'll try to take a look in the next couple of days.
On leave for now..
I got the setup working yesterday, and thought of sharing some learnings.
I changed the LiveIndexWriterConfig#ramBufferSizeMB to 2048 and that made
things work.
I was even able to keep merging on, and was
Hey Gautam & Michael,
I opened a PR that will help slightly. It should reduce the heap usage
by a smallish factor. But, I would still expect the cost to be
dominated by the `float[]` vectors held in memory before flush.
https://github.com/apache/lucene/pull/13538
The other main overhead is the c
Hi Ben,
I am working on something very close to what Michael Sokolov has done.
I see OOMs on the Writer when it tries to index 130M 8 bit / 4 bit
quantized vectors on a single big box with a 40 GB heap, with HNSW disabled.
I've tried indexing all the vectors as plain vectors converted to floats
co
Michael,
Empirically, I am not surprised there is an increase in heap usage. We
do have extra overhead with the scalar quantization on flush. There
may also be some additional heap usage on merge.
I just don't think it is via: Lucene99FlatVectorsWriter
On Wed, Jun 12, 2024 at 11:55 AM Michael So
Empirically I thought I saw the need to increase JVM heap with this,
but let me do some more testing to narrow down what is going on. It's
possible the same heap requirements exist for the non-quantized case
and I am just seeing some random vagary of the merge process happening
to tip over a limit
Heya Michael,
> the first one I traced was referenced by vector writers involved in a merge
> (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected?
Yes, that is holding the raw floats before flush. You should see
nearly the exact same overhead there as you would indexing raw
vector
Hi folks. I've been experimenting with our new scalar quantization
support - yay, thanks for adding it! I'm finding that when I index a
large number of large vectors, enabling quantization (vs simply
indexing the full-width floats) requires more heap - I keep getting
OOMs and have to increase heap