Re: scalar quantization heap usage during merge

Benjamin Trent Wed, 03 Jul 2024 08:04:26 -0700

Hey Gautam & Michael,

I opened a PR that will help slightly. It should reduce the heap usage
by a smallish factor. But, I would still expect the cost to be
dominated by the `float[]` vectors held in memory before flush.


https://github.com/apache/lucene/pull/13538

The other main overhead is the creation of the ScalarQuantizer. Since
it requires sorting floating point arrays, I am not 100% sure how we
can get around the cost. I think we will always need to copy the
arrays to get the quantiles via `FloatSelector`.

Since the ScalarQuantizer needs to make copies & we can determine what
the size of those copies are, we should be able to give a better
estimate of all the memory required for flush (right now the cost of
building the ScalarQuantizer, because its memory usage is short lived,
isn't provided).

I can open a different PR for that bug fix.

On Wed, Jul 3, 2024 at 12:43 AM Gautam Worah <worah.gau...@gmail.com> wrote:
>
> Hi Ben,
>
> I am working on something very close to what Michael Sokolov has done.
> I see OOMs on the Writer when it tries to index 130M 8 bit / 4 bit quantized 
> vectors on a single big box with a 40 GB heap, with HNSW disabled.
> I've tried indexing all the vectors as plain vectors converted to floats 
> converted to BinaryDocValues and that worked fine.
> I tried smaller heap sizes starting with 20 GB but they all failed. 40 GB 
> heap is already quite a bit and hence the deep dive..
> The Writer process is not doing any other RAM heavy things so I am assuming 
> that the memory is dominated by vectors.
> The vectors originally had 768 dimensions.
>
> My process was initially failing when it reached an index size of about ~40 
> GB. OOM stack failures were close to when merges were happening.
> I tried reducing the number of concurrent merges that were allowed, the 
> number of segments that can be merged at once, and that helped, but only a 
> little. I was still seeing OOMs.
> Then, I adopted a NoMergePolicy and was able to build a ~240 GB quantized 
> index, but that too, OOMs out before indexing all the docs.
> The ramBufferSizeMB is 4096, so roughly speaking it has 2.5 GB ish segments, 
> and multiple smaller ones per flush.
>
> I am assuming the quantization on flush is causing the failures. Which 
> operation during flush is taking up so much memory? I don't know ..
> I don't think the quantization factor (bits) affects the memory much.
>
> Do we do the quartile calculation, eventual quantization in a streaming 
> fashion?
> Are there any other things that jump out to you as memory bottlenecks/methods 
> that you think would be memory hungry?
>
> I have the heap dump and am also analyzing it myself.
>
> >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
> of it (does this no matter what)
>
> This maybe could've caused the problems.. but if the ramBufferSizeMB is small 
> and merges are disabled, it's hard to imagine how 40 GB could've been 
> consumed.
>
> Best,
> Gautam Worah.
>
>
> On Wed, Jun 12, 2024 at 9:42 AM Benjamin Trent <ben.w.tr...@gmail.com> wrote:
>>
>> Michael,
>>
>> Empirically, I am not surprised there is an increase in heap usage. We
>> do have extra overhead with the scalar quantization on flush. There
>> may also be some additional heap usage on merge.
>>
>> I just don't think it is via: Lucene99FlatVectorsWriter
>>
>> On Wed, Jun 12, 2024 at 11:55 AM Michael Sokolov <msoko...@gmail.com> wrote:
>> >
>> >  Empirically I thought I saw the need to increase JVM heap with this,
>> > but let me do some more testing to narrow down what is going on. It's
>> > possible the same heap requirements exist for the non-quantized case
>> > and I am just seeing some random vagary of the merge process happening
>> > to tip over a limit. It's also possible I messed something up in
>> > https://github.com/apache/lucene/pull/13469 which I am trying to use
>> > in order to index quantized vectors without building an HNSW graph.
>> >
>> > On Wed, Jun 12, 2024 at 10:24 AM Benjamin Trent <ben.w.tr...@gmail.com> 
>> > wrote:
>> > >
>> > > Heya Michael,
>> > >
>> > > > the first one I traced was referenced by vector writers involved in a 
>> > > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this 
>> > > > expected?
>> > >
>> > > Yes, that is holding the raw floats before flush. You should see
>> > > nearly the exact same overhead there as you would indexing raw
>> > > vectors. I would be surprised if there is a significant memory usage
>> > > difference due to Lucene99FlatVectorsWriter when using quantized vs.
>> > > not.
>> > >
>> > > The flow is this:
>> > >
>> > >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
>> > > of it (does this no matter what) and passes on to the next part of the
>> > > chain
>> > >  - If quantizing, the next part of the chain is
>> > > Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
>> > > REFERENCE to the array, it doesn't copy it. The float vector array is
>> > > then passed to the HNSW indexer (if its being used), which also does
>> > > NOT copy, but keeps a reference.
>> > >  - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass
>> > > it directly to the hnsw indexer, which does not copy it, but does add
>> > > it to the HNSW graph
>> > >
>> > > > I wonder if there is an opportunity to move some of this off-heap?
>> > >
>> > > I think we could do some things off-heap in the ScalarQuantizer. Maybe
>> > > even during "flush", but we would have to adjust the interfaces some
>> > > so that the scalarquantizer can know where the vectors are being
>> > > stored after the initial flush. Right now there is no way to know the
>> > > file nor file handle.
>> > >
>> > > > I can imagine that when we requantize we need to scan all the vectors 
>> > > > to determine the new quantization settings?
>> > >
>> > > We shouldn't be scanning every vector. We do take a sampling, though
>> > > that sampling can be large. There is here an opportunity for off-heap
>> > > action if possible. Though I don't know how we could do that before
>> > > flush. I could see the off-heap idea helping on merge.
>> > >
>> > > > Maybe we could do two passes - merge the float vectors while 
>> > > > recalculating, and then re-scan to do the actual quantization?
>> > >
>> > > I am not sure what you mean here by "merge the float vectors". If you
>> > > mean simply reading the individual float vector files and combining
>> > > them into a single file, we already do that separately from
>> > > quantizing.
>> > >
>> > > Thank you for digging into this. Glad others are experimenting!
>> > >
>> > > Ben
>> > >
>> > > On Wed, Jun 12, 2024 at 8:57 AM Michael Sokolov <msoko...@gmail.com> 
>> > > wrote:
>> > > >
>> > > > Hi folks. I've been experimenting with our new scalar quantization
>> > > > support - yay, thanks for adding it! I'm finding that when I index a
>> > > > large number of large vectors, enabling quantization (vs simply
>> > > > indexing the full-width floats) requires more heap - I keep getting
>> > > > OOMs and have to increase heap size. I took a heap dump, and not
>> > > > surprisingly I found some big arrays of floats and bytes, and the
>> > > > first one I traced was referenced by vector writers involved in a
>> > > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this
>> > > > expected? I wonder if there is an opportunity to move some of this
>> > > > off-heap?  I can imagine that when we requantize we need to scan all
>> > > > the vectors to determine the new quantization settings?  Maybe we
>> > > > could do two passes - merge the float vectors while recalculating, and
>> > > > then re-scan to do the actual quantization?
>> > > >
>> > > > ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > > For additional commands, e-mail: dev-h...@lucene.apache.org
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > For additional commands, e-mail: dev-h...@lucene.apache.org
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: scalar quantization heap usage during merge

Reply via email to