benwtrent opened a new issue, #14208:
URL: https://github.com/apache/lucene/issues/14208
### Description
I am not sure of other structures, but HNSW merges can allocate a pretty
large chunk of memory on heap.
For example:
Let's have the max_conn set to 16. Thus connections on the bottom layer is
32.
We eagerly create the neighbor arrays, which means for 9 million vectors,
the heap allocation balloons to over 2GB (and depending on the number of layers
and other structures, is over 2.5GB of heap).
From what I can tell, merges don't really expose a "Here is how much heap I
am estimated to use".
I wonder if we can do one of the following to help this scenario:
- Make HNSW merges cheaper when it comes to on-heap memory (e.g. merge off
heap?!? make it cheaper??)
- Don't eagerly allocate all the memory required (complicates
multi-threaded merging...and might not actually address the issue)
Note, this is tangential this other HNSW merging issue, and might actually
be an antithesis, as sometimes reducing memory allocations then implies slower
merging: https://github.com/apache/lucene/issues/12440
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]