[I] Make HNSW merges cheaper on heap or at least expose heap usage estimate [lucene]

via GitHub Thu, 06 Feb 2025 06:48:16 -0800


benwtrent opened a new issue, #14208:
URL: https://github.com/apache/lucene/issues/14208


   ### Description
   
   I am not sure of other structures, but HNSW merges can allocate a pretty 
large chunk of memory on heap.
   
   For example:
   
   Let's have the max_conn set to 16. Thus connections on the bottom layer is 
32.
   
   We eagerly create the neighbor arrays, which means for 9 million vectors, 
the heap allocation balloons to over 2GB (and depending on the number of layers 
and other structures, is over 2.5GB of heap). 
   
   From what I can tell, merges don't really expose a "Here is how much heap I 
am estimated to use". 
   
   I wonder if we can do one of the following to help this scenario:
   
    - Make HNSW merges cheaper when it comes to on-heap memory (e.g. merge off 
heap?!? make it cheaper??)
    - Don't eagerly allocate all the memory required (complicates 
multi-threaded merging...and might not actually address the issue)
   
   
   Note, this is tangential this other HNSW merging issue, and might actually 
be an antithesis, as sometimes reducing memory allocations then implies slower 
merging: https://github.com/apache/lucene/issues/12440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Make HNSW merges cheaper on heap or at least expose heap usage estimate [lucene]

Reply via email to