Re: [I] Can we add configuration on dropping raw vectors from quantized formats after some period of time? [lucene]

via GitHub Mon, 01 Apr 2024 09:10:51 -0700


mikemccand commented on issue #13251:
URL: https://github.com/apache/lucene/issues/13251#issuecomment-2030070182


   This is a neat idea -- it would allow the user to accept some "lossy 
compression" when they know/expect that loss will be minor for their use case.  
Sort of like JPEG vs RAW image encoding.
   
   One question (I don't know enough about how the HNSW merging works): if we 
did this, and segments with these "only the quantized vectors remain" are 
merged, we would have to use those quantized vectors to build the next HNSW 
graph right?  (Whereas today we always go back to the full precision vectors to 
build the graph for the newly merged segment?).  Or are we always using the 
quantized vectors to build the graph?
   
   I suppose, if using the quantized vectors at search time is not hurting 
much, because the "quantization noise" in the resulting distance computation 
between two vectors is negligible, then building the graph off of the quantized 
forms should also not hurt much since that graph building is really just doing 
a bunch of searching to get the top K vectors that should be linked up in the 
graph?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Can we add configuration on dropping raw vectors from quantized formats after some period of time? [lucene]

Reply via email to