Re: [PR] Implement off-heap quantized scoring [lucene]

via GitHub Sun, 14 Sep 2025 10:27:28 -0700


kaivalnp commented on PR #14863:
URL: https://github.com/apache/lucene/pull/14863#issuecomment-3289707525


   > I'll try to run a test with 8-bit quantization too, I realized this PR 
will implicitly support it :)
   
   Here are the benchmarks:
   ### `cosine`
   
   `main`
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.584        1.470   1.469        0.999  200000   100      50       32      
  200     8 bits     10.32      19372.34           14.67             1          
747.53       733.185      147.247       HNSW
   ```
   
   This PR
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.583        1.316   1.314        0.998  200000   100      50       32      
  200     8 bits     10.54      18978.93           13.66             1          
747.16       733.185      147.247       HNSW
   ```
   
   ### `dot_product`
   
   `main`
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.858        1.444   1.442        0.999  200000   100      50       32      
  200     8 bits     10.38      19267.82           14.96             1          
747.12       733.185      147.247       HNSW
   ```
   
   This PR
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.858        1.315   1.314        0.999  200000   100      50       32      
  200     8 bits     11.21      17836.44           14.28             1          
747.16       733.185      147.247       HNSW
   ```
   
   ### `euclidean`
   
   `main`
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.933        1.360   1.359        0.999  200000   100      50       32      
  200     8 bits     14.75      13562.08           34.43             1          
740.41       733.185      147.247       HNSW
   ```
   
   This PR
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.935        1.288   1.286        0.998  200000   100      50       32      
  200     8 bits     10.04      19920.32           12.47             1          
740.53       733.185      147.247       HNSW
   ```
   
   ### `mip`
   
   `main`
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.857        1.400   1.399        0.999  200000   100      50       32      
  200     8 bits     10.70      18686.35           14.36             1          
747.11       733.185      147.247       HNSW
   ```
   
   This PR
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.858        1.358   1.357        0.999  200000   100      50       32      
  200     8 bits     10.47      19109.50           14.30             1          
747.15       733.185      147.247       HNSW
   ```
   
   
   Indexing and force-merge are non-trivially faster (30% and 60% respectively) 
for `euclidean`, not sure if this is an outlier..
   Search is slightly faster (3-10%) for all vector similarities


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Implement off-heap quantized scoring [lucene]

Reply via email to