kaivalnp commented on PR #14863: URL: https://github.com/apache/lucene/pull/14863#issuecomment-3289707525
> I'll try to run a test with 8-bit quantization too, I realized this PR will implicitly support it :) Here are the benchmarks: ### `cosine` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.584 1.470 1.469 0.999 200000 100 50 32 200 8 bits 10.32 19372.34 14.67 1 747.53 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.583 1.316 1.314 0.998 200000 100 50 32 200 8 bits 10.54 18978.93 13.66 1 747.16 733.185 147.247 HNSW ``` ### `dot_product` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.858 1.444 1.442 0.999 200000 100 50 32 200 8 bits 10.38 19267.82 14.96 1 747.12 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.858 1.315 1.314 0.999 200000 100 50 32 200 8 bits 11.21 17836.44 14.28 1 747.16 733.185 147.247 HNSW ``` ### `euclidean` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.933 1.360 1.359 0.999 200000 100 50 32 200 8 bits 14.75 13562.08 34.43 1 740.41 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.935 1.288 1.286 0.998 200000 100 50 32 200 8 bits 10.04 19920.32 12.47 1 740.53 733.185 147.247 HNSW ``` ### `mip` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.857 1.400 1.399 0.999 200000 100 50 32 200 8 bits 10.70 18686.35 14.36 1 747.11 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.858 1.358 1.357 0.999 200000 100 50 32 200 8 bits 10.47 19109.50 14.30 1 747.15 733.185 147.247 HNSW ``` Indexing and force-merge are non-trivially faster (30% and 60% respectively) for `euclidean`, not sure if this is an outlier.. Search is slightly faster (3-10%) for all vector similarities -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org