kaivalnp commented on PR #14863: URL: https://github.com/apache/lucene/pull/14863#issuecomment-3273523849
Sorry for the delay here! I ran the following benchmarks on 768d Cohere vectors for all vector similarities, with 4-bit (compressed) and 7-bit quantization. I needed to run 10k queries for reliable results (saw some variance in the default case of 1k queries) ### `cosine` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.544 3.103 3.102 0.999 200000 100 50 32 200 4 bits 14.45 13842.75 4 670.05 659.943 74.005 HNSW 0.505 4.499 4.497 1.000 200000 100 50 32 200 7 bits 14.03 14257.20 4 745.36 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.543 2.854 2.852 0.999 200000 100 50 32 200 4 bits 14.57 13724.95 4 670.06 659.943 74.005 HNSW 0.506 3.978 3.976 0.999 200000 100 50 32 200 7 bits 13.41 14912.02 4 745.09 733.185 147.247 HNSW ``` ### `dot_product` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.528 3.522 3.520 1.000 200000 100 50 32 200 4 bits 14.03 14258.22 4 674.69 659.943 74.005 HNSW 0.881 4.303 4.301 1.000 200000 100 50 32 200 7 bits 14.41 13880.21 4 746.41 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.528 3.218 3.217 1.000 200000 100 50 32 200 4 bits 13.60 14706.96 4 674.64 659.943 74.005 HNSW 0.882 3.915 3.913 1.000 200000 100 50 32 200 7 bits 15.15 13205.68 4 746.44 733.185 147.247 HNSW ``` ### `euclidean` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.550 7.581 7.579 1.000 200000 100 50 32 200 4 bits 13.09 15284.68 4 667.46 659.943 74.005 HNSW 0.936 3.938 3.937 1.000 200000 100 50 32 200 7 bits 12.88 15532.77 4 739.76 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.550 2.422 2.420 0.999 200000 100 50 32 200 4 bits 13.27 15070.45 4 667.45 659.943 74.005 HNSW 0.936 3.666 3.664 0.999 200000 100 50 32 200 7 bits 12.66 15796.54 4 739.73 733.185 147.247 HNSW ``` ### `mip` `main` ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.529 3.537 3.536 1.000 200000 100 50 32 200 4 bits 14.30 13988.95 4 674.69 659.943 74.005 HNSW 0.882 4.280 4.278 1.000 200000 100 50 32 200 7 bits 14.18 14109.35 4 746.41 733.185 147.247 HNSW ``` This PR ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.529 3.332 3.330 0.999 200000 100 50 32 200 4 bits 13.89 14401.96 4 674.65 659.943 74.005 HNSW 0.882 3.876 3.874 0.999 200000 100 50 32 200 7 bits 13.87 14423.77 4 746.43 733.185 147.247 HNSW ``` The speedup vector search time for 4 bit `euclidean` (=68%) seems amazing, because we used to decompress the bits into a `byte` and use the same [`squareDistance`](https://github.com/apache/lucene/blob/50a4f1864ef98f48abcfdd5202bd96693ee8b098/lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java#L789-L792) function, which did not take into account that the max value of the inputs could be in the \[0, 15\] range, and we can make some optimizations with this information. We see \~10% speedup in search time for everything else, while indexing is kind of unaffected. Sharing JMH benchmarks (also because it checks for correctness of functions): ``` java --module-path lucene/benchmark-jmh/build/benchmarks --module org.apache.lucene.benchmark.jmh "VectorUtilBenchmark.binaryHalfByte*" -p size=1024 ``` ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHalfByteDotProductBothPackedScalar 1024 thrpt 15 2.378 ± 0.001 ops/us VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024 thrpt 15 0.472 ± 0.002 ops/us VectorUtilBenchmark.binaryHalfByteDotProductScalar 1024 thrpt 15 2.378 ± 0.002 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedScalar 1024 thrpt 15 2.448 ± 0.005 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024 thrpt 15 16.180 ± 0.082 ops/us VectorUtilBenchmark.binaryHalfByteDotProductVector 1024 thrpt 15 20.947 ± 0.045 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedScalar 1024 thrpt 15 1.642 ± 0.001 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024 thrpt 15 14.142 ± 0.031 ops/us VectorUtilBenchmark.binaryHalfByteSquareScalar 1024 thrpt 15 2.463 ± 0.003 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedScalar 1024 thrpt 15 2.022 ± 0.001 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024 thrpt 15 16.340 ± 0.039 ops/us VectorUtilBenchmark.binaryHalfByteSquareVector 1024 thrpt 15 18.749 ± 0.055 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org