Hello all, I have been looking at this benchmark against Vespa recently: https://blog.vespa.ai/elasticsearch-vs-vespa-performance-comparison/. (The report is behind an annoying email wall, but I'm copying relevant data below, so hopefully you don't need to download the report.) Even though it uses Elasticsearch to run the benchmark, it really benchmarks Lucene functionality, I don't believe that Elasticsearch does anything that meaningfully alters the results that you would get if you were to run Lucene directly.
The benchmark seems designed to highlight the benefits of Vespa's realtime design, that's fair game I guess. But it also runs some queries in read-only scenarios when I was expecting Lucene to perform better. One thing that got me curious is that it reports about 2x worse latency and throughput for pure unfiltered vector search on a force-merged index (so single segment/graph). Does anybody know why Lucene's HNSW may perform slower than Vespa's HNSW? This seems consistent with results from https://ann-benchmarks.com/index.html though I don't know if the cause of the performance difference is the same or not. For reference, here are details that apply to both Lucene and Vespa's vector search: - HNSW, - float32 vectors, no quantization, - embeddings generated using Snowflake's Arctic-embed-xs model - 1M docs - 384 dimensions, - dot product, - m = 16, - max connections = 200, - search for top 10 hits, - no filter, - single client, so no search concurrency, - purple column is force-merged, so single segment/graph like Vespa. I never seriously looked at Lucene's vector search performance, so I'm very happy to be educated if I'm making naive assumptions! Somewhat related, is this the reason why I'm seeing many threads around bringing 3rd party implementations into Lucene, including ones that are very similar to Lucene on paper? To speed up vector search? -- Adrien
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org