What does your synthetic randomized benchmark look like? Did you try different values for hnswMaxConnections and hnswMaxConn. Do your curves wildly differ from https://ann-benchmarks.com/luceneknn.html ?
On Tue, Jan 30, 2024 at 3:49 PM Moll, Dr. Andreas <m...@juris.de.invalid> wrote: > Hi, > > the hnsw documentation for the Lucene HnswGraph and the SolR vector search > is not very verbose, especially in regards to the parameters hnswMaxConn > and hnswBeamWidth. > I find it hard to come up with sensible values for these parameters by > reading the paper from 2018. > Does anyone have experience with the influence of the parameters on the > results? As far as I understand the code the graph is created at indexing > time so it would be time intensive to come up with the optimal values for a > specific use case by trial and error? > > We have a SolR index with roughly 100 million embeddings and in a > synthetic randomized benchmarks around 14% percent of requests will result > in a suboptimal answer (based on the cosine vector similarity). > I expected this "error" rate to be much smaller. I would love to hear your > experiences. > > Best regards > > Andreas Moll > >