HNSW (knn_vector_cosine) index accuracy over time

Derek C Fri, 01 Nov 2024 07:21:26 -0700

Hi all,

This is something I'm unsure about:


We have a SOLR collection of documents with knn_vector_cosine embedding
fields which we use to run nearest neighbor searches.  We have replicas but
no shards so every node has the entire core/collection of documents.

We are adding and removing documents all the time in nightly add and remove
batches.

What I'm wondering is: -

Because HNSW relates vertices to each other I don't know if, or how, SOLR
"re-indexes" data as new documents are added and removed. So :
1) Does the accuracy (of nearest neighbours to a supplied embedding) get
worse over time?
2) If I deleted all documents in the collection and re-loaded them (so
re-indexed them) would I get different (better?) results to a
nearest-neighbour knn query ?

thank you for any information on this

Derek


--
Derek Conniffe
Skype: dconnrt
Email: de...@hssl.ie


*Disclaimer:*This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please delete it
(if you are not the intended recipient you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of
this information is strictly prohibited).
*Warning*: Although HSSL have taken reasonable precautions to ensure no
viruses are present in this email, HSSL cannot accept responsibility for
any loss or damage arising from the use of this email or attachments.PFor
the Environment, please only print this email if necessary.

HNSW (knn_vector_cosine) index accuracy over time

Reply via email to