Hi all again,

Just following-up on this myself and to see if anyone else out there, also
using KNN nearest-neighbour vector searches in SOLR, has been thinking
about this.

There is an interesting document on Pinecone's website on HNSW specifically
about databases that "bolt-on" vector indexes (now of course Pinecone will
be a little bit biased when talking about other databases with vector
indexes :-) but it does seem to be talking about exactly what I'm trying to
understand).

They say:

"HNSW was designed for relatively static datasets, though, and production
applications are anything but static. As your data changes, so do the
vector embeddings that represent your data, and, consequently, so must the
vector index"
...
"Periodically rebuild the index, and either tolerate downtime during the
rebuild process or manage a blue-green deployment each time".

So when I have a SOLR collection primarily used for vector searches and I'm
creating and deleting documents throughout the day I'm wondering if the
vector index is becoming out-of-date, maybe with poorer and poorer results,
because I'm never rebuilding the SOLR index from scratch (like their 2nd
quote above).  Maybe I'll have to rebuild the SOLR index periodically ?
(probably the "blue-geen" method is the way to go).

Unless SOLR does somehow rebuild the HNSW vector index in the background ?
 But I haven't seen any documentation suggesting this.

If anyone doesn't have any knowledge on this, or results from their own
experiments, I'm very interested.

Derek

p.s. I'm very interested in sticking with SOLR for the vector queries
because I want to introduce vector searches WITH other, traditional,
queries and I think with the later versions of SOLR it's better at this
(and not just a filter query of the knn results but I haven't got to try
this out more recently - I only experimented with mixing vector and
traditional options in an earlier version of SOLR).


On Fri, Nov 1, 2024 at 2:20 PM Derek C <de...@hssl.ie> wrote:

> Hi all,
>
> This is something I'm unsure about:
>
> We have a SOLR collection of documents with knn_vector_cosine embedding
> fields which we use to run nearest neighbor searches.  We have replicas but
> no shards so every node has the entire core/collection of documents.
>
> We are adding and removing documents all the time in nightly add and
> remove batches.
>
> What I'm wondering is: -
>
> Because HNSW relates vertices to each other I don't know if, or how, SOLR
> "re-indexes" data as new documents are added and removed. So :
> 1) Does the accuracy (of nearest neighbours to a supplied embedding) get
> worse over time?
> 2) If I deleted all documents in the collection and re-loaded them (so
> re-indexed them) would I get different (better?) results to a
> nearest-neighbour knn query ?
>
> thank you for any information on this
>
> Derek
>
>
> --
> Derek Conniffe
> Skype: dconnrt
> Email: de...@hssl.ie
>
>
> *Disclaimer:*This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error please
> delete it (if you are not the intended recipient you are notified that
> disclosing, copying, distributing or taking any action in reliance on the
> contents of this information is strictly prohibited).
> *Warning*: Although HSSL have taken reasonable precautions to ensure no
> viruses are present in this email, HSSL cannot accept responsibility for
> any loss or damage arising from the use of this email or attachments.PFor
> the Environment, please only print this email if necessary.
>
>


-- 
-- 
Derek Conniffe
Harvey Software Systems Ltd T/A HSSL
Telephone (IRL): 086 856 3823
Telephone (US): (650) 449 6044
Skype: dconnrt
Email: de...@hssl.ie


*Disclaimer:* This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please delete it
(if you are not the intended recipient you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of
this information is strictly prohibited).
*Warning*: Although HSSL have taken reasonable precautions to ensure no
viruses are present in this email, HSSL cannot accept responsibility for
any loss or damage arising from the use of this email or attachments.
P For the Environment, please only print this email if necessary.

Reply via email to