Re: Sub-Graphs in Hnsw

2025-06-02 Thread Michael Sokolov
I wonder if you could influence the graph search by incorporating the partition key (customer id?) to the vectors somehow? If this was done well it should lead to a natural clustering of the graph. On Mon, Jun 2, 2025 at 11:32 AM Ravikumar Govindarajan wrote: > > Hi Michael, > > The docs range co

Re: Sub-Graphs in Hnsw

2025-06-02 Thread Ravikumar Govindarajan
Hi Michael, The docs range could vary in extremes from few 10s to tens-of-thousands and in very heavy usage cases, 100k and above… in a single segment Filtered Hnsw like you said uses a single graph.., which could be better if designed as sub-graphs On Mon, 2 Jun 2025 at 5:42 PM, Michael Sokolo

Re: Sub-Graphs in Hnsw

2025-06-02 Thread Michael Sokolov
How many documents do you anticipate in a typical sub range? If it's in the hundreds or even low thousands you would be better off without hnsw. Instead you can use a function score query based on the vector distance. For larger numbers where hnsw becomes useful, you could try using filtered hnsw,

Sub-Graphs in Hnsw

2025-06-02 Thread Ravikumar Govindarajan
We use index-sorting to arrange segment data. The ord-ranges for any given KnnVectorField is mutually exclusive Ex: field: content OrdRange -> 0-100 (User1) OrdRange -> 101-300 (User2) and so on.. Each OrdRange has to be a self-contained Hnsw graph with all neighbours strictly inside the given O