I wonder if you could influence the graph search by incorporating the
partition key (customer id?) to the vectors somehow? If this was done
well it should lead to a natural clustering of the graph.
On Mon, Jun 2, 2025 at 11:32 AM Ravikumar Govindarajan
wrote:
>
> Hi Michael,
>
> The docs range co
Hi Michael,
The docs range could vary in extremes from few 10s to tens-of-thousands
and in very heavy usage cases, 100k and above… in a single segment
Filtered Hnsw like you said uses a single graph.., which could be better if
designed as sub-graphs
On Mon, 2 Jun 2025 at 5:42 PM, Michael Sokolo
How many documents do you anticipate in a typical sub range? If it's in the
hundreds or even low thousands you would be better off without hnsw.
Instead you can use a function score query based on the vector distance.
For larger numbers where hnsw becomes useful, you could try using filtered
hnsw,
We use index-sorting to arrange segment data. The ord-ranges for any given
KnnVectorField is mutually exclusive
Ex:
field: content
OrdRange -> 0-100 (User1)
OrdRange -> 101-300 (User2)
and so on..
Each OrdRange has to be a self-contained Hnsw graph with all neighbours
strictly inside the given O