On 5/18/22 10:56, Hasmik Sarkezians wrote:
Thanks for the reply.

It doesn't matter to me which shard the document ends up in, just matters
how many shards the document ends up with:

And seems like I wouldn't have control over that as the number of shards
grows.

I've been thinking about some documentation to provide more detail about how this works.

Here's some of what I would include in that documentation:

Lets say that the prefix is IBM, and we're using composite IDs like IBM!12345in the index.  This means that the prefix will define 16 bits of the hash and the rest of the ID will define the other 16 bits.  Each shard has a defined hash range, and the combined hash will determine which shard the document ends up on.

The effective result of this if there are a lot of documents means that each prefix will land on 1/65536 (65536 being 2 to the power of 16) of the shards.  So if you have 65536 or fewer shards, then every document with that prefix will end up on only one shard. But you can't directly control which shard gets a given prefix -- that will be decided by what hash value the prefix generates. Unless there a lot of prefixes and a lot of documents in each prefix, there could be an imbalance of documents across the different shards, and that imbalance could be very extreme.

If you specify the number of bits with something like "IBM/3!12345" then for that example each prefix will end up on 1/8 of your shards.  (8 being 2 to the power of 3)

Shard splitting can make things very complicated, unless you split ALL shards to the same number of target shards.  If you do a completely uniform split, then the same rules I just mentioned will apply to the new shards.

Thanks,
Shawn

Reply via email to