Re: [External Email] Re: Shard Split and composite id

Shawn Heisey Wed, 18 May 2022 11:28:13 -0700

On 5/18/22 10:56, Hasmik Sarkezians wrote:

Thanks for the reply.


It doesn't matter to me which shard the document ends up in, just matters
how many shards the document ends up with:

And seems like I wouldn't have control over that as the number of shards
grows.

I've been thinking about some documentation to provide more detail abouthow this works.


Here's some of what I would include in that documentation:

Lets say that the prefix is IBM, and we're using composite IDs likeIBM!12345in the index. This means that the prefix will define 16 bitsof the hash and the rest of the ID will define the other 16 bits. Eachshard has a defined hash range, and the combined hash will determinewhich shard the document ends up on.

The effective result of this if there are a lot of documents means thateach prefix will land on 1/65536 (65536 being 2 to the power of 16) ofthe shards. So if you have 65536 or fewer shards, then every documentwith that prefix will end up on only one shard. But you can't directlycontrol which shard gets a given prefix -- that will be decided by whathash value the prefix generates. Unless there a lot of prefixes and alot of documents in each prefix, there could be an imbalance ofdocuments across the different shards, and that imbalance could be veryextreme.

If you specify the number of bits with something like "IBM/3!12345" thenfor that example each prefix will end up on 1/8 of your shards. (8being 2 to the power of 3)

Shard splitting can make things very complicated, unless you split ALLshards to the same number of target shards. If you do a completelyuniform split, then the same rules I just mentioned will apply to thenew shards.


Thanks,
Shawn

Re: [External Email] Re: Shard Split and composite id

Reply via email to