On 5/18/22 08:42, Hasmik Sarkezians wrote:
Have a question about shard splitting and compositeId usage. We are
starting a solr collection with X number of shards for our multi-tenant
application. We are assuming that the number of shards will increase over
time as the number of customers grows as well as the customer data.

We are thinking of using  the <customerId>/num!docId format to specify
multiple shards for my tenants depending on the number of records that we
will index. We will start with 4 shards and then my assumption is that we
use the shard split to add more shards to the collection.

customer size X = 1 shard and as such the compositeId would be
customer1!docId
customer size 5*X = 2 shards and as such the compositeId would be
customer2/1!docId

And now if I split the shards and the number of shards becomes 5, 6, 7, 8
what happens to the data? The point is I don't want the customer2 endup in
4 shards when we get to have 8 shards. If someone can shed some light here
I would appreciate it.

I wonder if you have a good understanding of how a compositeId works.

The prefix does not directly dictate what shard a document will end up in.  It determines how many bits of the full 32-bit ID hash will be computed from the prefix and how many from the rest of the ID.

https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing

Something not stated there is how many bits are used if the number is not specified.  Looking at the code, the default appears to be 16 if the number of parts in the ID is 2, and 8 if the number of parts in the ID is 3.  I don't think it supports more than 3 parts.

When you split a shard, the hash range for the shard will be split, and the range for the new shards will be smaller than any other shards that were not split.  So it may not be completely predictable which shards a composite ID will be stored in when you split them.  If you split ALL shards in half, then a prefix that limited the number of shards to 2 could result in those documents being split across 4 shards, but depending on how many documents there are with that prefix and EXACTLY how the hashes end up being divided, it could be as low as 2 shards and as high as 4.  If there are a lot of documents with that prefix, chances are that it would be 4 shards.

If you want explicit control over which shard a document ends up in, you cannot use compositeId.  You'll have to use the implicit router and designate a field where the name of the shard will go.  I don't think splitting shards is possible with the implicit router.

Thanks,
Shawn

Reply via email to