This is all great advice.

There is no optimal number of shards. I’ve run clusters with 4 shards, we 
currently have one cluster with 96 shards and one with 320 shards. The next one 
we build out will probably not be sharded.

With long queries, I’ve usually seen a roughly linear speedup with sharding. 
Double the shards, halve the response time.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 13, 2023, at 4:48 AM, Jan Høydahl <jan....@cominvent.com> wrote:
> 
> Hi,
> 
> There are no hard rules wrt sharding, it often comes down to measuring and 
> experimenting for your workload.
> 
> There are other things to consider than shard size. Why are the queries slow? 
> How many rows do you ask for? Do you use faceting? Grouping?
> You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
> each node have, and how much RAM did you allocate to Solr/Java?
> A common mistake is to allocate too much ram/heap to Solr to you don't get 
> any virtual memory caching in Linux.
> Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those 
> to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching.
> 
> Other things to consider is to look at whether your queries can be optimized 
> by rewriting them to more efficient equivalents. Sometimes, Solr-level caches 
> can also help.
> 
> Wrt shards efficiency: If you already have 8 shards, it is not much more 
> expensive to go to 16, but you increase the risk of a single failure 
> affecting your requests...
> 
> Jan
> 
>> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
>> <saksham.gu...@indiamart.com.INVALID>:
>> 
>> Hi All,
>> 
>> I have been trying to reduce the response time of solr cloud(v8.10, 8
>> nodes). To achieve this, I have tried increasing the number of shards of
>> solr cloud which can help reduce data size on each shard thereby reducing
>> response time.
>> 
>> 
>> I have encountered a few questions regarding sharding strategy:
>> 
>> 1. How to decide the ideal number of shards? Is there a minimum or maximum
>> number of shards which should be used?
>> 
>> 2. What is the minimum size of a shard after which reducing the size
>> further won't have any effect on the response time (as time taken by other
>> factors like data aggregation will compensate for that) ?
>> 
>> 3. Is there some maximum limit to the size of data that should be kept in a
>> shard?
>> 
>> 
>> As of now we have 8 shards each on a separate node with ~25 gb of
>> data(15-16 million docs) present on each shard. Please advise me of the
>> standard approaches to define the number of shards and shard size. Thanks
>> in advance.
> 

Reply via email to