This is all great advice. There is no optimal number of shards. I’ve run clusters with 4 shards, we currently have one cluster with 96 shards and one with 320 shards. The next one we build out will probably not be sharded.
With long queries, I’ve usually seen a roughly linear speedup with sharding. Double the shards, halve the response time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 13, 2023, at 4:48 AM, Jan Høydahl <jan....@cominvent.com> wrote: > > Hi, > > There are no hard rules wrt sharding, it often comes down to measuring and > experimenting for your workload. > > There are other things to consider than shard size. Why are the queries slow? > How many rows do you ask for? Do you use faceting? Grouping? > You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does > each node have, and how much RAM did you allocate to Solr/Java? > A common mistake is to allocate too much ram/heap to Solr to you don't get > any virtual memory caching in Linux. > Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those > to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching. > > Other things to consider is to look at whether your queries can be optimized > by rewriting them to more efficient equivalents. Sometimes, Solr-level caches > can also help. > > Wrt shards efficiency: If you already have 8 shards, it is not much more > expensive to go to 16, but you increase the risk of a single failure > affecting your requests... > > Jan > >> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta >> <saksham.gu...@indiamart.com.INVALID>: >> >> Hi All, >> >> I have been trying to reduce the response time of solr cloud(v8.10, 8 >> nodes). To achieve this, I have tried increasing the number of shards of >> solr cloud which can help reduce data size on each shard thereby reducing >> response time. >> >> >> I have encountered a few questions regarding sharding strategy: >> >> 1. How to decide the ideal number of shards? Is there a minimum or maximum >> number of shards which should be used? >> >> 2. What is the minimum size of a shard after which reducing the size >> further won't have any effect on the response time (as time taken by other >> factors like data aggregation will compensate for that) ? >> >> 3. Is there some maximum limit to the size of data that should be kept in a >> shard? >> >> >> As of now we have 8 shards each on a separate node with ~25 gb of >> data(15-16 million docs) present on each shard. Please advise me of the >> standard approaches to define the number of shards and shard size. Thanks >> in advance. >