@Walter,

how on earth are you monitoring all vital Solr Cloud Parameters for 320 shards?

Regards,
Bernd


Am 13.09.23 um 16:22 schrieb Walter Underwood:
This is all great advice.

There is no optimal number of shards. I’ve run clusters with 4 shards, we 
currently have one cluster with 96 shards and one with 320 shards. The next one 
we build out will probably not be sharded.

With long queries, I’ve usually seen a roughly linear speedup with sharding. 
Double the shards, halve the response time.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Sep 13, 2023, at 4:48 AM, Jan Høydahl <jan....@cominvent.com> wrote:

Hi,

There are no hard rules wrt sharding, it often comes down to measuring and 
experimenting for your workload.

There are other things to consider than shard size. Why are the queries slow? 
How many rows do you ask for? Do you use faceting? Grouping?
You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
each node have, and how much RAM did you allocate to Solr/Java?
A common mistake is to allocate too much ram/heap to Solr to you don't get any 
virtual memory caching in Linux.
Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those to 
Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching.

Other things to consider is to look at whether your queries can be optimized by 
rewriting them to more efficient equivalents. Sometimes, Solr-level caches can 
also help.

Wrt shards efficiency: If you already have 8 shards, it is not much more 
expensive to go to 16, but you increase the risk of a single failure affecting 
your requests...

Jan

13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
<saksham.gu...@indiamart.com.INVALID>:

Hi All,

I have been trying to reduce the response time of solr cloud(v8.10, 8
nodes). To achieve this, I have tried increasing the number of shards of
solr cloud which can help reduce data size on each shard thereby reducing
response time.


I have encountered a few questions regarding sharding strategy:

1. How to decide the ideal number of shards? Is there a minimum or maximum
number of shards which should be used?

2. What is the minimum size of a shard after which reducing the size
further won't have any effect on the response time (as time taken by other
factors like data aggregation will compensate for that) ?

3. Is there some maximum limit to the size of data that should be kept in a
shard?


As of now we have 8 shards each on a separate node with ~25 gb of
data(15-16 million docs) present on each shard. Please advise me of the
standard approaches to define the number of shards and shard size. Thanks
in advance.




--
*************************************************************
Bernd Fehling                    Bielefeld University Library
Dipl.-Inform. (FH)                LibTec - Library Technology
Universitätsstr. 25                  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de
          https://www.ub.uni-bielefeld.de/~befehl/

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Reply via email to