@Walter,

Just curious for 320 shards and 2.5 billion docs, how is the query
throughput?  The internal distributed traffic would be huge due to the fan
out.

Regards,
Wei


On Thu, Sep 14, 2023 at 7:36 AM Walter Underwood <wun...@wunderwood.org>
wrote:

> I would have thought it would be a huge hassle, but it isn’t. We build a
> cluster with ArgoCD and Kubernetes deployments. The nodes all report to
> DataDog. The only real bother is that it is hard to go directly to the
> admin UI on a specific node, something doesn’t work right with the
> hostnames and permissions from outside.
>
> Oh, and we run blue/green clusters, so there are two of these beasts.
>
> The graphical display in the admin UI is pretty impressive. Here is a view
> of part of it.
>
>
> https://www.dropbox.com/scl/fi/99xfgek24qocowhft6q7b/Screenshot-2023-09-14-at-7.27.16-AM.png?rlkey=nmjyrl9z0n92lgidfei45vq4q&dl=0
>
> The collection currently has about 2.5 billion documents. When I worked at
> Infoseek, our index of the entire web was 12 million documents.
>
> This is at LexisNexis.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Sep 13, 2023, at 11:47 PM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
> >
> > @Walter,
> >
> > how on earth are you monitoring all vital Solr Cloud Parameters for 320
> shards?
> >
> > Regards,
> > Bernd
> >
> >
> > Am 13.09.23 um 16:22 schrieb Walter Underwood:
> >> This is all great advice.
> >> There is no optimal number of shards. I’ve run clusters with 4 shards,
> we currently have one cluster with 96 shards and one with 320 shards. The
> next one we build out will probably not be sharded.
> >> With long queries, I’ve usually seen a roughly linear speedup with
> sharding. Double the shards, halve the response time.
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>> On Sep 13, 2023, at 4:48 AM, Jan Høydahl <jan....@cominvent.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> There are no hard rules wrt sharding, it often comes down to measuring
> and experimenting for your workload.
> >>>
> >>> There are other things to consider than shard size. Why are the
> queries slow? How many rows do you ask for? Do you use faceting? Grouping?
> >>> You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM
> does each node have, and how much RAM did you allocate to Solr/Java?
> >>> A common mistake is to allocate too much ram/heap to Solr to you don't
> get any virtual memory caching in Linux.
> >>> Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of
> those to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk
> caching.
> >>>
> >>> Other things to consider is to look at whether your queries can be
> optimized by rewriting them to more efficient equivalents. Sometimes,
> Solr-level caches can also help.
> >>>
> >>> Wrt shards efficiency: If you already have 8 shards, it is not much
> more expensive to go to 16, but you increase the risk of a single failure
> affecting your requests...
> >>>
> >>> Jan
> >>>
> >>>> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta <
> saksham.gu...@indiamart.com.INVALID>:
> >>>>
> >>>> Hi All,
> >>>>
> >>>> I have been trying to reduce the response time of solr cloud(v8.10, 8
> >>>> nodes). To achieve this, I have tried increasing the number of shards
> of
> >>>> solr cloud which can help reduce data size on each shard thereby
> reducing
> >>>> response time.
> >>>>
> >>>>
> >>>> I have encountered a few questions regarding sharding strategy:
> >>>>
> >>>> 1. How to decide the ideal number of shards? Is there a minimum or
> maximum
> >>>> number of shards which should be used?
> >>>>
> >>>> 2. What is the minimum size of a shard after which reducing the size
> >>>> further won't have any effect on the response time (as time taken by
> other
> >>>> factors like data aggregation will compensate for that) ?
> >>>>
> >>>> 3. Is there some maximum limit to the size of data that should be
> kept in a
> >>>> shard?
> >>>>
> >>>>
> >>>> As of now we have 8 shards each on a separate node with ~25 gb of
> >>>> data(15-16 million docs) present on each shard. Please advise me of
> the
> >>>> standard approaches to define the number of shards and shard size.
> Thanks
> >>>> in advance.
> >>>
> >
> > --
> > *************************************************************
> > Bernd Fehling                    Bielefeld University Library
> > Dipl.-Inform. (FH)                LibTec - Library Technology
> > Universitätsstr. 25                  and Knowledge Management
> > 33615 Bielefeld
> > Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de
> >          https://www.ub.uni-bielefeld.de/~befehl/
> >
> > BASE - Bielefeld Academic Search Engine - www.base-search.net
> > *************************************************************
>
>

Reply via email to