RE: Autoscaling

Kaminski, Adi Mon, 18 Jul 2022 00:00:56 -0700

Shawn - thanks for your response !

1M index was just an example. For instance, we are planning to have multiple 
customers on same SolrCloud cluster (each customer/tenant=collection) . Some 
customers may have 1-2M docs (small ones),
some will have 3-12M docs (medium ones) and some will have 20-80M docs (large 
ones). If we migrate 100 such customers of different sizes, eventually we will 
end up with 1B+ docs in same SolrCloud cluster (depends on ratio of large vs 
medium vs small ones of course(.

The thing is that we cannot project the growth of each customer (Solr 
collection) other than relying on size/quota that the customer has with on-prem 
deployment before we migrate to cloud.
And also, would like to prevent static tuning (#shards) and then manual 
operations management  (such as splits, rebalancing if supported, etc.) based 
on some rules/etc.

That's why we are asking whether some automatic capabilities exist in Solr to 
ease the maintenance work and simplify the tuning (we understand that some 
exist in Solr 8.11 but planned to be deprecated starting Solr 9.x)

Alternatively, if there are some other best practices to meet our use case, 
we'll be happy to hear some direction.

Thanks in advance,
Adi

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org>
Sent: Monday, July 18, 2022 12:42 AM
To: users@solr.apache.org
Subject: Re: Autoscaling

On 7/17/22 11:25, Kaminski, Adi wrote:
> For example, if we have 10 shards each 100k (1M total) documents size for 
> best and optimized ingestion/query performance...adding more documents will 
> make sense to have 11th shard, and reaching 1.1M total will make sense to add 
> 12th one eventually.

One million total documents is actually a pretty small index, and as you were 
told in another reply, is not big enough in most situations to require 
sharding, unless your hardware has very little cpu/memory/storage.

> Is it reasonable to use some automation of collections API, splitting shards 
> accordingly to some strategy (largest, oldest, etc.) ?

In a typical scenario, every shard will be approximately equal in size, and 
will contain documents of any age.  If you have a 10 shard index and you split 
one of the shards, then you will have 9 shards of relatively equal size and two 
shards that are each half the size of the other 9. To correctly redistribute 
the load, you would need to split ALL the shards, so you would end up with 20 
shards, or some other multiple of 10, the starting point.

In my last reply, I mentioned the implicit router.  This is the router you 
would need to use if you want to organize your shards by something like date.  
But then every single document you index must indicate what shard it will end 
up on -- there is no automatic routing.

> Aren't some out of the box capabilities in Solr Cloud search engine ? Or 
> maybe some libraries/operators on top to simplify k8s deployments, but not 
> only for queries and automatic PODs scaling but also automating data storage 
> optimization (per volume, date, any other custom logic..).

I have no idea what you are asking here.

Thanks,
Shawn

This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.

RE: Autoscaling

Reply via email to