So what would be the recommendation then to have balanced shards automatically 
in specific collection (if collection is used as separate abstraction/storage 
for each customer/ tenant  to comply with multi tenancy/security isolation) ?

For example, if we have 10 shards each 100k (1M total) documents size for best 
and optimized ingestion/query performance...adding more documents will make 
sense to have 11th shard, and reaching 1.1M total will make sense to add 12th 
one eventually.

Is it reasonable to use some automation of collections API, splitting shards 
accordingly to some strategy (largest, oldest, etc.) ?

Aren't some out of the box capabilities in Solr Cloud search engine ? Or maybe 
some libraries/operators on top to simplify k8s deployments, but not only for 
queries and automatic PODs scaling but also automating data storage 
optimization (per volume, date, any other custom logic..).

Thanks in advance,
Adi


Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: Sunday, July 17, 2022 5:44:24 PM
To: users@solr.apache.org <users@solr.apache.org>
Subject: Re: Autoscaling

On 7/17/22 07:40, Ronen Nussbaum wrote:
> We are planning to migrate our Solr Cloud clusters to the cloud.
> Currently it is installed on-prem for each customer.
> It is already deployed as Docker containers.
> Instead of estimating in advance what is the number of shards needed, or
> the number of pods, we'd like to rely on EKS cluster autoscaler, K8S HPA
> and Solr autoscaling.
> Our main concern is the deprecation of the autoscaling feature since
> version 9.0.

One thing I am not sure you're aware of:  You can't add shards to a
collection unless it is using the implicit router, which is poorly named
because what it means is that sharding is 100% user-managed (manual).

There is the shard splitting capability in the Collections API, but that
only works on a single shard, not the whole collection. If you wanted to
adjust from say 6 to 8 shards and still have the shards be approximately
equal in size, you would need to build a new collection and completely
reindex.

There have been a number of issues filed for a rebalance feature, but it
has not been implemented because implementing it would involve a very
large amount of work, and making it stable would take even more work.
And I am not even sure the Lucene API has the capability to do it currently.

https://issues.apache.org/jira/browse/SOLR-9241

> What is your recommendation? Should we start with 8.11? Will it be a
> substitute soon?

I have no idea whether a substitute will be available.  You can manually
do everything that the autoscaler would do with the Collections API.

Thanks,
Shawn



This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.

Reply via email to