So what would be the recommendation then to have balanced shards automatically in specific collection (if collection is used as separate abstraction/storage for each customer/ tenant to comply with multi tenancy/security isolation) ?
For example, if we have 10 shards each 100k (1M total) documents size for best and optimized ingestion/query performance...adding more documents will make sense to have 11th shard, and reaching 1.1M total will make sense to add 12th one eventually. Is it reasonable to use some automation of collections API, splitting shards accordingly to some strategy (largest, oldest, etc.) ? Aren't some out of the box capabilities in Solr Cloud search engine ? Or maybe some libraries/operators on top to simplify k8s deployments, but not only for queries and automatic PODs scaling but also automating data storage optimization (per volume, date, any other custom logic..). Thanks in advance, Adi Get Outlook for Android<https://aka.ms/AAb9ysg> ________________________________ From: Shawn Heisey <apa...@elyograg.org> Sent: Sunday, July 17, 2022 5:44:24 PM To: users@solr.apache.org <users@solr.apache.org> Subject: Re: Autoscaling On 7/17/22 07:40, Ronen Nussbaum wrote: > We are planning to migrate our Solr Cloud clusters to the cloud. > Currently it is installed on-prem for each customer. > It is already deployed as Docker containers. > Instead of estimating in advance what is the number of shards needed, or > the number of pods, we'd like to rely on EKS cluster autoscaler, K8S HPA > and Solr autoscaling. > Our main concern is the deprecation of the autoscaling feature since > version 9.0. One thing I am not sure you're aware of: You can't add shards to a collection unless it is using the implicit router, which is poorly named because what it means is that sharding is 100% user-managed (manual). There is the shard splitting capability in the Collections API, but that only works on a single shard, not the whole collection. If you wanted to adjust from say 6 to 8 shards and still have the shards be approximately equal in size, you would need to build a new collection and completely reindex. There have been a number of issues filed for a rebalance feature, but it has not been implemented because implementing it would involve a very large amount of work, and making it stable would take even more work. And I am not even sure the Lucene API has the capability to do it currently. https://issues.apache.org/jira/browse/SOLR-9241 > What is your recommendation? Should we start with 8.11? Will it be a > substitute soon? I have no idea whether a substitute will be available. You can manually do everything that the autoscaler would do with the Collections API. Thanks, Shawn This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.