Well to start you should just have one shard. 1 million documents is barely anything justifying sharding it out. So it’s really quite easy to balance one shard and one server
> On Jul 17, 2022, at 1:26 PM, Kaminski, Adi <adi.kamin...@verint.com.invalid> > wrote: > > So what would be the recommendation then to have balanced shards > automatically in specific collection (if collection is used as separate > abstraction/storage for each customer/ tenant to comply with multi > tenancy/security isolation) ? > > For example, if we have 10 shards each 100k (1M total) documents size for > best and optimized ingestion/query performance...adding more documents will > make sense to have 11th shard, and reaching 1.1M total will make sense to add > 12th one eventually. > > Is it reasonable to use some automation of collections API, splitting shards > accordingly to some strategy (largest, oldest, etc.) ? > > Aren't some out of the box capabilities in Solr Cloud search engine ? Or > maybe some libraries/operators on top to simplify k8s deployments, but not > only for queries and automatic PODs scaling but also automating data storage > optimization (per volume, date, any other custom logic..). > > Thanks in advance, > Adi > > > Get Outlook for Android<https://aka.ms/AAb9ysg> > ________________________________ > From: Shawn Heisey <apa...@elyograg.org> > Sent: Sunday, July 17, 2022 5:44:24 PM > To: users@solr.apache.org <users@solr.apache.org> > Subject: Re: Autoscaling > >> On 7/17/22 07:40, Ronen Nussbaum wrote: >> We are planning to migrate our Solr Cloud clusters to the cloud. >> Currently it is installed on-prem for each customer. >> It is already deployed as Docker containers. >> Instead of estimating in advance what is the number of shards needed, or >> the number of pods, we'd like to rely on EKS cluster autoscaler, K8S HPA >> and Solr autoscaling. >> Our main concern is the deprecation of the autoscaling feature since >> version 9.0. > > One thing I am not sure you're aware of: You can't add shards to a > collection unless it is using the implicit router, which is poorly named > because what it means is that sharding is 100% user-managed (manual). > > There is the shard splitting capability in the Collections API, but that > only works on a single shard, not the whole collection. If you wanted to > adjust from say 6 to 8 shards and still have the shards be approximately > equal in size, you would need to build a new collection and completely > reindex. > > There have been a number of issues filed for a rebalance feature, but it > has not been implemented because implementing it would involve a very > large amount of work, and making it stable would take even more work. > And I am not even sure the Lucene API has the capability to do it currently. > > https://issues.apache.org/jira/browse/SOLR-9241 > >> What is your recommendation? Should we start with 8.11? Will it be a >> substitute soon? > > I have no idea whether a substitute will be available. You can manually > do everything that the autoscaler would do with the Collections API. > > Thanks, > Shawn > > > > This electronic message may contain proprietary and confidential information > of Verint Systems Inc., its affiliates and/or subsidiaries. The information > is intended to be for the use of the individual(s) or entity(ies) named > above. If you are not the intended recipient (or authorized to receive this > e-mail for the intended recipient), you may not use, copy, disclose or > distribute to anyone this message or any information contained in this > message. If you have received this electronic message in error, please notify > us by replying to this e-mail.