Hi, With mult tenants, scaling on the #tenants axis will be simply adding new collections to the cluster. That should be fairly simple with K8S and SolrOperator. First add N new nodes to your EKS cluster, then use --scale in your SolrOperator to add more PODs, which will then pop up as "empty" Solr nodes in the cluster. Finally, create the new collection(s) with desired number of shards/replicas, and let the new PlacementPlugins introduced in Solr 9 (https://solr.apache.org/guide/solr/latest/configuration-guide/replica-placement-plugins.html) take care of placing the new collection on the best PODs (typically the new empty ones).
Should a tenant start to see slowness due to too many docs per shard, you could then either migrate that collection to a new one with more shards, or build into your app's control plane a feature which would perform SPLITSHARD <https://solr.apache.org/guide/solr/latest/deployment-guide/shard-management.html#splitshard> + MOVEREPLICA <https://solr.apache.org/guide/solr/latest/deployment-guide/replica-management.html#movereplica> on that collection. Looks like MOVEREPLICA does not support automatically picking targetNode using placement logic, which would have made the operation much simpler. Jan > 18. jul. 2022 kl. 09:00 skrev Kaminski, Adi <adi.kamin...@verint.com.INVALID>: > > Shawn - thanks for your response ! > > 1M index was just an example. For instance, we are planning to have multiple > customers on same SolrCloud cluster (each customer/tenant=collection) . Some > customers may have 1-2M docs (small ones), > some will have 3-12M docs (medium ones) and some will have 20-80M docs (large > ones). If we migrate 100 such customers of different sizes, eventually we > will end up with 1B+ docs in same SolrCloud cluster (depends on ratio of > large vs medium vs small ones of course(. > > The thing is that we cannot project the growth of each customer (Solr > collection) other than relying on size/quota that the customer has with > on-prem deployment before we migrate to cloud. > And also, would like to prevent static tuning (#shards) and then manual > operations management (such as splits, rebalancing if supported, etc.) based > on some rules/etc. > > That's why we are asking whether some automatic capabilities exist in Solr to > ease the maintenance work and simplify the tuning (we understand that some > exist in Solr 8.11 but planned to be deprecated starting Solr 9.x) > > Alternatively, if there are some other best practices to meet our use case, > we'll be happy to hear some direction. > > Thanks in advance, > Adi > > -----Original Message----- > From: Shawn Heisey <apa...@elyograg.org> > Sent: Monday, July 18, 2022 12:42 AM > To: users@solr.apache.org > Subject: Re: Autoscaling > > On 7/17/22 11:25, Kaminski, Adi wrote: >> For example, if we have 10 shards each 100k (1M total) documents size for >> best and optimized ingestion/query performance...adding more documents will >> make sense to have 11th shard, and reaching 1.1M total will make sense to >> add 12th one eventually. > > One million total documents is actually a pretty small index, and as you were > told in another reply, is not big enough in most situations to require > sharding, unless your hardware has very little cpu/memory/storage. > >> Is it reasonable to use some automation of collections API, splitting shards >> accordingly to some strategy (largest, oldest, etc.) ? > > In a typical scenario, every shard will be approximately equal in size, and > will contain documents of any age. If you have a 10 shard index and you > split one of the shards, then you will have 9 shards of relatively equal size > and two shards that are each half the size of the other 9. To correctly > redistribute the load, you would need to split ALL the shards, so you would > end up with 20 shards, or some other multiple of 10, the starting point. > > In my last reply, I mentioned the implicit router. This is the router you > would need to use if you want to organize your shards by something like date. > But then every single document you index must indicate what shard it will > end up on -- there is no automatic routing. > >> Aren't some out of the box capabilities in Solr Cloud search engine ? Or >> maybe some libraries/operators on top to simplify k8s deployments, but not >> only for queries and automatic PODs scaling but also automating data storage >> optimization (per volume, date, any other custom logic..). > > I have no idea what you are asking here. > > Thanks, > Shawn > > > > This electronic message may contain proprietary and confidential information > of Verint Systems Inc., its affiliates and/or subsidiaries. The information > is intended to be for the use of the individual(s) or entity(ies) named > above. If you are not the intended recipient (or authorized to receive this > e-mail for the intended recipient), you may not use, copy, disclose or > distribute to anyone this message or any information contained in this > message. If you have received this electronic message in error, please notify > us by replying to this e-mail.