Re: Autoscaling

Jan Høydahl Fri, 05 Aug 2022 04:51:27 -0700

Hi,

With mult tenants, scaling on the #tenants axis will be simply adding new 
collections to the cluster. That should be fairly simple with K8S and 
SolrOperator. First add N new nodes to your EKS cluster, then use --scale in 
your SolrOperator to add more PODs, which will then pop up as "empty" Solr 
nodes in the cluster. Finally, create the new collection(s) with desired number 
of shards/replicas, and let the new PlacementPlugins introduced in Solr 9 
(https://solr.apache.org/guide/solr/latest/configuration-guide/replica-placement-plugins.html)
 take care of placing the new collection on the best PODs (typically the new 
empty ones).


Should a tenant start to see slowness due to too many docs per shard, you could 
then either migrate that collection to a new one with more shards, or build 
into your app's control plane a feature which would perform SPLITSHARD 
<https://solr.apache.org/guide/solr/latest/deployment-guide/shard-management.html#splitshard>
 + MOVEREPLICA 
<https://solr.apache.org/guide/solr/latest/deployment-guide/replica-management.html#movereplica>
 on that collection. Looks like MOVEREPLICA does not support automatically 
picking targetNode using placement logic, which would have made the operation 
much simpler.

Jan

> 18. jul. 2022 kl. 09:00 skrev Kaminski, Adi <adi.kamin...@verint.com.INVALID>:
> 
> Shawn - thanks for your response !
> 
> 1M index was just an example. For instance, we are planning to have multiple 
> customers on same SolrCloud cluster (each customer/tenant=collection) . Some 
> customers may have 1-2M docs (small ones),
> some will have 3-12M docs (medium ones) and some will have 20-80M docs (large 
> ones). If we migrate 100 such customers of different sizes, eventually we 
> will end up with 1B+ docs in same SolrCloud cluster (depends on ratio of 
> large vs medium vs small ones of course(.
> 
> The thing is that we cannot project the growth of each customer (Solr 
> collection) other than relying on size/quota that the customer has with 
> on-prem deployment before we migrate to cloud.
> And also, would like to prevent static tuning (#shards) and then manual 
> operations management  (such as splits, rebalancing if supported, etc.) based 
> on some rules/etc.
> 
> That's why we are asking whether some automatic capabilities exist in Solr to 
> ease the maintenance work and simplify the tuning (we understand that some 
> exist in Solr 8.11 but planned to be deprecated starting Solr 9.x)
> 
> Alternatively, if there are some other best practices to meet our use case, 
> we'll be happy to hear some direction.
> 
> Thanks in advance,
> Adi
> 
> -----Original Message-----
> From: Shawn Heisey <apa...@elyograg.org>
> Sent: Monday, July 18, 2022 12:42 AM
> To: users@solr.apache.org
> Subject: Re: Autoscaling
> 
> On 7/17/22 11:25, Kaminski, Adi wrote:
>> For example, if we have 10 shards each 100k (1M total) documents size for 
>> best and optimized ingestion/query performance...adding more documents will 
>> make sense to have 11th shard, and reaching 1.1M total will make sense to 
>> add 12th one eventually.
> 
> One million total documents is actually a pretty small index, and as you were 
> told in another reply, is not big enough in most situations to require 
> sharding, unless your hardware has very little cpu/memory/storage.
> 
>> Is it reasonable to use some automation of collections API, splitting shards 
>> accordingly to some strategy (largest, oldest, etc.) ?
> 
> In a typical scenario, every shard will be approximately equal in size, and 
> will contain documents of any age.  If you have a 10 shard index and you 
> split one of the shards, then you will have 9 shards of relatively equal size 
> and two shards that are each half the size of the other 9. To correctly 
> redistribute the load, you would need to split ALL the shards, so you would 
> end up with 20 shards, or some other multiple of 10, the starting point.
> 
> In my last reply, I mentioned the implicit router.  This is the router you 
> would need to use if you want to organize your shards by something like date. 
>  But then every single document you index must indicate what shard it will 
> end up on -- there is no automatic routing.
> 
>> Aren't some out of the box capabilities in Solr Cloud search engine ? Or 
>> maybe some libraries/operators on top to simplify k8s deployments, but not 
>> only for queries and automatic PODs scaling but also automating data storage 
>> optimization (per volume, date, any other custom logic..).
> 
> I have no idea what you are asking here.
> 
> Thanks,
> Shawn
> 
> 
> 
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.

Re: Autoscaling

Reply via email to