To whom it may concern,

This is a broader question around building the architecture for SolrCloud
Time Routed Alias application. I'm using SolrCloud to ingest time-series
data on a regular basis and have SolrCloud running in a Kubernetes Cluster.
A Solr node gets attached every time we add a new Pod to our cluster. Each
pod has a persistent volume claim, so this is how we scale our storage as
well.

Since I'm trying to use Time Routed Aliases, it creates a new collection
with preemptive calculation and currently places them across the Solr pods
based on how much free-disk space is available in a pod, so new pods will
get selected for shard placements whenever a new pod is introduced.

However, I would like to design a solution where we can avoid
*hot-spotting* Solr
nodes by distributing the shards across older pods and yet still
maintaining SolrCloud architecture that grows in size as data is ingested
every day.

I'm unsure what the best configuration would be at a collection/cluster
level based on the available policies in
https://solr.apache.org/guide/8_6/solrcloud-autoscaling-policy-preferences.html

I'm currently creating collections at *weekly-intervals* and my use-cases
involve searching across data at least *2 weeks old*. Because ingested data
will be placed in newer pods, my client side facing applications will be
bombarding the newer pods every time.

*Each collection has a replication factor of 2 and a numShards parameter of
2.*

What level of configuration on a collection/alias/cluster level should I
use in order to avoid hot-spotting?

Warm Regards,
Nahian

Reply via email to