On 7/17/22 11:25, Kaminski, Adi wrote:
For example, if we have 10 shards each 100k (1M total) documents size for best 
and optimized ingestion/query performance...adding more documents will make 
sense to have 11th shard, and reaching 1.1M total will make sense to add 12th 
one eventually.

One million total documents is actually a pretty small index, and as you were told in another reply, is not big enough in most situations to require sharding, unless your hardware has very little cpu/memory/storage.

Is it reasonable to use some automation of collections API, splitting shards 
accordingly to some strategy (largest, oldest, etc.) ?

In a typical scenario, every shard will be approximately equal in size, and will contain documents of any age.  If you have a 10 shard index and you split one of the shards, then you will have 9 shards of relatively equal size and two shards that are each half the size of the other 9.  To correctly redistribute the load, you would need to split ALL the shards, so you would end up with 20 shards, or some other multiple of 10, the starting point.

In my last reply, I mentioned the implicit router.  This is the router you would need to use if you want to organize your shards by something like date.  But then every single document you index must indicate what shard it will end up on -- there is no automatic routing.

Aren't some out of the box capabilities in Solr Cloud search engine ? Or maybe 
some libraries/operators on top to simplify k8s deployments, but not only for 
queries and automatic PODs scaling but also automating data storage 
optimization (per volume, date, any other custom logic..).

I have no idea what you are asking here.

Thanks,
Shawn

Reply via email to