On 7/17/22 11:25, Kaminski, Adi wrote:
For example, if we have 10 shards each 100k (1M total) documents size for best and optimized ingestion/query performance...adding more documents will make sense to have 11th shard, and reaching 1.1M total will make sense to add 12th one eventually.
One million total documents is actually a pretty small index, and as you were told in another reply, is not big enough in most situations to require sharding, unless your hardware has very little cpu/memory/storage.
Is it reasonable to use some automation of collections API, splitting shards accordingly to some strategy (largest, oldest, etc.) ?
In a typical scenario, every shard will be approximately equal in size, and will contain documents of any age. If you have a 10 shard index and you split one of the shards, then you will have 9 shards of relatively equal size and two shards that are each half the size of the other 9. To correctly redistribute the load, you would need to split ALL the shards, so you would end up with 20 shards, or some other multiple of 10, the starting point.
In my last reply, I mentioned the implicit router. This is the router you would need to use if you want to organize your shards by something like date. But then every single document you index must indicate what shard it will end up on -- there is no automatic routing.
Aren't some out of the box capabilities in Solr Cloud search engine ? Or maybe some libraries/operators on top to simplify k8s deployments, but not only for queries and automatic PODs scaling but also automating data storage optimization (per volume, date, any other custom logic..).
I have no idea what you are asking here. Thanks, Shawn