Hi, We have a requirement to calculate metrics on a huge number of keys (could be hundreds of millions, perhaps billions of keys - attempting caching on individual keys in many cases will have almost a 0% cache hit rate). Is Kafka Streams with RocksDB and compacting topics the right tool for a task like that?
As well, just from playing with Kafka Streams for a week it feels like it wants to create a lot of separate stores by default (if I want to calculate aggregates on five, ten and 30 days I will get three separate stores by default for this state data). Coming from a different distributed storage solution, I feel like I want to put them together in one store as I/O has always been my bottleneck (1 big read and 1 big write is better than three small separate reads and three small separate writes). But am I perhaps missing something here? I don't want to avoid the DSL that Kafka Streams provides if I don't have to. Will the Kafka Streams RocksDB solution be so much faster than a distributed read that it won't be the bottleneck even with huge amounts of data? Any info/opinions would be greatly appreciated. thanks in advance, Gareth Collins