Hi,

We have a requirement to calculate metrics on a huge number of keys (could
be hundreds of millions, perhaps billions of keys - attempting caching on
individual keys in many cases will have almost a 0% cache hit rate). Is
Kafka Streams with RocksDB and compacting topics the right tool for a task
like that?

As well, just from playing with Kafka Streams for a week it feels like it
wants to create a lot of separate stores by default (if I want to calculate
aggregates on five, ten and 30 days I will get three separate stores by
default for this state data). Coming from a different distributed storage
solution, I feel like I want to put them together in one store as I/O has
always been my bottleneck (1 big read and 1 big write is better than three
small separate reads and three small separate writes).

But am I perhaps missing something here? I don't want to avoid the DSL that
Kafka Streams provides if I don't have to. Will the Kafka Streams RocksDB
solution be so much faster than a distributed read that it won't be the
bottleneck even with huge amounts of data?

Any info/opinions would be greatly appreciated.

thanks in advance,
Gareth Collins

Reply via email to