Hi,
just want to pick this up again. You can always use more partitions to
reduce the number of keys handled by a single broker and parallelize the
compaction. So with sufficient number of machines and the ability to
partition I don’t see you running into problems.
Jan
On 07.10.2015 05:34, Feroze Daud wrote:
hi!
We have a use case where we want to store ~100m keys in kafka. Is there any
problem with this approach?
I have heard from some people using kafka, that kafka has a problem when doing
log compaction with those many number of keys.
Another topic might have around 10 different K/V pairs for each key in the
primary topic. The primary topic's keyspace is approx of 100m keys. We would
like to store this in kafka because we are doing a lot of stream processing on
these messages, and want to avoid writing another process to recompute data
from snapshots.
So, in summary:
primary topic: ~100m keyssecondary topic: ~1B keys
Is it feasible to use log compaction at such a scale of data?
Thanks
feroze.