Hey Dotan, The high-level (ZK-based) Kafka consumer (not Samza's) currently uses ZK to store offsets. They (Kafka) are moving away from this when they re-write their new NIO-based consumer. They will adopt the strategy of storing offsets in a Kafka topic, just like Samza has for years.
The main motivation for not storing offsets in ZK is that it imposes artificial limits on how often you can checkpoint due to ZK scalability. For example, if you wanted to checkpoint your offsets after every message, you would hammer away on ZK with thousands of writers per-second, just for one consumer. Multiple this out by 100s or 1000s of consumers, and the ZK grid would never be able to keep up. Kafka is actually really good at exactly this kind of workload. In general, using ZK as a KV store is not a great idea. The other benefit of storing offsets in Kafka is that it means Samza doesn't directly depend on ZK (just transitively, through Kafka). This should make operating Samza easier. Cheers, Chris On Wed, Feb 25, 2015 at 10:09 PM, Dotan Patrich <dot...@fortscale.com> wrote: > Hi, > > I was looking for a quick and easy way to monitor tasks offsets and > stumbled upon this utility: > https://github.com/quantifind/KafkaOffsetMonitor > > It didn't work for me and what I discovered is that it they apparently look > for the consumers list and offsets in zookeeper, while Samza stores those > in a kafka topic. > I tried to think what could be the down sides of using zookeeper to store > offsets (performance?) but didn't had anything solid that came to mind. > > I guess you guys had some discussions regarding this in the past, What > would be the pros/cons for storing the offsets in a kafka topic instead of > zookeeper? > > > Thanks, > Dotan >