Hi Chris,

Thanks for the info! very helpful!
Seems very reasonable, by the way, it all started when I was looking for
some open source monitoring tool for Samza/Kafka to see which tasks are the
bottleneck in terms of performance. Do you have any experience with such a
tool (other than the internal solution developed at LinkedIn)?
 On 26 Feb 2015 20:11, "Chris Riccomini" <criccom...@apache.org> wrote:

> Hey Dotan,
>
> The high-level (ZK-based) Kafka consumer (not Samza's) currently uses ZK to
> store offsets. They (Kafka) are moving away from this when they re-write
> their new NIO-based consumer. They will adopt the strategy of storing
> offsets in a Kafka topic, just like Samza has for years.
>
> The main motivation for not storing offsets in ZK is that it imposes
> artificial limits on how often you can checkpoint due to ZK scalability.
> For example, if you wanted to checkpoint your offsets after every message,
> you would hammer away on ZK with thousands of writers per-second, just for
> one consumer. Multiple this out by 100s or 1000s of consumers, and the ZK
> grid would never be able to keep up. Kafka is actually really good at
> exactly this kind of workload. In general, using ZK as a KV store is not a
> great idea.
>
> The other benefit of storing offsets in Kafka is that it means Samza
> doesn't directly depend on ZK (just transitively, through Kafka). This
> should make operating Samza easier.
>
> Cheers,
> Chris
>
> On Wed, Feb 25, 2015 at 10:09 PM, Dotan Patrich <dot...@fortscale.com>
> wrote:
>
> > Hi,
> >
> > I was looking for a quick and easy way to monitor tasks offsets and
> > stumbled upon this utility:
> > https://github.com/quantifind/KafkaOffsetMonitor
> >
> > It didn't work for me and what I discovered is that it they apparently
> look
> > for the consumers list and offsets in zookeeper, while Samza stores those
> > in a kafka topic.
> > I tried to think what could be the down sides of using zookeeper to store
> > offsets (performance?) but didn't had anything solid that came to mind.
> >
> > I guess you guys had some discussions regarding this in the past, What
> > would be the pros/cons for storing the offsets in a kafka topic instead
> of
> > zookeeper?
> >
> >
> > Thanks,
> > Dotan
> >
>

Reply via email to