Re: Monitoring of consumer group lag

Kasper Mackenhauer Jacobsen Tue, 17 Mar 2015 02:48:06 -0700

Hi Mathias,

We're currently using a custom solution that queries kafka and zookeeper (2
different processes) for topic size and consumer offset and submits the
information to a collectd/statsd instance that ships it on to graphite, so
we can track it in grafana.


There's no alerting built in, but it gives us a good overview of what's
going on without having to consolidate another service like the offset
monitor


On Tue, Mar 17, 2015 at 10:36 AM, Mathias Söderberg <
mathias.soederb...@gmail.com> wrote:

> Hi Lance,
>
> I tried Kafka Offset Monitor a while back, but it didn't play especially
> nice with a lot of topics / partitions (we currently have around 1400
> topics and 4000 partitions in total). Might be possible to make it work a
> bit better, but not sure it would be the best way to do alerting.
>
> Thanks for the tip though :).
>
> Best regards,
> Mathias
>
>
> On Mon, 16 Mar 2015 at 21:02 Lance Laursen <llaur...@rubiconproject.com>
> wrote:
>
> > Hey Mathias,
> >
> > Kafka Offset Monitor will give you a general idea of where your consumer
> > group(s) are at:
> >
> > http://quantifind.com/KafkaOffsetMonitor/
> >
> > However, I'm not sure how useful it will be with "a large number of
> topics"
> > / turning its output into a script that alerts upon a threshold. Could
> take
> > a look and see what they're doing though.
> >
> > On Mon, Mar 16, 2015 at 8:31 AM, Mathias Söderberg <
> > mathias.soederb...@gmail.com> wrote:
> >
> > > Good day,
> > >
> > > I'm looking into using SimpleConsumer#getOffsetsBefore and offsets
> > > committed in ZooKeeper for monitoring the lag of a consumer group.
> > >
> > > Our current use case is that we have a service that is continuously
> > > consuming messages of a large number of topics and persisting the
> > messages
> > > to S3 at somewhat regular intervals (depends on time and the total size
> > of
> > > consumed messages for each partition). Offsets are committed to
> ZooKeeper
> > > after the messages have been persisted to S3.
> > > The partitions are of varying load, so a simple threshold based on the
> > > number of messages we're lagging behind would be cumbersome to maintain
> > due
> > > to the number of topics, and most likely prone to unnecessary alerts.
> > >
> > > Currently our broker configuration specifies log.roll.hours=1 and
> > > log.segment.bytes=1GB, and my proposed solution is to have a separate
> > > service that would iterate through all topics/partitions and use
> > > #getOffsetsBefore with a timestamp that is one (1) or two (2) hours ago
> > and
> > > compare the first offset (which from my testing looks to be the offset
> > that
> > > is closest in time, i.e. from the log segment that is closest to the
> > > timestamp given) with the one that is saved to ZooKeeper.
> > > It feels like a pretty solid solution, given that we just want a rough
> > > estimate of how much we're lagging behind in time, so that we know
> > (again,
> > > roughly) how much time we have to fix whatever is broken before the log
> > > segments are deleted by Kafka.
> > >
> > > Is there anyone doing monitoring similar to this? Are there any obvious
> > > downsides of this approach that I'm not thinking about? Thoughts on
> > > alternatives?
> > >
> > > Best regards,
> > > Mathias
> > >
> >
>



-- 
*Kasper Mackenhauer Jacobsen*

Re: Monitoring of consumer group lag

Reply via email to