Hi Mathias, We're currently using a custom solution that queries kafka and zookeeper (2 different processes) for topic size and consumer offset and submits the information to a collectd/statsd instance that ships it on to graphite, so we can track it in grafana.
There's no alerting built in, but it gives us a good overview of what's going on without having to consolidate another service like the offset monitor On Tue, Mar 17, 2015 at 10:36 AM, Mathias Söderberg < mathias.soederb...@gmail.com> wrote: > Hi Lance, > > I tried Kafka Offset Monitor a while back, but it didn't play especially > nice with a lot of topics / partitions (we currently have around 1400 > topics and 4000 partitions in total). Might be possible to make it work a > bit better, but not sure it would be the best way to do alerting. > > Thanks for the tip though :). > > Best regards, > Mathias > > > On Mon, 16 Mar 2015 at 21:02 Lance Laursen <llaur...@rubiconproject.com> > wrote: > > > Hey Mathias, > > > > Kafka Offset Monitor will give you a general idea of where your consumer > > group(s) are at: > > > > http://quantifind.com/KafkaOffsetMonitor/ > > > > However, I'm not sure how useful it will be with "a large number of > topics" > > / turning its output into a script that alerts upon a threshold. Could > take > > a look and see what they're doing though. > > > > On Mon, Mar 16, 2015 at 8:31 AM, Mathias Söderberg < > > mathias.soederb...@gmail.com> wrote: > > > > > Good day, > > > > > > I'm looking into using SimpleConsumer#getOffsetsBefore and offsets > > > committed in ZooKeeper for monitoring the lag of a consumer group. > > > > > > Our current use case is that we have a service that is continuously > > > consuming messages of a large number of topics and persisting the > > messages > > > to S3 at somewhat regular intervals (depends on time and the total size > > of > > > consumed messages for each partition). Offsets are committed to > ZooKeeper > > > after the messages have been persisted to S3. > > > The partitions are of varying load, so a simple threshold based on the > > > number of messages we're lagging behind would be cumbersome to maintain > > due > > > to the number of topics, and most likely prone to unnecessary alerts. > > > > > > Currently our broker configuration specifies log.roll.hours=1 and > > > log.segment.bytes=1GB, and my proposed solution is to have a separate > > > service that would iterate through all topics/partitions and use > > > #getOffsetsBefore with a timestamp that is one (1) or two (2) hours ago > > and > > > compare the first offset (which from my testing looks to be the offset > > that > > > is closest in time, i.e. from the log segment that is closest to the > > > timestamp given) with the one that is saved to ZooKeeper. > > > It feels like a pretty solid solution, given that we just want a rough > > > estimate of how much we're lagging behind in time, so that we know > > (again, > > > roughly) how much time we have to fix whatever is broken before the log > > > segments are deleted by Kafka. > > > > > > Is there anyone doing monitoring similar to this? Are there any obvious > > > downsides of this approach that I'm not thinking about? Thoughts on > > > alternatives? > > > > > > Best regards, > > > Mathias > > > > > > -- *Kasper Mackenhauer Jacobsen*