For mirror maker and our audit application, we've been using Kafka-committed offsets for some time now. We've got a few other consumers who are using it, but we haven't actively worked on moving the bulk of them over. It's been less critical since we put the ZK transaction logs on SSD.
And yeah, this is specific for kafka-committed offsets. I'm looking at some options for handling Zookeeper as well, but since our goal with this was to monitor our own infrastructure applications and move forwards, it hasn't gotten a lot of my attention yet. -Todd On Tue, Jun 9, 2015 at 11:53 AM, Jason Rosenberg <j...@squareup.com> wrote: > Hi Todd, > > Thanks for open sourcing this, I'm excited to take a look. > > It looks like it's specific to offsets stored in kafka (and not zookeeper) > correct? I assume by that that LinkedIn is using the kafka storage now in > production? > > Jason > > On Thu, Jun 4, 2015 at 9:43 PM, Todd Palino <tpal...@gmail.com> wrote: > > > I am very happy to introduce Burrow, an application to provide Kafka > > consumer status as a service. Burrow is different than just a "lag > > checker": > > > > * Multiple Kafka cluster support - Burrow supports any number of Kafka > > clusters in a single instance. You can also run multiple copies of Burrow > > in parallel and only one of them will send out notifications. > > > > * All consumers, all partitions - If the consumer is committing offsets > to > > Kafka (not Zookeeper), it will be available in Burrow automatically. > Every > > partition it consumes will be monitored simultaneously, avoiding the trap > > of just watching the worst partition (MaxLag) or spot checking individual > > topics. > > > > * Status can be checked via HTTP request - There's an internal HTTP > server > > that provides topic and consumer lists, can give you the latest offsets > for > > a topic either from the brokers or from the consumer, and lets you check > > consumer status. > > > > * Continuously monitor groups with output via email or a call to an > > external HTTP endpoint - Configure emails to send for bad groups, checked > > continuously. Or you can have Burrow call an HTTP endpoint into another > > system for handling alerts. > > > > * No thresholds - Status is determined over a sliding window and does not > > rely on a fixed limit. When a consumer is checked, it has a status > > indicator that tells whether it is OK, a warning, or an error, and the > > partitions that caused it to be bad are provided. > > > > > > Burrow was created to address specific problems that LinkedIn has with > > monitoring consumers, in particular wildcard consumers like mirror makers > > and our audit consumers. Instead of checking offsets for specific > consumers > > periodically, it monitors the stream of all committed offsets > > (__consumer_offsets) and continually calculates lag over a sliding > window. > > > > We welcome all feedback, comments, and contributors. This project is very > > much under active development for us (we're using it in some of our > > environments now, and working on getting it running everywhere to replace > > our previous monitoring system). > > > > Burrow is written in Go, published under the Apache License, and hosted > on > > GitHub at: > > https://github.com/linkedin/Burrow > > > > Documentation is on the GitHub wiki at: > > https://github.com/linkedin/Burrow/wiki > > > > -Todd > > >