I am referring to the "Lag" that can exist when a Consumer Offset is significantly less than the Log Size. This difference is Lag and is often symptomatic of a problem - processing has stopped or being overwhelmed etc.
Our Legacy Node.js system uses Consumer Groups, (same as say older Spark). To get the Offset, we can use kafka-consumer-groups.sh tool to get the offset. For Ops related work we use Kafkamon for these to get a UI up for Ops folks. Our newer stuff uses Samza and I see zero Consumer Groups. Instead I see checkpoint topics (example: __samza_checkpoint_ver_1_for_generic-delivery_1). I can consume this topic and get the current offset by partition, but I don't have the log size, so cannot compute the lag. All I can do is see these numbers increment but know clue how behind my process is. I just took Linkedin's Burrow (https://github.com/linkedin/Burrow) for a test drive locally, hoping it would solve my problem due to it looking at the internal consumers. However, I have the same problem - can't get data on a consumer group that doesn't exist. Jeremiah Adams Software Engineer www.helixeducation.com Blog | Twitter | Facebook | LinkedIn ________________________________________ From: Tom Davis <t...@recursivedream.com> Sent: Monday, November 26, 2018 6:59 PM To: dev@samza.apache.org Subject: Re: Alerting and Monitoring Samza Checkpointing? Have you looked into KafkaSystemConsumerMetrics? Is the meaning of "lag" there different from what you mean? Jeremiah Adams <jad...@helixeducation.com> writes: > We are replacing a node.js app that consumed topics on a Kafka cluster with > Samza jobs. We use kafka-offsets to trigger alerts based on message lag. e.g., > message lag is greater than 10, wake up support persons. > > > Samza doesn't use the same mechanism for offset storage and the tools for > examining a topic's checkpoint aren't readily useful for application > consumption. > > > Can some of you share your approaches to monitoring and alerting on consumer > lag? > > > Regards. > > > Jeremiah Adams > Software Engineer > https://url.emailprotection.link/?ahfhEufaAWbezBrUFPG98ZJcterGfIerU3ZwsA3Gv_C0~<https://url.emailprotection.link/?a49H2rNGIIBtQOw6md8OcHp-qKE3Xn2gNiZ3dlqAeSDA~> > Blog<https://url.emailprotection.link/?a49H2rNGIIBtQOw6md8OcHgFEZu-KYuiu8doY66NWwmmyWxz7kC-27Yfnbdgd2wyh5gjXUa6LMT_NRXsj1g1VVg~~> > | > Twitter<https://url.emailprotection.link/?a0Q7ct5_6cOdbJ86kpWB0zx6RbtgugTVC7lU_W7za50jLdZQGpLgVlR1V06zckSaM5oOKb6QBo46Qp9xt0Tt7Aw~~> > | > Facebook<https://url.emailprotection.link/?aAmyAO_nS_C1aDgBLeKyGTu0tksTt1_mn2PcS8KJXNJPM04iRHKgX96qGgENV-dMSER5wl8zDVRr3RsS0OmcF9A~~> > | > LinkedIn<https://url.emailprotection.link/?aanlcNI-cN74Gdz-TD332xAl6lHu7TRNICWoHUFjYf-KlBjrCGHoYR65b3rl-OyW10nWFv6hwYvUSoVHL4b3vGA~~>