Re: Telling if a job has caught up with Kafka

2017-10-30 Thread aitozi
Hi, rmetzger0 Sorry to reply to this old question, i found that we use the kafka client 0.9 in class kafkaThread which lead to the lose of many other detail metrics add in kafka client 10 like per partition consumer lag mentioned by this issuse https://issues.apache.org/jira/browse/FLINK-7945. i

Re: Telling if a job has caught up with Kafka

2017-03-23 Thread Robert Metzger
Sorry for joining this discussion late, but there is already a metric for the offset lag in our 0.9+ consumers. Its called the "records-lag-max": https://kafka.apache.org/documentation/#new_consumer_fetch_monitoring and its exposed via Flink's metrics system. The only issue is that it only shows th

Re: Telling if a job has caught up with Kafka

2017-03-20 Thread Bruno Aranda
Hi, Thanks! The proposal sounds very good to us too. Bruno On Sun, 19 Mar 2017 at 10:57 Florian König wrote: > Thanks Gordon for the detailed explanation! That makes sense and explains > the expected behaviour. > > The JIRA for the new metric also sounds very good. Can’t wait to have this > in

Re: Telling if a job has caught up with Kafka

2017-03-19 Thread Florian König
Thanks Gordon for the detailed explanation! That makes sense and explains the expected behaviour. The JIRA for the new metric also sounds very good. Can’t wait to have this in the Flink GUI (KafkaOffsetMonitor has some problems and stops working after 1-2 days, don’t know the reason yet). All

Re: Telling if a job has caught up with Kafka

2017-03-18 Thread Gyula Fóra
Thanks Gordon! :) Gyula On Sat, Mar 18, 2017, 08:26 Tzu-Li (Gordon) Tai wrote: So we would have current lag per partition (for instance every 1 sec) and lag at the latest checkpoint per partition in an easily queryable way. I quite like this idea! We could perhaps call them “currentOffsetLag”

Re: Telling if a job has caught up with Kafka

2017-03-18 Thread Tzu-Li (Gordon) Tai
@Florian the 0.9 / 0.10 version and 0.8 version behave a bit differently right now for the offset committing. In 0.9 / 0.10, if checkpointing is enabled, the “auto.commit.enable” etc. settings will be completely ignored and overwritten before used to instantiate the interval Kafka clients, henc

Re: Telling if a job has caught up with Kafka

2017-03-18 Thread Tzu-Li (Gordon) Tai
So we would have current lag per partition (for instance every 1 sec) and lag at the latest checkpoint per partition in an easily queryable way. I quite like this idea! We could perhaps call them “currentOffsetLag” and “lastCheckpointedOffsetLag”. I’ve filed a JIRA to track this feature, and add

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Gyula Fóra
Hi Gordon, Thanks for the suggestions, I think in general it would be good to make this periodic (with a configurable interval), and also show the latest committed (checkpointed) offset lag. I think it's better to show both not only one of them as they both carry useful information. So we would h

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Tzu-Li (Gordon) Tai
One other possibility for reporting “consumer lag” is to update the metric only at a configurable interval, if use cases can tolerate a certain delay in realizing the consumer has caught up. Or we could also piggy pack the consumer lag update onto the checkpoint interval - I think in the case t

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Tzu-Li (Gordon) Tai
Hi, I was thinking somewhat similar to what Ufuk suggested, but if we want to report a “consumer lag” metric, we would essentially need to request the latest offset on every record fetch (because the latest offset advances as well), so I wasn’t so sure of the performance tradeoffs there (the parti

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Ufuk Celebi
@Gordon: What's your take on integrating this directly into the consumer? Can't we poll the latest offset wie the Offset API [1] and report a consumer lag metric for the consumer group of the application? This we could also display in the web frontend. In the first version, users would have to pol

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Bruno Aranda
Hi, We are interested on this too. So far we flag the records with timestamps in different points of the pipeline and use metrics gauges to measure latency between the different components, but would be good to know if there is something more specific to Kafka that we can do out of the box in Flin

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Florian König
Hi, thank you Gyula for posting that question. I’d also be interested in how this could be done. You mentioned the dependency on the commit frequency. I’m using https://github.com/quantifind/KafkaOffsetMonitor. With the 08 Kafka consumer a job's offsets as shown in the diagrams updated a lot m