Randall,

Thank you for the KIP.  This should improve visibility greatly.  I had a
few questions/ideas for more metrics.


   1. What's the relationship between the worker state and the connector
   status?  Does the 'paused' status at the Connector level include the time
   that worker is 'rebalancing'?
   2. Are the "Source Connector" metrics like record rate an aggregation of
   the "Source Task" metrics?
      - How much value is there is monitoring at the "Source Connector"
      level (other than status) if the number of constituent tasks may change
      over time?
      - I'm imagining that it's most useful to collect metrics at the task
      level as the task-level metrics should be stable regardless of tasks
      shifting to different workers
      - If so, can we duplicate the Connector Status down at the task level
         so that all important metrics can be tracked by task?
         3. For the Sink Task metrics
      - Can we add offset lag and timestamp lag on commit?
         - After records are flushed/committed
            - what is the diff between the record timestamps and commit
            time (histogram)?  this is a measure of end-to-end pipeline latency
            - what is the diff between record offsets and latest offset of
            their partition at commit time (histogram)? this is a
measure of whether
            this particular task is keeping up
         - How about flush error rate?  Assuming the sink connectors are
      using retries, it would be helpful to know how many errors they're seeing
      - Can we tell at the framework level how many records were inserted
      vs updated vs deleted?
      - Batching stats
         - Histogram of flush batch size
         - Counts of flush trigger method (time vs max batch size)

Cheers,

Roger

On Sun, Sep 10, 2017 at 8:45 AM, Randall Hauch <rha...@gmail.com> wrote:

> Thanks, Gwen.
>
> That's a great idea, so I've changed the KIP to add those metrics. I've
> also made a few other changes:
>
>
>    1. The context of all metrics is limited to the activity within the
>    worker. This wasn't clear before, so I changed the motivation and metric
>    descriptions to explicitly state this.
>    2. Added the worker ID to all MBean attributes. In addition to hopefully
>    making this same scope obvious from within JMX or other metric reporting
>    system. This is also similar to how the Kafka producer and consumer
> metrics
>    include the client ID in their MBean attributes. Hopefully this does not
>    negatively impact or complicate how external reporting systems'
> aggregate
>    metrics from multiple workers.
>    3. Stated explicitly that aggregating metrics across workers was out of
>    scope of this KIP.
>    4. Added metrics to report the connector class and version for both sink
>    and source connectors.
>
> Check this KIP's history for details of these changes.
>
> Please let me know if you have any other suggestions. I hope to start the
> voting soon!
>
> Best regards,
>
> Randall
>
> On Thu, Sep 7, 2017 at 9:35 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > Thanks for the KIP, Randall. Those are badly needed!
> >
> > Can we have two metrics with record rate per task? One before SMT and one
> > after?
> > We can have cases where we read 5000 rows from JDBC but write 5 to Kafka,
> > or read 5000 records from Kafka and write 5 due to filtering. I think its
> > important to know both numbers.
> >
> >
> > Gwen
> >
> > On Thu, Sep 7, 2017 at 7:50 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> > > Hi everyone.
> > >
> > > I've created a new KIP to add metrics to the Kafka Connect framework:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 196%3A+Add+metrics+to+Kafka+Connect+framework
> > >
> > > The KIP approval deadline is looming, so if you're interested in Kafka
> > > Connect metrics please review and provide feedback as soon as possible.
> > I'm
> > > interested not only in whether the metrics are sufficient and
> > appropriate,
> > > but also in whether the MBean naming conventions are okay.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > <http://www.confluent.io/blog>
> >
>

Reply via email to