Hello all,

While implementing the last piece of this KIP for the coming 2.6 release, I
realized that it is important to cover the following monitoring metrics as
well so I'd propose adding them as part of KIP-444 too:

Instance-level:

   - number of alive stream threads, INFO
   - number of alive cleanup threads, INFO
   - number of alive global threads, INFO
   - number of alive restore threads, INFO

Monitoring these numbers can help if any threads died unexpectedly while
the instance is still proceeding.

Thread-level:

   - avg / max number of records polled from the consumer per thread
   iteration, INFO
   - avg / max number of records processed by the task manager (i.e. across
   all tasks) per thread iteration, INFO

Ideally the all polled records can be processed as well within one
iteration --- if one observed either we polled too few records such that
thread is mostly idling, or polled too many records that the thread cannot
keep up, she should go ahead and tune the consumer configs.

Task-level:

   - number of current buffered records at the moment (i.e. it is just a
   dynamic gauge), DEBUG.

This is a finer grained metric indicating which task's processing cannot
keep up with the fetching throughput.


Please let me know if anyone has any concerns about the proposed metrics.


Guozhang



On Mon, Sep 9, 2019 at 5:17 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> +1 (binding)
>
>
> -Matthias
>
> On 9/5/19 11:47 AM, Guozhang Wang wrote:
> > +1 from myself.
> >
> > I'm now officially closing this voting thread with the following tally:
> >
> > binding +1: 3 (Guozhang, Bill, Matthias voted on the DISCUSS thread).
> > non-binding +1: 2 (Bruno, John).
> >
> >
> > Guozhang
> >
> >
> > On Thu, Aug 22, 2019 at 8:16 AM Bill Bejeck <bbej...@gmail.com> wrote:
> >
> >> +1 (binding)
> >>
> >> -Bill
> >>
> >> On Thu, Aug 22, 2019 at 10:55 AM John Roesler <j...@confluent.io>
> wrote:
> >>
> >>> Hi Guozhang, thanks for cleaning this up.
> >>>
> >>> I'm +1 (non-binding)
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On Thu, Aug 22, 2019 at 2:26 AM Bruno Cadonna <br...@confluent.io>
> >> wrote:
> >>>
> >>>> Hi Guozhang,
> >>>>
> >>>> +1 (non-binding)
> >>>>
> >>>> Thank you for driving this!
> >>>> Bruno
> >>>>
> >>>> On Tue, Aug 20, 2019 at 8:29 PM Guozhang Wang <wangg...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>> Hello folks,
> >>>>>
> >>>>> I'd like to start a voting thread the following KIP to improve the
> >>> Kafka
> >>>>> Streams metrics mechanism to users. This includes 1) renaming changes
> >>> in
> >>>>> the public StreamsMetrics utils API, and 2) a major cleanup on the
> >>>> Streams'
> >>>>> own built-in metrics hierarchy.
> >>>>>
> >>>>> Details can be found here:
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams
> >>>>>
> >>>>> I'd love to hear your thoughts and feedbacks. Thanks!
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>
> >>>
> >>
> >
> >
>
>

-- 
-- Guozhang

Reply via email to