Hello all, While implementing the last piece of this KIP for the coming 2.6 release, I realized that it is important to cover the following monitoring metrics as well so I'd propose adding them as part of KIP-444 too:
Instance-level: - number of alive stream threads, INFO - number of alive cleanup threads, INFO - number of alive global threads, INFO - number of alive restore threads, INFO Monitoring these numbers can help if any threads died unexpectedly while the instance is still proceeding. Thread-level: - avg / max number of records polled from the consumer per thread iteration, INFO - avg / max number of records processed by the task manager (i.e. across all tasks) per thread iteration, INFO Ideally the all polled records can be processed as well within one iteration --- if one observed either we polled too few records such that thread is mostly idling, or polled too many records that the thread cannot keep up, she should go ahead and tune the consumer configs. Task-level: - number of current buffered records at the moment (i.e. it is just a dynamic gauge), DEBUG. This is a finer grained metric indicating which task's processing cannot keep up with the fetching throughput. Please let me know if anyone has any concerns about the proposed metrics. Guozhang On Mon, Sep 9, 2019 at 5:17 PM Matthias J. Sax <matth...@confluent.io> wrote: > +1 (binding) > > > -Matthias > > On 9/5/19 11:47 AM, Guozhang Wang wrote: > > +1 from myself. > > > > I'm now officially closing this voting thread with the following tally: > > > > binding +1: 3 (Guozhang, Bill, Matthias voted on the DISCUSS thread). > > non-binding +1: 2 (Bruno, John). > > > > > > Guozhang > > > > > > On Thu, Aug 22, 2019 at 8:16 AM Bill Bejeck <bbej...@gmail.com> wrote: > > > >> +1 (binding) > >> > >> -Bill > >> > >> On Thu, Aug 22, 2019 at 10:55 AM John Roesler <j...@confluent.io> > wrote: > >> > >>> Hi Guozhang, thanks for cleaning this up. > >>> > >>> I'm +1 (non-binding) > >>> > >>> Thanks, > >>> -John > >>> > >>> On Thu, Aug 22, 2019 at 2:26 AM Bruno Cadonna <br...@confluent.io> > >> wrote: > >>> > >>>> Hi Guozhang, > >>>> > >>>> +1 (non-binding) > >>>> > >>>> Thank you for driving this! > >>>> Bruno > >>>> > >>>> On Tue, Aug 20, 2019 at 8:29 PM Guozhang Wang <wangg...@gmail.com> > >>> wrote: > >>>>> > >>>>> Hello folks, > >>>>> > >>>>> I'd like to start a voting thread the following KIP to improve the > >>> Kafka > >>>>> Streams metrics mechanism to users. This includes 1) renaming changes > >>> in > >>>>> the public StreamsMetrics utils API, and 2) a major cleanup on the > >>>> Streams' > >>>>> own built-in metrics hierarchy. > >>>>> > >>>>> Details can be found here: > >>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams > >>>>> > >>>>> I'd love to hear your thoughts and feedbacks. Thanks! > >>>>> > >>>>> -- > >>>>> -- Guozhang > >>>> > >>> > >> > > > > > > -- -- Guozhang