Hmm that is a very good question. It seems to me that we did not add the
corresponding metrics for it when we changed the mechanism. And your
observation is likely to happen, that lag-in-message will not be useful
enough to predict / explain why a follower has been kicked out of ISR.

Could you file a JIRA for this? I think we can create a new metrics
recording (time.milliseconds - r.lastCaughtUpTimeMs) and deprecate the old
metrics.

Guozhang


On Tue, Feb 21, 2017 at 5:47 PM, Jun MA <mj.saber1...@gmail.com> wrote:

> Hi Guozhang,
>
> Thanks for pointing this out. I was actually looking at this before and
> that’s why I’m asking the question. This metric is 'lag in messages', and
> since now the ISR logic relies on lag in seconds, not lag in messages, I’m
> not sure how useful this metrics is. In fact, we saw the value of this
> metrics been 0 all the time, even when there's ISR shrink/expand. I’d
> expect to see a increasing in lag when shrink/expand happens. Is there a
> metrics that can correctly represent the lag between followers and the
> leader?
>
> Thanks,
> Jun
>
> > On Feb 21, 2017, at 10:19 AM, Guozhang Wang <wangg...@gmail.com> wrote:
> >
> > You can find them in https://kafka.apache.org/documentation/#monitoring
> >
> > I think this is the one you are looking for:
> >
> > Lag in messages per follower replica
> > kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+)
> ,topic=([-.\w]+),partition=([0-9]+)
> > lag
> > should be proportional to the maximum batch size of a produce request.
> >
> > On Mon, Feb 20, 2017 at 5:43 PM, Jun Ma <mj.saber1...@gmail.com> wrote:
> >
> >> Hi Guozhang,
> >>
> >> Thanks for your replay. Could you tell me which one indicates the lag
> >> between follower and leader for a specific partition?
> >>
> >> Thanks,
> >> Jun
> >>
> >> On Mon, Feb 20, 2017 at 4:57 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >>
> >>> I don't think the metrics have been changed in 0.9.0.1, in fact even in
> >>> 0.10.x they are still the same as stated in:
> >>>
> >>> https://kafka.apache.org/documentation/#monitoring
> >>>
> >>> The mechanism for determine which followers have been dropped out of
> ISR
> >>> has changed, but the metrics are not.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Sun, Feb 19, 2017 at 7:56 PM, Jun MA <mj.saber1...@gmail.com>
> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I’m looking for the JMX metrics to represent replica lag time for
> >>> 0.9.0.1.
> >>>> Base on the documentation, I can only find kafka.server:type=
> >>>> ReplicaFetcherManager,name=MaxLag,clientId=Replica, which is max lag
> >> in
> >>>> messages btw follower and leader replicas. But since in 0.9.0.1 lag in
> >>>> messages is deprecated and replaced with lag time, I’m wondering what
> >> is
> >>>> the corresponding metrics for this?
> >>>>
> >>>> Thanks,
> >>>> Jun
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> -- Guozhang
> >>>
> >>
> >
> >
> >
> > --
> > -- Guozhang
>
>


-- 
-- Guozhang

Reply via email to