Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Luke Chen Thu, 02 Jun 2022 03:39:11 -0700

Hi Jun,

Thanks for the comment.


Yes, I've tried to work on this way to track the number of remaining
segments, but it will change the design in UnifiedLog, so I only track the
logs number.
Currently, we will load all segments and recover those segments if needed
"during creating UnifiedLog instance". And also get the log offsets here
<https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842>
.
That is, if we want to get all segments to be recovered before running log
recovery, we need to break the logic in UnifiedLog, to create a partial
UnifiedLog instance, and add more info to it later, which I think is just
making the codes more complicated.


Thank you.
Luke



On Thu, May 26, 2022 at 2:57 AM Jun Rao <[email protected]> wrote:

> Hi, Luke,
>
> Thanks for the KIP. Just one comment.
>
> 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could we
> instead track the number of remaining segments? This monitors the progress
> at a finer granularity and is also consistent with the thread level metric.
>
> Thanks,
>
> Jun
>
> On Wed, May 25, 2022 at 7:47 AM Tom Bentley <[email protected]> wrote:
>
> > Thanks Luke! LGTM.
> >
> > On Sun, 22 May 2022 at 05:18, Luke Chen <[email protected]> wrote:
> >
> > > Hi Tom and Raman,
> > >
> > > Thanks for your comments.
> > >
> > > > 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> > > 2. Similarly the link to this discussion thread needs updating.
> > > > Please update the links to JIRA and the discussion thread.
> > >
> > > Yes, thanks for the reminder. I've updated the KIP.
> > >
> > > > 3. I wonder whether we need to keep these metrics (with value 0) once
> > the
> > > broker enters the running state. Do you see it as valuable? A benefit
> of
> > > removing the metrics would be a reduction on storage required for
> metric
> > > stores which are recording these metrics.
> > >
> > > Yes, removing the metrics after log recovery completed is a good idea.
> > > Updated the KIP.
> > >
> > > > 4. I think the KIP's public interfaces section could be a bit
> clearer.
> > > Previous KIPs which added metrics usually used a table, with the MBean
> > > name, metric type and description. SeeKIP-551 for example (or KIP-748,
> > > KIP-608). Similarly you could use a table in the proposed changes
> section
> > > rather than describing the tree you'd see in an MBean console.
> > >
> > > Good point! Updated the KIP to use a table to list the MBean name,
> metric
> > > type and descriptions.
> > >
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> <[email protected]
> > >
> > > wrote:
> > >
> > > > Hi Luke,
> > > >
> > > > The change is useful and simple. Thanks.
> > > > Please update the links to JIRA and the discussion thread.
> > > >
> > > > Best Regards,
> > > > Raman Verma
> > > >
> > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <[email protected]>
> > wrote:
> > > > >
> > > > > Hi Luke,
> > > > >
> > > > > Thanks for the KIP. I think the idea makes sense and would provide
> > > useful
> > > > > observability of log recovery. I have a few comments.
> > > > >
> > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> updating).
> > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > 3. I wonder whether we need to keep these metrics (with value 0)
> once
> > > the
> > > > > broker enters the running state. Do you see it as valuable? A
> benefit
> > > of
> > > > > removing the metrics would be a reduction on storage required for
> > > metric
> > > > > stores which are recording these metrics.
> > > > > 4. I think the KIP's public interfaces section could be a bit
> > clearer.
> > > > > Previous KIPs which added metrics usually used a table, with the
> > MBean
> > > > > name, metric type and description. SeeKIP-551 for example (or
> > KIP-748,
> > > > > KIP-608). Similarly you could use a table in the proposed changes
> > > section
> > > > > rather than describing the tree you'd see in an MBean console.
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <[email protected]> wrote:
> > > > >
> > > > > > > And if people start using RemainingLogs and RemainingSegments
> and
> > > > then
> > > > > > REALLY FEEL like they need RemainingBytes, then we can always add
> > it
> > > > in the
> > > > > > future.
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks James!
> > > > > > Luke
> > > > > >
> > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <
> [email protected]>
> > > > wrote:
> > > > > >
> > > > > > > Hi Luke,
> > > > > > >
> > > > > > > Thanks for the detailed explanation. I agree that the current
> > > > proposal of
> > > > > > > RemainingLogs and RemainingSegments will greatly improve the
> > > > situation,
> > > > > > and
> > > > > > > that we can go ahead with the KIP as is.
> > > > > > >
> > > > > > > If RemainingBytes were straight-forward to implement, then I’d
> > like
> > > > to
> > > > > > > have it. But we can live without it for now. And if people
> start
> > > > using
> > > > > > > RemainingLogs and RemainingSegments and then REALLY FEEL like
> > they
> > > > need
> > > > > > > RemainingBytes, then we can always add it in the future.
> > > > > > >
> > > > > > > Thanks Luke, for the detailed explanation, and for responding
> to
> > my
> > > > > > > feedback!
> > > > > > >
> > > > > > > -James
> > > > > > >
> > > > > > > Sent from my iPhone
> > > > > > >
> > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <[email protected]>
> > > wrote:
> > > > > > > >
> > > > > > > > Hi James and all,
> > > > > > > >
> > > > > > > > I checked again and I can see when creating UnifiedLog, we
> > > > expected the
> > > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > > So, I don't think we should break the current design to
> expose
> > > the
> > > > > > > > `RemainingBytesToRecovery`
> > > > > > > > metric.
> > > > > > > >
> > > > > > > > If there is no other comments, I'll start a vote within this
> > > week.
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <[email protected]
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> Hi James,
> > > > > > > >>
> > > > > > > >> Thanks for your input.
> > > > > > > >>
> > > > > > > >> For the `RemainingBytesToRecovery` metric proposal, I think
> > > > there's
> > > > > > one
> > > > > > > >> thing I didn't make it clear.
> > > > > > > >> Currently, when log manager start up, we'll try to load all
> > logs
> > > > > > > >> (segments), and during the log loading, we'll try to recover
> > > logs
> > > > if
> > > > > > > >> necessary.
> > > > > > > >> And the logs loading is using "thread pool" as you thought.
> > > > > > > >>
> > > > > > > >> So, here's the problem:
> > > > > > > >> All segments in each log folder (partition) will be loaded
> in
> > > > each log
> > > > > > > >> recovery thread, and until it's loaded, we can know how many
> > > > segments
> > > > > > > (or
> > > > > > > >> how many Bytes) needed to recover.
> > > > > > > >> That means, if we have 10 partition logs in one broker, and
> we
> > > > have 2
> > > > > > > log
> > > > > > > >> recovery threads (num.recovery.threads.per.data.dir=2),
> before
> > > the
> > > > > > > >> threads load the segments in each log, we only know how many
> > > logs
> > > > > > > >> (partitions) we have in the broker (i.e.
> > RemainingLogsToRecover
> > > > > > metric).
> > > > > > > >> We cannot know how many segments/Bytes needed to recover
> until
> > > > each
> > > > > > > thread
> > > > > > > >> starts to load the segments under one log (partition).
> > > > > > > >>
> > > > > > > >> So, the example in the KIP, it shows:
> > > > > > > >> Currently, there are still 5 logs (partitions) needed to
> > recover
> > > > under
> > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where
> > one
> > > > > > thread
> > > > > > > has
> > > > > > > >> 10000 segments needed to recover, and the other one has 3
> > > segments
> > > > > > > needed
> > > > > > > >> to recover.
> > > > > > > >>
> > > > > > > >>   - kafka.log
> > > > > > > >>      - LogManager
> > > > > > > >>         - RemainingLogsToRecover
> > > > > > > >>            - /tmp/log1 => 5            ← there are 5 logs
> > under
> > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > >>            - /tmp/log2 => 0
> > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > >>            - /tmp/log1                     ← 2 threads are
> > doing
> > > > log
> > > > > > > >>            recovery for /tmp/log1
> > > > > > > >>            - 0 => 10000         ← there are 10000 segments
> > > needed
> > > > to
> > > > > > be
> > > > > > > >>               recovered for thread 0
> > > > > > > >>               - 1 => 3
> > > > > > > >>               - /tmp/log2
> > > > > > > >>               - 0 => 0
> > > > > > > >>               - 1 => 0
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> So, after a while, the metrics might look like this:
> > > > > > > >> It said, now, there are only 4 logs needed to recover in
> > > > /tmp/log1,
> > > > > > and
> > > > > > > >> the thread 0 has 9000 segments left, and thread 1 has 5
> > segments
> > > > left
> > > > > > > >> (which should imply the thread already completed 2 logs
> > recovery
> > > > in
> > > > > > the
> > > > > > > >> period)
> > > > > > > >>
> > > > > > > >>   - kafka.log
> > > > > > > >>      - LogManager
> > > > > > > >>         - RemainingLogsToRecover
> > > > > > > >>            - /tmp/log1 => 3            ← there are 3 logs
> > under
> > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > >>            - /tmp/log2 => 0
> > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > >>            - /tmp/log1                     ← 2 threads are
> > doing
> > > > log
> > > > > > > >>            recovery for /tmp/log1
> > > > > > > >>            - 0 => 9000         ← there are 9000 segments
> > needed
> > > > to be
> > > > > > > >>               recovered for thread 0
> > > > > > > >>               - 1 => 5
> > > > > > > >>               - /tmp/log2
> > > > > > > >>               - 0 => 0
> > > > > > > >>               - 1 => 0
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> That said, the `RemainingBytesToRecovery` metric is
> difficult
> > to
> > > > > > achieve
> > > > > > > >> as you expected. I think the current proposal with
> > > > > > > `RemainingLogsToRecover`
> > > > > > > >> and `RemainingSegmentsToRecover` should already provide
> enough
> > > > info
> > > > > > for
> > > > > > > >> the log recovery progress.
> > > > > > > >>
> > > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Thank you.
> > > > > > > >> Luke
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> Hi Luke,
> > > > > > > >>>
> > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > > >>>
> > > > > > > >>> Another thought: different topics can have different
> segment
> > > > sizes. I
> > > > > > > >>> don't know how common it is, but it is possible. Some
> topics
> > > > might
> > > > > > want
> > > > > > > >>> small segment sizes to more granular expiration of data.
> > > > > > > >>>
> > > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > > RemainingSegmentsToRecovery
> > > > > > > >>> is that the rate that they will decrement depends on the
> > > > > > configuration
> > > > > > > and
> > > > > > > >>> patterns of the topics and partitions and segment sizes. If
> > > > someone
> > > > > > is
> > > > > > > >>> monitoring those metrics, they might see times where the
> > metric
> > > > > > > decrements
> > > > > > > >>> slowly, followed by a burst where it decrements quickly.
> > > > > > > >>>
> > > > > > > >>> What about RemainingBytesToRecovery? This would not depend
> on
> > > the
> > > > > > > >>> configuration of the topic or of the data. It would
> actually
> > > be a
> > > > > > > pretty
> > > > > > > >>> good metric, because I think that this metric would change
> > at a
> > > > > > > constant
> > > > > > > >>> rate (based on the disk I/O speed that the broker allocates
> > to
> > > > > > > recovery).
> > > > > > > >>> Because it changes at a constant rate, you would be able to
> > use
> > > > the
> > > > > > > >>> rate-of-change to predict when it hits zero, which will let
> > you
> > > > know
> > > > > > > when
> > > > > > > >>> the broker is going to start up. Like, I would imagine if
> we
> > > > graphed
> > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly straight
> line
> > > > that is
> > > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > > >>>
> > > > > > > >>> What do you think about adding RemainingBytesToRecovery?
> > > > > > > >>>
> > > > > > > >>> Or, what would you think about making the primary metric be
> > > > > > > >>> RemainingBytesToRecovery, and getting rid of the others?
> > > > > > > >>>
> > > > > > > >>> I don't know if I personally would rather have all 3
> metrics,
> > > or
> > > > > > would
> > > > > > > >>> just use RemainingBytesToRecovery. I'd too would like more
> > > > community
> > > > > > > input
> > > > > > > >>> on which of those metrics would be useful to people.
> > > > > > > >>>
> > > > > > > >>> About the JMX metrics, you said that if
> > > > > > > >>> num.recovery.threads.per.data.dir=2, that there might be a
> > > > separate
> > > > > > > >>> RemainingSegmentsToRecovery counter for each thread. Is
> that
> > > > actually
> > > > > > > how
> > > > > > > >>> the data is structured within the Kafka recovery threads?
> > Does
> > > > each
> > > > > > > thread
> > > > > > > >>> get a fixed set of partitions, or is there just one big
> pool
> > of
> > > > > > > partitions
> > > > > > > >>> that the threads all work on?
> > > > > > > >>>
> > > > > > > >>> As a more concrete example:
> > > > > > > >>> * If I have 9 small partitions and 1 big partition, and
> > > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > > >>> Does each thread get 5 partitions, which means one thread
> > will
> > > > finish
> > > > > > > >>> much sooner than the other?
> > > > > > > >>> OR
> > > > > > > >>> Do both threads just work on the set of 10 partitions,
> which
> > > > means
> > > > > > > likely
> > > > > > > >>> 1 thread will be busy with the big partition, while the
> other
> > > one
> > > > > > ends
> > > > > > > up
> > > > > > > >>> plowing through the 9 small partitions?
> > > > > > > >>>
> > > > > > > >>> If each thread gets assigned 5 partitions, then it would
> make
> > > > sense
> > > > > > > that
> > > > > > > >>> each thread has its own counter.
> > > > > > > >>> If the threads works on a single pool of 10 partitions,
> then
> > it
> > > > would
> > > > > > > >>> probably mean that the counter is on the pool of partitions
> > > > itself,
> > > > > > > and not
> > > > > > > >>> on each thread.
> > > > > > > >>>
> > > > > > > >>> -James
> > > > > > > >>>
> > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <[email protected]>
> > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>> Hi devs,
> > > > > > > >>>>
> > > > > > > >>>> If there are no other comments, I'll start a vote
> tomorrow.
> > > > > > > >>>>
> > > > > > > >>>> Thank you.
> > > > > > > >>>> Luke
> > > > > > > >>>>
> > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <
> [email protected]
> > >
> > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Hi James,
> > > > > > > >>>>>
> > > > > > > >>>>> Sorry for the late reply.
> > > > > > > >>>>>
> > > > > > > >>>>> Yes, this is a good point, to know how many segments to
> be
> > > > > > recovered
> > > > > > > if
> > > > > > > >>>>> there are some large partitions.
> > > > > > > >>>>> I've updated the KIP, to add a
> > `*RemainingSegmentsToRecover*`
> > > > > > metric
> > > > > > > >>> for
> > > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > > >>>>> The example in the Proposed section here
> > > > > > > >>>>> <
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > > >>>>
> > > > > > > >>>>> shows what it will look like.
> > > > > > > >>>>>
> > > > > > > >>>>> Thanks for the suggestion.
> > > > > > > >>>>>
> > > > > > > >>>>> Thank you.
> > > > > > > >>>>> Luke
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > > [email protected]>
> > > > > > > >>> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which seems
> to
> > be
> > > > the
> > > > > > > >>> number
> > > > > > > >>>>>> of partitions in each log.dir.
> > > > > > > >>>>>>
> > > > > > > >>>>>> We have some partitions which are much much larger than
> > > > others.
> > > > > > > Those
> > > > > > > >>>>>> large partitions have many many more segments than
> others.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Is there a way the metric can reflect partition size?
> > Could
> > > > it be
> > > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> > RemainingBytesToRecover?
> > > > > > > >>>>>>
> > > > > > > >>>>>> -James
> > > > > > > >>>>>>
> > > > > > > >>>>>> Sent from my iPhone
> > > > > > > >>>>>>
> > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> > [email protected]>
> > > > > > wrote:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Hi all,
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric for log
> > > recovery
> > > > > > > >>> progress.
> > > > > > > >>>>>>> This metric would let the admins have a way to monitor
> > the
> > > > log
> > > > > > > >>> recovery
> > > > > > > >>>>>>> progress.
> > > > > > > >>>>>>> Details can be found here:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Thank you.
> > > > > > > >>>>>>> Luke
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards,
> > > > Raman Verma
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Reply via email to