Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Luke Chen Wed, 11 May 2022 01:02:17 -0700

> And if people start using RemainingLogs and RemainingSegments and then
REALLY FEEL like they need RemainingBytes, then we can always add it in the
future.


+1

Thanks James!
Luke

On Wed, May 11, 2022 at 3:57 PM James Cheng <[email protected]> wrote:

> Hi Luke,
>
> Thanks for the detailed explanation. I agree that the current proposal of
> RemainingLogs and RemainingSegments will greatly improve the situation, and
> that we can go ahead with the KIP as is.
>
> If RemainingBytes were straight-forward to implement, then I’d like to
> have it. But we can live without it for now. And if people start using
> RemainingLogs and RemainingSegments and then REALLY FEEL like they need
> RemainingBytes, then we can always add it in the future.
>
> Thanks Luke, for the detailed explanation, and for responding to my
> feedback!
>
> -James
>
> Sent from my iPhone
>
> > On May 10, 2022, at 6:48 AM, Luke Chen <[email protected]> wrote:
> >
> > Hi James and all,
> >
> > I checked again and I can see when creating UnifiedLog, we expected the
> > logs/indexes/snapshots are in good state.
> > So, I don't think we should break the current design to expose the
> > `RemainingBytesToRecovery`
> > metric.
> >
> > If there is no other comments, I'll start a vote within this week.
> >
> > Thank you.
> > Luke
> >
> >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <[email protected]> wrote:
> >>
> >> Hi James,
> >>
> >> Thanks for your input.
> >>
> >> For the `RemainingBytesToRecovery` metric proposal, I think there's one
> >> thing I didn't make it clear.
> >> Currently, when log manager start up, we'll try to load all logs
> >> (segments), and during the log loading, we'll try to recover logs if
> >> necessary.
> >> And the logs loading is using "thread pool" as you thought.
> >>
> >> So, here's the problem:
> >> All segments in each log folder (partition) will be loaded in each log
> >> recovery thread, and until it's loaded, we can know how many segments
> (or
> >> how many Bytes) needed to recover.
> >> That means, if we have 10 partition logs in one broker, and we have 2
> log
> >> recovery threads (num.recovery.threads.per.data.dir=2), before the
> >> threads load the segments in each log, we only know how many logs
> >> (partitions) we have in the broker (i.e. RemainingLogsToRecover metric).
> >> We cannot know how many segments/Bytes needed to recover until each
> thread
> >> starts to load the segments under one log (partition).
> >>
> >> So, the example in the KIP, it shows:
> >> Currently, there are still 5 logs (partitions) needed to recover under
> >> /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread
> has
> >> 10000 segments needed to recover, and the other one has 3 segments
> needed
> >> to recover.
> >>
> >>   - kafka.log
> >>      - LogManager
> >>         - RemainingLogsToRecover
> >>            - /tmp/log1 => 5            ← there are 5 logs under
> >>            /tmp/log1 needed to be recovered
> >>            - /tmp/log2 => 0
> >>         - RemainingSegmentsToRecover
> >>            - /tmp/log1                     ← 2 threads are doing log
> >>            recovery for /tmp/log1
> >>            - 0 => 10000         ← there are 10000 segments needed to be
> >>               recovered for thread 0
> >>               - 1 => 3
> >>               - /tmp/log2
> >>               - 0 => 0
> >>               - 1 => 0
> >>
> >>
> >> So, after a while, the metrics might look like this:
> >> It said, now, there are only 4 logs needed to recover in /tmp/log1, and
> >> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
> >> (which should imply the thread already completed 2 logs recovery in the
> >> period)
> >>
> >>   - kafka.log
> >>      - LogManager
> >>         - RemainingLogsToRecover
> >>            - /tmp/log1 => 3            ← there are 3 logs under
> >>            /tmp/log1 needed to be recovered
> >>            - /tmp/log2 => 0
> >>         - RemainingSegmentsToRecover
> >>            - /tmp/log1                     ← 2 threads are doing log
> >>            recovery for /tmp/log1
> >>            - 0 => 9000         ← there are 9000 segments needed to be
> >>               recovered for thread 0
> >>               - 1 => 5
> >>               - /tmp/log2
> >>               - 0 => 0
> >>               - 1 => 0
> >>
> >>
> >> That said, the `RemainingBytesToRecovery` metric is difficult to achieve
> >> as you expected. I think the current proposal with
> `RemainingLogsToRecover`
> >> and `RemainingSegmentsToRecover` should already provide enough info for
> >> the log recovery progress.
> >>
> >> I've also updated the KIP example to make it clear.
> >>
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <[email protected]>
> wrote:
> >>>
> >>> Hi Luke,
> >>>
> >>> Thanks for adding RemainingSegmentsToRecovery.
> >>>
> >>> Another thought: different topics can have different segment sizes. I
> >>> don't know how common it is, but it is possible. Some topics might want
> >>> small segment sizes to more granular expiration of data.
> >>>
> >>> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery
> >>> is that the rate that they will decrement depends on the configuration
> and
> >>> patterns of the topics and partitions and segment sizes. If someone is
> >>> monitoring those metrics, they might see times where the metric
> decrements
> >>> slowly, followed by a burst where it decrements quickly.
> >>>
> >>> What about RemainingBytesToRecovery? This would not depend on the
> >>> configuration of the topic or of the data. It would actually be a
> pretty
> >>> good metric, because I think that this metric would change at a
> constant
> >>> rate (based on the disk I/O speed that the broker allocates to
> recovery).
> >>> Because it changes at a constant rate, you would be able to use the
> >>> rate-of-change to predict when it hits zero, which will let you know
> when
> >>> the broker is going to start up. Like, I would imagine if we graphed
> >>> RemainingBytesToRecovery that we'd see a fairly straight line that is
> >>> decrementing at a steady rate towards zero.
> >>>
> >>> What do you think about adding RemainingBytesToRecovery?
> >>>
> >>> Or, what would you think about making the primary metric be
> >>> RemainingBytesToRecovery, and getting rid of the others?
> >>>
> >>> I don't know if I personally would rather have all 3 metrics, or would
> >>> just use RemainingBytesToRecovery. I'd too would like more community
> input
> >>> on which of those metrics would be useful to people.
> >>>
> >>> About the JMX metrics, you said that if
> >>> num.recovery.threads.per.data.dir=2, that there might be a separate
> >>> RemainingSegmentsToRecovery counter for each thread. Is that actually
> how
> >>> the data is structured within the Kafka recovery threads? Does each
> thread
> >>> get a fixed set of partitions, or is there just one big pool of
> partitions
> >>> that the threads all work on?
> >>>
> >>> As a more concrete example:
> >>> * If I have 9 small partitions and 1 big partition, and
> >>> num.recovery.threads.per.data.dir=2
> >>> Does each thread get 5 partitions, which means one thread will finish
> >>> much sooner than the other?
> >>> OR
> >>> Do both threads just work on the set of 10 partitions, which means
> likely
> >>> 1 thread will be busy with the big partition, while the other one ends
> up
> >>> plowing through the 9 small partitions?
> >>>
> >>> If each thread gets assigned 5 partitions, then it would make sense
> that
> >>> each thread has its own counter.
> >>> If the threads works on a single pool of 10 partitions, then it would
> >>> probably mean that the counter is on the pool of partitions itself,
> and not
> >>> on each thread.
> >>>
> >>> -James
> >>>
> >>>> On May 4, 2022, at 5:55 AM, Luke Chen <[email protected]> wrote:
> >>>>
> >>>> Hi devs,
> >>>>
> >>>> If there are no other comments, I'll start a vote tomorrow.
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>>
> >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <[email protected]> wrote:
> >>>>
> >>>>> Hi James,
> >>>>>
> >>>>> Sorry for the late reply.
> >>>>>
> >>>>> Yes, this is a good point, to know how many segments to be recovered
> if
> >>>>> there are some large partitions.
> >>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric
> >>> for
> >>>>> each log recovery thread, to show the value.
> >>>>> The example in the Proposed section here
> >>>>> <
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> >>>>
> >>>>> shows what it will look like.
> >>>>>
> >>>>> Thanks for the suggestion.
> >>>>>
> >>>>> Thank you.
> >>>>> Luke
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <[email protected]>
> >>> wrote:
> >>>>>
> >>>>>> The KIP describes RemainingLogsToRecovery, which seems to be the
> >>> number
> >>>>>> of partitions in each log.dir.
> >>>>>>
> >>>>>> We have some partitions which are much much larger than others.
> Those
> >>>>>> large partitions have many many more segments than others.
> >>>>>>
> >>>>>> Is there a way the metric can reflect partition size? Could it be
> >>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
> >>>>>>
> >>>>>> -James
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'd like to propose a KIP to expose a metric for log recovery
> >>> progress.
> >>>>>>> This metric would let the admins have a way to monitor the log
> >>> recovery
> >>>>>>> progress.
> >>>>>>> Details can be found here:
> >>>>>>>
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> >>>>>>>
> >>>>>>> Any feedback is appreciated.
> >>>>>>>
> >>>>>>> Thank you.
> >>>>>>> Luke
> >>>>>>
> >>>>>
> >>>
> >>>
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Reply via email to