> And if people start using RemainingLogs and RemainingSegments and then REALLY FEEL like they need RemainingBytes, then we can always add it in the future.
+1 Thanks James! Luke On Wed, May 11, 2022 at 3:57 PM James Cheng <wushuja...@gmail.com> wrote: > Hi Luke, > > Thanks for the detailed explanation. I agree that the current proposal of > RemainingLogs and RemainingSegments will greatly improve the situation, and > that we can go ahead with the KIP as is. > > If RemainingBytes were straight-forward to implement, then I’d like to > have it. But we can live without it for now. And if people start using > RemainingLogs and RemainingSegments and then REALLY FEEL like they need > RemainingBytes, then we can always add it in the future. > > Thanks Luke, for the detailed explanation, and for responding to my > feedback! > > -James > > Sent from my iPhone > > > On May 10, 2022, at 6:48 AM, Luke Chen <show...@gmail.com> wrote: > > > > Hi James and all, > > > > I checked again and I can see when creating UnifiedLog, we expected the > > logs/indexes/snapshots are in good state. > > So, I don't think we should break the current design to expose the > > `RemainingBytesToRecovery` > > metric. > > > > If there is no other comments, I'll start a vote within this week. > > > > Thank you. > > Luke > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <show...@gmail.com> wrote: > >> > >> Hi James, > >> > >> Thanks for your input. > >> > >> For the `RemainingBytesToRecovery` metric proposal, I think there's one > >> thing I didn't make it clear. > >> Currently, when log manager start up, we'll try to load all logs > >> (segments), and during the log loading, we'll try to recover logs if > >> necessary. > >> And the logs loading is using "thread pool" as you thought. > >> > >> So, here's the problem: > >> All segments in each log folder (partition) will be loaded in each log > >> recovery thread, and until it's loaded, we can know how many segments > (or > >> how many Bytes) needed to recover. > >> That means, if we have 10 partition logs in one broker, and we have 2 > log > >> recovery threads (num.recovery.threads.per.data.dir=2), before the > >> threads load the segments in each log, we only know how many logs > >> (partitions) we have in the broker (i.e. RemainingLogsToRecover metric). > >> We cannot know how many segments/Bytes needed to recover until each > thread > >> starts to load the segments under one log (partition). > >> > >> So, the example in the KIP, it shows: > >> Currently, there are still 5 logs (partitions) needed to recover under > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread > has > >> 10000 segments needed to recover, and the other one has 3 segments > needed > >> to recover. > >> > >> - kafka.log > >> - LogManager > >> - RemainingLogsToRecover > >> - /tmp/log1 => 5 ← there are 5 logs under > >> /tmp/log1 needed to be recovered > >> - /tmp/log2 => 0 > >> - RemainingSegmentsToRecover > >> - /tmp/log1 ← 2 threads are doing log > >> recovery for /tmp/log1 > >> - 0 => 10000 ← there are 10000 segments needed to be > >> recovered for thread 0 > >> - 1 => 3 > >> - /tmp/log2 > >> - 0 => 0 > >> - 1 => 0 > >> > >> > >> So, after a while, the metrics might look like this: > >> It said, now, there are only 4 logs needed to recover in /tmp/log1, and > >> the thread 0 has 9000 segments left, and thread 1 has 5 segments left > >> (which should imply the thread already completed 2 logs recovery in the > >> period) > >> > >> - kafka.log > >> - LogManager > >> - RemainingLogsToRecover > >> - /tmp/log1 => 3 ← there are 3 logs under > >> /tmp/log1 needed to be recovered > >> - /tmp/log2 => 0 > >> - RemainingSegmentsToRecover > >> - /tmp/log1 ← 2 threads are doing log > >> recovery for /tmp/log1 > >> - 0 => 9000 ← there are 9000 segments needed to be > >> recovered for thread 0 > >> - 1 => 5 > >> - /tmp/log2 > >> - 0 => 0 > >> - 1 => 0 > >> > >> > >> That said, the `RemainingBytesToRecovery` metric is difficult to achieve > >> as you expected. I think the current proposal with > `RemainingLogsToRecover` > >> and `RemainingSegmentsToRecover` should already provide enough info for > >> the log recovery progress. > >> > >> I've also updated the KIP example to make it clear. > >> > >> > >> Thank you. > >> Luke > >> > >> > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <wushuja...@gmail.com> > wrote: > >>> > >>> Hi Luke, > >>> > >>> Thanks for adding RemainingSegmentsToRecovery. > >>> > >>> Another thought: different topics can have different segment sizes. I > >>> don't know how common it is, but it is possible. Some topics might want > >>> small segment sizes to more granular expiration of data. > >>> > >>> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery > >>> is that the rate that they will decrement depends on the configuration > and > >>> patterns of the topics and partitions and segment sizes. If someone is > >>> monitoring those metrics, they might see times where the metric > decrements > >>> slowly, followed by a burst where it decrements quickly. > >>> > >>> What about RemainingBytesToRecovery? This would not depend on the > >>> configuration of the topic or of the data. It would actually be a > pretty > >>> good metric, because I think that this metric would change at a > constant > >>> rate (based on the disk I/O speed that the broker allocates to > recovery). > >>> Because it changes at a constant rate, you would be able to use the > >>> rate-of-change to predict when it hits zero, which will let you know > when > >>> the broker is going to start up. Like, I would imagine if we graphed > >>> RemainingBytesToRecovery that we'd see a fairly straight line that is > >>> decrementing at a steady rate towards zero. > >>> > >>> What do you think about adding RemainingBytesToRecovery? > >>> > >>> Or, what would you think about making the primary metric be > >>> RemainingBytesToRecovery, and getting rid of the others? > >>> > >>> I don't know if I personally would rather have all 3 metrics, or would > >>> just use RemainingBytesToRecovery. I'd too would like more community > input > >>> on which of those metrics would be useful to people. > >>> > >>> About the JMX metrics, you said that if > >>> num.recovery.threads.per.data.dir=2, that there might be a separate > >>> RemainingSegmentsToRecovery counter for each thread. Is that actually > how > >>> the data is structured within the Kafka recovery threads? Does each > thread > >>> get a fixed set of partitions, or is there just one big pool of > partitions > >>> that the threads all work on? > >>> > >>> As a more concrete example: > >>> * If I have 9 small partitions and 1 big partition, and > >>> num.recovery.threads.per.data.dir=2 > >>> Does each thread get 5 partitions, which means one thread will finish > >>> much sooner than the other? > >>> OR > >>> Do both threads just work on the set of 10 partitions, which means > likely > >>> 1 thread will be busy with the big partition, while the other one ends > up > >>> plowing through the 9 small partitions? > >>> > >>> If each thread gets assigned 5 partitions, then it would make sense > that > >>> each thread has its own counter. > >>> If the threads works on a single pool of 10 partitions, then it would > >>> probably mean that the counter is on the pool of partitions itself, > and not > >>> on each thread. > >>> > >>> -James > >>> > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <show...@gmail.com> wrote: > >>>> > >>>> Hi devs, > >>>> > >>>> If there are no other comments, I'll start a vote tomorrow. > >>>> > >>>> Thank you. > >>>> Luke > >>>> > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <show...@gmail.com> wrote: > >>>> > >>>>> Hi James, > >>>>> > >>>>> Sorry for the late reply. > >>>>> > >>>>> Yes, this is a good point, to know how many segments to be recovered > if > >>>>> there are some large partitions. > >>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric > >>> for > >>>>> each log recovery thread, to show the value. > >>>>> The example in the Proposed section here > >>>>> < > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges > >>>> > >>>>> shows what it will look like. > >>>>> > >>>>> Thanks for the suggestion. > >>>>> > >>>>> Thank you. > >>>>> Luke > >>>>> > >>>>> > >>>>> > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wushuja...@gmail.com> > >>> wrote: > >>>>> > >>>>>> The KIP describes RemainingLogsToRecovery, which seems to be the > >>> number > >>>>>> of partitions in each log.dir. > >>>>>> > >>>>>> We have some partitions which are much much larger than others. > Those > >>>>>> large partitions have many many more segments than others. > >>>>>> > >>>>>> Is there a way the metric can reflect partition size? Could it be > >>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover? > >>>>>> > >>>>>> -James > >>>>>> > >>>>>> Sent from my iPhone > >>>>>> > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <show...@gmail.com> wrote: > >>>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I'd like to propose a KIP to expose a metric for log recovery > >>> progress. > >>>>>>> This metric would let the admins have a way to monitor the log > >>> recovery > >>>>>>> progress. > >>>>>>> Details can be found here: > >>>>>>> > >>>>>> > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress > >>>>>>> > >>>>>>> Any feedback is appreciated. > >>>>>>> > >>>>>>> Thank you. > >>>>>>> Luke > >>>>>> > >>>>> > >>> > >>> >