Thanks for the review David. Here are the answers to your questions. I will update the KIP to make the info clearer.
> 1) Does "publisher-error-count" represent the number of errors > encountered only when loading the most recent image? Or will this value be > the cumulative number of publisher errors since the broker started? > 2) Same question for "listener-batch-load-error-count" The intent is to have a cumulative number for both of these. The rationale is that any fault in loading an image (even if a subsequent load was OK) is worthy of inspection. It would be good to have a way to bring the count down to zero through an operator initiated signal, but that could be a follow up. > 3) Will ForceRenounceCount be zero for non-leader controllers? Or will this > value remain between elections and only get reset to zero upon a restart I think it makes sense to keep these metrics for all controllers in the system. A forced resignation is usually looked at after it has happened, and at that point, the controller might not be the leader anymore. > On Jul 27, 2022, at 11:39 AM, David Arthur > <david.art...@confluent.io.invalid> wrote: > > Thanks for the KIP, Niket! I definitely agree we need to surface metadata > processing errors to the operator. I have some questions about the > semantics of the new metrics: > > 1) Does "publisher-error-count" represent the number of errors > encountered only when loading the most recent image? Or will this value be > the cumulative number of publisher errors since the broker started? > 2) Same question for "listener-batch-load-error-count" > 3) Will ForceRenounceCount be zero for non-leader controllers? Or will this > value remain between elections and only get reset to zero upon a restart > > Thanks! > David > > On Wed, Jul 27, 2022 at 2:20 PM Niket Goel <ng...@confluent.io.invalid> > wrote: > >> >> Hi all, >> >> I would like to start a discussion on adding some new metrics to KRaft to >> allow for better visibility into log processing errors. >> >> KIP URL: >> https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/KAFKA/KIP-859%253A%2BAdd%2BMetadata%2BLog%2BProcessing%2BError%2BRelated%2BMetrics&source=gmail-imap&ust=1659551965000000&usg=AOvVaw2Uzcu-JIs-OZSdfTavNjn7 >> >> Thanks! >> Niket >> >> > > -- > -David