Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-24 Thread Kevin Wu
Hey Jose, Yeah, that was an initial discussion point that isn't going to be implemented. I'll move it to "rejected alternatives" and remove the "proposed changes" section. Thanks for the feedback. Best, Kevin On Mon, Apr 14, 2025 at 4:31 PM Kevin Wu wrote: > Hey Colin, > > > How about somethin

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-24 Thread José Armando García Sancio
Hi Kevin, The KIP says the following: "However, if we want to add another value to BrokerRegistrationState that maps to starting up brokers (i.e. never unfenced), this would require adding a boolean to the broker's registration record. Additionally, if we want to track is a broker has been unclean

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-21 Thread Colin McCabe
We had some offline discussion about this, so let me summarize. - States as strings really doesn't play well with Grafana, Datadog, New Relic and whatever else people are using with Kafka. - We might want to add more states in the future (like "never-unfenced", "post-controlled-shutdown", etc.)

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-15 Thread José Armando García Sancio
Hi Colin and Kevin, On Mon, Apr 14, 2025 at 5:25 PM Colin McCabe wrote: > How about something like this? > 10 = fenced > 20 = controlled shutdown > 30 = active Very few users are going to know what these values mean. For example, I see myself having to look up the code and KIP to remember

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-14 Thread Kevin Wu
Hey Colin, > How about something like this? > 10 = fenced > 20 = controlled shutdown > 30 = active Yeah, that seems reasonable to me. Thanks for the suggestion. Kevin On Mon, Apr 14, 2025 at 12:42 PM Kevin Wu wrote: > Thanks for the comments Federico. > > > If I understand correctly unfence

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-14 Thread Colin McCabe
Hi Kevin, The values for kafka.controller:type=KafkaController,name=BrokerRegistrationState seem a bit unintuitive. Using 0 for active might be confusing to systems that treat metrics that aren't present as 0. Or to people just scanning the graph visually. How about something like this? 10

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-14 Thread Kevin Wu
Thanks for the comments Federico. > If I understand correctly unfenced == active. In the code we always > use the term active, so I think it would be better to use that for the > state 0 description. I've updated the KIP description to refer to "active". > You propose creating per-broker metrics

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-14 Thread Kevin Wu
Thanks for the comments Jose. For 1 and 2, I've changed the naming of the metrics to follow your suggestion of tags/attributes. For 3, I made a note as to why we need the maximum. Basically, it's because the map that contains broker contact times we're using as the source for these metrics removes

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-04-09 Thread José Armando García Sancio
Thanks for the improvement Kevin. I got a chance to look at the KIP. 1. kafka.controller:type=KafkaController,name=BrokerRegistrationState.kafka-X Can we use tags or attributes instead of different names? For example, how about kafka.controller:type=KafkaController,name=BrokerRegistrationState,

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-03-26 Thread Federico Valeri
Hi Kevin, thanks for the KIP. I have a couple of questions/considerations. If I understand correctly unfenced == active. In the code we always use the term active, so I think it would be better to use that for the state 0 description. You propose creating per-broker metrics indicating their state

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-03-15 Thread Kevin Wu
> > That's an interesting idea. However, I think that's going to be messy and > difficult for people to use. For example, how would you set up Grafana or > Datadog to use this? The string could also get extremely long (imagine 1000 > brokers all in startup.) Hmm... Yeah from what I've read so far

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-03-05 Thread Colin McCabe
On Thu, Feb 27, 2025, at 12:19, Kevin Wu wrote: >> >> I guess my concern is that the time-based metrics would reset to 0 on >> every failover (if I understand the proposed implementation correctly). >> That seems likely to create confusion. > > Yeah that makes sense to me. I'm fine with moving towa

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-02-27 Thread Kevin Wu
> > I guess my concern is that the time-based metrics would reset to 0 on > every failover (if I understand the proposed implementation correctly). > That seems likely to create confusion. Yeah that makes sense to me. I'm fine with moving towards the approach of either (since I don't think we need

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-02-25 Thread Colin McCabe
On Tue, Feb 25, 2025, at 14:40, Colin McCabe wrote: > On Tue, Feb 25, 2025, at 14:12, Kevin Wu wrote: >> Hey Colin, >> >> Thanks for the review. >> >> Regarding the metrics that reflect times: my initial thinking was to indeed >> have these be "soft state", which would be reset when a controller fa

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-02-25 Thread Colin McCabe
On Tue, Feb 25, 2025, at 14:12, Kevin Wu wrote: > Hey Colin, > > Thanks for the review. > > Regarding the metrics that reflect times: my initial thinking was to indeed > have these be "soft state", which would be reset when a controller failover > happens. I'm not sure if it's a big issue if these

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-02-25 Thread Kevin Wu
Hey Colin, Thanks for the review. Regarding the metrics that reflect times: my initial thinking was to indeed have these be "soft state", which would be reset when a controller failover happens. I'm not sure if it's a big issue if these values get reset though, since a controller failover means

RE: Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-02-25 Thread Kevin Wu
Hey Colin, Thanks for the review. Regarding the metrics that reflect times: my initial thinking was to indeed have these be "soft state", which would be reset when a controller failover happens. I'm not sure if it's a big issue if these values get reset though, since a controller failover means

Re: [DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-02-19 Thread Colin McCabe
Hi Kevin, Thanks for the KIP. I notice that you have some metrics that reflect times here, such as LongestPendingStartupTimeMs, LongestPendingControlledShudownTimeMs, etc. I think this may be difficult to do with complete accuracy because we don't include times in the metadata log events for r

[DISCUSS] KIP-1131: Controller-side monitoring for broker shutdown and startup

2025-01-27 Thread Kevin Wu
Hey all, I posted a KIP to monitor broker startup and controlled shutdown on the controller-side. Here's the link: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1131%3A+Controller-side+monitoring+for+broker+shutdown+and+startup Best, Kevin Wu