Hey Jose,
Yeah, that was an initial discussion point that isn't going to be
implemented. I'll move it to "rejected alternatives" and remove the
"proposed changes" section. Thanks for the feedback.
Best,
Kevin
On Mon, Apr 14, 2025 at 4:31 PM Kevin Wu wrote:
> Hey Colin,
>
> > How about somethin
Hi Kevin,
The KIP says the following:
"However, if we want to add another value to BrokerRegistrationState
that maps to starting up brokers (i.e. never unfenced), this would
require adding a boolean to the broker's registration record.
Additionally, if we want to track is a broker has been unclean
We had some offline discussion about this, so let me summarize.
- States as strings really doesn't play well with Grafana, Datadog, New Relic
and whatever else people are using with Kafka.
- We might want to add more states in the future (like "never-unfenced",
"post-controlled-shutdown", etc.)
Hi Colin and Kevin,
On Mon, Apr 14, 2025 at 5:25 PM Colin McCabe wrote:
> How about something like this?
> 10 = fenced
> 20 = controlled shutdown
> 30 = active
Very few users are going to know what these values mean. For example,
I see myself having to look up the code and KIP to remember
Hey Colin,
> How about something like this? > 10 = fenced > 20 = controlled shutdown >
30 = active
Yeah, that seems reasonable to me. Thanks for the suggestion.
Kevin
On Mon, Apr 14, 2025 at 12:42 PM Kevin Wu wrote:
> Thanks for the comments Federico.
>
> > If I understand correctly unfence
Hi Kevin,
The values for
kafka.controller:type=KafkaController,name=BrokerRegistrationState seem a bit
unintuitive.
Using 0 for active might be confusing to systems that treat metrics that aren't
present as 0. Or to people just scanning the graph visually.
How about something like this?
10
Thanks for the comments Federico.
> If I understand correctly unfenced == active. In the code we always
> use the term active, so I think it would be better to use that for the >
state 0 description.
I've updated the KIP description to refer to "active".
> You propose creating per-broker metrics
Thanks for the comments Jose.
For 1 and 2, I've changed the naming of the metrics to follow your
suggestion of tags/attributes. For 3, I made a note as to why we need the
maximum. Basically, it's because the map that contains broker contact times
we're using as the source for these metrics removes
Thanks for the improvement Kevin. I got a chance to look at the KIP.
1. kafka.controller:type=KafkaController,name=BrokerRegistrationState.kafka-X
Can we use tags or attributes instead of different names? For example,
how about
kafka.controller:type=KafkaController,name=BrokerRegistrationState,
Hi Kevin, thanks for the KIP. I have a couple of questions/considerations.
If I understand correctly unfenced == active. In the code we always
use the term active, so I think it would be better to use that for the
state 0 description.
You propose creating per-broker metrics indicating their state
>
> That's an interesting idea. However, I think that's going to be messy and
> difficult for people to use. For example, how would you set up Grafana or
> Datadog to use this? The string could also get extremely long (imagine 1000
> brokers all in startup.)
Hmm... Yeah from what I've read so far
On Thu, Feb 27, 2025, at 12:19, Kevin Wu wrote:
>>
>> I guess my concern is that the time-based metrics would reset to 0 on
>> every failover (if I understand the proposed implementation correctly).
>> That seems likely to create confusion.
>
> Yeah that makes sense to me. I'm fine with moving towa
>
> I guess my concern is that the time-based metrics would reset to 0 on
> every failover (if I understand the proposed implementation correctly).
> That seems likely to create confusion.
Yeah that makes sense to me. I'm fine with moving towards the approach of
either (since I don't think we need
On Tue, Feb 25, 2025, at 14:40, Colin McCabe wrote:
> On Tue, Feb 25, 2025, at 14:12, Kevin Wu wrote:
>> Hey Colin,
>>
>> Thanks for the review.
>>
>> Regarding the metrics that reflect times: my initial thinking was to indeed
>> have these be "soft state", which would be reset when a controller fa
On Tue, Feb 25, 2025, at 14:12, Kevin Wu wrote:
> Hey Colin,
>
> Thanks for the review.
>
> Regarding the metrics that reflect times: my initial thinking was to indeed
> have these be "soft state", which would be reset when a controller failover
> happens. I'm not sure if it's a big issue if these
Hey Colin,
Thanks for the review.
Regarding the metrics that reflect times: my initial thinking was to indeed
have these be "soft state", which would be reset when a controller failover
happens. I'm not sure if it's a big issue if these values get reset
though, since a controller failover means
Hey Colin,
Thanks for the review.
Regarding the metrics that reflect times: my initial thinking was to indeed
have these be "soft state", which would be reset when a controller failover
happens. I'm not sure if it's a big issue if these values get reset
though, since a controller failover means
Hi Kevin,
Thanks for the KIP.
I notice that you have some metrics that reflect times here, such as
LongestPendingStartupTimeMs, LongestPendingControlledShudownTimeMs, etc. I
think this may be difficult to do with complete accuracy because we don't
include times in the metadata log events for r
Hey all,
I posted a KIP to monitor broker startup and controlled shutdown on the
controller-side. Here's the link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1131%3A+Controller-side+monitoring+for+broker+shutdown+and+startup
Best,
Kevin Wu
19 matches
Mail list logo