Hi Ismael,
Yes, a follow up KIP after the controller code settles down sounds good.
Thanks,
Jiangjie (Becket) Qin
On Wed, May 10, 2017 at 6:11 PM, Ismael Juma wrote:
> Thanks Jun. Discussed this offline with Onur and Jun and I believe there's
> agreement so updated the KIP:
>
> https://cwiki.
Thanks Jun. Discussed this offline with Onur and Jun and I believe there's
agreement so updated the KIP:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?
pageId=69407758&selectedPageVersions=8&selectedPageVersions=7
Ismael
On Wed, May 10, 2017 at 4:46 PM, Jun Rao wrote:
> H
Hi, Onur,
We probably don't want to do the 1-to-1 mapping from the event type to the
controller state since some of the event types are implementation details.
How about the following mapping?
0 - idle
1 - controller change (Startup, ControllerChange, Reelect)
2 - broker change (BrokerChange)
3 -
@Ismael, Jun
After bringing up an earlier point twice now, it still doesn't feel like
it's been commented on/addressed, so I'm going to give it one more shot:
Assuming that ControllerState should reflect the current event being
processed, the KIP is missing states.
The controller currently has 14
Becket,
Are you OK with extending the metrics via a subsequent KIP (assuming that
what we're doing here doesn't prevent that)? The KIP freeze is tomorrow
(although I will give it an extra day or two as many in the community have
been attending the Kafka Summit this week), so we should avoid increa
Hi, Becket,
q10. The reason why there is not a timer metric for broker change event is
that the controller currently always has a LeaderElectionRateAndTimeMs
timer metric (in ControllerStats).
q11. I agree that that it's useful to know the queue time in the controller
event queue and suggested th
@Ismael,
About the stage and event type. Yes, I think each event handling should
have those stages covered. It is similar to what we are doing for the
requests on the broker side. We have benefited from such systematic metric
structure a lot so I think it would be worth following the same way in t
I had a similar comment to Becket but accidentally posted it on the vote
thread last Friday. From that thread:
"I noticed that both the ControllerState metric and the *RateAndTimeMs metrics
only cover a subset of the controller event types. Was this intentional?"
I think it makes most sense to jus
Hi Becket,
Thanks for the feedback. Comments inline.
On Tue, May 9, 2017 at 12:19 AM, Becket Qin wrote:
>
> q10. With event loop based controller design, it seems natrural to have the
> processing time for each controller event type. In that case, the current
> metrics seem not covering all the
Hi Ismael and Jun,
Thanks for the KIP. It is very useful. A couple of comments:
q10. With event loop based controller design, it seems natrural to have the
processing time for each controller event type. In that case, the current
metrics seem not covering all the event processing time? e.g. (brok
Hi Ismael,
> > What about a broker that is not the controller? Would you need a separate
> > idle-not-controller state?
>
>
> Do we need a separate state or can users just use the ActiveControllerCount
> metric to check if the broker is the controller?
>
Sure - the ACC metric should be sufficien
Hi Joel,
Thanks for the feedback. Comments inline.
On Wed, May 3, 2017 at 7:52 PM, Joel Koshy wrote:
> Yes - that is roughly what I was thinking (although deletes are no longer
> long running). Also, what is the "steady-state" controller state? Idle?
>
Yes.
> What about a broker that is not
Hi Onur,
Comments inline.
On Wed, May 3, 2017 at 6:54 PM, Onur Karaman
wrote:
> Regarding the ControllerState and the potential for overlap, I think it
> depends on our definition of controller state. While KAFKA-5028 allows only
> a single ControllerEvent to be processed at a time, it still al
On Wed, May 3, 2017 at 10:54 AM, Onur Karaman
wrote:
> Regarding the ControllerState and the potential for overlap, I think it
> depends on our definition of controller state. While KAFKA-5028 allows only
> a single ControllerEvent to be processed at a time, it still allows
> interleavings for lo
Regarding the ControllerState and the potential for overlap, I think it
depends on our definition of controller state. While KAFKA-5028 allows only
a single ControllerEvent to be processed at a time, it still allows
interleavings for long-lasting actions like partition reassignment and
topic deleti
Thanks for the feedback Tom, Joel and Jun.
I updated the KIP in the following way:
1. Removed ControlledShutdownRateAndTimeMs
2. Added QueueSize and QueueTimeMs
3. Renamed FailedIsrUpdateRate to FailedIsrUpdatesPerSec for consistency
with other metrics in the Partition class
4. Mentioned that Yam
Hi, Joel,
1. Yes, the long term goal is to migrate the metrics on the broker to
kafka-metrics. Since many people are using Yammer reporters, we probably
need to support a few popular ones in kafka-metrics before migrating. Until
that happens, we probably want to stick with the Yammer metrics on th
Thanks for the KIP - couple of comments:
- Do you intend to actually use yammer metrics? or use kafka-metrics and
split the timer into an explicit rate and time? I think long term we ought
to move off yammer and use kafka-metrics only. Actually either is fine, but
we should ideally use only one in
Ismael,
Great, that sounds lovely.
I'd like a `Timer` (using yammer metrics parlance) over how long it took to
process the event, so we can get at p99 and max times spent processing
things. Maybe we could even do a log at warning level if event processing
takes over some timeout?
Thanks
Tom
On
Hi Tom,
Yes, the plan is to merge KAFKA-5028 first and then use a lock-free
approach for the new metrics. I considered mentioning that in the KIP
given KAFKA-5120, but didn't in the end. I'll add it to make it clear.
Regarding locks, they are removed by KAFKA-5028, as you say. So, if I
understan
Hi,
We (Heroku) are very excited about this KIP, as we've struggled a bit with
controller stability recently. Having these additional metrics would be
wonderful.
I'd like to ensure polling these metrics *doesn't* hold any locks etc,
because, as noted in https://issues.apache.org/jira/browse/KAFKA
Hi all,
We've posted "KIP-143: Controller Health Metrics" for discussion:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-143%3A+Controller+Health+Metrics
Please take a look. Your feedback is appreciated.
Thanks,
Ismael
22 matches
Mail list logo