Hi,

We (Heroku) are very excited about this KIP, as we've struggled a bit with
controller stability recently. Having these additional metrics would be
wonderful.

I'd like to ensure polling these metrics *doesn't* hold any locks etc,
because, as noted in https://issues.apache.org/jira/browse/KAFKA-5120, that
lock can be held for quite some time. This may become not an issue as of
KAFKA-5028 though.

Lastly, I'd love to see some metrics around how long the controller spends
inside its lock. We've been tracking an issue (
https://issues.apache.org/jira/browse/KAFKA-5116) where it can hold the
lock for many, many minutes in a zk client listener thread when responding
to a single request. I'm not sure how that plays into
https://issues.apache.org/jira/browse/KAFKA-5028 (which I assume will land
before this metrics patch), but it feels like there will be equivalent
problems ("how long does it spend processing any individual message from
the queue, broken down by message type").

These are minor improvements though, the addition of more metrics to the
controller is already going to be very helpful.

Thanks

Tom Crayford
Heroku Kafka

On Thu, Apr 27, 2017 at 3:10 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi all,
>
> We've posted "KIP-143: Controller Health Metrics" for discussion:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 143%3A+Controller+Health+Metrics
>
> Please take a look. Your feedback is appreciated.
>
> Thanks,
> Ismael
>

Reply via email to