[ 
https://issues.apache.org/jira/browse/KAFKA-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983009#comment-15983009
 ] 

Ismael Juma commented on KAFKA-5120:
------------------------------------

[~halorgium], if everything goes well, that PR should be merged this week.

> Several controller metrics block if controller lock is held by another thread
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-5120
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5120
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, metrics
>    Affects Versions: 0.10.2.0
>            Reporter: Tim Carey-Smith
>            Priority: Minor
>
> We have been tracking latency issues surrounding queries to Controller 
> MBeans. Upon digging into the root causes, we discovered that several metrics 
> acquire the controller lock within the gauge. 
> The affected metrics are: 
> * {{ActiveControllerCount}}
> * {{OfflinePartitionsCount}}
> * {{PreferredReplicaImbalanceCount}}
> If the controller is currently holding the lock and a MBean request is 
> received, the thread executing the request will block until the controller 
> releases the lock. 
> We discovered this in a cluster where the controller was holding the lock for 
> extended periods of time for normal operations. We have documented this issue 
> in KAFKA-5116. 
> Several possible solutions exist: 
> * Remove the lock from inside these {{Gauge}} s. 
> * Store and update the metric values in {{AtomicLong}} s. 
> Modifying the {{ActiveControllerCount}} metric seems to be straight-forward 
> while the other 2 metrics seem to be more involved. 
> We're happy to contribute a patch, but wanted to discuss potential solutions 
> and their tradeoffs before proceeding. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to