[jira] [Updated] (KAFKA-5120) Several controller metrics block if controller lock is held by another thread

Tim Carey-Smith (JIRA) Tue, 25 Apr 2017 07:49:42 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Carey-Smith updated KAFKA-5120:
-----------------------------------
    Description: 
We have been tracking latency issues surrounding queries to Controller MBeans. 
Upon digging into the root causes, we discovered that several metrics acquire 
the controller lock within the gauge. 

The affected metrics are: 

* {{ActiveControllerCount}}
* {{OfflinePartitionsCount}}
* {{PreferredReplicaImbalanceCount}}

If the controller is currently holding the lock and a MBean request is 
received, the thread executing the request will block until the controller 
releases the lock. 

We discovered this in a cluster where the controller was holding the lock for 
extended periods of time for normal operations. We have documented this issue 
in KAFKA-5116. 

Several possible solutions exist: 

* Remove the lock from inside these {{Gauge}} s. 
* Store and update the metric values in {{AtomicLong}} s. 

Modifying the {{ActiveControllerCount}} metric seems to be straight-forward 
while the other 2 metrics seem to be more involved. 

We're happy to contribute a patch, but wanted to discuss potential solutions 
and their tradeoffs before proceeding. 

  was:
We have been tracking latency issues surrounding queries to Controller MBeans. 
Upon digging into the root causes, we discovered that several metrics acquire 
the controller lock within the gauge. 

The affected metrics are: 

* {{ActiveControllerCount}}
* {{OfflinePartitionsCount}}
* {{PreferredReplicaImbalanceCount}}

If the controller is currently holding the lock and a MBean request is 
received, the thread executing the request will block until the controller 
releases the lock. 

We discovered this in a cluster where the controller was holding the lock for 
extended periods of time for normal operations. We have documented this issue 
in KAFKA-5116. 

Several possible solutions exist: 

* Remove the lock from inside these {{Gauge}}s. 
* Store and update the metric values in {{AtomicLong}}s. 

Modifying the {{ActiveControllerCount}} metric seems to be straight-forward 
while the other 2 metrics seem to be more involved. 

We're happy to contribute a patch, but wanted to discuss potential solutions 
and their tradeoffs before proceeding. 


> Several controller metrics block if controller lock is held by another thread
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-5120
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5120
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, metrics
>    Affects Versions: 0.10.2.0
>            Reporter: Tim Carey-Smith
>            Priority: Minor
>
> We have been tracking latency issues surrounding queries to Controller 
> MBeans. Upon digging into the root causes, we discovered that several metrics 
> acquire the controller lock within the gauge. 
> The affected metrics are: 
> * {{ActiveControllerCount}}
> * {{OfflinePartitionsCount}}
> * {{PreferredReplicaImbalanceCount}}
> If the controller is currently holding the lock and a MBean request is 
> received, the thread executing the request will block until the controller 
> releases the lock. 
> We discovered this in a cluster where the controller was holding the lock for 
> extended periods of time for normal operations. We have documented this issue 
> in KAFKA-5116. 
> Several possible solutions exist: 
> * Remove the lock from inside these {{Gauge}} s. 
> * Store and update the metric values in {{AtomicLong}} s. 
> Modifying the {{ActiveControllerCount}} metric seems to be straight-forward 
> while the other 2 metrics seem to be more involved. 
> We're happy to contribute a patch, but wanted to discuss potential solutions 
> and their tradeoffs before proceeding. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KAFKA-5120) Several controller metrics block if controller lock is held by another thread

Reply via email to