[
https://issues.apache.org/jira/browse/KAFKA-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982118#comment-15982118
]
Onur Karaman commented on KAFKA-5120:
-------------------------------------
This will be fixed in KAFKA-5028. It chooses the latter approach you mentioned
where the metrics would just read an atomic variable holding the precomputed
metric values.
> Several controller metrics block if controller lock is held by another thread
> -----------------------------------------------------------------------------
>
> Key: KAFKA-5120
> URL: https://issues.apache.org/jira/browse/KAFKA-5120
> Project: Kafka
> Issue Type: Bug
> Components: controller, metrics
> Affects Versions: 0.10.2.0
> Reporter: Tim Carey-Smith
> Priority: Minor
>
> We have been tracking latency issues surrounding queries to Controller
> MBeans. Upon digging into the root causes, we discovered that several metrics
> acquire the controller lock within the gauge.
> The affected metrics are:
> * {{ActiveControllerCount}}
> * {{OfflinePartitionsCount}}
> * {{PreferredReplicaImbalanceCount}}
> If the controller is currently holding the lock and a MBean request is
> received, the thread executing the request will block until the controller
> releases the lock.
> We discovered this in a cluster where the controller was holding the lock for
> extended periods of time for normal operations. We have documented this issue
> in KAFKA-5116.
> Several possible solutions exist:
> * Remove the lock from inside these {{Gauge}}s.
> * Store and update the metric values in {{AtomicLong}}s.
> Modifying the {{ActiveControllerCount}} metric seems to be straight-forward
> while the other 2 metrics seem to be more involved.
> We're happy to contribute a patch, but wanted to discuss potential solutions
> and their tradeoffs before proceeding.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)