[ https://issues.apache.org/jira/browse/KAFKA-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982118#comment-15982118 ]
Onur Karaman commented on KAFKA-5120: ------------------------------------- This will be fixed in KAFKA-5028. It chooses the latter approach you mentioned where the metrics would just read an atomic variable holding the precomputed metric values. > Several controller metrics block if controller lock is held by another thread > ----------------------------------------------------------------------------- > > Key: KAFKA-5120 > URL: https://issues.apache.org/jira/browse/KAFKA-5120 > Project: Kafka > Issue Type: Bug > Components: controller, metrics > Affects Versions: 0.10.2.0 > Reporter: Tim Carey-Smith > Priority: Minor > > We have been tracking latency issues surrounding queries to Controller > MBeans. Upon digging into the root causes, we discovered that several metrics > acquire the controller lock within the gauge. > The affected metrics are: > * {{ActiveControllerCount}} > * {{OfflinePartitionsCount}} > * {{PreferredReplicaImbalanceCount}} > If the controller is currently holding the lock and a MBean request is > received, the thread executing the request will block until the controller > releases the lock. > We discovered this in a cluster where the controller was holding the lock for > extended periods of time for normal operations. We have documented this issue > in KAFKA-5116. > Several possible solutions exist: > * Remove the lock from inside these {{Gauge}}s. > * Store and update the metric values in {{AtomicLong}}s. > Modifying the {{ActiveControllerCount}} metric seems to be straight-forward > while the other 2 metrics seem to be more involved. > We're happy to contribute a patch, but wanted to discuss potential solutions > and their tradeoffs before proceeding. -- This message was sent by Atlassian JIRA (v6.3.15#6346)