[ https://issues.apache.org/jira/browse/FLINK-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-10761: ----------------------------------- Labels: pull-request-available (was: ) > MetricGroup#getAllVariables can deadlock > ---------------------------------------- > > Key: FLINK-10761 > URL: https://issues.apache.org/jira/browse/FLINK-10761 > Project: Flink > Issue Type: Bug > Components: Metrics > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Reporter: Chesnay Schepler > Assignee: Chesnay Schepler > Priority: Critical > Labels: pull-request-available > Fix For: 1.8.0 > > > {{AbstractMetricGroup#getAllVariables}} acquires the locks of both the > current and all parent groups when assembling the variables map. This can > lead to a deadlock if metrics are registered concurrently on a child and > parent if the child registration is applied first and the reporter uses said > method (which many do). > Assume we have a MetricGroup Mc(hild) and Mp(arent). > 2 separate threads Tc and Tp each register a metric on their respective > group, acquiring the lock. > Let's assume that Tc has a slight headstart. > Tc will now call {{MetricRegistry#register}} first, acquiring the MR lock. > Tp will block on this lock. > Tc now iterates over all reporters calling > {{MetricReporter#notifyOfAddedMetric}}. Assume that in this method > {{MetricGroup#getAllVariables}} is called on Mc by Tc. > Tc still holds the lock to Mc, and attempts to acquire the lock to Mp. > The lock to Mp is still held by Tp however, which waits for the MR lock to be > released by Tc. > Thus a deadlock is created. This may deadlock anything, be it minor threads, > tasks, or entire components. > This has not surfaced so far since usually metrics are no longer added to a > group once children have been created (since the component initialization at > that point is complete). -- This message was sent by Atlassian JIRA (v7.6.3#76005)