ConfX created HADOOP-18822:
------------------------------

             Summary: Out of Memory when mistakenly set 
decay-scheduler.metrics.top.user.count to a large number
                 Key: HADOOP-18822
                 URL: https://issues.apache.org/jira/browse/HADOOP-18822
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: ConfX
         Attachments: reproduce.sh

h2. What happened:

When setting {{decay-scheduler.metrics.top.user.count}} to a large number, 
{{DecayRpcScheduler}} in Hcommon throws an out-of-memory exception due to 
inappropriate checking and handling.
Hcommon only checks the value should be larger than 0.
h2. Buggy code:

In DecayRpcScheduler.java
{noformat}
public DecayRpcScheduler(int numLevels, String ns, Configuration conf) {
  ...
  topUsersCount =                                                               
                       
    conf.getInt(DECAYSCHEDULER_METRICS_TOP_USER_COUNT,                          
                     
      DECAYSCHEDULER_METRICS_TOP_USER_COUNT_DEFAULT);    <<---- topUsersCount 
gets the config value                                          
  Preconditions.checkArgument(topUsersCount > 0,     <<--- Only checks for 
positivity                      
    "the number of top users for scheduler metrics must be at least 1");
  ...
}
private void addTopNCallerSummary(MetricsRecordBuilder rb) {                    
                       
  TopN topNCallers = getTopCallers(topUsersCount);    <<--- calls getTopCallers 
with n equals topUsersCount
  ...
}
private TopN getTopCallers(int n) {                                             
                       
  TopN topNCallers = new TopN(n); <<--- starts an priorityQ with initial 
capacity n, causing out of memory
  ...
}{noformat}
h2. StackTrace:
{noformat}
java.lang.OutOfMemoryError: Java heap space                                     
                               
        at java.base/java.util.PriorityQueue.<init>(PriorityQueue.java:172)     
                               
        at java.base/java.util.PriorityQueue.<init>(PriorityQueue.java:139)     
                               
        at 
org.apache.hadoop.metrics2.util.Metrics2Util$TopN.<init>(Metrics2Util.java:80)  
                    
        at 
org.apache.hadoop.ipc.DecayRpcScheduler.getTopCallers(DecayRpcScheduler.java:1002)
                  
        at 
org.apache.hadoop.ipc.DecayRpcScheduler.addTopNCallerSummary(DecayRpcScheduler.java:982)
            
        at 
org.apache.hadoop.ipc.DecayRpcScheduler.getMetrics(DecayRpcScheduler.java:935)  
                    
        at 
org.apache.hadoop.ipc.DecayRpcScheduler$MetricsProxy.getMetrics(DecayRpcScheduler.java:893)
         
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
      
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
        at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMB
eanServerInterceptor.java:329)
        at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServ
erInterceptor.java:315)
        at 
java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
        at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:100)
        at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:73)
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
        at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
        at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:233)
        at 
org.apache.hadoop.ipc.DecayRpcScheduler$MetricsProxy.registerMetrics2Source(DecayRpcScheduler.java:8
19)
        at 
org.apache.hadoop.ipc.DecayRpcScheduler$MetricsProxy.<init>(DecayRpcScheduler.java:792)
        at 
org.apache.hadoop.ipc.DecayRpcScheduler$MetricsProxy.getInstance(DecayRpcScheduler.java:800)
        at 
org.apache.hadoop.ipc.DecayRpcScheduler.<init>(DecayRpcScheduler.java:260){noformat}
h2. Reproduce:

(1) Set {{decay-scheduler.metrics.top.user.count}} to a large value, e.g., 
1419140791
(2) Run a simple test that exercises this parameter, e.g. 
{{org.apache.hadoop.ipc.TestDecayRpcScheduler#testNPEatInitialization}}

 

For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to