ConfX created YARN-11530:
----------------------------

             Summary: Server$Listener stating too many open files when setting 
ipc.server.read.threadpool.size big enough
                 Key: YARN-11530
                 URL: https://issues.apache.org/jira/browse/YARN-11530
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: ConfX
         Attachments: reproduce.sh

h2. What happened?
Got an IOException stating "Too many open files" when running 
org.apache.hadoop.yarn.TestRPCFactories#test

h2. Where's the bug?
In the constructor of org.apache.hadoop.ipc.Server$Listener, the listener opens 
a bunch of readers:
{code:java}
      readers = new Reader[readThreads];
      for (int i = 0; i < readThreads; i++) {
        Reader reader = new Reader(
            "Socket Reader #" + (i + 1) + " for port " + port);
        readers[i] = reader;
        reader.start();
      }
{code}
without checking on the value readThreads. When the parameter 
ipc.server.read.threadpool.size is set big enough, the system would run out of 
new readers to open. The listener should try to catch exceptions thrown during 
the creation of the reader.

h3. Stacktrace
{code}
java.lang.ExceptionInInitializerError
        ...
Caused by: java.io.IOException: Too many open files
        at java.base/sun.nio.ch.FileDispatcherImpl.init(Native Method)
        at 
java.base/sun.nio.ch.FileDispatcherImpl.<clinit>(FileDispatcherImpl.java:38)
        ...
{code}

h2. How to reproduce?
(1) set ipc.server.read.threadpool.size to 50000
(2) run org.apache.hadoop.yarn.TestRPCFactories#test

You can use the reproduce.sh in the attachment to easily reproduce the bug:

We have tested this bug on both Ubuntu and MacOS. *The bug is volatile and 
appears in different forms on the two OS we have tested*. On MacOS it outputs 
the too many open files error in stderr. On Ubuntu the JVM crashes directly: 
{code}
[WARNING] Corrupted STDOUT by directly writing to native stream in forked JVM 1.
...
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
...
Error occurred in starting fork, check output in log                            
                                                                                
             
Process Exit Code: 1                                                            
                                                                                
             
Crashed tests:                                                                  
                                                                                
             
org.apache.hadoop.yarn.TestRPCFactories
...
Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The 
forked VM terminated without properly saying goodbye. VM crash or System.exit 
called?
{code}

We are happy to provide a patch after this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to