Vladimir Sitnikov created PULSAR-4:
--------------------------------------

             Summary: Pulsar precommit Jenkins jobs consume too much resources 
and it leads to "unable to create native thread"
                 Key: PULSAR-4
                 URL: https://issues.apache.org/jira/browse/PULSAR-4
             Project: Pulsar
          Issue Type: Bug
            Reporter: Vladimir Sitnikov
         Attachments: pulsar_threaddump.txt.gz

See 
https://lists.apache.org/thread.html/r9cb0772531814fdf10c82b61fb4bb8d3a187852ddf98ac84754bf778%40%3Cbuilds.apache.org%3E

H23 node was unresponsive, and it turned out to have lots of Pulsar Java 
processes (~14 processes, 9000+ threads):

{noformat}
22058 jenkins   20   0 19.514g 2.156g  33960 S  36.8  2.3   2032:55 
/usr/local/asfpackages/java/jdk1.8.0_191/jre/bin/java -Xmx1G -XX:+UseG1GC 
-Dpulsar.allocator.pooled=false -Dpulsar.allocator.leak_detection=Advanced 
-Dpulsar.allocator.exit_on_oom=false -Dlog4j.configurationFile=log4j2.xml -jar 
/home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/surefire/surefirebooter5673414172185975509.jar
 
/home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/sur
{noformat}


Thread dump includes 1020 threas like
{noformat}
"pulsar-9510-20" #73509 prio=5 os_prio=0 tid=0x00007fba40010000 nid=0xa73 
waiting on condition [0x00007fb8dd946000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000cd3bf4d8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

733 threads like
{noformat}
"bookkeeper-ml-cache-eviction-6747-1" #51441 prio=5 os_prio=0 
tid=0x00007fbb1c31c000 nid=0x58be sleeping[0x00007fb8ea6e6000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.cacheEvictionTask(ManagedLedgerFactoryImpl.java:221)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$$Lambda$70/551994588.run(Unknown
 Source)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

and so on (see the attached threaddump)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to