Vladimir Sitnikov created PULSAR-4: -------------------------------------- Summary: Pulsar precommit Jenkins jobs consume too much resources and it leads to "unable to create native thread" Key: PULSAR-4 URL: https://issues.apache.org/jira/browse/PULSAR-4 Project: Pulsar Issue Type: Bug Reporter: Vladimir Sitnikov Attachments: pulsar_threaddump.txt.gz
See https://lists.apache.org/thread.html/r9cb0772531814fdf10c82b61fb4bb8d3a187852ddf98ac84754bf778%40%3Cbuilds.apache.org%3E H23 node was unresponsive, and it turned out to have lots of Pulsar Java processes (~14 processes, 9000+ threads): {noformat} 22058 jenkins 20 0 19.514g 2.156g 33960 S 36.8 2.3 2032:55 /usr/local/asfpackages/java/jdk1.8.0_191/jre/bin/java -Xmx1G -XX:+UseG1GC -Dpulsar.allocator.pooled=false -Dpulsar.allocator.leak_detection=Advanced -Dpulsar.allocator.exit_on_oom=false -Dlog4j.configurationFile=log4j2.xml -jar /home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/surefire/surefirebooter5673414172185975509.jar /home/jenkins/jenkins-slave/workspace/pulsar_precommit_java8/pulsar-broker/target/sur {noformat} Thread dump includes 1020 threas like {noformat} "pulsar-9510-20" #73509 prio=5 os_prio=0 tid=0x00007fba40010000 nid=0xa73 waiting on condition [0x00007fb8dd946000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000cd3bf4d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) {noformat} 733 threads like {noformat} "bookkeeper-ml-cache-eviction-6747-1" #51441 prio=5 os_prio=0 tid=0x00007fbb1c31c000 nid=0x58be sleeping[0x00007fb8ea6e6000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.cacheEvictionTask(ManagedLedgerFactoryImpl.java:221) at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$$Lambda$70/551994588.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) {noformat} and so on (see the attached threaddump) -- This message was sent by Atlassian Jira (v8.3.4#803005)