I’m now at 4 times this week where my build job has landed on a node that has 
broken JVM tasks hanging about from surefire tests gone awry.  (Culprits: 
Accumulo, Reef, and Sling.) Due to the way Linux does process limits on 
systemd-based boxes, even though there is plenty of CPU and memory, my tasks 
are getting killed because all of these surefire tests have spawned enough 
threads that everything else fails.

Folks:  please, if you aren’t running in a docker container (which makes it 
extremely easy to clean as well as enforce a sub-5k process limit), please add 
a Post Action on your Jenkins job to blow away your tasks that are still 
hanging around. 

At this point, I feel like I have no choice but to just start nuking any long 
running java processes (-agent/slave.jar and the datadog stuff that infra runs) 
before startup just so I can get a build. :(


Reply via email to