You can attack this one of two ways: either playing with how Java does GC, and how much G you have to C. I’ll let someone smarter than I guide you on how to keep GC from stopping the world, but I think that the basic problem is that you’ve got a 100 GB heap.
How many old job runs are you keeping? Jenkins keeps data (modulo build logs and artifacts) on every run it remembers, and it keeps it in memory. In our shop, we only keep build results on Jenkins for a week or so, and part of the build process is to persist the results to an RDBMS for long-term history. If your builds aren’t set to “Discard Old Builds”, that’s your problem. With an installation that size, you can’t afford to keep records going back forever in Jenkins itself. If the problem is too many old builds, you’ll need to set your jobs to discard old builds, and then probably reboot it (it won’t immediately “forget” old runs). Somebody around here should remember some Groovy magic that will allow you to “forget” old builds after you set the discard rule and without rebooting the server. If your system is so complex that you need a 100 GB heap and you need better control over the JVM, remember (if you’re not doing it already) that you can run Jenkins as its own server, so you’re not beholden to Tomcat, JBoss, or whatever app server you’re running. If it matters, my installation has 100+ nodes, 200+ jobs, and 1000-2000 build runs recorded at any given time, and we run in 4 GB running it as its own server with no problem. --Rob From: jenkinsci-users@googlegroups.com [mailto:jenkinsci-users@googlegroups.com] On Behalf Of icarusnine Sent: Tuesday, October 09, 2012 5:53 AM To: jenkinsci-users@googlegroups.com Subject: Performance problems on Jenkins master(very long minor gc with stop-the-world) Hello. We have a very large Jenkins set up that includes on master node with 100+ slaves and 1000+ jobs. We have reasons for keeping just a single master node so it isn't possible split our Hudson master. Now, we are experiencing performance problems(minor gc happens frequently and it is performed over 1~2 minutes and it made stop-the-world.) However, full gc is performed within 20~30 seconds. Our heap size is over 100G so it is hard to generate and analysis heap dump. Does anyone have any experience with very large Hudson installations like this? Is there any advice for tuning or recommendations for this issue? Also, please do let me know if there is any other data that I can provide that would help with analysis. Thanks for any help you can provide. ---------------------------------------------------------------- Jenkins info ---------------------------------------------------------------- Core ver : 1.424.6 WAS : weblogic 10.3.2 JAVA : jdk1.6.0.34 JVM OPTION : -Xms180g -Xmx180g -XX:NewSize=140g -XX:MaxNewSize=140g -XX:PermSize=1024m -XX:MaxPermSize=1024m -XX:-UseGCOverheadLimit -XX:+UseParallelGC -XX:SurvivorRatio=8 -verbosegc -Xloggc:app_gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Djava.awt.headless=true ---------------------------------------------------------------- Server spec ---------------------------------------------------------------- CPU : Intel(R) Xeon(R) CPU E5-2690 2.90GHz * 4 (32 core) RAM : 256GB ---------------------------------------------------------------- Gc log ---------------------------------------------------------------- 60286.042: [GC [PSYoungGen: 8655292K->8714K(40587584K)] 25909185K->17262608K(208359744K), 0.0248360 secs] [Times: user=0.38 sys=0.01, real=0.03 secs] 60286.067: [Full GC (System) [PSYoungGen: 8714K->0K(40587584K)] [ParOldGen: 17253893K->17228395K(167772160K)] 17262608K->17228395K(208359744K) [PSPermGen: 194622K->194622K(2097152K)], 1.8638320 secs] [Times: user=33.26 sys=0.23, real=1.86 secs] 60748.860: [GC [PSYoungGen: 39173056K->532623K(40528512K)] 56401451K->17761019K(208300672K), 0.0837520 secs] [Times: user=1.19 sys=0.00, real=0.08 secs] 61243.483: [GC [PSYoungGen: 39705679K->29759K(40658432K)] 56934075K->17272524K(208430592K), 0.0558890 secs] [Times: user=0.49 sys=0.00, real=0.05 secs] 61805.663: [GC [PSYoungGen: 39346943K->28331K(40601792K)] 56589708K->17275705K(208373952K), 0.0544110 secs] [Times: user=0.49 sys=0.01, real=0.06 secs] 62383.664: [GC [PSYoungGen: 39345515K->33640K(40776640K)] 56592889K->17284373K(208548800K), 0.0592330 secs] [Times: user=0.49 sys=0.00, real=0.06 secs] .......................... 85842.953: [GC [PSYoungGen: 38973565K->1818337K(40038592K)] 80709421K->44276973K(207810752K), 22.0442750 secs] [Times: user=2.44 sys=503.41, real=22.04 secs] 85976.095: [GC [PSYoungGen: 40038561K->1904445K(37126592K)] 82497204K->46320890K(204898752K), 49.0663710 secs] [Times: user=2.88 sys=1117.05, real=49.06 secs] 86147.499: [GC [PSYoungGen: 37126456K->1721075K(38582592K)] 81542901K->48037517K(206354752K), 39.6267960 secs] [Times: user=2.81 sys=904.88, real=39.62 secs] 86265.898: [GC [PSYoungGen: 36943219K->1147657K(38796608K)] 83259661K->49166685K(206568768K), 43.2677960 secs] [Times: user=6.13 sys=985.33, real=43.26 secs] 86435.957: [GC [PSYoungGen: 36592456K->748179K(38591488K)] 84611484K->49915859K(206363648K), 34.1037910 secs] [Times: user=2.48 sys=780.02, real=34.10 secs] 86560.263: [GC [PSYoungGen: 36192756K->448475K(38982464K)] 85360436K->50343633K(206754624K), 27.3025220 secs] [Times: user=1.64 sys=623.52, real=27.29 secs] 86594.685: [GC [PSYoungGen: 36402298K->106372K(38914816K)] 86297455K->50438940K(206686976K), 15.7548480 secs] [Times: user=1.88 sys=359.23, real=15.76 secs] The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.