This Jira issue seems to capture my problem: https://issues.jenkins-ci.org/browse/JENKINS-19544
On Monday, December 9, 2013 11:32:21 AM UTC-5, Tim Drury wrote: > > I'm doing a heap-dump analysis now and I think I might know what the issue > was. The start of this whole problem was the disk-usage plugin hanging our > attempts to view a job in Jenkins (see > https://issues.jenkins-ci.org/browse/JENKINS-20876) so we disabled that > plugin. After disabling, Jenkins complained about data in an > older/unreadable format: > > You have data stored in an older format and/or unreadable data. > > If I click the "Manage" button to delete it, it takes a _long_ time for it > to display all the disk-usage plugin data - there must be thousands of > rows, but it does display it all eventually. The error shown in each row > is: > > CannotResolveClassException: hudson.plugins.disk_usage.BuildDiskUsageAction > > If I click "Discard Unreadable Data" at the bottom of the page, I quickly > get a stack trace: > > javax.servlet.ServletException: java.util.ConcurrentModificationException > at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:735) > at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799) > at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:239) > at > org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53) > at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:685) > at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799) > at org.kohsuke.stapler.Stapler.invoke(Stapler.java:587) > at org.kohsuke.stapler.Stapler.service(Stapler.java:218) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:45) > at winstone.ServletConfiguration.execute(ServletConfiguration.java:248) > at winstone.RequestDispatcher.forward(RequestDispatcher.java:333) > at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:376) > at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96) > at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:203) > at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:181) > at > net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:86) > > and it fails to discard the data. Older data isn't usually a problem so I > brushed off this error. However, here is dominator_tree of the heap dump: > > Class Name > | Shallow Heap | Retained Heap > | Percentage > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------- > hudson.diagnosis.OldDataMonitor @ 0x6f9f2c4a0 > | 24 | > 3,278,466,984 | 88.69% > com.thoughtworks.xstream.converters.SingleValueConverterWrapper @ > 0x6f9da8780 | 16 | > 13,825,616 | 0.37% > hudson.model.Hudson @ 0x6f9b8b8e8 > | 272 | > 3,572,400 | 0.10% > org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6f9a73598 > | 88 | 2,308,760 > | 0.06% > org.apache.commons.jexl.util.introspection.Introspector @ 0x6fbb74710 > | 32 | > 1,842,392 | 0.05% > org.kohsuke.stapler.WebApp @ 0x6f9c0ff10 > | 64 | 1,127,480 > | 0.03% > java.lang.Thread @ 0x7d5c2d138 Handling GET > /view/Alle/job/common-translation-main/ : RequestHandlerThread[#105] > Thread| 112 | 971,336 | 0.03% > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > What is hudson.diagnosis.OldDataMonitor? Could the disk-usage plugin data > be the cause of all my recent OOM errors? If so, how do I get rid of it? > > -tim > > > On Monday, December 9, 2013 9:41:25 AM UTC-5, Tim Drury wrote: >> >> I intended to install 1.532 on Friday, but mistakenly installed 1.539. >> It gave us the same OOM exceptions. I'm installing 1.532 now and will - >> hopefully - know tomorrow whether it's stable or not. I'm not exactly sure >> what's going to happen with our plugins though. Hopefully Jenkins will >> tell me if they must be downgraded too. >> >> -tim >> >> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote: >>> >>> How does the current LTS (1.532.1) hold up? >>> >>> >>> On 6 December 2013 13:33, Tim Drury <tdr...@gmail.com> wrote: >>> >>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're getting >>>> a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM is jdk >>>> -x64-1.6.0_26) >>>> >>>> At first I did the simplest thing and increased the heap from 3G to >>>> 4.2G (and bumped up permgen). This didn't help so I started looking at >>>> threads via the Jenkins monitoring tool. It indicated the disk-usage >>>> plugin was hung. When you tried to view a page for a particularly large >>>> job, the page would "hang" and the stack trace showed the disk-usage >>>> plugin >>>> was to blame (or so I thought). Jira report with thread dump here: >>>> https://issues.jenkins-ci.org/browse/JENKINS-20876<https://www.google.com/url?q=https%3A%2F%2Fissues.jenkins-ci.org%2Fbrowse%2FJENKINS-20876&sa=D&sntz=1&usg=AFQjCNFcjP8y2rafiviVJB5cLwC_Tn7MPg> >>>> >>>> We disabled the disk-usage plugin and restarted and now we can visit >>>> that job page. However, we still get OOM and lots of GCs in the logs at >>>> least once a day. The stack trace looks frighteningly similar to that >>>> from >>>> the disk-usage plugin. Here is an edited stack trace showing the methods >>>> common between the two OOM incidents: one during the disk-usage plugin and >>>> one after it was disabled: >>>> >>>> [lots of xstream methods snipped] >>>> hudson.XmlFile.unmarshal(XmlFile.java:165) >>>> hudson.model.Run.reload(Run.java:323) >>>> hudson.model.Run.<init>(Run.java:312) >>>> hudson.model.AbstractBuild.<init>(AbstractBuild.java:185) >>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54) >>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146) >>>> ... [JVM methods snipped] >>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155) >>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342) >>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340) >>>> hudson.model.RunMap.retrieve(RunMap.java:225) >>>> hudson.model.RunMap.retrieve(RunMap.java:59) >>>> >>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:677) >>>> >>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:660) >>>> >>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:502) >>>> >>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:536) >>>> hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:1077) >>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165) >>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273) >>>> hudson.model.Run.isKeepLog(Run.java:572) >>>> ... >>>> >>>> It seems something in "core" Jenkins has changed and not for the >>>> better. Anyone seeing these issues? >>>> >>>> -tim >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Jenkins Users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to jenkinsci-use...@googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.