I'm doing a heap-dump analysis now and I think I might know what the issue 
was.  The start of this whole problem was the disk-usage plugin hanging our 
attempts to view a job in Jenkins (see 
https://issues.jenkins-ci.org/browse/JENKINS-20876) so we disabled that 
plugin.  After disabling, Jenkins complained about data in an 
older/unreadable format:

You have data stored in an older format and/or unreadable data.

If I click the "Manage" button to delete it, it takes a _long_ time for it 
to display all the disk-usage plugin data - there must be thousands of 
rows, but it does display it all eventually.  The error shown in each row 
is:

CannotResolveClassException: hudson.plugins.disk_usage.BuildDiskUsageAction

If I click "Discard Unreadable Data" at the bottom of the page, I quickly 
get a stack trace:

javax.servlet.ServletException: java.util.ConcurrentModificationException
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:735)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:239)
at 
org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:685)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:587)
at org.kohsuke.stapler.Stapler.service(Stapler.java:218)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:45)
at winstone.ServletConfiguration.execute(ServletConfiguration.java:248)
at winstone.RequestDispatcher.forward(RequestDispatcher.java:333)
at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:376)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96)
at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:203)
at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:181)
at 
net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:86)

and it fails to discard the data.  Older data isn't usually a problem so I 
brushed off this error.  However, here is dominator_tree of the heap dump:

Class Name                                                                 
                                             | Shallow Heap | Retained Heap 
| Percentage
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
hudson.diagnosis.OldDataMonitor @ 0x6f9f2c4a0                               
                                            |           24 | 3,278,466,984 
|     88.69%
com.thoughtworks.xstream.converters.SingleValueConverterWrapper @ 
0x6f9da8780                                           |           16 |   
 13,825,616 |      0.37%
hudson.model.Hudson @ 0x6f9b8b8e8                                           
                                            |          272 |     3,572,400 
|      0.10%
org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6f9a73598                   
                                             |           88 |     2,308,760 
|      0.06%
org.apache.commons.jexl.util.introspection.Introspector @ 0x6fbb74710       
                                            |           32 |     1,842,392 
|      0.05%
org.kohsuke.stapler.WebApp @ 0x6f9c0ff10                                   
                                             |           64 |     1,127,480 
|      0.03%
java.lang.Thread @ 0x7d5c2d138  Handling GET 
/view/Alle/job/common-translation-main/ : RequestHandlerThread[#105] 
Thread|          112 |       971,336 |      0.03%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

What is hudson.diagnosis.OldDataMonitor?  Could the disk-usage plugin data 
be the cause of all my recent OOM errors?  If so, how do I get rid of it?

-tim


On Monday, December 9, 2013 9:41:25 AM UTC-5, Tim Drury wrote:
>
> I intended to install 1.532 on Friday, but mistakenly installed 1.539.  It 
> gave us the same OOM exceptions.  I'm installing 1.532 now and will - 
> hopefully - know tomorrow whether it's stable or not.  I'm not exactly sure 
> what's going to happen with our plugins though.  Hopefully Jenkins will 
> tell me if they must be downgraded too.
>
> -tim
>
> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote:
>>
>> How does the current LTS (1.532.1) hold up?
>>
>>
>> On 6 December 2013 13:33, Tim Drury <tdr...@gmail.com> wrote:
>>
>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're getting 
>>> a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM is jdk
>>> -x64-1.6.0_26)
>>>
>>> At first I did the simplest thing and increased the heap from 3G to 4.2G 
>>> (and bumped up permgen).  This didn't help so I started looking at threads 
>>> via the Jenkins monitoring tool.  It indicated the disk-usage plugin was 
>>> hung.  When you tried to view a page for a particularly large job, the page 
>>> would "hang" and the stack trace showed the disk-usage plugin was to blame 
>>> (or so I thought).  Jira report with thread dump here: 
>>> https://issues.jenkins-ci.org/browse/JENKINS-20876<https://www.google.com/url?q=https%3A%2F%2Fissues.jenkins-ci.org%2Fbrowse%2FJENKINS-20876&sa=D&sntz=1&usg=AFQjCNFcjP8y2rafiviVJB5cLwC_Tn7MPg>
>>>
>>> We disabled the disk-usage plugin and restarted and now we can visit 
>>> that job page.  However, we still get OOM and lots of GCs in the logs at 
>>> least once a day.  The stack trace looks frighteningly similar to that from 
>>> the disk-usage plugin.  Here is an edited stack trace showing the methods 
>>> common between the two OOM incidents: one during the disk-usage plugin and 
>>> one after it was disabled:
>>>
>>> [lots of xstream methods snipped]
>>> hudson.XmlFile.unmarshal(XmlFile.java:165)
>>> hudson.model.Run.reload(Run.java:323)
>>> hudson.model.Run.<init>(Run.java:312)
>>> hudson.model.AbstractBuild.<init>(AbstractBuild.java:185)
>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54)
>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146)
>>> ... [JVM methods snipped]
>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155)
>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342)
>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340)
>>> hudson.model.RunMap.retrieve(RunMap.java:225)
>>> hudson.model.RunMap.retrieve(RunMap.java:59)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:677)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:660)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:502)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:536)
>>> hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:1077)
>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165)
>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273)
>>> hudson.model.Run.isKeepLog(Run.java:572)
>>> ...
>>>
>>> It seems something in "core" Jenkins has changed and not for the better. 
>>>  Anyone seeing these issues?
>>>
>>> -tim
>>>  
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Jenkins Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to jenkinsci-use...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to