Re: [lopsa-tech] What can cause sudden load average spikes

Doug Hughes Tue, 12 Nov 2013 20:19:18 -0800

On 11/12/2013 10:57 PM, Chris Picton wrote:

Hi all


I have a set of servers running asterisk and some java apps which have
(so far) unexplained spikes in load average.

A typical spike which occurs at "random" times would see the 1 minute
load average load go from around 4 to upwards of 50, sometime
approaching 200, within one second.

 From proc manpage, the 1 min load average is  "number of jobs in the
run queue (state R) or waiting for disk I/O (state D) averaged over 1
minute"

I am collecting many different stats from proc every second, but nothing
I have found can correlate with the spike in load average. The counts of
process numbers from /proc/stat and /prov/loadavg do not match up to the
sudden spike.   I have looked at memory paging, irqs, number of threads,
cpu states(intr/iowait/etc), network traffic, disk io, etc but no metric
I have yet found indicates it is changing behaviour at the same time as
the load average spikes

As I am writing this, I have realized that I am not actually tracking
the numbers which would be the direct cause of the load average, which
would be to loop through all processes, extract the process state from
/proc/<pid>/stat, and add up the various types.  This would provide
(hopefully) a match so I could see that the load average numbers are
"correct", and may indicate a cause (many processes waiting for IO, or
lots of the same process (asterisk or java) being scheduled to run at
the same time)

While I do that, would anyone have some other idea of how to
troubleshoot the cause of very high load spikes?

Take a look at this (presented at LISA):
http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/

once you've read the main blog, search for java in one of the comments.Combine these 2 and you may have some seriously cool insights into yourproblem.




_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

Re: [lopsa-tech] What can cause sudden load average spikes

Reply via email to