Ok, then that shows that the high CPU usage was related to STAX.
The strace output you provided shows lots of "GET WAIT" requests submitted
to the QUEUE service on the local STAX machine. Each STAX job has a
STAFQueueMonitor class that in a loop continuously checks for messages on
the STAX job handle's queue (e.g. STAF command completion messages and
process completion messages) by submitting a QUEUE GET WAIT request. If
there are no messages, the QUEUE GET WAIT request waits indefinitely for a
message to appear on its queue (and this doesn't use up CPU). But, in
your case, one or more STAX jobs were constantly getting messages on its
queue and processing them.
In STAF V3.3.3, we added support for getting multiple messages off a
handle's queue by adding the ALL and FIRST options to the QUEUE service's
GET/PEEK requests as getting multiple messages off a queue can improve
performance (over getting a single message at a time off a queue multiple
times). So, I'll look into changing STAX to do a QUEUE GET ALL WAIT
request and when I get this done, I could send an updated STAX.jar file to
you and perhaps you could try it out to see if it improves performance and
decreases CPU utilization.
But, if you have some "abusive" STAX jobs that do CPU intensive things
like in a loop then those STAX jobs should be fixed. For example, if in a
STAX job someone uses a <loop> element to constantly submit a STAF service
request (like a PING request) via a <stafcmd>, there should also be a
delay in the loop between STAF service requests so that it isn't using up
all the CPU. Anyone can write a STAX job that eats up all the CPU, just
like anyone can write a program in any language that eats up all the CPU.
--------------------------------------------------------------
Sharon Lucas
IBM Austin, luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313
Nathan Parrish <nparr...@clustrix.com>
11/03/2009 07:39 PM
To
Sharon Lucas/Austin/i...@ibmus
cc
staf-users@lists.sourceforge.net
Subject
Re: [staf-users] high CPU utilization by STAFProc and STAX JVM
after removing/adding the STAX service (and taking the opportunity to
switch to a newer JVM), the CPU load has dropped.
before the service restart, strace did show STAFProc as being awfully
busy, compared to what I see now. now I see it stopping every second or
so, whereas before it spewed at a rate of 5-20 lines a second. here is a
sampling of output, not that I expect it will be particularly revealing:
http://pastebin.com/m769fe708
thanks again for your help in digging into this matter; perhaps the newer
JVM will have fixed the issue, though I suspect it's our tendency towards
abusive stax jobs, and we'll see this crop up again.
nathan
----- "Sharon Lucas" <luc...@us.ibm.com> wrote:
> Right. STAF does not currently ever "reap" threads. If at one time 270
> threads were created, then that's how many threads will remain
> created. We have considered adding the ability to reap threads, but
> that has not been implemented.
>
> However, this should not effect CPU usage. But it does effect memory
> usage, as when a thread is created, a default thread stack of a size
> dependent on the operating system is created. On Linux machines, STAF
> overrides the default thread stack size to $M if its > 4M (as it can
> be 8M or 10M) in STAF V3.2.3 or later. Note that you can further
> override it by setting the STAF_THREAD_STACK_SIZE environment variable
> before starting STAFProc. For example, setting
> STAF_THREAD_STACK_SIZE=4194304 specifies 4M as the value specified is
> the thread stack size in kilobytes.
>
> --------------------------------------------------------------
> Sharon Lucas
> IBM Austin, luc...@us.ibm.com
> (512) 286-7313 or Tieline 363-7313
>
>
>
> Nathan Parrish <nparr...@clustrix.com>
>
> 11/03/2009 03:29 PM
> To Sharon Lucas/Austin/i...@ibmus
>
> cc staf-users@lists.sourceforge.net
>
> Subject Re: [staf-users] high CPU utilization by STAFProc
and STAX
> JVM
>
>
>
>
> right, I understand this. my question is how 'thread growth' works.
> presumably the delta controls some behavior regarding how many new
> threads we spin off when the system detects that we are running
> low/out of threads.
> also, are threads ever reaped? or if my system was incredibly busy at
> some time in the past, such that 270 threads were really all going at
> once, will I have all those threads sitting around even if they are no
> longer unneeded? is it possible that the overhead of this many idle
> threads could be causing my CPU churn?
>
> thanks,
> nathan
>
> ----- "Sharon Lucas" <luc...@us.ibm.com> wrote:
>
> > No, the STAF MISC service simply shows the initial threads and the
> > thread growth delta. It doesn't show the number of threads that STAF
> > has created.
> >
> > --------------------------------------------------------------
> > Sharon Lucas
> > IBM Austin, luc...@us.ibm.com
> > (512) 286-7313 or Tieline 363-7313
> >
> >
> >
> > Nathan Parrish <nparr...@clustrix.com>
> >
> > 11/03/2009 02:43 PM
> > To Sharon Lucas/Austin/i...@ibmus
> >
> > cc staf-users@lists.sourceforge.net
> >
> > Subject Re: [staf-users] high CPU utilization by STAFProc and STAX
> > JVM
> >
> >
> >
> >
> > Our longevity test has finally finished, so I'm able to poke around
> a
> > bit deeper, and can look at doing things like restarting the STAX
> > service, or STAFProc itself later this afternoon.
> >
> > I enabled tracing, and I see traffic on the order of maybe 5-20
> > queries a second, mostly the VAR service (90+% gets), and some
> > semaphore requests as well. I'll get breaks in traffic as long as 3
> > seconds...
> >
> > short of doing gdb, I tried strace -fp on the process, and noticed
> > that it was quite busy, and was also dealing with a very large
> number
> > of child processes. I believe linux does some weird stuff mixing
> what
> > constitutes a process vs. a thread; I don't see all these process
> with
> > ps, but pstree -p does show them, something like ~270 (not counting
> > the java procs/threads which are also underneath the main STAFProc
> > PID). looking on another machine (which is not running STAX, or
> > servicing tons of variable requests, etc.), it has 82. my desktop,
> > which does run STAX for testing purposes, looks to have 90.
> >
> > from misc settings:
> > Initial Threads : 10
> > Thread Growth Delta : 1
> >
> > does this suggest that some threads have gotten spun off into
> > never-never land and other threads have been created as a result?
> >
> >
> >
> > ----- "Sharon Lucas" <luc...@us.ibm.com> wrote:
> > > Unfortunately, I didn't really get many clues as to why STAFProc
> is
> > > using up so much CPU from the information you provided. Does the
> CPU
> > > usage for STAFProc constantly stay very high (e.g. above 100%)?
> > >
> > > What are your STAX jobs doing? Can you give a description of some
> of
> > > the STAF service requests that they are submitting.
> > >
> > > Since STAFProc was started 21 days ago on your STAX service
> machine,
> > > 1,832,518,328 STAF service requests have been submitted to this
> > > machine. That's a lot of STAF service requests. Do you have any
> > > "rogue" STAX jobs that are constantly submitting STAF service
> > requests
> > > in a loop (without a good reason)? What STAF service requests are
> > > being submitted the most?
> > >
> > > Yes, you can enable STAF tracing to see if that gives any clues
> > about
> > > what's driving the CPU load (though note that enabling STAF
> tracing
> > > may slow things down a little). To see what STAF service requests
> > are
> > > being submitted and when each STAF service request completes, you
> > > could enable trace points ServiceRequest, ServiceComplete, and
> > > RemoteRequests. Note that this will generate tons of trace output
> > > since lots of STAF service requests are being submitted to this
> > > machine, so you'll also want to redirect STAFProc's trace output
> to
> > a
> > > file in a location where there is lots of available disk space and
> > > monitor the size of this file. You may also want to enable the
> > Warning
> > > tracepoint so that any warning messages are also logged.For
> example:
> > >
> > > STAF staxMachine TRACE SET DESTINATION TO FILE
> > > /usr/local/staf/STAFProc.trc
> > > STAF staxMachine TRACE ENABLE TRACEPOINTS "ServiceRequest
> > > ServiceComplete RemoteRequests Warning"
> > >
> > > See section "8.18 Trace Service" in the STAF User's Guide for more
> > > information.
> > >
> > > I don't know if the trace output will help us in determining why
> > > STAFProc is using up so much CPU. Maybe there is some thread in
> > > STAFProc that is in a bad state and constantly looping for some
> > > unknown reason (like the STAFProcessMonitor thread that monitors
> for
> > > processes to complete). The only way I know of to check this would
> > be
> > > if STAFProc was started via a debugger like gdb. Then, once it got
> > in
> > > this "bad" state of high CPU usage, you could break in via gdb and
> > > list threads (info threads) and change to each thread (thread n)
> and
> > > check the backtraces (bt) for each thread to see what they are
> > doing.
> > > For example:
> > >
> > > Use gdb to debug STAF locally as follows:
> > > 1. gdb STAFProc
> > > 2. run
> > > 3. Recreate the problem.
> > >
> > > Various commands that you might need while using gdb are:
> > >
> > > ? help
> > > ? help tracepoints
> > > ? help stack
> > > ? info threads
> > > ? thread n
> > > ? bt
> > >
> > > Of course, this will require that STAF be shut down and then
> > restarted
> > > using gdb so you may not be able to do that now while your long
> runs
> > > are still running.
> > >
> > > There's no reason that I know of not to use a 1.6.0 JVM with the
> > STAX
> > > service so you can try that instead of upgrading to a more recent
> > > 1.5.0 JVM.
> > >
> > > You may want to increase the STAX service's MaxFileCacheSize from
> 20
> > > to something like 50. This won't "fix" the problem of STAFProc
> using
> > > 100%+ CPU, but whenever a STAX job is executed, it first needs to
> be
> > > XML parsed and this is a very CPU-intensive process (so if the CPU
> > > usage is already high, it will take longer for STAX to parse a
> STAX
> > > job before execution of the STAX job begins). So, if you are
> running
> > > the same STAX job file more than one, the first time STAX needs to
> > > parse it, but then it will cache it so that if the exact same STAX
> > job
> > > is submitted to be executed again, it doesn't have to be re-parsed
> > (if
> > > it's still in the STAX file cache). So, that's why I recommended
> > > increasing the STAX service's MaxFileCacheSize. It can be
> increased
> > > dynamically as follows:
> > >
> > > STAF staxMachine STAX SET MAXFILECACHESIZE 50
> > >
> > > Note that this setting only applies to this instance of the STAX
> > > service. If you shutdown and restarted STAFProc, it would no
> longer
> > > apply. You would want to add this setting when registering the
> STAX
> > > service in your STAF.cfg file to make it "permanent".
> > >
> > > You may also want to increase the maximum heap size for the STAX
> JVM
> > > from 384m to something like 512m by specifying OPTION J2=-Xmx512m
> > > (instead of OPTION J2=-Xmx384m).
> > >
> > > SERVICE STAX LIBRARY JSTAF EXECUTE
> > > /usr/local/staf/services/stax/STAX.jar \
> > > OPTION JVM=/usr/java/jdk1.6.0_16/bin/java OPTION JVMName=STAX \
> > > OPTION J2=-Xmx384m PARMS "MAXFILECACHESIZE 50"
> > >
> > > --------------------------------------------------------------
> > > Sharon Lucas
> > > IBM Austin, luc...@us.ibm.com
> > > (512) 286-7313 or Tieline 363-7313
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users