after removing/adding the STAX service (and taking the opportunity to switch to 
a newer JVM), the CPU load has dropped.

before the service restart, strace did show STAFProc as being awfully busy, 
compared to what I see now.  now I see it stopping every second or so, whereas 
before it spewed at a rate of 5-20 lines a second.  here is a sampling of 
output, not that I expect it will be particularly revealing: 
http://pastebin.com/m769fe708

thanks again for your help in digging into this matter; perhaps the newer JVM 
will have fixed the issue, though I suspect it's our tendency towards abusive 
stax jobs, and we'll see this crop up again.  

nathan

----- "Sharon Lucas" <luc...@us.ibm.com> wrote:

> Right. STAF does not currently ever "reap" threads. If at one time 270
> threads were created, then that's how many threads will remain
> created. We have considered adding the ability to reap threads, but
> that has not been implemented.
> 
> However, this should not effect CPU usage. But it does effect memory
> usage, as when a thread is created, a default thread stack of a size
> dependent on the operating system is created. On Linux machines, STAF
> overrides the default thread stack size to $M if its > 4M (as it can
> be 8M or 10M) in STAF V3.2.3 or later. Note that you can further
> override it by setting the STAF_THREAD_STACK_SIZE environment variable
> before starting STAFProc. For example, setting
> STAF_THREAD_STACK_SIZE=4194304 specifies 4M as the value specified is
> the thread stack size in kilobytes.
> 
> --------------------------------------------------------------
> Sharon Lucas
> IBM Austin, luc...@us.ibm.com
> (512) 286-7313 or Tieline 363-7313
> 
> 
> 
>       Nathan Parrish <nparr...@clustrix.com>
> 
> 11/03/2009 03:29 PM
> To    Sharon Lucas/Austin/i...@ibmus
> 
> cc    staf-users@lists.sourceforge.net
> 
> Subject       Re: [staf-users] high CPU utilization by STAFProc and STAX
> JVM
> 
> 
> 
> 
> right, I understand this. my question is how 'thread growth' works.
> presumably the delta controls some behavior regarding how many new
> threads we spin off when the system detects that we are running
> low/out of threads.
> also, are threads ever reaped? or if my system was incredibly busy at
> some time in the past, such that 270 threads were really all going at
> once, will I have all those threads sitting around even if they are no
> longer unneeded? is it possible that the overhead of this many idle
> threads could be causing my CPU churn?
> 
> thanks,
> nathan
> 
> ----- "Sharon Lucas" <luc...@us.ibm.com> wrote:
> 
> > No, the STAF MISC service simply shows the initial threads and the
> > thread growth delta. It doesn't show the number of threads that STAF
> > has created.
> >
> > --------------------------------------------------------------
> > Sharon Lucas
> > IBM Austin, luc...@us.ibm.com
> > (512) 286-7313 or Tieline 363-7313
> >
> >
> >
> > Nathan Parrish <nparr...@clustrix.com>
> >
> > 11/03/2009 02:43 PM
> > To Sharon Lucas/Austin/i...@ibmus
> >
> > cc staf-users@lists.sourceforge.net
> >
> > Subject Re: [staf-users] high CPU utilization by STAFProc and STAX
> > JVM
> >
> >
> >
> >
> > Our longevity test has finally finished, so I'm able to poke around
> a
> > bit deeper, and can look at doing things like restarting the STAX
> > service, or STAFProc itself later this afternoon.
> >
> > I enabled tracing, and I see traffic on the order of maybe 5-20
> > queries a second, mostly the VAR service (90+% gets), and some
> > semaphore requests as well. I'll get breaks in traffic as long as 3
> > seconds...
> >
> > short of doing gdb, I tried strace -fp on the process, and noticed
> > that it was quite busy, and was also dealing with a very large
> number
> > of child processes. I believe linux does some weird stuff mixing
> what
> > constitutes a process vs. a thread; I don't see all these process
> with
> > ps, but pstree -p does show them, something like ~270 (not counting
> > the java procs/threads which are also underneath the main STAFProc
> > PID). looking on another machine (which is not running STAX, or
> > servicing tons of variable requests, etc.), it has 82. my desktop,
> > which does run STAX for testing purposes, looks to have 90.
> >
> > from misc settings:
> > Initial Threads : 10
> > Thread Growth Delta : 1
> >
> > does this suggest that some threads have gotten spun off into
> > never-never land and other threads have been created as a result?
> >
> >
> >
> > ----- "Sharon Lucas" <luc...@us.ibm.com> wrote:
> > > Unfortunately, I didn't really get many clues as to why STAFProc
> is
> > > using up so much CPU from the information you provided. Does the
> CPU
> > > usage for STAFProc constantly stay very high (e.g. above 100%)?
> > >
> > > What are your STAX jobs doing? Can you give a description of some
> of
> > > the STAF service requests that they are submitting.
> > >
> > > Since STAFProc was started 21 days ago on your STAX service
> machine,
> > > 1,832,518,328 STAF service requests have been submitted to this
> > > machine. That's a lot of STAF service requests. Do you have any
> > > "rogue" STAX jobs that are constantly submitting STAF service
> > requests
> > > in a loop (without a good reason)? What STAF service requests are
> > > being submitted the most?
> > >
> > > Yes, you can enable STAF tracing to see if that gives any clues
> > about
> > > what's driving the CPU load (though note that enabling STAF
> tracing
> > > may slow things down a little). To see what STAF service requests
> > are
> > > being submitted and when each STAF service request completes, you
> > > could enable trace points ServiceRequest, ServiceComplete, and
> > > RemoteRequests. Note that this will generate tons of trace output
> > > since lots of STAF service requests are being submitted to this
> > > machine, so you'll also want to redirect STAFProc's trace output
> to
> > a
> > > file in a location where there is lots of available disk space and
> > > monitor the size of this file. You may also want to enable the
> > Warning
> > > tracepoint so that any warning messages are also logged.For
> example:
> > >
> > > STAF staxMachine TRACE SET DESTINATION TO FILE
> > > /usr/local/staf/STAFProc.trc
> > > STAF staxMachine TRACE ENABLE TRACEPOINTS "ServiceRequest
> > > ServiceComplete RemoteRequests Warning"
> > >
> > > See section "8.18 Trace Service" in the STAF User's Guide for more
> > > information.
> > >
> > > I don't know if the trace output will help us in determining why
> > > STAFProc is using up so much CPU. Maybe there is some thread in
> > > STAFProc that is in a bad state and constantly looping for some
> > > unknown reason (like the STAFProcessMonitor thread that monitors
> for
> > > processes to complete). The only way I know of to check this would
> > be
> > > if STAFProc was started via a debugger like gdb. Then, once it got
> > in
> > > this "bad" state of high CPU usage, you could break in via gdb and
> > > list threads (info threads) and change to each thread (thread n)
> and
> > > check the backtraces (bt) for each thread to see what they are
> > doing.
> > > For example:
> > >
> > > Use gdb to debug STAF locally as follows:
> > > 1. gdb STAFProc
> > > 2. run
> > > 3. Recreate the problem.
> > >
> > > Various commands that you might need while using gdb are:
> > >
> > > • help
> > > • help tracepoints
> > > • help stack
> > > • info threads
> > > • thread n
> > > • bt
> > >
> > > Of course, this will require that STAF be shut down and then
> > restarted
> > > using gdb so you may not be able to do that now while your long
> runs
> > > are still running.
> > >
> > > There's no reason that I know of not to use a 1.6.0 JVM with the
> > STAX
> > > service so you can try that instead of upgrading to a more recent
> > > 1.5.0 JVM.
> > >
> > > You may want to increase the STAX service's MaxFileCacheSize from
> 20
> > > to something like 50. This won't "fix" the problem of STAFProc
> using
> > > 100%+ CPU, but whenever a STAX job is executed, it first needs to
> be
> > > XML parsed and this is a very CPU-intensive process (so if the CPU
> > > usage is already high, it will take longer for STAX to parse a
> STAX
> > > job before execution of the STAX job begins). So, if you are
> running
> > > the same STAX job file more than one, the first time STAX needs to
> > > parse it, but then it will cache it so that if the exact same STAX
> > job
> > > is submitted to be executed again, it doesn't have to be re-parsed
> > (if
> > > it's still in the STAX file cache). So, that's why I recommended
> > > increasing the STAX service's MaxFileCacheSize. It can be
> increased
> > > dynamically as follows:
> > >
> > > STAF staxMachine STAX SET MAXFILECACHESIZE 50
> > >
> > > Note that this setting only applies to this instance of the STAX
> > > service. If you shutdown and restarted STAFProc, it would no
> longer
> > > apply. You would want to add this setting when registering the
> STAX
> > > service in your STAF.cfg file to make it "permanent".
> > >
> > > You may also want to increase the maximum heap size for the STAX
> JVM
> > > from 384m to something like 512m by specifying OPTION J2=-Xmx512m
> > > (instead of OPTION J2=-Xmx384m).
> > >
> > > SERVICE STAX LIBRARY JSTAF EXECUTE
> > > /usr/local/staf/services/stax/STAX.jar \
> > > OPTION JVM=/usr/java/jdk1.6.0_16/bin/java OPTION JVMName=STAX \
> > > OPTION J2=-Xmx384m PARMS "MAXFILECACHESIZE 50"
> > >
> > > --------------------------------------------------------------
> > > Sharon Lucas
> > > IBM Austin, luc...@us.ibm.com
> > > (512) 286-7313 or Tieline 363-7313

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

Reply via email to