after removing/adding the STAX service (and taking the opportunity to switch to a newer JVM), the CPU load has dropped.
before the service restart, strace did show STAFProc as being awfully busy, compared to what I see now. now I see it stopping every second or so, whereas before it spewed at a rate of 5-20 lines a second. here is a sampling of output, not that I expect it will be particularly revealing: http://pastebin.com/m769fe708 thanks again for your help in digging into this matter; perhaps the newer JVM will have fixed the issue, though I suspect it's our tendency towards abusive stax jobs, and we'll see this crop up again. nathan ----- "Sharon Lucas" <luc...@us.ibm.com> wrote: > Right. STAF does not currently ever "reap" threads. If at one time 270 > threads were created, then that's how many threads will remain > created. We have considered adding the ability to reap threads, but > that has not been implemented. > > However, this should not effect CPU usage. But it does effect memory > usage, as when a thread is created, a default thread stack of a size > dependent on the operating system is created. On Linux machines, STAF > overrides the default thread stack size to $M if its > 4M (as it can > be 8M or 10M) in STAF V3.2.3 or later. Note that you can further > override it by setting the STAF_THREAD_STACK_SIZE environment variable > before starting STAFProc. For example, setting > STAF_THREAD_STACK_SIZE=4194304 specifies 4M as the value specified is > the thread stack size in kilobytes. > > -------------------------------------------------------------- > Sharon Lucas > IBM Austin, luc...@us.ibm.com > (512) 286-7313 or Tieline 363-7313 > > > > Nathan Parrish <nparr...@clustrix.com> > > 11/03/2009 03:29 PM > To Sharon Lucas/Austin/i...@ibmus > > cc staf-users@lists.sourceforge.net > > Subject Re: [staf-users] high CPU utilization by STAFProc and STAX > JVM > > > > > right, I understand this. my question is how 'thread growth' works. > presumably the delta controls some behavior regarding how many new > threads we spin off when the system detects that we are running > low/out of threads. > also, are threads ever reaped? or if my system was incredibly busy at > some time in the past, such that 270 threads were really all going at > once, will I have all those threads sitting around even if they are no > longer unneeded? is it possible that the overhead of this many idle > threads could be causing my CPU churn? > > thanks, > nathan > > ----- "Sharon Lucas" <luc...@us.ibm.com> wrote: > > > No, the STAF MISC service simply shows the initial threads and the > > thread growth delta. It doesn't show the number of threads that STAF > > has created. > > > > -------------------------------------------------------------- > > Sharon Lucas > > IBM Austin, luc...@us.ibm.com > > (512) 286-7313 or Tieline 363-7313 > > > > > > > > Nathan Parrish <nparr...@clustrix.com> > > > > 11/03/2009 02:43 PM > > To Sharon Lucas/Austin/i...@ibmus > > > > cc staf-users@lists.sourceforge.net > > > > Subject Re: [staf-users] high CPU utilization by STAFProc and STAX > > JVM > > > > > > > > > > Our longevity test has finally finished, so I'm able to poke around > a > > bit deeper, and can look at doing things like restarting the STAX > > service, or STAFProc itself later this afternoon. > > > > I enabled tracing, and I see traffic on the order of maybe 5-20 > > queries a second, mostly the VAR service (90+% gets), and some > > semaphore requests as well. I'll get breaks in traffic as long as 3 > > seconds... > > > > short of doing gdb, I tried strace -fp on the process, and noticed > > that it was quite busy, and was also dealing with a very large > number > > of child processes. I believe linux does some weird stuff mixing > what > > constitutes a process vs. a thread; I don't see all these process > with > > ps, but pstree -p does show them, something like ~270 (not counting > > the java procs/threads which are also underneath the main STAFProc > > PID). looking on another machine (which is not running STAX, or > > servicing tons of variable requests, etc.), it has 82. my desktop, > > which does run STAX for testing purposes, looks to have 90. > > > > from misc settings: > > Initial Threads : 10 > > Thread Growth Delta : 1 > > > > does this suggest that some threads have gotten spun off into > > never-never land and other threads have been created as a result? > > > > > > > > ----- "Sharon Lucas" <luc...@us.ibm.com> wrote: > > > Unfortunately, I didn't really get many clues as to why STAFProc > is > > > using up so much CPU from the information you provided. Does the > CPU > > > usage for STAFProc constantly stay very high (e.g. above 100%)? > > > > > > What are your STAX jobs doing? Can you give a description of some > of > > > the STAF service requests that they are submitting. > > > > > > Since STAFProc was started 21 days ago on your STAX service > machine, > > > 1,832,518,328 STAF service requests have been submitted to this > > > machine. That's a lot of STAF service requests. Do you have any > > > "rogue" STAX jobs that are constantly submitting STAF service > > requests > > > in a loop (without a good reason)? What STAF service requests are > > > being submitted the most? > > > > > > Yes, you can enable STAF tracing to see if that gives any clues > > about > > > what's driving the CPU load (though note that enabling STAF > tracing > > > may slow things down a little). To see what STAF service requests > > are > > > being submitted and when each STAF service request completes, you > > > could enable trace points ServiceRequest, ServiceComplete, and > > > RemoteRequests. Note that this will generate tons of trace output > > > since lots of STAF service requests are being submitted to this > > > machine, so you'll also want to redirect STAFProc's trace output > to > > a > > > file in a location where there is lots of available disk space and > > > monitor the size of this file. You may also want to enable the > > Warning > > > tracepoint so that any warning messages are also logged.For > example: > > > > > > STAF staxMachine TRACE SET DESTINATION TO FILE > > > /usr/local/staf/STAFProc.trc > > > STAF staxMachine TRACE ENABLE TRACEPOINTS "ServiceRequest > > > ServiceComplete RemoteRequests Warning" > > > > > > See section "8.18 Trace Service" in the STAF User's Guide for more > > > information. > > > > > > I don't know if the trace output will help us in determining why > > > STAFProc is using up so much CPU. Maybe there is some thread in > > > STAFProc that is in a bad state and constantly looping for some > > > unknown reason (like the STAFProcessMonitor thread that monitors > for > > > processes to complete). The only way I know of to check this would > > be > > > if STAFProc was started via a debugger like gdb. Then, once it got > > in > > > this "bad" state of high CPU usage, you could break in via gdb and > > > list threads (info threads) and change to each thread (thread n) > and > > > check the backtraces (bt) for each thread to see what they are > > doing. > > > For example: > > > > > > Use gdb to debug STAF locally as follows: > > > 1. gdb STAFProc > > > 2. run > > > 3. Recreate the problem. > > > > > > Various commands that you might need while using gdb are: > > > > > > • help > > > • help tracepoints > > > • help stack > > > • info threads > > > • thread n > > > • bt > > > > > > Of course, this will require that STAF be shut down and then > > restarted > > > using gdb so you may not be able to do that now while your long > runs > > > are still running. > > > > > > There's no reason that I know of not to use a 1.6.0 JVM with the > > STAX > > > service so you can try that instead of upgrading to a more recent > > > 1.5.0 JVM. > > > > > > You may want to increase the STAX service's MaxFileCacheSize from > 20 > > > to something like 50. This won't "fix" the problem of STAFProc > using > > > 100%+ CPU, but whenever a STAX job is executed, it first needs to > be > > > XML parsed and this is a very CPU-intensive process (so if the CPU > > > usage is already high, it will take longer for STAX to parse a > STAX > > > job before execution of the STAX job begins). So, if you are > running > > > the same STAX job file more than one, the first time STAX needs to > > > parse it, but then it will cache it so that if the exact same STAX > > job > > > is submitted to be executed again, it doesn't have to be re-parsed > > (if > > > it's still in the STAX file cache). So, that's why I recommended > > > increasing the STAX service's MaxFileCacheSize. It can be > increased > > > dynamically as follows: > > > > > > STAF staxMachine STAX SET MAXFILECACHESIZE 50 > > > > > > Note that this setting only applies to this instance of the STAX > > > service. If you shutdown and restarted STAFProc, it would no > longer > > > apply. You would want to add this setting when registering the > STAX > > > service in your STAF.cfg file to make it "permanent". > > > > > > You may also want to increase the maximum heap size for the STAX > JVM > > > from 384m to something like 512m by specifying OPTION J2=-Xmx512m > > > (instead of OPTION J2=-Xmx384m). > > > > > > SERVICE STAX LIBRARY JSTAF EXECUTE > > > /usr/local/staf/services/stax/STAX.jar \ > > > OPTION JVM=/usr/java/jdk1.6.0_16/bin/java OPTION JVMName=STAX \ > > > OPTION J2=-Xmx384m PARMS "MAXFILECACHESIZE 50" > > > > > > -------------------------------------------------------------- > > > Sharon Lucas > > > IBM Austin, luc...@us.ibm.com > > > (512) 286-7313 or Tieline 363-7313 ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ staf-users mailing list staf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/staf-users