Bodo,
What are the contents of the STAX Job Log and the STAX Job User Log when
this job hangs?
You said the job hangs once out of every 20 or 30 times it is run. When
the job hangs, did the job take a different path? For example:
- Did a test fail?
- Were different tests run?
- Did a different number of tests run?
- Did a hold or terminate block or job occur?
The STAX Job Log and the STAX Job User Log might provide more information
on what was different when the job hangs. If you could get a solid
recreation scenario, it would be easier to debug...
On a side note, try adding the NUMTHREADS parameter when registering the
STAX service to increase the number of physical threads that the STAX
service can use. NUMTHREADS specifies the number of physical threads that
the STAX Service will use. The default is 5,. Try specifying 10 and see
if that makes a difference in whether the job hangs. For example:
SERVICE STAX LIBRARY JSTAF EXECUTE
{STAF/Config/STAFRoot}/services/stax/STAX.jar \
OPTION J2=-Xmx1024m PARMS "PYTHONOUTPUT JobUserLogAndMsg
NUMTHREADS 10"
--------------------------------------------------------------
Sharon Lucas
IBM Austin, luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313
Strösser, Bodo <bodo.stroes...@ts.fujitsu.com>
08/03/2009 01:16 PM
To
Sharon Lucas/Austin/i...@ibmus
cc
"'staf-users@lists.sourceforge.net'" <staf-users@lists.sourceforge.net>
Subject
RE: [staf-users] STAX Job hangs
Some info that i've forgotten:
The program EMACHRemoteHelper conditionally writes stdout and stderr
to STAXMon, depending on a call parameter of the script and/or test tool
definition. It uses the C-interface "submit2" for this.
Bodo
From: Strösser, Bodo
Sent: Monday, August 03, 2009 8:08 PM
To: 'Sharon Lucas'
Cc: 'staf-users@lists.sourceforge.net'
Subject: RE: [staf-users] STAX Job hangs
Sharon,
the job does not hang evey time, but once in about 20 or 30 runs only.
STAF.cfg is:
# Turn on tracing of internal errors and deprecated options
trace enable tracepoints "error deprecated"
# Enable TCP/IP connections
interface ssl library STAFTCP option Secure=Yes option Port=6550
interface tcp library STAFTCP option Secure=No option Port=6500
# Set default local trust
trust machine local://local level 5
# Default Service Loader Service
serviceloader library STAFDSLS
# EVENT and STAX services
service STAX library JSTAF EXECUTE
{STAF/Config/STAFRoot}/services/stax/STAX.jar option J2=-Xmx1024m PARMS
"PYTHONOUTPUT JobUserLogAndMsg"
service EVENT library JSTAF EXECUTE
{STAF/Config/STAFRoot}/services/stax/STAFEvent.jar
set MAXQUEUESIZE 10000
The name of the STAX script isn't report.xml, but EMACH-start.xml. I have
attached it.
Some short explanation: we will be using STAX to execute test runs defined
by a
test management system. Data from test management comes in a XML file -
here
its name is report.xml - but the STAX script does not parse it directly,
but uses
methods from a java library from test management's vendor (iTEP-Library).
But it parses other XML files that describe configuration or test tools
depending on the
tests that are requested by test management. Doing this in Python and
partly in Java,
Python data is created that contains all info about the test tools to run.
I never saw a job
hanging in this early phase of the script.
The second part of a run is executing the test tool calls one after the
other, checking
exit-code of test tools and sending result via iTEP-Lib to a result-XML
(to be imported
by test management system). While doing this, the script sometimes hangs
up, con-
suming´as much CPU as it can get.
As the user of my script expects specific tests to be executed, I'm trying
to suppress
info about STAF calls used by my script. On the other hand STAXMon
perfectly is
appropriate to visualize the test run by naming blocks and processes
according to
test tool names. So I wrote a Java helper to have an "invisible" version
of <stafcmd>.
Currently, our test tools are started using ssh (or rexec in rare cases).
STAF process
service might be used in future for that (have to convince the users). So
I wrote a
program called "EMACHRemoteHelper" that allows starting a remote process,
return
its exit code and possibly kill the remote running test tool using the
signal suitable for
the test tool. The helper itself on block termination is killed by SIGINT.
HTH
Bodo
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users