Bodo, please see my comands below in green.
--------------------------------------------------------------
Sharon Lucas
IBM Austin, luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313
Strösser, Bodo <bodo.stroes...@ts.fujitsu.com>
07/30/2009 12:26 PM
To
Sharon Lucas/Austin/i...@ibmus
cc
"staf-users@lists.sourceforge.net" <staf-users@lists.sourceforge.net>
Subject
RE: [staf-users] STAX Job hangs
Sharon,
thank you for your help. Please see my comments below.
Bodo
From: Sharon Lucas [mailto:luc...@us.ibm.com]
Sent: Thursday, July 30, 2009 6:39 PM
To: Strösser, Bodo
Cc: staf-users@lists.sourceforge.net
Subject: Re: [staf-users] STAX Job hangs
Bodo,
Were there any errors in the STAX JVM log?
What version of STAX and what version of STAF are you running?
You're doing several things in your <finally> element that we don't
recommend and can cause you not to be able to stop your STAX job.
1) First, in the STAX User's Guide (in the documentation for the <finally>
element) it says:
"Note that if you want to have a guaranteed way to stop a finally task,
you should have the first element contained in your finally task be a
block or timer element. For example, if you submit a request to terminate
the job, it will not terminate the job until the finally task(s) complete.
But if you submit a request to terminate a block that is currently running
which is contained within a finally task, then the block will be
terminated (it will not wait until that finally task completes)."
Yes, I know that. But this <finally> is written to wait for process
termination. It must not be aborted by user,
as this could cause the next test to be started while the previous one
still is running.
The part inside <if> will run only, if the process started in <try> was
terminated by use (block termination).
If the process ends normally, the next thing in the script is to reset
MyProcessHanndle.
Thus, I intentionally did not put a <block> inside the <finally>
Then you don't have a guaranteed way to stop a finally task, in which case
you can't complain that you can't stop your STAX job if you choose not to
do this.
So, you should change your <finally> element to contain a <block> as its
first task (and possible a <timer> element if you know that this process
should be able to be freed within a specified duration) For example:
<finally>
<block name="'FinallyCleanupBlock'">
<if expr="MyProcessHandle != 0">
....
</if>
</block>
</finally>
or, if you know this process should always complete within 30 minutes (or
whatever value), you can also specify a timer element.
<finally>
<block name="'FinallyCleanupBlock'">
<timer duration="'30m'">
<if expr="MyProcessHandle != 0">
....
</if>
</timer>
</block>
</finally>
2) You should be using a <loop> element instead of a <script> element
because STAX cannot stop an infinite loop in Python code, but it can stop
a <loop> element. Also, your loop is using up a lot of CPU as it only
waits 0.1 seconds before it repeats itself. You should have a longer wait
interval such as 10 seconds or more depending on how long it takes your
process to complete. If it takes a long time, then your wait interval
should be longer. If you have a long wait interval, then you could use a
<stafcmd> to submit a local DELAY request to the DELAY service instead of
using the Python time.sleep() because STAX cannot stop a Python
time.sleep() if you wanted to terminate the finally block.
Yes, I could use a <loop>. But that would not maqke sense without the
<block>, right?
So I wrote the code in Python entirely, as it would be unkillable anyway.
A <loop> makes sense without a <block>.
3) What does EMACHTools.STAFSubmit2() do? Is it using a <stafcmd> to
submit a STAF service request? A STAX job should only use a <stafcmd>
element to submit a STAF service request.
Yes, this is a STAF service request written in Java. I wrote this routine
to get a better way for
handling <testcase>. Using the XML-element, it is quite difficult to make
sure, that each
testcase opened will have its result even in case of block terminations.
So my script uses
STAF requests to stax service to start, update and stop a testcase. But I
don't like
to have STAF service symbols flickering on the screen, so I used Java
instead of <stafcmd>.
Is there any reason, that this might not work correctly?
Yes, there are reasons why this might not work correctly as I just tried
to tell you. If you don't use a <stafcmd>, then STAX is not aware that
you submitted a STAF service request. So STAX can't tell you what went
wrong (such as in this case). Why are you using a STAX job if you're not
using the elements it provides? Perhaps you simply should be running a
Java or Python program and use the STAF Python or STAF Java APIs to submit
your STAF service requests.
So, here's what I think your <finally> block should look like (based on
what little I know that you're trying to do). You should also see if your
other <finally> blocks need to be updated for the reasons I talked about
above.
<finally>
<block name="'FinallyCleanupBlock'">
<if expr="MyProcessHandle != 0">
<sequence>
<log message="1">
" Interaction '%s': Signal '%s' sent due to User Abort"
% \
(Interaction['Name'], Interaction['AbortSignal'])
</log>
<log message="1">
" Waiting for signalled interaction to exit"
</log>
<script>done = 0</script>
<loop while="not done">
<sequence>
<stafcmd name="'Free process handle %s' %
(MyProcessHandle)">
<location>'local'</location>
<service>'PROCESS'</service>
<request>'FREE HANDLE %s' % (MyProcessHandle)</request>
</stafcmd>
<if expr="RC == STAFRC.Ok or RC ==
STAFRC.HandleDoesNotExist">
<script>done = 1</script>
<elseif expr="RC == STAFRC.ProcessNotComplete">
<stafcmd name="'Delay for 10 seconds while waiting for
process to end'">
<location>'local'</location>
<service>'DELAY'</service>
<request>'DELAY 10s'</request>
</stafcmd>
</elseif>
<else>
<script>
FatalMsg = 'PROCESS FREE HANDLE %s failed with RC=%s'
% (MyProcessHandle, RC)
done = 1
</script>
</else>
</if>
</sequence>
</loop>
<call function="'EMACH_CheckError'"/>
<log message="1">
" Signalled interaction is gone"
</log>
</sequence>
</if>
</block>
</finally>
I don't know why the <finally> element is the last element shown in the
call stack. If it the STAX job was stuck in the infinite loop in the
<script> element, then I would have expected the <script> element to be
the last element shown in the call stack. Perhaps there's another problem
in your STAX job. But, you should first update your finally element(s) as
I recommended and see if that resolves the problem.
Yes, this is the question: why is the <finally> the last element on the
stack? If this is true, my job does NOT run in the Python loop.
Also, as I saw the process having exited, and even freed the handle
"manually", the loop must stop.
BTW: the hung thread is the only one in this STAX job, and the job is the
only one in the STAX service.
Note that the <finally> element shows up in the call stack before the
<try> element it's associated with. So, I'm confused as to why there is
no <try> element following the last <finally> element in your STAX job.
Can you provide your complete STAX job to me? The problem could be in the
<try> element associated with this <finally> element.
Is there any way to get a java call stack? The job still is looping, I
didn't kill it as I do not know, how fast I can recreate the problem.
No. Perhaps you should be be running a Java program instead (alone or via
a <process> element) and use the STAF Python or STAF Java APIs to submit
your STAF service requests.
--------------------------------------------------------------
Sharon Lucas
IBM Austin, luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313
Strösser, Bodo <bodo.stroes...@ts.fujitsu.com>
07/30/2009 09:57 AM
To
"staf-users@lists.sourceforge.net" <staf-users@lists.sourceforge.net>
cc
Subject
[staf-users] STAX Job hangs
Hi,
today my STAX-Job hanged up. This is the result of a thread query:
# staf local stax query job 14 thread 1
Response
--------
{
Thread ID : 1
Parent TID : <None>
Start Date-Time: 20090730-16:04:03
Call Stack : [
function: EMACH_main (Line: 804, File: /home/EMACH/EMACH-stax.xml,
Machine: local://local)
sequence: 12/12 (Line: 865, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
block: main.Test execution (Line: 1076, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/1 (Line: 1077, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1154, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1079, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
sequence: 10/10 (Line: 1080, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
iterate: 1/1 {'Name': '2.1.1 CSTA_base_all', 'Tes... (Line: 1124, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1125, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
block: main.Test execution.2:1:1 CSTA_base_all (Line: 1127, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 3/4 (Line: 1128, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
function: EMACH_ProcessTestCases (Line: 1331, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/1 (Line: 1338, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1493, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1340, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
iterate: 1/1 {'Name': 'TC_1', 'SUT': 'PINGUIN', '... (Line: 1342, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1343, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
block: main.Test execution.2:1:1 CSTA_base_all.TC_1 / PINGUIN (Line:
1346, File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1347, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1379, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1348, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
sequence: 5/5 (Line: 1349, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
function: EMACH_ProcessInteractions (Line: 1575, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
finally (Line: 1620, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1582, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
iterate: 1/2 {'Type': 'C', 'ExitPass': '0', 'Outp... (Line: 1583, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/3 (Line: 1584, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
if: Interaction['Type'] == 'F' (Line: 1586, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
function: EMACH_ProcessCaller (Line: 1781, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1787, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1907, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
]
Condition Stack: []
}
The part of the job included in the <finally> starting on line 1907 is:
<finally>
<if expr="MyProcessHandle != 0">
<sequence>
<log message="True">
" Interaction '%s': Signal '%s' sent due to User
Abort" % \
(Interaction['Name'], Interaction['AbortSignal']) </log>
<log message="True">
" Waiting for signalled interaction to exit"</log>
<script>
while True :
msg = EMACHTools.STAFsubmit2('LOCAL', 'PROCESS', \
'FREE HANDLE %s' % MyProcessHandle)
if msg == None :
break
RC = msg.split("RC=")[1]
RC = int(RC.split(",")[0])
if RC == 5 : # RC 5 is 'Handle does not exist'
break
if RC != 12 : # RC 12 is 'Process Not Complete'
FatalMsg = msg
break
time.sleep(0.1)
</script>
<call function="'EMACH_CheckError'"/>
<log message="True">
" Signalled interaction is gone"</log>
</sequence>
</if>
</finally>
The code is here to wait for a process termination after user has
terminated a block.
I looked for the process the loop wait for, found it to be done and having
a RC. So I
released the handle via staf, but that didn't make the job run.
# staf local process list
Response
--------
H# Command Start Date-Time End Date-Time RC
-- ----------------------------- ----------------- -----------------
----------
38 /home/EMACH/EMACHRemoteHelper 20090727-21:02:12 20090728-13:57:26 129
# staf local process free handle 38
Response
--------
# staf local process list
Response
--------
#
Thinking about the call trace: why do I see <finally> as the innermost
element? I would guess
the job is looping in <script>, but shouldn't <if> be the innermost then?
When looking for
the jvm (ps command) I see its CPU-usage count as fast as real time.
Any help to find the problem is welcome.
Best Regards
Bodo
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008
30-Day
trial. Simplify your report design, integration and deployment - and focus
on
what you do best, core application coding. Discover what's new with
Crystal Reports now.
http://p.sf.net/sfu/bobj-july_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users