Bodo,
Were there any errors in the STAX JVM log?
What version of STAX and what version of STAF are you running?
You're doing several things in your <finally> element that we don't
recommend and can cause you not to be able to stop your STAX job.
1) First, in the STAX User's Guide (in the documentation for the <finally>
element) it says:
"Note that if you want to have a guaranteed way to stop a finally task,
you should have the first element contained in your finally task be a
block or timer element. For example, if you submit a request to terminate
the job, it will not terminate the job until the finally task(s) complete.
But if you submit a request to terminate a block that is currently running
which is contained within a finally task, then the block will be
terminated (it will not wait until that finally task completes)."
So, you should change your <finally> element to contain a <block> as its
first task (and possible a <timer> element if you know that this process
should be able to be freed within a specified duration) For example:
<finally>
<block name="'FinallyCleanupBlock'">
<if expr="MyProcessHandle != 0">
....
</if>
</block>
</finally>
or, if you know this process should always complete within 30 minutes (or
whatever value), you can also specify a timer element.
<finally>
<block name="'FinallyCleanupBlock'">
<timer duration="'30m'">
<if expr="MyProcessHandle != 0">
....
</if>
</timer>
</block>
</finally>
2) You should be using a <loop> element instead of a <script> element
because STAX cannot stop an infinite loop in Python code, but it can stop
a <loop> element. Also, your loop is using up a lot of CPU as it only
waits 0.1 seconds before it repeats itself. You should have a longer wait
interval such as 10 seconds or more depending on how long it takes your
process to complete. If it takes a long time, then your wait interval
should be longer. If you have a long wait interval, then you could use a
<stafcmd> to submit a local DELAY request to the DELAY service instead of
using the Python time.sleep() because STAX cannot stop a Python
time.sleep() if you wanted to terminate the finally block.
3) What does EMACHTools.STAFSubmit2() do? Is it using a <stafcmd> to
submit a STAF service request? A STAX job should only use a <stafcmd>
element to submit a STAF service request.
So, here's what I think your <finally> block should look like (based on
what little I know that you're trying to do). You should also see if your
other <finally> blocks need to be updated for the reasons I talked about
above.
<finally>
<block name="'FinallyCleanupBlock'">
<if expr="MyProcessHandle != 0">
<sequence>
<log message="1">
" Interaction '%s': Signal '%s' sent due to User
Abort" % \
(Interaction['Name'], Interaction['AbortSignal'])
</log>
<log message="1">
" Waiting for signalled interaction to exit"
</log>
<script>done = 0</script>
<loop while="not done">
<sequence>
<stafcmd name="'Free process handle %s' %
(MyProcessHandle)">
<location>'local'</location>
<service>'PROCESS'</service>
<request>'FREE HANDLE %s' % (MyProcessHandle)</request>
</stafcmd>
<if expr="RC == STAFRC.Ok or RC ==
STAFRC.HandleDoesNotExist">
<script>done = 1</script>
<elseif expr="RC == STAFRC.ProcessNotComplete">
<stafcmd name="'Delay for 10 seconds while waiting for
process to end'">
<location>'local'</location>
<service>'DELAY'</service>
<request>'DELAY 10s'</request>
</stafcmd>
</elseif>
<else>
<script>
FatalMsg = 'PROCESS FREE HANDLE %s failed with RC=%s'
% (MyProcessHandle, RC)
done = 1
</script>
</else>
</if>
</sequence>
</loop>
<call function="'EMACH_CheckError'"/>
<log message="1">
" Signalled interaction is gone"
</log>
</sequence>
</if>
</block>
</finally>
I don't know why the <finally> element is the last element shown in the
call stack. If it the STAX job was stuck in the infinite loop in the
<script> element, then I would have expected the <script> element to be
the last element shown in the call stack. Perhaps there's another problem
in your STAX job. But, you should first update your finally element(s) as
I recommended and see if that resolves the problem.
--------------------------------------------------------------
Sharon Lucas
IBM Austin, luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313
Strösser, Bodo <bodo.stroes...@ts.fujitsu.com>
07/30/2009 09:57 AM
To
"staf-users@lists.sourceforge.net" <staf-users@lists.sourceforge.net>
cc
Subject
[staf-users] STAX Job hangs
Hi,
today my STAX-Job hanged up. This is the result of a thread query:
# staf local stax query job 14 thread 1
Response
--------
{
Thread ID : 1
Parent TID : <None>
Start Date-Time: 20090730-16:04:03
Call Stack : [
function: EMACH_main (Line: 804, File: /home/EMACH/EMACH-stax.xml,
Machine: local://local)
sequence: 12/12 (Line: 865, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
block: main.Test execution (Line: 1076, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/1 (Line: 1077, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1154, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1079, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
sequence: 10/10 (Line: 1080, File: /home/EMACH/EMACH-stax.xml,
Machine: local://local)
iterate: 1/1 {'Name': '2.1.1 CSTA_base_all', 'Tes... (Line: 1124,
File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1125, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
block: main.Test execution.2:1:1 CSTA_base_all (Line: 1127, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 3/4 (Line: 1128, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
function: EMACH_ProcessTestCases (Line: 1331, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/1 (Line: 1338, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1493, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1340, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
iterate: 1/1 {'Name': 'TC_1', 'SUT': 'PINGUIN', '... (Line: 1342,
File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1343, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
block: main.Test execution.2:1:1 CSTA_base_all.TC_1 / PINGUIN (Line:
1346, File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1347, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1379, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1348, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
sequence: 5/5 (Line: 1349, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
function: EMACH_ProcessInteractions (Line: 1575, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
finally (Line: 1620, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
try (Line: 1582, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
iterate: 1/2 {'Type': 'C', 'ExitPass': '0', 'Outp... (Line: 1583,
File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/3 (Line: 1584, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
if: Interaction['Type'] == 'F' (Line: 1586, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
function: EMACH_ProcessCaller (Line: 1781, File:
/home/EMACH/EMACH-stax.xml, Machine: local://local)
sequence: 1/2 (Line: 1787, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
finally (Line: 1907, File: /home/EMACH/EMACH-stax.xml, Machine:
local://local)
]
Condition Stack: []
}
The part of the job included in the <finally> starting on line 1907 is:
<finally>
<if expr="MyProcessHandle != 0">
<sequence>
<log message="True">
" Interaction '%s': Signal '%s' sent due to User
Abort" % \
(Interaction['Name'], Interaction['AbortSignal']) </log>
<log message="True">
" Waiting for signalled interaction to exit"</log>
<script>
while True :
msg = EMACHTools.STAFsubmit2('LOCAL', 'PROCESS', \
'FREE HANDLE %s' % MyProcessHandle)
if msg == None :
break
RC = msg.split("RC=")[1]
RC = int(RC.split(",")[0])
if RC == 5 : # RC 5 is 'Handle does not exist'
break
if RC != 12 : # RC 12 is 'Process Not Complete'
FatalMsg = msg
break
time.sleep(0.1)
</script>
<call function="'EMACH_CheckError'"/>
<log message="True">
" Signalled interaction is gone"</log>
</sequence>
</if>
</finally>
The code is here to wait for a process termination after user has
terminated a block.
I looked for the process the loop wait for, found it to be done and having
a RC. So I
released the handle via staf, but that didn't make the job run.
# staf local process list
Response
--------
H# Command Start Date-Time End Date-Time RC
-- ----------------------------- ----------------- -----------------
----------
38 /home/EMACH/EMACHRemoteHelper 20090727-21:02:12 20090728-13:57:26 129
# staf local process free handle 38
Response
--------
# staf local process list
Response
--------
#
Thinking about the call trace: why do I see <finally> as the innermost
element? I would guess
the job is looping in <script>, but shouldn't <if> be the innermost then?
When looking for
the jvm (ps command) I see its CPU-usage count as fast as real time.
Any help to find the problem is welcome.
Best Regards
Bodo
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008
30-Day
trial. Simplify your report design, integration and deployment - and focus
on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users