Sharon,
this night my short script recreated the problem!
Unfortunately I didn't do any logging, so I don't know how long it looped
until the hang occured. It was running monitored by STAXMon. STAXMon
was running on the same Linux machine as the STAX service, but the
DISPLAY was redirected to my workplace (Win-XP running Exceed).
You'll find the script attached.
Bodo
________________________________
From: Strösser, Bodo
Sent: Tuesday, August 04, 2009 10:19 PM
To: 'Sharon Lucas'
Cc: 'staf-users@lists.sourceforge.net'
Subject: RE: [staf-users] STAX Job hangs
Sharon,
I tried to create a smaller script that recreates the problem, but had no
success yet.
Looking into the STAX code, one should think that this condition never can
occur:
Finally's state is THREAD_WAIT, but HardHoldCondition is removed from thread.
fFinallyThread's fState field is STATE_COMPLETE and its fCompletionNotifiees
list
is empty.
Would it make sense to build a new STAX.jar with DEBUG set in
STAXFinallyAction.java?
Or even might this be a bug in JVM?
Bodo
________________________________
From: Strösser, Bodo
Sent: Tuesday, August 04, 2009 8:33 PM
To: 'Sharon Lucas'
Cc: 'staf-users@lists.sourceforge.net'
Subject: RE: [staf-users] STAX Job hangs
Sharon,
have played with jdb a bit.
No matter when I stop all the threads inside JVM, one of the ten "worker
threads" is
on the "synchronized (fConditionSet)" on line 1359 in STAXThread.java:
Thread-11[1] where
[1] com.ibm.staf.service.stax.STAXThread.execute (STAXThread.java:1.359)
[2] com.ibm.staf.service.stax.STAXThreadQueue$QueueThread.run
(STAXThreadQueue.java:54)
Thread-11[1]
I also found out how to dump object data. For example here is the <finally>:
Thread-11[1] dump this.fActionStack.header.previous.element
this.fActionStack.header.previous.element = {
DEBUG: false
INIT: 0
TRY_ACTION: 1
WAIT_THREAD: 2
THREAD_COMPLETE: 3
COMPLETE: 4
INIT_STRING: "INIT"
TRY_ACTION_STRING: "TRY_ACTION"
WAIT_THREAD_STRING: "WAIT_THREAD"
THREAD_COMPLETE_STRING: "THREAD_COMPLETE"
COMPLETE_STRING: "COMPLETE"
STATE_UNKNOWN_STRING: "UNKNOWN"
USE_SAME_PYINTERPRETER: true
fHardHoldCondition: instance of
com.ibm.staf.service.stax.STAXHardHoldThreadCondition(id=3162)
fState: 2
fTryAction: instance of com.ibm.staf.service.stax.STAXTryAction(id=3163)
fFinallyAction: instance of com.ibm.staf.service.stax.STAXIfAction(id=3164)
fFinallyThread: instance of com.ibm.staf.service.stax.STAXThread(id=3165)
fSaveConditionList: instance of java.util.ArrayList(id=3166)
fSavedConditions: false
fHasInheritableConditions: false
com.ibm.staf.service.stax.STAXActionDefaultImpl.fElement: "finally"
com.ibm.staf.service.stax.STAXActionDefaultImpl.fXmlFile:
"/home/STAF/emach/EMACH-stax.xml"
com.ibm.staf.service.stax.STAXActionDefaultImpl.fXmlMachine: "local://local"
com.ibm.staf.service.stax.STAXActionDefaultImpl.fElementInfo: instance of
com.ibm.staf.service.stax.STAXElementInfo(id=3170)
com.ibm.staf.service.stax.STAXActionDefaultImpl.fLineNumberMap: instance of
java.util.HashMap(id=3171)
}
So, maybe it's really possible to dump out what's wrong. I guess, I even could
set break points in the code (e.g. finally action). So, if you have some
questions, i'll try to find out the answers from jdb.
Regarding the informations you sent, please see my comments below in teal.
Bodo
________________________________
From: Sharon Lucas [mailto:luc...@us.ibm.com]
Sent: Tuesday, August 04, 2009 8:01 PM
To: Strösser, Bodo
Cc: 'staf-users@lists.sourceforge.net'
Subject: RE: [staf-users] STAX Job hangs
Bodo,
When a <try> element with a <finally> element is encountered, the <finally>
element is added to the call stack before the <try> element (as part of the
code that ensures that the finally element is always run), unlike any other
STAX element. So, just because the <finally> element is on the top of the call
stack doesn't mean that the hang necessarily occurred within the finally.
Something strange is happening though where the finally element is not being
removed from the call stack.
Yes, the problem might occur in the <try>, but what cycles definitly is the
finally, as this.fActionStack.header.previous.element is a <finally> (see
above).
It would be helpful if you added some more <log> elements to debug this problem
(you don't have to send these to the STAX Monitor, just log them in the STAX
Job User Log). For example, to know if the <finally> element started
execution, add a <log> as the first task in the finally element so that even if
MyProcessHandle is 0, you'll know if the finally element started execution.
Also, to know if the <finally> element completed, add a <log> as the last task
in the finally element. For example:
<finally>
<sequence>
<log>"Entering Finally block"</log>
<if expr="MyProcessHandle != 0">
<sequence>
<log message="True">
" Interaction '%s': Signal '%s' sent due to User Abort"
% \
(Interaction['Name'], Interaction['AbortSignal']) </log>
...
<log message="True">
" Signalled interaction is gone"</log>
</sequence>
</if>
<log>"Exiting Finally block"</log>
</sequence>
</finally>
</try>
Yes, I will insert loggongs as you suggest.
Even though the process is no longer running (as you verified via the ps
command), you need to verify if both STAF and STAX have been notified that the
process is no longer running. You can do this as follows:
1) To see if STAF knows that the process is no longer running:
STAF processMachine PROCESS LIST HANDLES LONG
Is the process handle still in the list? If it's not in the list, then STAF
knows the process has completed and its process completion information has been
freed. If the process handle is in the list, if its "End Date-Time" and
"Return Code" fields contain a value other than <None>, then the process has
completed but its process completion infiormation has not yet been freed.
The list is empty.
2) To see if STAX knows that the process is no longer running:
STAF staxMachine STAX LIST JOB 15 PROCESSES
Is the process handle in the list? If it's in the list, then STAX has not been
notified (or did not receive the notification) that the process has completed.
The list is empty. BTW: I did expect this, as STAXMon removed the gearwheel for
the process from the screen.
Also, as a side note, why do you have a <try>/<finally> where the <try> element
contains <nop/> like as follows? It doesn't really make any sense to do this
as the purpose of the finally element is to ensure that the finally element's
task is executed, no matter whether the tr y task completes normally or
abnormally. Since a <nop/> element does nothing (e.g. no operation), then it's
can't fail, so it doesn't make sense to do that. You should change this as
follows:
Thank you for the hint, but I did it intentionally. For me,´the finally block
guarantees, that the included code is executed "atomically". This means, it is
a part of the run that may not be interrupted by user's block termination.
Change:
<try>
<nop/>
<finally>
<sequence>
...
</sequence>
</finally>
</try>
to:
<sequence>
...
</sequence>
Let me know when you have an easier recreation scenario (e.g. one that I could
run on my STAX machine to recreate the problem and debug it). That's going to
be the most likely way that this problem will be resolved.
Yes, but from the last time (<hold> in <finally>) I know, this means a lot of
try and error. Meanwhile it is a complex script, not easy to make a small
recreator from it.
--------------------------------------------------------------
Sharon Lucas
IBM Austin, luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313
Strösser, Bodo <bodo.stroes...@ts.fujitsu.com>
08/04/2009 10:38 AM
To
Sharon Lucas/Austin/i...@ibmus
cc
"'staf-users@lists.sourceforge.net'" <staf-users@lists.sourceforge.net>
Subject
RE: [staf-users] STAX Job hangs
Sharon,
for Job 7 it's exactly what I've sent, the part starting on 20090803 up to
20090804
10:53. On 10:53 the hang imlicitly was released by STAF shutdown.
Those logs of JOB 7 show the last started <process> being completed when the
job stopped.
Currently again a job is hanging (10 Threads). The logs are appended.
This time, the logs say that the <process> still is running. But that isn't
true.
The process is gone (ps command) and also no longer is displayed by
STAXMon (no gearwheel).
STAF local STAX QUERY JOB 15 THREAD 1 says, that the job hangs inside of
the <finally> on line 1923, as it did when I mailed the first time. So, the
script
simply didn't reach the line, where the completion message for the <process> is
logged.
This time I didn't terminate any block, so the <process> in <try> came to
its normal end and the following <script> must have resetted MyProcessHandle
to 0. Thus, the <if> on line 1924 must be false and all the content of the
<finally>
must be skipped. How can it hang in an empty <finally>?
The only thing that is common to all hangs I've looked into is a <finally> on
top of
the stack.
The STAX job still hangs and I can connect jdb to the JVM. If you have more
experience using jdb, maybe you could tell me how to get more info from it.
Bodo
BTW: I'll try to strip off my script to have an easy way to recreate the
problem.
But that might take a lot of time. If there is a chance to catch the problem
using
the current script, it would be better for me.
________________________________
From: Sharon Lucas [mailto:luc...@us.ibm.com]
Sent: Tuesday, August 04, 2009 4:43 PM
To: Strösser, Bodo
Cc: 'staf-users@lists.sourceforge.net'
Subject: Re: [staf-users] STAX Job hangs
Bodo,
What are the contents of the STAX Job Log and the STAX Job User Log when this
job hangs?
--------------------------------------------------------------
Sharon Lucas
IBM Austin, luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313[attachment "Job_15_User.log" deleted by
Sharon Lucas/Austin/IBM] [attachment "Job_15.log" deleted by Sharon
Lucas/Austin/IBM]
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE stax SYSTEM "stax.dtd">
<stax>
<defaultcall function="EMACH_main"/>
<signalhandler signal="'STAXProcessStartError'">
<nop/>
</signalhandler>
<!-- ####################################################################### -->
<function name="EMACH_main" scope="global">
<sequence>
<loop while="1">
<sequence>
<block name="'level 1'">
<iterate var="n" in="[1, 2, 3, 4, 5]">
<block name="'level 2 / %d' % n">
<iterate var="k" in="[10, 20, 30, 40, 50]">
<try>
<stafcmd name="'DELAY %d' % (n + k)">
<location>"LOCAL"</location>
<service>"DELAY"</service>
<request>"DELAY 1s"</request>
</stafcmd>
<finally>
<if expr="0">
<nop/>
</if>
</finally>
</try>
</iterate>
</block>
</iterate>
</block>
<stafcmd>
<location>"LOCAL"</location>
<service>"DELAY"</service>
<request>"DELAY 1s"</request>
</stafcmd>
</sequence>
</loop>
</sequence>
</function>
</stax>
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users