Sharon,

thank you for your help. Please see my comments below.

Bodo

________________________________
From: Sharon Lucas [mailto:luc...@us.ibm.com]
Sent: Thursday, July 30, 2009 6:39 PM
To: Strösser, Bodo
Cc: staf-users@lists.sourceforge.net
Subject: Re: [staf-users] STAX Job hangs


Bodo,

Were there any errors in the STAX JVM log?
What version of STAX and what version of STAF are you running?

You're doing several things in your <finally> element that we don't recommend 
and can cause you not to be able to stop your STAX job.

1) First, in the STAX User's Guide (in the documentation for the <finally> 
element) it says:

"Note that if you want to have a guaranteed way to stop a finally task, you 
should have the first element contained in your finally task be a block or 
timer element. For example, if you submit a request to terminate the job, it 
will not terminate the job until the finally task(s) complete. But if you 
submit a request to terminate a block that is currently running which is 
contained within a finally task, then the block will be terminated (it will not 
wait until that finally task completes)."

Yes, I know that. But this <finally> is written to wait for process 
termination. It must not be aborted by user,
as this could cause the next test to be started while the previous one still is 
running.
The part inside <if> will run only, if the process started in <try> was 
terminated by use (block termination).
If the process ends normally, the next thing in the script is to reset 
MyProcessHanndle.
Thus, I intentionally did not put a <block> inside the <finally>

So, you should change your <finally> element to contain a <block> as its first 
task (and possible a <timer> element if you know that this process should be 
able to be freed within a specified duration)  For example:

  <finally>
     <block name="'FinallyCleanupBlock'">

       <if expr="MyProcessHandle != 0">
           ....
       </if>

    </block>
  </finally>

or, if you know this process should always complete within 30 minutes (or 
whatever value), you can also specify a timer element.

  <finally>
     <block name="'FinallyCleanupBlock'">
       <timer duration="'30m'">

         <if expr="MyProcessHandle != 0">
           ....
         </if>

       </timer>
    </block>
  </finally>


2) You should be using a <loop> element instead of  a <script> element because 
STAX cannot stop an infinite loop in Python code, but it can stop a <loop> 
element.  Also, your loop is using up a lot of CPU as it only waits 0.1 seconds 
before it repeats itself.  You should have a longer wait interval such as 10 
seconds or more depending on how long it takes your process to complete.  If it 
takes a long time, then your wait interval should be longer.  If you have a 
long wait interval, then you could use a <stafcmd> to submit a local DELAY 
request to the DELAY service instead of using the Python time.sleep() because 
STAX cannot stop a Python time.sleep() if you wanted to terminate the finally 
block.

Yes, I could use a <loop>. But that would not maqke sense without the <block>, 
right?
So I wrote the code in Python entirely, as it would be unkillable anyway.

3) What does EMACHTools.STAFSubmit2() do?  Is it using a <stafcmd> to submit a 
STAF service request?  A STAX job should only use a <stafcmd> element to submit 
a STAF service request.

Yes, this is a STAF service request written in Java. I wrote this routine to 
get a better way for
handling <testcase>. Using the XML-element, it is quite difficult to make sure, 
that each
testcase opened will have its result even in case of block terminations. So my 
script uses
STAF requests to stax service to start, update and stop a testcase. But I don't 
like
to have STAF service symbols flickering on the screen, so I used Java instead 
of <stafcmd>.
Is there any reason, that this might not work correctly?

So, here's what I think your <finally> block should look like (based on what 
little I know that you're trying to do).  You should also see if your other 
<finally> blocks need to be updated for the reasons I talked about above.

   <finally>
     <block name="'FinallyCleanupBlock'">
       <if expr="MyProcessHandle != 0">
         <sequence>

           <log message="1">
            "        Interaction '%s': Signal '%s' sent due to User Abort" % \
            (Interaction['Name'], Interaction['AbortSignal'])
           </log>
          <log message="1">
             "        Waiting for signalled interaction to exit"
           </log>

           <script>done = 0</script>

           <loop while="not done">

             <sequence>

               <stafcmd name="'Free process handle %s' % (MyProcessHandle)">
                 <location>'local'</location>
                 <service>'PROCESS'</service>
                 <request>'FREE HANDLE %s' % (MyProcessHandle)</request>
               </stafcmd>

               <if expr="RC == STAFRC.Ok or RC == STAFRC.HandleDoesNotExist">
                 <script>done = 1</script>
                 <elseif expr="RC == STAFRC.ProcessNotComplete">
                   <stafcmd name="'Delay for 10 seconds while waiting for 
process to end'">
                     <location>'local'</location>
                     <service>'DELAY'</service>
                     <request>'DELAY 10s'</request>
                   </stafcmd>
                 </elseif>
                 <else>
                   <script>
                     FatalMsg = 'PROCESS FREE HANDLE %s failed with RC=%s' % 
(MyProcessHandle, RC)
                     done = 1
                   </script>
                 </else>
               </if>

             </sequence>
           </loop>

           <call function="'EMACH_CheckError'"/>

           <log message="1">
             "        Signalled interaction is gone"
           </log>

         </sequence>
       </if>
    </block>
  </finally>

I don't know why the <finally> element is the last element shown in the call 
stack.  If it the STAX job was stuck in the infinite loop in the <script> 
element, then I would have expected the <script> element to be the last element 
shown in the call stack.  Perhaps there's another problem in your STAX job.  
But, you should first update your finally element(s) as I recommended and see 
if that resolves the problem.

Yes, this is the question: why is the <finally> the last element on the stack? 
If this is true, my job does NOT run in the Python loop.
Also, as I saw the process having exited, and even freed the handle "manually", 
the loop must stop.
BTW: the hung thread is the only one in this STAX job, and the job is the only 
one in the STAX service.

Is there any way to get a java call stack? The job still is looping, I didn't 
kill it as I do not know, how fast I can recreate the problem.


--------------------------------------------------------------
Sharon Lucas
IBM Austin,   luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313



Strösser, Bodo <bodo.stroes...@ts.fujitsu.com>

07/30/2009 09:57 AM

To
"staf-users@lists.sourceforge.net" <staf-users@lists.sourceforge.net>
cc
Subject
[staf-users] STAX Job hangs





Hi,

today my STAX-Job hanged up. This is the result of a thread query:

# staf local stax query job 14 thread 1
Response
--------
{
 Thread ID      : 1
 Parent TID     : <None>
 Start Date-Time: 20090730-16:04:03
 Call Stack     : [
   function: EMACH_main (Line: 804, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   sequence: 12/12 (Line: 865, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   block: main.Test execution (Line: 1076, File: /home/EMACH/EMACH-stax.xml, 
Machine: local://local)
   sequence: 1/1 (Line: 1077, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1154, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1079, File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 10/10 (Line: 1080, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   iterate: 1/1 {'Name': '2.1.1 CSTA_base_all', 'Tes... (Line: 1124, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/2 (Line: 1125, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   block: main.Test execution.2:1:1 CSTA_base_all (Line: 1127, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 3/4 (Line: 1128, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   function: EMACH_ProcessTestCases (Line: 1331, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/1 (Line: 1338, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1493, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1340, File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
   iterate: 1/1 {'Name': 'TC_1', 'SUT': 'PINGUIN', '... (Line: 1342, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/2 (Line: 1343, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   block: main.Test execution.2:1:1 CSTA_base_all.TC_1 / PINGUIN (Line: 1346, 
File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/2 (Line: 1347, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1379, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1348, File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 5/5 (Line: 1349, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   function: EMACH_ProcessInteractions (Line: 1575, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   finally (Line: 1620, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1582, File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
   iterate: 1/2 {'Type': 'C', 'ExitPass': '0', 'Outp... (Line: 1583, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/3 (Line: 1584, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   if: Interaction['Type'] == 'F' (Line: 1586, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   function: EMACH_ProcessCaller (Line: 1781, File: /home/EMACH/EMACH-stax.xml, 
Machine: local://local)
   sequence: 1/2 (Line: 1787, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1907, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
 ]
 Condition Stack: []
}


The part of the job included in the <finally> starting on line 1907 is:

      <finally>
       <if expr="MyProcessHandle != 0">
         <sequence>

            <log message="True">
             "        Interaction '%s': Signal '%s' sent due to User Abort" % \
             (Interaction['Name'], Interaction['AbortSignal']) </log>
           <log message="True">
             "        Waiting for signalled interaction to exit"</log>

            <script>
             while True :
               msg = EMACHTools.STAFsubmit2('LOCAL', 'PROCESS', \
                       'FREE HANDLE %s' % MyProcessHandle)
               if msg == None :
                 break
               RC = msg.split("RC=")[1]
               RC = int(RC.split(",")[0])
               if RC == 5 : # RC 5 is 'Handle does not exist'
                 break
               if RC != 12 : # RC 12 is 'Process Not Complete'
                 FatalMsg = msg
                 break
               time.sleep(0.1)
           </script>
           <call function="'EMACH_CheckError'"/>

            <log message="True">
             "        Signalled interaction is gone"</log>

          </sequence>
       </if>
     </finally>

The code is here to wait for a process termination after user has terminated a 
block.
I looked for the process the loop wait for, found it to be done and having a 
RC. So I
released the handle via staf, but that didn't make the job run.


# staf local process list
Response
--------
H# Command                       Start Date-Time   End Date-Time     RC
-- ----------------------------- ----------------- ----------------- ----------
38 /home/EMACH/EMACHRemoteHelper 20090727-21:02:12 20090728-13:57:26 129

# staf local process free handle 38
Response
--------

# staf local process list
Response
--------

#


Thinking about the call trace: why do I see <finally> as the innermost element? 
I would guess
the job is looping in <script>, but shouldn't <if> be the innermost then? When 
looking for
the jvm (ps command) I see its CPU-usage count as fast as real time.

Any help to find the problem is welcome.

Best Regards
Bodo






 ------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  
http://p.sf.net/sfu/bobj-july_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

Reply via email to