Bodo, please see my comands below in green.

--------------------------------------------------------------
Sharon Lucas
IBM Austin,   luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313




Strösser, Bodo <bodo.stroes...@ts.fujitsu.com> 
07/30/2009 12:26 PM

To
Sharon Lucas/Austin/i...@ibmus
cc
"staf-users@lists.sourceforge.net" <staf-users@lists.sourceforge.net>
Subject
RE: [staf-users] STAX Job hangs






Sharon,
 
thank you for your help. Please see my comments below.
 
Bodo

From: Sharon Lucas [mailto:luc...@us.ibm.com] 
Sent: Thursday, July 30, 2009 6:39 PM
To: Strösser, Bodo
Cc: staf-users@lists.sourceforge.net
Subject: Re: [staf-users] STAX Job hangs


Bodo, 

Were there any errors in the STAX JVM log? 
What version of STAX and what version of STAF are you running? 

You're doing several things in your <finally> element that we don't 
recommend and can cause you not to be able to stop your STAX job. 

1) First, in the STAX User's Guide (in the documentation for the <finally> 
element) it says: 

"Note that if you want to have a guaranteed way to stop a finally task, 
you should have the first element contained in your finally task be a 
block or timer element. For example, if you submit a request to terminate 
the job, it will not terminate the job until the finally task(s) complete. 
But if you submit a request to terminate a block that is currently running 
which is contained within a finally task, then the block will be 
terminated (it will not wait until that finally task completes)." 
 
Yes, I know that. But this <finally> is written to wait for process 
termination. It must not be aborted by user,
as this could cause the next test to be started while the previous one 
still is running.
The part inside <if> will run only, if the process started in <try> was 
terminated by use (block termination).
If the process ends normally, the next thing in the script is to reset 
MyProcessHanndle.
Thus, I intentionally did not put a <block> inside the <finally>

Then you don't have a guaranteed way to stop a finally task, in which case 
you can't complain that you can't stop your STAX job if you choose not to 
do this.
 
So, you should change your <finally> element to contain a <block> as its 
first task (and possible a <timer> element if you know that this process 
should be able to be freed within a specified duration)  For example: 

  <finally> 
     <block name="'FinallyCleanupBlock'"> 

       <if expr="MyProcessHandle != 0"> 
           .... 
       </if> 

    </block> 
  </finally> 

or, if you know this process should always complete within 30 minutes (or 
whatever value), you can also specify a timer element. 

  <finally> 
     <block name="'FinallyCleanupBlock'"> 
       <timer duration="'30m'"> 

         <if expr="MyProcessHandle != 0"> 
           .... 
         </if> 

       </timer> 
    </block> 
  </finally> 


2) You should be using a <loop> element instead of  a <script> element 
because STAX cannot stop an infinite loop in Python code, but it can stop 
a <loop> element.  Also, your loop is using up a lot of CPU as it only 
waits 0.1 seconds before it repeats itself.  You should have a longer wait 
interval such as 10 seconds or more depending on how long it takes your 
process to complete.  If it takes a long time, then your wait interval 
should be longer.  If you have a long wait interval, then you could use a 
<stafcmd> to submit a local DELAY request to the DELAY service instead of 
using the Python time.sleep() because STAX cannot stop a Python 
time.sleep() if you wanted to terminate the finally block.  
 
Yes, I could use a <loop>. But that would not maqke sense without the 
<block>, right?
So I wrote the code in Python entirely, as it would be unkillable anyway.

A <loop> makes sense without a <block>.
 
3) What does EMACHTools.STAFSubmit2() do?  Is it using a <stafcmd> to 
submit a STAF service request?  A STAX job should only use a <stafcmd> 
element to submit a STAF service request.   
 
Yes, this is a STAF service request written in Java. I wrote this routine 
to get a better way for
handling <testcase>. Using the XML-element, it is quite difficult to make 
sure, that each
testcase opened will have its result even in case of block terminations. 
So my script uses
STAF requests to stax service to start, update and stop a testcase. But I 
don't like
to have STAF service symbols flickering on the screen, so I used Java 
instead of <stafcmd>.
Is there any reason, that this might not work correctly? 

Yes, there are reasons why this might not work correctly as I just tried 
to tell you.  If you don't use a <stafcmd>, then STAX is not aware that 
you submitted a STAF service request.  So STAX can't tell you what went 
wrong (such as in this case).   Why are you using a STAX job if you're not 
using the elements it provides?  Perhaps you simply should be running a 
Java or Python program and use the STAF Python or STAF Java APIs to submit 
your STAF service requests.
 
So, here's what I think your <finally> block should look like (based on 
what little I know that you're trying to do).  You should also see if your 
other <finally> blocks need to be updated for the reasons I talked about 
above. 

   <finally> 
     <block name="'FinallyCleanupBlock'"> 
       <if expr="MyProcessHandle != 0"> 
         <sequence> 

           <log message="1">
            "        Interaction '%s': Signal '%s' sent due to User Abort" 
% \
            (Interaction['Name'], Interaction['AbortSignal']) 
           </log>
          <log message="1"> 
             "        Waiting for signalled interaction to exit" 
           </log> 

           <script>done = 0</script> 

           <loop while="not done"> 

             <sequence> 

               <stafcmd name="'Free process handle %s' % 
(MyProcessHandle)"> 
                 <location>'local'</location> 
                 <service>'PROCESS'</service> 
                 <request>'FREE HANDLE %s' % (MyProcessHandle)</request> 
               </stafcmd> 

               <if expr="RC == STAFRC.Ok or RC == 
STAFRC.HandleDoesNotExist"> 
                 <script>done = 1</script> 
                 <elseif expr="RC == STAFRC.ProcessNotComplete"> 
                   <stafcmd name="'Delay for 10 seconds while waiting for 
process to end'"> 
                     <location>'local'</location> 
                     <service>'DELAY'</service> 
                     <request>'DELAY 10s'</request> 
                   </stafcmd> 
                 </elseif> 
                 <else> 
                   <script> 
                     FatalMsg = 'PROCESS FREE HANDLE %s failed with RC=%s' 
% (MyProcessHandle, RC) 
                     done = 1 
                   </script> 
                 </else> 
               </if> 

             </sequence> 
           </loop> 

           <call function="'EMACH_CheckError'"/> 
  
           <log message="1"> 
             "        Signalled interaction is gone" 
           </log> 
  
         </sequence> 
       </if> 
    </block> 
  </finally> 

I don't know why the <finally> element is the last element shown in the 
call stack.  If it the STAX job was stuck in the infinite loop in the 
<script> element, then I would have expected the <script> element to be 
the last element shown in the call stack.  Perhaps there's another problem 
in your STAX job.  But, you should first update your finally element(s) as 
I recommended and see if that resolves the problem. 
   
Yes, this is the question: why is the <finally> the last element on the 
stack? If this is true, my job does NOT run in the Python loop.
Also, as I saw the process having exited, and even freed the handle 
"manually", the loop must stop.
BTW: the hung thread is the only one in this STAX job, and the job is the 
only one in the STAX service.

Note that the <finally> element shows up in the call stack before the 
<try> element it's associated with.  So, I'm confused as to why there is 
no <try> element following the last <finally> element in your STAX job. 
Can you provide your complete STAX job to me?  The problem could be in the 
<try> element associated with this <finally> element.
 
Is there any way to get a java call stack? The job still is looping, I 
didn't kill it as I do not know, how fast I can recreate the problem.
 
No.  Perhaps you should be be running a Java program instead (alone or via 
a <process> element) and use the STAF Python or STAF Java APIs to submit 
your STAF service requests.
 
--------------------------------------------------------------
Sharon Lucas
IBM Austin,   luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313



Strösser, Bodo <bodo.stroes...@ts.fujitsu.com> 
07/30/2009 09:57 AM 


To
"staf-users@lists.sourceforge.net" <staf-users@lists.sourceforge.net> 
cc

Subject
[staf-users] STAX Job hangs








Hi, 
  
today my STAX-Job hanged up. This is the result of a thread query: 
  
# staf local stax query job 14 thread 1
Response
--------
{
 Thread ID      : 1
 Parent TID     : <None>
 Start Date-Time: 20090730-16:04:03
 Call Stack     : [
   function: EMACH_main (Line: 804, File: /home/EMACH/EMACH-stax.xml, 
Machine: local://local)
   sequence: 12/12 (Line: 865, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   block: main.Test execution (Line: 1076, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/1 (Line: 1077, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1154, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1079, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   sequence: 10/10 (Line: 1080, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   iterate: 1/1 {'Name': '2.1.1 CSTA_base_all', 'Tes... (Line: 1124, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/2 (Line: 1125, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   block: main.Test execution.2:1:1 CSTA_base_all (Line: 1127, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 3/4 (Line: 1128, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   function: EMACH_ProcessTestCases (Line: 1331, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/1 (Line: 1338, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1493, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1340, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   iterate: 1/1 {'Name': 'TC_1', 'SUT': 'PINGUIN', '... (Line: 1342, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/2 (Line: 1343, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   block: main.Test execution.2:1:1 CSTA_base_all.TC_1 / PINGUIN (Line: 
1346, File: /home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/2 (Line: 1347, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1379, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1348, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   sequence: 5/5 (Line: 1349, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   function: EMACH_ProcessInteractions (Line: 1575, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   finally (Line: 1620, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   try (Line: 1582, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   iterate: 1/2 {'Type': 'C', 'ExitPass': '0', 'Outp... (Line: 1583, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/3 (Line: 1584, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   if: Interaction['Type'] == 'F' (Line: 1586, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   function: EMACH_ProcessCaller (Line: 1781, File: 
/home/EMACH/EMACH-stax.xml, Machine: local://local)
   sequence: 1/2 (Line: 1787, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
   finally (Line: 1907, File: /home/EMACH/EMACH-stax.xml, Machine: 
local://local)
 ]
 Condition Stack: []
} 
 
  
The part of the job included in the <finally> starting on line 1907 is: 
  
      <finally>
       <if expr="MyProcessHandle != 0">
         <sequence> 
  
            <log message="True">
             "        Interaction '%s': Signal '%s' sent due to User 
Abort" % \
             (Interaction['Name'], Interaction['AbortSignal']) </log>
           <log message="True">
             "        Waiting for signalled interaction to exit"</log> 
  
            <script>
             while True :
               msg = EMACHTools.STAFsubmit2('LOCAL', 'PROCESS', \
                       'FREE HANDLE %s' % MyProcessHandle)
               if msg == None :
                 break
               RC = msg.split("RC=")[1]
               RC = int(RC.split(",")[0])
               if RC == 5 : # RC 5 is 'Handle does not exist'
                 break
               if RC != 12 : # RC 12 is 'Process Not Complete'
                 FatalMsg = msg
                 break
               time.sleep(0.1)
           </script>
           <call function="'EMACH_CheckError'"/> 
  
            <log message="True">
             "        Signalled interaction is gone"</log> 
  
          </sequence>
       </if>
     </finally> 
  
The code is here to wait for a process termination after user has 
terminated a block. 
I looked for the process the loop wait for, found it to be done and having 
a RC. So I 
released the handle via staf, but that didn't make the job run. 
 
  
# staf local process list
Response
--------
H# Command                       Start Date-Time   End Date-Time     RC
-- ----------------------------- ----------------- ----------------- 
----------
38 /home/EMACH/EMACHRemoteHelper 20090727-21:02:12 20090728-13:57:26 129 
  
# staf local process free handle 38
Response
-------- 
  
# staf local process list
Response
-------- 
  
# 
 
  
Thinking about the call trace: why do I see <finally> as the innermost 
element? I would guess 
the job is looping in <script>, but shouldn't <if> be the innermost then? 
When looking for 
the jvm (ps command) I see its CPU-usage count as fast as real time. 
  
Any help to find the problem is welcome. 
  
Best Regards 
Bodo 
 
 
 
 
 
 
 
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 
30-Day 
trial. Simplify your report design, integration and deployment - and focus 
on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  
http://p.sf.net/sfu/bobj-july_______________________________________________

staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

Reply via email to