Hi Sharon,

Thanks, I will try the debugging.  The machine can definitely communicate
via STAF, as this particular problem tends to come up after the job has
already started and has run for a little while (e.g., by the time we
install anything, we've already run a variety of commands on the
remote/test machine).  I'll try to reproduce it with those tracepoints,
sounds like that will help narrow it down.

Thanks,
Paul


                                                                                
                                                              
  From:       Sharon Lucas/Austin/IBM                                           
                                                              
                                                                                
                                                              
  To:         Paul Ellsworth/San Jose/IBM@IBMUS                                 
                                                              
                                                                                
                                                              
  Cc:         staf-users@lists.sourceforge.net                                  
                                                              
                                                                                
                                                              
  Date:       02/28/2011 12:04 PM                                               
                                                              
                                                                                
                                                              
  Subject:    Re: [staf-users] hung processes and timers                        
                                                              
                                                                                
                                                              




Hi Paul,

As you read in the STAX UG, when a timer expires, it attempts to stop any
processes contained within the timer element that are still running (this
is documented in the <timer> section in the STAX User's Guide at
http://staf.sourceforge.net/current/STAX/staxug.html#Header_Timer.  STAX
submits a STOP request to the PROCESS service to stop a process.  Note that
if the STAF PROCESS STOP request is not able to stop the process using the
specified stop method (default is SIGKILLALL), then the underlying process
won't be terminated.  However, even if STAF cannot actually stop the
underlying process on the machine, the <process> element itself will not
"hang" if the PROCESS STOP request doesn't work.  It should continue on.
Note that the cause of a STAX <process> element "hanging" is usually
because the STAX job did not receive the STAF/Process/End message and its
handle's queue.  The STAF/STAX FAQ talks about this problem at
http://staf.sourceforge.net/current/STAFFAQ.htm#d0e2072 in sections "4.1.2.
Why is STAX still showing a process as running, even though it has
completed?" and "3.1.4 Why can't my STAF machines communicate? ".  Follow
the instructions in section 3.1.4 to see if the machine where the process
is running can successfully submit STAF requests to the STAX service
machine using the host name of the STAX service machine.  Note that to send
a process completion message to the STAX service machine, the process
machine submits a QUEUE request to the STAF QUEUE service to send a
STAF/Process/End message type to the STAX job handle's queue.  This message
is sent when a process completes normally and when a process is stopped by
a PROCESS STOP method.  If this QUEUE request fails (e.g. with RC 16 etc),
then the STAX service never receives the message that the process is no
longer running.  So, if the process machine cannot communicate via STAF to
the STAX service machine, then you need to fix this problem.

To further debug this problem, you can turn on STAF tracing for tracepoints
"RemoteRequests ServiceRequest ServiceResult" and for the QUEUE service on
the machine where the process is running so that you can see the submission
of the STAF/Process/End message to the STAX job handle's queue and see if
this QUEUE request was successful or failed (e.g. with RC 16 if it cannot
communicate to the STAX service machine using the STAX service machine's
host name).

--------------------------------------------------------------
Sharon Lucas
IBM Austin,   luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313





From:   Paul Ellsworth/San Jose/IBM@IBMUS
To:     staf-users@lists.sourceforge.net
Date:   02/28/2011 01:30 PM
Subject:        Re: [staf-users] hung processes and timers



I thought I had already read all that the STAX UG said about this, but
apparently not. It looks like when a timer pops it's supposed to kill any
processes it envelops ...

So perhaps the problem I am having is the nature of the "error." The
process actually does not exist on the machine anymore, but STAF/STAX is
still waiting for it.

Inactive hide details for Paul Ellsworth---02/28/2011 11:24:57 AM---We use
STAF/STAX for testing, so this particular problem isPaul
Ellsworth---02/28/2011 11:24:57 AM---We use STAF/STAX for testing, so this
particular problem is in a STAX test "server" -> STAF "client"
                                                                       
                                                                       
 From:                   Paul Ellsworth/San Jose/IBM@IBMUS             
                                                                       
                                                                       
 To:                     staf-users@lists.sourceforge.net              
                                                                       
                                                                       
 Date:                   02/28/2011 11:24 AM                           
                                                                       
                                                                       
 Subject:                [staf-users] hung processes and timers        
                                                                       





We use STAF/STAX for testing, so this particular problem is in a STAX test
"server" -> STAF "client" situation.

Occasionally, I get a hung process... what seems to happen is that the
command returns, but for whatever reason, the STAX machine doesn't get the
return. Normally, this wouldn't be too difficult; just put a timer on the
process (for example, we install our product on AIX using installp and it
shouldn't take more than a minute or two at most).

However, it seems that timers do not ... interrupt, I guess, a process?
Quick code example (on-the-fly, not copy/paste from actual XML):

<timer duration="'1m'">
<sequence>
<process name="'cat /etc/filesystems'">
<location>machineIP</location>
<command mode="'shell'">"cat /etc/filesystems"</command>
<stderr mode="'stdout'" />
<returnstdout />
</process>
<log level="'trace'">"Got /etc/filesystems:\n%s" % STAXResult[0][1]</log>
</sequence>
</timer>
<if expr="RC != 0">
<log level="'error'">"ERROR: trying to cat /etc/filesystems took longer
than 1m!"</log>
</if>

I guess my question is: is it true that a timer cannot interrupt a "hung"
process and force STAX to move on, or am I doing something wrong :)

And if it is true, is that even possible, or should I not open a feature
request.

Thanks!
Paul E.
------------------------------------------------------------------------------

Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT
data
generated by your applications, servers and devices whether physical,
virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

------------------------------------------------------------------------------

Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT
data
generated by your applications, servers and devices whether physical,
virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

<<inline: graycol.gif>>

<<inline: ecblank.gif>>

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

Reply via email to