Hi Sharon, Thanks, I will try the debugging. The machine can definitely communicate via STAF, as this particular problem tends to come up after the job has already started and has run for a little while (e.g., by the time we install anything, we've already run a variety of commands on the remote/test machine). I'll try to reproduce it with those tracepoints, sounds like that will help narrow it down.
Thanks, Paul From: Sharon Lucas/Austin/IBM To: Paul Ellsworth/San Jose/IBM@IBMUS Cc: staf-users@lists.sourceforge.net Date: 02/28/2011 12:04 PM Subject: Re: [staf-users] hung processes and timers Hi Paul, As you read in the STAX UG, when a timer expires, it attempts to stop any processes contained within the timer element that are still running (this is documented in the <timer> section in the STAX User's Guide at http://staf.sourceforge.net/current/STAX/staxug.html#Header_Timer. STAX submits a STOP request to the PROCESS service to stop a process. Note that if the STAF PROCESS STOP request is not able to stop the process using the specified stop method (default is SIGKILLALL), then the underlying process won't be terminated. However, even if STAF cannot actually stop the underlying process on the machine, the <process> element itself will not "hang" if the PROCESS STOP request doesn't work. It should continue on. Note that the cause of a STAX <process> element "hanging" is usually because the STAX job did not receive the STAF/Process/End message and its handle's queue. The STAF/STAX FAQ talks about this problem at http://staf.sourceforge.net/current/STAFFAQ.htm#d0e2072 in sections "4.1.2. Why is STAX still showing a process as running, even though it has completed?" and "3.1.4 Why can't my STAF machines communicate? ". Follow the instructions in section 3.1.4 to see if the machine where the process is running can successfully submit STAF requests to the STAX service machine using the host name of the STAX service machine. Note that to send a process completion message to the STAX service machine, the process machine submits a QUEUE request to the STAF QUEUE service to send a STAF/Process/End message type to the STAX job handle's queue. This message is sent when a process completes normally and when a process is stopped by a PROCESS STOP method. If this QUEUE request fails (e.g. with RC 16 etc), then the STAX service never receives the message that the process is no longer running. So, if the process machine cannot communicate via STAF to the STAX service machine, then you need to fix this problem. To further debug this problem, you can turn on STAF tracing for tracepoints "RemoteRequests ServiceRequest ServiceResult" and for the QUEUE service on the machine where the process is running so that you can see the submission of the STAF/Process/End message to the STAX job handle's queue and see if this QUEUE request was successful or failed (e.g. with RC 16 if it cannot communicate to the STAX service machine using the STAX service machine's host name). -------------------------------------------------------------- Sharon Lucas IBM Austin, luc...@us.ibm.com (512) 286-7313 or Tieline 363-7313 From: Paul Ellsworth/San Jose/IBM@IBMUS To: staf-users@lists.sourceforge.net Date: 02/28/2011 01:30 PM Subject: Re: [staf-users] hung processes and timers I thought I had already read all that the STAX UG said about this, but apparently not. It looks like when a timer pops it's supposed to kill any processes it envelops ... So perhaps the problem I am having is the nature of the "error." The process actually does not exist on the machine anymore, but STAF/STAX is still waiting for it. Inactive hide details for Paul Ellsworth---02/28/2011 11:24:57 AM---We use STAF/STAX for testing, so this particular problem isPaul Ellsworth---02/28/2011 11:24:57 AM---We use STAF/STAX for testing, so this particular problem is in a STAX test "server" -> STAF "client" From: Paul Ellsworth/San Jose/IBM@IBMUS To: staf-users@lists.sourceforge.net Date: 02/28/2011 11:24 AM Subject: [staf-users] hung processes and timers We use STAF/STAX for testing, so this particular problem is in a STAX test "server" -> STAF "client" situation. Occasionally, I get a hung process... what seems to happen is that the command returns, but for whatever reason, the STAX machine doesn't get the return. Normally, this wouldn't be too difficult; just put a timer on the process (for example, we install our product on AIX using installp and it shouldn't take more than a minute or two at most). However, it seems that timers do not ... interrupt, I guess, a process? Quick code example (on-the-fly, not copy/paste from actual XML): <timer duration="'1m'"> <sequence> <process name="'cat /etc/filesystems'"> <location>machineIP</location> <command mode="'shell'">"cat /etc/filesystems"</command> <stderr mode="'stdout'" /> <returnstdout /> </process> <log level="'trace'">"Got /etc/filesystems:\n%s" % STAXResult[0][1]</log> </sequence> </timer> <if expr="RC != 0"> <log level="'error'">"ERROR: trying to cat /etc/filesystems took longer than 1m!"</log> </if> I guess my question is: is it true that a timer cannot interrupt a "hung" process and force STAX to move on, or am I doing something wrong :) And if it is true, is that even possible, or should I not open a feature request. Thanks! Paul E. ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ staf-users mailing list staf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/staf-users ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ staf-users mailing list staf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/staf-users
<<inline: graycol.gif>>
<<inline: ecblank.gif>>
------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________ staf-users mailing list staf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/staf-users