Sorry about that, I forgot version numbers.  STAX 3.5.0 and STAF 3.4.6.

JVM log... actually yes, there was a null pointer exception.  I also get
random data in there that seems to come from a log message that gets
generated when a testcase has an error - a couple tail commands of logs on
the target system, etc.  Here's the null pointer:

20110812-10:34:43 ERROR: Exception on STAX service request:

list jobs

java.lang.NullPointerException
        at com.ibm.staf.service.stax.STAX.handleList(STAX.java:3172)
        at com.ibm.staf.service.stax.STAX.acceptRequest(STAX.java:1843)
        at com.ibm.staf.service.STAFServiceHelper.callService
(STAFServiceHelper.java:349)

That was back on the 12th though, a week ago...

The message error is:
20110802-09:55:51 Error: STAX Job ID 18. STAXJob$STAFQueueMonitor.run():
Exception unmarshalling queued messages.
Marshalled string:
@SDT/*:15769:@SDT/{:761::13:map-class-map@SDT/{:733::24:STAF/Service/Queue/Entry@SDT/{:694::4:keys@SDT/[8:633:@SDT/{:91::12:display-name@SDT/$S:8:Priority:18:
display-short-name@SDT/$S:1:P:3:key@SDT/$S:8:priority@SDT/{:60::12:display-name@SDT/$S:9:Date-Time:3:key@SDT/$S:9:timestamp@SDT/{:56::12:display-name@SDT/$S:7:Machine:3:key@SDT/
$S:7:machine@SDT/{:101::12:display-name@SDT/$S:11:Handle
Name:18:display-short-name@SDT/$S:4:Name:3:key@SDT/$S:10:handleName@SDT/{:88::12:display-name@SDT/$S:6:Handle:18:display
-short-name@SDT/$S:2:H#:3:key@SDT/$S:6:handle@SDT/{:50::12:display-name@SDT/$S:4:User:3:key@SDT/$S:4:user@SDT/{:50::12:display-name@SDT/$S:4:Type:3:key@SDT/$S:4:type@SDT/{:56::1
2:display-name@SDT/$S:7:Message:3:key@SDT/$S:7:message:4:name@SDT/$S:24:STAF/Service/Queue/Entry@SDT/[1:14983:@SDT/%:14970::24:STAF/Service/Queue/Entry@SDT/$S:1:5@SDT/$S:17:2011
0802-09:55:51@SDT/$S:36:tcp://sjx64galb.sanjose.ibm.com@6500@SDT/$S:12:STAF_Process@SDT/$S:1:1@SDT/$S:16:none://anonymous@SDT/$S:16:STAF/Process/End@SDT/$S:14754:@SDT/{:14741::1
2:endTimestamp@SDT/$S:17:20110802-04:47:33:8:fileList@SDT/[1:14614:@SDT/{:14601::4:data@SDT/$S:14564:OUTPUT
 OF MOUNT
[.... more stuff...]

I've seen those before.  That happened weeks ago, so it doesn't seem
related.

Next time I get the error, I'll run the query job thread one command, I
have not done that before.

Here's the STAX JVM log when it started:
******************************************************************************
*** 20110801-10:30:37 - Start of Log for JVMName: STAX
*** JVM Executable: /usr/bin/java
*** JVM Options   : -Xmx1024m
-cp 
/usr/local/staf/tools/zxJDBC/lib/zxJDBC.jar:/usr/local/staf/lib/JSTAF.jar:/usr/local/staf/samples/demo/STAFDemo.jar:/usr/local/staf/lib/JSTAF.j
ar:/usr/local/staf/samples/demo/STAFDemo.jar:/usr/local/staf/lib/JSTAF.jar:/usr/local/staf/samples/demo/STAFDemo.jar
 -XX:MaxPermSize=512m -XX:PermSize=512m
*** JVM Version   : java version "1.6.0"
Java(TM) SE Runtime Environment (build pxi3260sr8fp1-20100624_01(SR8 FP1))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux x86-32
jvmxi3260sr8ifx-20100609_59383 (JIT enabled, AOT enabled)
J9VM - 20100609_059383
JIT  - r9_20100401_15339ifx2
GC   - 20100308_AA)
JCL  - 20100624_01
*** JVM PID       : 11917
******************************************************************************

I recently added the perm sizes to see if that would help at all, but it
didn't appear to.


                                                                                
                                                              
  From:       Sharon Lucas/Austin/IBM                                           
                                                              
                                                                                
                                                              
  To:         Paul Ellsworth/San Jose/IBM@IBMUS                                 
                                                              
                                                                                
                                                              
  Cc:         staf-users@lists.sourceforge.net                                  
                                                              
                                                                                
                                                              
  Date:       08/19/2011 01:15 PM                                               
                                                              
                                                                                
                                                              
  Subject:    Re: [staf-users] Job hangs, cannot terminate                      
                                                              
                                                                                
                                                              




What version of STAX are you running on this machine (STAF local STAX
VERSION) and what version of STAF are you running on this machine (STAF
local MISC VERSION)?

If  you are not running STAX V3.3.8 or later, you could be running into Bug
#2832883 that was a race problem where a STAX job could hang if a finally
element's  task completes very quickly (see
https://sourceforge.net/tracker/?func=detail&aid=2832883&group_id=33142&atid=407381
 for more info on this bug).

Were there any errors in the STAX JVM Log?

To debug, determine what the last element is on the call stack for the
"hung" STAX job by using the  STAX LIST JOB <JobID> THREADS and STAX QUERY
JOB <JobID> THREAD <ThreadID> commands.  For more information, see section
"Debugging" in the STAX User's Guide at
http://staf.sourceforge.net/current/STAX/staxug.html#Header_Debugging.

--------------------------------------------------------------
Sharon Lucas
IBM Austin,   luc...@us.ibm.com
(512) 286-7313 or Tieline 363-7313





From:   Paul Ellsworth/San Jose/IBM@IBMUS
To:     staf-users@lists.sourceforge.net,
Date:   08/19/2011 02:36 PM
Subject:        [staf-users] Job hangs, cannot terminate



Hello,

Occasionally, our STAX server gets into a position where even if I tell it
to terminate a job, the job ends up still running:
|--+----+-----------------+---------------------------------|
|40|Info|20110819-12:01:11|Terminating block: main          |
|--+----+-----------------+---------------------------------|
|39|Info|20110819-12:01:11|Received TERMINATE BLOCK main    |
|  |    |                 |request                          |
|--+----+-----------------+---------------------------------|



Oddly enough, I can print something (running a python command via a
thread). There is no process or stafcmd going on, and I'm unable to
terminate any of the blocks.

The only way I've found to get rid of the job is to restart STAF on the
server. This is somewhat inconvenient at times.

Usually, when it gets into this state, new jobs also will get hung.

I've checked for JVM memory errors in the STAF JVM logs but I've never seen
them, even though the machine gets quite low on free physical memory at
times (under 100mb, the machine has 4gb).

Any ideas or ways to debug jobs that are in this state?

Thanks!
Paul
------------------------------------------------------------------------------

Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
user administration capabilities and model configuration. Take
the hassle out of deploying and managing Subversion and the
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

<<inline: graycol.gif>>

<<inline: ecblank.gif>>

------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
_______________________________________________
staf-users mailing list
staf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/staf-users

Reply via email to