Catching a Phantom

Chase, John Thu, 19 Jan 2012 08:26:07 -0800

NOTE:  X-posted to CICS-L and IBM-MAIN

Environment:  z/OS 1.11, MQ 7.0, CICS TS 4.2.


We have a batch program that is run once or twice a day, that writes
between 200 - 600 (on average) messages to an MQ queue.  That MQ queue
in turn is drained by a CICS region, with each message spawning a new
instance of a specific transaction.  The MQ queue is defined with
"trigger on first message", so that implies that the message arrival
rate is slower that the CICS message processing rate.  The CICS program
writes a message to the CICS log when it reads and "validates" the first
message, and again after processing the last message.  So far, so
good....

Occasionally, that MQ-draining transaction will be spawned nearly 1,000
times per second, and apparently "do nothing":  No "I'm starting"
message, no "I'm finished" message, no ABEND, no nothing.  SMF records
for these "phantom" transactions, formatted by CA-Sysview, show that
each instance starts the program, which then issues two MQOPENs (not
back-to-back), two MQCLOSEs (again, not back-to-back), one MQINQ, one
MQGET and one SQL SELECT, and then it ends "without comment".  Average
task lifetime for these "phantoms" is around 8 milliseconds.

This "phantom transaction" event has occurred only four or five times
during the past year or so, with durations between about 10 seconds and
several minutes, but we believe we've isolated its occurrence to the
processing of this batch job's MQ messages.  What we'd like to do is
capture a SDUMP of the CICS region while these "phantoms" are present,
but by the time we (humans) notice it, the "phantoms" are gone; no time
to issue the CEMT PERFORM SNAPDUMP command "manually".  We've configured
Sysview to send an email alert based on the total number of the
MQ-draining transaction exceeding a "relatively ridiculously high"
number during a monitor interval, but by the time we see the
notification it's "too late".  We also write a message to the console
(syslog) when the number of these transactions exceeds the threshold,
and are working on a console automation rule to capture that message and
issue the SNAPDUMP command to the CICS region, but that will likely
occur "too late" as well due to Sysview's monitor interval.

Is there a way, preferably without custom programming, to configure
Sysview to issue the console message based on transaction-attach rate
per second?  Likely this would require being able to dynamically change
Sysview's monitor interval for this CICS region. Is that do-able?
Alternatively, we have available the IBM Application Performance
Analyzer (APA), but I'm not aware if that tool can be configured to emit
an "excess transaction-attach rate" message that console automation
could act upon.  And I can't see any way to set up a SLIP trap to detect
something like this.

Any other ideas would be appreciated, too.

TIA,

    -jc-

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Catching a Phantom

Reply via email to