On 2018-11-29 18:39, Seymour J Metz wrote:
> OCO was a gradual infestation. DF/DS, DF/EF (15 % discount on plasma),
> MVS/SE, MVS/SP and, as I recall, DFP, preceded it, and the was still
> microfiche for, e.g., TSO/E, long after OCO had started.

OCO was a lot more gradual than the speed IBM drove.  I wonder if IBM
realized that.

Once upon a time we had 7171's for dial up 3270 sessions.  And TSO users
would get stuck.  Their session would be left so they couldn't log back
on and they couldn't reconnect.

The operators couldn't cancel the left over session, though the system
did say "cancel command accepted".

A dump of the hung TSO session showed the initiator task RB waiting on
the cancel ECB for the TSO session.  This RB wasn't running (even if the
cancel ECB was posted) because it wasn't the topmost RB on the initiator
tasks TCB RB chain.

The topmost RB was an IRB from VTAM waiting for some VTAM event so I
opened a problem with IBM VTAM.

VTAM looked at several dumps of this case but VTAM wanted traces of a
failing case -- Yes, trace all 7171's on a production system for something
which happened possibly several times a week.  And the problem wasn't
know about until until someone tried to logon which could be any amount
of time after the problem hit.

No.

They also couldn't understand why the user wasn't cancelable.

Locally we eventually figured out that varying the LU for the TSO session
inactive (possibly requiring force) would cause the waiting VTAM request
complete follwed by the initator canceling the user (if a cancel command
had been issued). This wasn't really a good solution as it took operator
intervention and/or system programming staff intervention and also our
local policy was to avoid "force" unless it was to avoid an IPL.

The only useful information from VTAM was that the hang was a result of
an incomplete reconnect where the old TSO address space had committed
to connect to a specific inbound session which had likely dropped.

After months of nothing useful from VTAM, I looked deeper.  I didn't need
to "fix", just stop the user from being locked out.

The VTAM module was OCO.  Well, not so great.

However, downstairs we had old microfiche.  How old?  Tww versions back.

Ah, a bit of looking at the fiche and disassembly of the current module
and things become clear.  I can see that this isn't the first time VTAM
has had a hang in this module.  There are multiple VTAM calls with later
added STIMER timeouts around them.  But no timeout on the one I'm stuck on
(of course).

A quick zap to the module to add a STIMER around the VTAM call with the
STIMER exit just issuing ABEND "fixed" the problem.  The non-reconnected
address space terminates and the user can re-logon without any operator
intervention.

Just a disasm wouldn't have helped as much as having the old fiche which
told what the module was doing.  So OCO had an effect more on new sites,
and on IBM as they got less help from the field.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to