Re: Why are sophisticated system-level coding examples not available?

Michael Stein Sat, 17 Nov 2018 21:51:46 -0800

> However, and this is important, anything and everything you do that uses
> authorized services entails exposure of system integrity.  It behooves
> any organization to ensure that its personnel writing such code are
> well-trained and thoroughly knowledgeable about how the system works,
> is designed, and what those exposures are.  It's also perfectly clear
> many organizations, including many ISVs, do not.  This kind of knowledge
> and experience doesn't come from blindly following two-sentence replies
> from who knows who on IBM-MAIN (I know who's who on IBM-MAIN, as many
> of us do, but how would a newbie know?).
>
> You could easily read a paper on the latest techniques in brain surgery.
> I'd be skeptical about your ability to do it, unless you had the prior
> training and experience it requires.
>
> The point is, you need that training and experience, and you also need
> to be able to train and study on your own, as there's very little in the
> way of formal education in our field.  Neither IBM-MAIN nor StackOverflow
> are a substitute for the fundamentals.


There's a large problem that this ignores.  An authorized program needs to be:

a. functional
b. not break system integrity
c. deterministic (in it's functionality)
d. not likely to break with future system changes (by IBM or others)

In MVS, as difficult as using the authorized interfaces seems, it's not
that hard to create something mostly functional.

This is not production code.

Writing real vs mostly functional MVS code considering multiple CPUs,
reasonable recovery, and asynchronous execution is non-trivial and
very hard to test.   Actually you really can't test code like this into
working -- it has to be designed to work.  Otherwise you will find out
that in some environment the timing will cause it fail out of the blue.
Good luck debugging that...

For many activities there are a choice of MVS interfaces which could be
used -- choosing the wrong ones or using one the wrong way may seem to
work but lead to future pain.

Then there is the system integrity problem.  There's no way to test
this -- again it goes back to the design and understanding of MVS and
the hardware.

Good examples are a problem.  IBM's code isn't perfect so some "examples"
aren't either.

Here's an example of imperfect code from MVS (MVS/XA or earlier?):

For an operator modify command MGCR (svc 34 command processing) 
does something like:

  - ENQ on CSCB chain resource
  - scan CSCB chain for target job
  - queue CIB (modify command) to target CSCB
  - cross memory post CSCB communication ECB
  - DEQ CSCB chain resource

The system wait service allows a problem program to wait on it's
communication ECB (even though it's not in user key).  QEDIT (again SVC
34) is the interface for a problem program to remove processed CIBs.
It does something like this:

  - ENQ on CSCB chain resource
  - find CIB on chain 
    - if found, remove it, unpost ECB
  - DEQ CSCB chain resource

There's a bug here (probably fixed, this was a long time ago) as it's
possible to wind up with a posted communications ECB but no CIBs chained
to the CSCB.

A problem program couldn't escape from this condition as it couldn't
clear the ECB and QEDIT won't clear it either.

The likely cause of the code problem was a result of copying the MVT
MGCR/QEDIT code into MVS.  MVT didn't have address spaces, nor SRBs.
Very few changes were likely made, the major one being changing the post
to a cross memory post.  Doesn't seem like a big change...

This is a huge change.  Two successive modify commands can both be
processed by the problem program before the 2nd cross memory post
completes since the cross memory post can happen after MGCR DEQs the
CSCB resource.  So the problem program has used QEDIT to remove both CIBS
(with the clearing of the ECB) and then the 2nd cross memory post occurs
leaving the ECB posted but no CIBs are chained.  Now QEDIT won't clear
the ECB...

I had this happen.  The UCLA/Mail MSERV STC used a modify command
to tell a WYLBUR user they had mail waiting.  WYLBUR waited on its
communication ECB and woke when it was posted to process the command.
This case resulted in WYLBUR looping 100% CPU since it kept waiting on
the communication ECB which was still posted without any commands
queued.

As far as I remember, I reported this to IBM and got a "working as
designed".  As a minimal local fix I changed WYLBUR to detect the ECB
being posted without any CIBs queued and had it switch to key zero to
clear the ECB.  I could do this since WYLBUR was already authorized but
normally ran problem key.

I later heard that IBM HSM had a similar problem so IBM has likely
fixed it.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why are sophisticated system-level coding examples not available?

Reply via email to