On Wed, 5 Mar 2025 18:01:01 -0500, Phil Smith III <li...@akphs.com> wrote:
>Rupert Reynolds wrote about taking down a system by compressing a PDS. What 
>stories can y'all share about times you or someone you worked with took down a 
>system in a way that made you SMH afterward?

Stupid outage I *fixed*, twice:

In the early days of VMLINK, it had a bug that would send it into a tight loop 
if nicknames contained a circular reference in their :list tags.  Our 
OfficeVision support team managed to put out an untested update with something 
like:

  :nick.OFFICE
    :list.OV

  :nick.OV
    :userid.SYSADMIN
    :addr.399
    :list.OFFICE

As soon as people started logging on, every single system froze up with 100% 
CPU usage.  I owned the server that managed the disk they put it on, and I was 
the one to think of a way to corrupt the file by overwriting it in place, so I 
got to spend the whole day waiting for logons to process and wrecking the file 
to free up each system, then restoring the old version.

I added the NAMES files with a special fixed weekly schedule to our automated 
test process that staged updates on a test disk for a week before putting them 
in production.  A few months later, they *removed* a nickname from a file, 
exposing the same problem that it had masked in *another* file.  Sadly, the 
test didn't cover that case, because:
  * VMLINK uses *all* matching NAMES files when filemode * is specified
    in the CONTROL file, not just the first in the search, and         
  * the test disk was accessed ahead of the production disk, not in place of it.
I think there was some unrelated update to the file that became urgent and I 
promoted it for them manually in the afternoon.  IIRC the bad file was actually 
on the Y-disk.  Luckily, somebody recognized the symptom as soon as one system 
froze up *and* could get me onto an ID with write access to the Y-disk, and I 
was able to jump in and corrupt them all within a few minutes and not spend all 
night at work.

After that, I set up a convoluted process using an altered VMLINK CONTROL and 
renamed NAMES files in test, which has been the bane of my successors for the 
past two decades.  I keep offering to help them undo it, since there are hardly 
any updates to worry about anymore and that bug is long gone, but they're 
scared to touch it.

¬R

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to