On Wed, 5 Mar 2025 18:01:01 -0500, Phil Smith III <li...@akphs.com> wrote: >Rupert Reynolds wrote about taking down a system by compressing a PDS. What >stories can y'all share about times you or someone you worked with took down a >system in a way that made you SMH afterward?
Stupid outage I *fixed*, twice: In the early days of VMLINK, it had a bug that would send it into a tight loop if nicknames contained a circular reference in their :list tags. Our OfficeVision support team managed to put out an untested update with something like: :nick.OFFICE :list.OV :nick.OV :userid.SYSADMIN :addr.399 :list.OFFICE As soon as people started logging on, every single system froze up with 100% CPU usage. I owned the server that managed the disk they put it on, and I was the one to think of a way to corrupt the file by overwriting it in place, so I got to spend the whole day waiting for logons to process and wrecking the file to free up each system, then restoring the old version. I added the NAMES files with a special fixed weekly schedule to our automated test process that staged updates on a test disk for a week before putting them in production. A few months later, they *removed* a nickname from a file, exposing the same problem that it had masked in *another* file. Sadly, the test didn't cover that case, because: * VMLINK uses *all* matching NAMES files when filemode * is specified in the CONTROL file, not just the first in the search, and * the test disk was accessed ahead of the production disk, not in place of it. I think there was some unrelated update to the file that became urgent and I promoted it for them manually in the afternoon. IIRC the bad file was actually on the Y-disk. Luckily, somebody recognized the symptom as soon as one system froze up *and* could get me onto an ID with write access to the Y-disk, and I was able to jump in and corrupt them all within a few minutes and not spend all night at work. After that, I set up a convoluted process using an altered VMLINK CONTROL and renamed NAMES files in test, which has been the bane of my successors for the past two decades. I keep offering to help them undo it, since there are hardly any updates to worry about anymore and that bug is long gone, but they're scared to touch it. ¬R ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN