A very good point. In all of our storage products that we produce (dasd and vtl) we use Raid 6 which can tolerate 2 drive failures and always have at least 1 hot spare that is inserted into the array automatically. Additionally, our online diagnostics send out an alert email indicating a drive failure and which drive it is. So the probability of a failure escalating to a complete system failure is extremely small.
Ken Kenneth A. Bloom CEO Avenir Technologies Inc /d/b/a Visara International 203-984-2235 bl...@visara.com www.visara.com > On Jul 7, 2020, at 9:58 AM, John McKown <john.archie.mck...@gmail.com> wrote: > > On Tue, Jul 7, 2020 at 8:19 AM Jackson, Rob <rwjack...@firsthorizon.com> > wrote: > >> Fun little note on RAID: it is fallible. The last Sunday of October 2016 >> I got a call bright and early because our VTS (TS7740) had shut down. >> Turns out we had a "cache" HDD failure at around 4 AM, and then a second >> one failed at around 7 AM, before the first one had been rebuilt on a >> spare. RAID-5 could not accommodate it. Because of IBM politics, we had >> no tape until Monday at 16:00. I am ashamed to say that I sort of took >> tape for granted. It was astonishing how much of our processing depended >> on it. >> > > We had a similar problem occurs, long ago, with an actual SAN dasd array > (for Windows, not MVS). Weekend backup to physical tape aborted on a > Sunday. The Windows admin said "No problem, it's a RAID-5 array, I can fix > it Monday morning." A few hours later, a disk in the array failed. No > problem, right? Unfortunately, while the CE was on his way in to replace > it, a second disk failed. The array was destroyed. Management said to > repair it and reload from the Sunday backup and we'd be good. When the > admin admitted that the backup failed and he didn't go in, he was > immediately terminated. Now, what are the chances that 2 drives in an array > will fail within hours? I don't know, but one thing many don't think about > with a "new array" is that all the drives are likely the same age and will > start to fail (if they are) about the same time. > > IMO, given my paranoia, I firmly believe that the disks in an array should > be replaced on a scheduled basis. I also believe in dual tape copies of > important tapes. And also, that tapes in "long term" retention (we have > tapes which have been at Iron Mountain for over 10 years!) should be > brought in and the data copied to a new (not reused) tape annually. Of > course, the bean counters will have an apoplectic fit and scream about how > much it costs to do this. They only understand cost, not value. I consider > them the bane of existence. Likely auditors, they take on too much > authority. Or as I have heard: Fire is a good servant but a terrible > master. > > > >> >> R.S. is spot on: make backups. Because of the trauma from this one >> event, we now have a three-way VTS grid, synchronous-mirrored SANs, and two >> mainframes on the floor. >> >> First Horizon Bank >> Mainframe Technical Support >> >> > -- > People in sleeping bags are the soft tacos of the bear world. > Maranatha! <>< > John McKown > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN