On Saturday, May 18, 2019 11:01:30 P.M. AEST Wols Lists wrote: > On 17/05/19 06:19, Andrew Udvare wrote: > >> On May 17, 2019, at 01:14, Adam Carter <adamcart...@gmail.com> wrote: > >> > >> The classic one is where OPS haven't noticed that disks in a RAID array > >> have died years ago...> > > This really happened? > > It's probably more common than you think. > > Can't tell (don't really know) the details, but I was told a story first > hand about someone who went in to the computer room and asked "what are > those flashing red lights?" > > Cue massive panic as ops suddenly realised that (a) it was the main > billing server with terabytes of critical information and (b) the two > flashing lights meant their terribly expensive raid-6 disk array was now > running in raid-0!
And the even bigger worry would be that a drive replacement and rebuild, which is the whole point of using RAID, may fail. The degraded RAID is working (so far) but a rebuild (unless it is *very* file system aware) needs to read EVERY BLOCK on the existing disks to rebuild the failed drive/s, and if it encounters any failed blocks in unused areas of the RAID it may be unable to complete the rebuild. I have seen this happen in previous positions. Not an easy thing to report to management, and the unexpected downtime to rebuild everything from backups onto new drives can be extensive (and expensive). This is why good RAID systems have a background task that regularly reads and checks every block of every disk, to avoid undetected errors. Hot Spares are also a good safety measure, along with monitoring software that alerts you when the spares have gone live. -- Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ Asking for technical help in newsgroups? Read this first: http://catb.org/~esr/faqs/smart-questions.html#intro