On Saturday, May 18, 2019 11:01:30 P.M. AEST Wols Lists wrote:
> On 17/05/19 06:19, Andrew Udvare wrote:
> >> On May 17, 2019, at 01:14, Adam Carter <adamcart...@gmail.com> wrote:
> >> 
> >> The classic one is where OPS haven't noticed that disks in a RAID array
> >> have died years ago...> 
> > This really happened?
> 
> It's probably more common than you think.
> 
> Can't tell (don't really know) the details, but I was told a story first
> hand about someone who went in to the computer room and asked "what are
> those flashing red lights?"
> 
> Cue massive panic as ops suddenly realised that (a) it was the main
> billing server with terabytes of critical information and (b) the two
> flashing lights meant their terribly expensive raid-6 disk array was now
> running in raid-0!


And the even bigger worry would be that a drive replacement and rebuild, which 
is the whole point of using RAID, may fail. The degraded RAID is working (so 
far) but a rebuild (unless it is *very* file system aware) needs to read EVERY 
BLOCK on the existing disks to rebuild the failed drive/s, and if it 
encounters any failed blocks in unused areas of the RAID it may be unable to 
complete the rebuild.

I have seen this happen in previous positions. Not an easy thing to report to 
management, and the unexpected downtime to rebuild everything from backups 
onto new drives can be extensive (and expensive).

This is why good RAID systems have a background task that regularly reads and 
checks every block of every disk, to avoid undetected errors.

Hot Spares are also a good safety measure, along with monitoring software that 
alerts you when the spares have gone live.


-- 
Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
     http://catb.org/~esr/faqs/smart-questions.html#intro




Reply via email to