Hi Peter, indeed I was not trying to say that RAID1 is an insurance against disks going bad. It is only the first defense against sudden and unpredictable failure (and has saved us a couple of times). On the contrary, we regularly inspect /var/log/messages since (on RHELx) this has mdadm-related messages and also shows non-RAID disk and other errors. We also regularly run SMART tests, run mdadm RAID checks, look at webserver error logs, read security logs ... I didn't want to expand on this since it doesn't solve Vaheh's problem ...
Best, Kay Am 27. November 2019 17:53:37 MEZ schrieb Peter Keller <pkel...@globalphasing.com>: Dear all, On 27/11/2019 14:03, Kay Diederichs wrote: > As an example, by default in my lab we have the operating system on mdadm > RAID1 which consists of two disks that mirror each other. If one of the disks > fails, typically we only notice this when inspecting the system log files. This won't help Vaheh, but I highly recommend configuring notifications of changes in the state of the RAID in this kind of setup: (1) configure /etc/mdadm.conf with (as a minumum) the MAILADDR keyword (see 'man mdadm.conf' for a complete list of keywords). Alternatively, use the PROGRAM keyword and write a script that uses something like 'wall' to notify users. (2) test that notifications get through to their intended destination with 'mdadm --monitor --test'. If the local MTA (postfix, sendmail,...) isn't set up correctly, you may need to do that too. (3) make sure that 'mdadm --monitor --scan' is running. Depending on distro, this will be done with the usual service enable and startup commands, something like systemctl enable --now mdmonitor or chkconfig --add mdmonitor service mdmonitor start And yes, you've guessed it, we got bitten once by a software RAID with multiple disk failures, and we only noticed it when an application complained that it couldn't write to a file ;-) Regards, Peter. -- Peter Keller Tel.: +44 (0)1223 353033 Global Phasing Ltd., Fax.: +44 (0)1223 366889 Sheraton House, Castle Park, Cambridge CB3 0AX United Kingdom ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
smime.p7s
Description: S/MIME Cryptographic Signature