On 1/27/06, David Gaudine <[EMAIL PROTECTED]> wrote: > I have to set up a system that is totally reliable w.r.t. data > integrity. That is, if a disk (or anything else) fails, it's OK if the > system is down for a few hours, but when it comes back up it has to be > exactly as it was, i.e. I can't restore from the previous day's backup. > The obvious solution is to use RAID level 1.
Hardly. RAID levels do not guarantee filesystem integrity. In fact, raid has _nothing_ to do with it. RAID is just one aspect of high-availability subsystems, and the most common. If you lose a harddrive, the system keeps on going. This is all raid gets you with respect to reliability. Now, if the machine were to crash from some form of hardware failure (cpu, ram, cosmic radiation, etc.) or software bug (ie. filesystem), raid has nothing to do with maintaining the validity of the data written to and stored on the drives. If a cpu overheats and starts writing crap to the filesystems, the raid subsystem is just going to write the same crap to all drives, parity drives included. RAID is also NOT a substitue for regular, proper backups. Don't even think about implementing a mission critical system without some form of backup system. A thorough and tested disaster recovery plan is much preferable. What you are looking for is, at the minimum, some form of fail-over clustering system. Two or more machines essentially working in parallel. If one dies, corrupts itself, etc., it can be removed from the cluster without loss of data or service. -- Noah Dain "Single failures can occur for a variety of reasons that have nothing to do with a hardware defect, such as cosmic radiation ..." - IBM Thinkpad R40 maintenance manual, page 25