One of the key points here is that people seem focused on two types of
errors:

        1. Total drive failure
        2. Bit rot

Traditional RAID solves #1.  Reed-Solomon ECC found in all modern drives
solves #2 for all but the most extreme cases.

The real problem is the rising complexity of firmware in modern drives
and the reality of software bugs.  Misdirected reads and writes and
phantom writes are all real phenomena, and while more prevalent in SATA
and commodity drives, is by no means restricted to the low end.  This
type of corruption happens everwhere, and results in corruption that is
undetectable by drive firmware.   We've seen these failures in SCSI, FC,
and SATA drives.  At a large storage company, a common story related to
us was that they would see approximately one silently corrupted block
per 9 TB of storage (on high-end FC drives).  As mentioned previously,
traditional RAID can detect these failures, but cannot repair the
damaged data.

Also, as pointed out previously, ZFS can detect failures in the entire
data path, up to the point where it reaches main memory (at which point
FMA takes over).  Once again, bad switches, cables, and drivers are a
reality of life.

There will always be a tradeoff between hardware RAID and RAID-Z.  But
saying that RAID-Z provides no discernable benefit over hardware RAID is
a lie, and has been disproven time and again by its ability to detect
and correct otherwise silent data corruption, even on top of hardware
RAID.

You are welcome to argue that people will make a judgement call and
chose performance/familiarity over RAID-Z in the datacenter, but that is
a matter of opinion that can only be settled by watching the evolution
of ZFS deployment over the next five years.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to