One of the key points here is that people seem focused on two types of errors:
1. Total drive failure 2. Bit rot Traditional RAID solves #1. Reed-Solomon ECC found in all modern drives solves #2 for all but the most extreme cases. The real problem is the rising complexity of firmware in modern drives and the reality of software bugs. Misdirected reads and writes and phantom writes are all real phenomena, and while more prevalent in SATA and commodity drives, is by no means restricted to the low end. This type of corruption happens everwhere, and results in corruption that is undetectable by drive firmware. We've seen these failures in SCSI, FC, and SATA drives. At a large storage company, a common story related to us was that they would see approximately one silently corrupted block per 9 TB of storage (on high-end FC drives). As mentioned previously, traditional RAID can detect these failures, but cannot repair the damaged data. Also, as pointed out previously, ZFS can detect failures in the entire data path, up to the point where it reaches main memory (at which point FMA takes over). Once again, bad switches, cables, and drivers are a reality of life. There will always be a tradeoff between hardware RAID and RAID-Z. But saying that RAID-Z provides no discernable benefit over hardware RAID is a lie, and has been disproven time and again by its ability to detect and correct otherwise silent data corruption, even on top of hardware RAID. You are welcome to argue that people will make a judgement call and chose performance/familiarity over RAID-Z in the datacenter, but that is a matter of opinion that can only be settled by watching the evolution of ZFS deployment over the next five years. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss