On Jan 30, 2010, at 8:58 AM, matthew patton wrote:

> please forgive the 'stupid' question.

This is not a stupid question, it is actually a good question that is 
frequently asked.

> Aside from having a convenient hash table of checksums to consult and upon 
> detection of a collision knowing we are dealing with a duplicate, why 
> checksum data when the memory bus, PCI-e/x bus, sata/sas bus, and the hard 
> disk itself use Reed-Solomon (or similar) encoding to store/transmit ECC 
> along with the data?
> 
> Where is this "silent data corruption" supposed to occur? And is the 
> probability of preventing/catching an occurance a realistically relevant 
> value?

I find that when people take this argument, they assuming that each component
has perfect implementation and 100% fault coverage.  The real world isn't so 
lucky.

The seminal paper for advocating end-to-end data protection is:
"End-to-End Arguments in Systems Design," by Saltzer, Reed, and 
Clark, MIT.
http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf

Like the best seminal papers, it is clear and concise.
 -- richard


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to