folks, please, chatting on - don't make me stop you, we are all open folks.
[but darn] ok, thank you much for the anticipation for something actually useful, here is another thing I shared with MS Storage but not with you folks yet -- we win with real advantages, not lies, not scales, but only real knowhow. cheers, z ----- Original Message ----- From: "JZ" <j...@excelsioritsolutions.com> To: "A Darren Dunham" <ddun...@taos.com>; <zfs-discuss@opensolaris.org> Sent: Wednesday, January 14, 2009 7:38 PM Subject: Re: [zfs-discuss] What are the usual suspects in data errors? > darn, Darren, learning fast! > > best, > z > > > ----- Original Message ----- > From: "A Darren Dunham" <ddun...@taos.com> > To: <zfs-discuss@opensolaris.org> > Sent: Wednesday, January 14, 2009 6:15 PM > Subject: Re: [zfs-discuss] What are the usual suspects in data errors? > > >> On Wed, Jan 14, 2009 at 04:39:03PM -0600, Gary Mills wrote: >>> I realize that any error can occur in a storage subsystem, but most >>> of these have an extremely low probability. I'm interested in this >>> discussion in only those that do occur occasionally, and that are >>> not catastrophic. >> >> What level is "extremely low" here? >> >>> Many of those components have their own error checking. Some have >>> error correction. For example, parity checking is done on a SCSI bus, >>> unless it's specifically disabled. Do SATA and PATA connections also >>> do error checking? Disk sector I/O uses CRC error checking and >>> correction. Memory buffers would often be protected by parity memory. >>> Is there any more that I've missed? >> >> Reports suggest that bugs in drive firmware could account for errors at >> a level that is not insignificant. >> >>> What can go wrong with the disk controller? A simple seek to the >>> wrong track is not a problem because the track number is encoded on >>> the platter. The controller will simply recalibrate the mechanism and >>> retry the seek. If it computes the wrong sector, that would be a >>> problem. Does this happen with any frequency? >> >> Netapp documents certain rewrite bugs that they've specifically seen. I >> would imagine they have good data on the frequency that they see it in >> the field. >> >>> In this case, ZFS >>> would detect a checksum error and obtain the data from its redundant >>> copy. >> >> Correct. >> >>> A logic error in ZFS might result in incorrect metadata being written >>> with valid checksum. In this case, ZFS might panic on import or might >>> correct the error. How is this sort of error prevented? >> >> It's very difficult to protect yourself from software bugs with the same >> piece of software. You can create assertions that are hopefully simpler >> and less prone to errors, but they will not catch all bugs. >> >>> Some errors might result from a loss of power if some ZFS data was >>> written to a disk cache but never was written to the disk platter. >>> Again, ZFS might panic on import or might correct the error. How is >>> this sort of error prevented? >> >> ZFS uses a multi-stage commit. It relies on the "disk" responding to a >> request to flush caches to the disk. If that assumption is correct, >> then there is no problem in general with power issues. The disk is >> consistent both before and after the cache is flushed. >> >> -- >> Darren >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss