Re: [zfs-discuss] Nice chassis for ZFS server

can you guess? Thu, 13 Dec 2007 22:22:54 -0800

> On December 13, 2007 12:51:55 PM -0800 "can you
> guess?" 
> <[EMAIL PROTECTED]> wrote:
> > ...
> >
> >> when the difference between an unrecoverable
> single
> >> bit error is not just
> >> 1 bit but the entire file, or corruption of an
> entire
> >> database row (etc),
> >> those small and infrequent errors are an
> "extremely
> >> big" deal.
> >
> > You are confusing unrecoverable disk errors (which
> are rare but orders of
> > magnitude more common) with otherwise
> *undetectable* errors (the
> > occurrence of which is at most once in petabytes by
> the studies I've
> > seen, rather than once in terabytes), despite my
> attempt to delineate the
> > difference clearly.
> 
> No I'm not.  I know exactly what you are talking
> about.


Then you misspoke in your previous post by referring to "an unrecoverable 
single bit error" rather than to "an undetected single-bit error", which I 
interpreted as a misunderstanding.

> 
> >  Conventional approaches using scrubbing provide as
> > complete protection against unrecoverable disk
> errors as ZFS does:  it's
> > only the far rarer otherwise *undetectable* errors
> that ZFS catches and
> > they don't.
> 
> yes.  far rarer and yet home users still see them.

I'd need to see evidence of that for current hardware.

> 
> that the home user ever sees these extremely rare
> (undetectable) errors
> may have more to do with poor connection (cables,
> etc) to the disk,

Unlikely, since transfers over those connections have been protected by 32-bit 
CRCs since ATA busses went to 33 or 66 MB/sec. (SATA has even stronger 
protection), and SMART tracks the incidence of these errors (which result in 
retries when detected) such that very high error rates should be noticed before 
an error is likely to make it through the 2^-32 probability sieve (for that 
matter, you might well notice the performance degradation due to the frequent 
retries).  I can certainly believe that undetected transfer errors occurred in 
noticeable numbers in older hardware, though:  that's why they introduced the 
CRCs.

 and
> less to do with disk media errors.  enterprise users
> probably have
> better connectivity and see errors due to high i/o.

As I said, at most once in petabytes transferred.  It takes about 5 years for a 
contemporary ATA/SATA disk to transfer 10 PB if it's streaming data at top 
speed, 24/7; doing 8 KB random database accesses (the example that you used) 
flat out, 24/7, it takes about 500 years (though most such drives aren't speced 
for 24/7 operation, especially with such a seek-intensive workload) - and for a 
more realistic random-access database workload it would take many millennia.

So it would take an extremely large (on the order of 1,000 disks) and very 
active database before you'd be likely to see one of these errors within the 
lifetime of the disks involved.

>  just thinking
> ut loud.
> 
> regardless, zfs on non-raid provides better
> protection than zfs on raid
> (well, depending on raid configuration) so just from
> the data integrity
> POV non-raid would generally be preferred.

That was the point I made in my original post here - but *if* the hardware RAID 
is scrubbing its disks the difference in data integrity protection is unlikely 
to be of any real significance and one might reasonably elect to use the 
hardware RAID if it offered any noticeable performance advantage (e.g., by 
providing NVRAM that could expedite synchronous writes).

  the fact
> that the type of
> error being prevented is rare doesn't change that and
> i was further
> arguing that even though it's rare the impact can be
> high so you don't
> want to write it off.

All reliability involves trade-offs, and very seldom are "all other things 
equal".  Extremely low probability risks are often worth taking if it costs 
*anything* to avoid them (but of course are never worth taking if it costs 
*nothing* to avoid them).

- bill
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Nice chassis for ZFS server

Reply via email to