Re: [zfs-discuss] Response to phantom dd-b post

can you guess? Sat, 10 Nov 2007 22:42:10 -0800

> can you guess? wrote:

...


> > Most of the balance of your post isn't addressed in
> any detail because it carefully avoids the
> fundamental issues that I raised:
> >   
> 
> Not true; and by selective quoting you have removed
> my specific 
> responses to most of these issues.

While I'm naturally reluctant to call you an outright liar, David, you have 
hardly so far in this discussion impressed me as someone whose presentation is 
so well-organized and responsive to specific points that I can easily assume 
that I simply missed those responses.  If you happen to have a copy of that 
earlier post, I'd like to see it resubmitted (without modification).

> > 1.  How much visible damage does a single-bit error
> actually do to the kind of large photographic (e.g.,
> RAW) file you are describing?  If it trashes the rest
> of the file, as you state is the case with jpeg, then
> you might have a point (though you'd still have to
> address my second issue below), but if it results in
> a virtually invisible blemish they you most certainly
> don't.
> >   
> 
> I addressed this quite specifically, for two cases
> (compressed raw vs. 
> uncompressed raw) with different results.

Then please do so where we all can see it.

> 
> > 2.  If you actually care about your data, you'd
> have to be a fool to entrust it to *any* single copy,
> regardless of medium.  And once you've got more than
> one copy, then you're protected (at the cost of very
> minor redundancy restoration effort in the unlikely
> event that any problem occurs) against the loss of
> any one copy due to a minor error - the only loss of
> non-negligible likelihood that ZFS protects against
> better than other file systems.
> >   
> 
> You have to detect the problem first.   ZFS is in a
> much better position 
> to detect the problem due to block checksums.

Bulls***, to quote another poster here who has since been strangely quiet.  The 
vast majority of what ZFS can detect (save for *extremely* rare undetectable 
bit-rot and for real hardware (path-related) errors that studies like CERN's 
have found to be very rare - and you have yet to provide even anecdotal 
evidence to the contrary) can also be detected by scrubbing, and it's arguably 
a lot easier to apply brute-force scrubbing (e.g., by scheduling a job that 
periodically copies your data to the null device if your system does not 
otherwise support the mechanism) than to switch your file system.

> 
> > If you're relying upon RAID to provide the multiple
> copies - though this would also arguably be foolish,
> if only due to the potential for trashing all the
> copies simultaneously - you'd probably want to
> schedule occasional scrubs, just in case you lost a
> disk.  But using RAID as a substitute for off-line
> redundancy is hardly suitable in the kind of
> archiving situations that you describe - and
> therefore ZFS has absolutely nothing of value to
> offer there:  you should be using off-line copies,
> and occasionally checking all copies for readability
> (e.g., by copying them to the null device - again,
> something you could do for your on-line copy with a
> cron job and which you should do for your off-line
> copy/copies once in a while as well.
> >   
> 
> You have to detect the problem first.

And I just described how to above - in a manner that also handles the off-line 
storage that you *should* be using for archival purposes (where ZFS scrubbing 
is useless).

  ZFS block
> checksums will detect 
> problems that a simple read-only pass through most
> other filesystems 
> will not detect. 

The only problems that ZFS will detect that a simple read-through pass will not 
are those that I just enumerated above:  *extremely* rare undetectable bit-rot 
and real hardware (path-related) errors that studies like CERN's have found to 
be very rare (like, none in the TB-sized installation under discussion here).

> 
> > In sum, your support of ZFS in this specific area
> seems very much knee-jerk in nature rather than
> carefully thought out - exactly the kind of
> 'over-hyping' that I pointed out in my first post in
> this thread.
> >   
> 
> And your opposition to ZFS appears knee-jerk and
> irrational, from this 
> end.  But telling you that will have no beneficial
> effect, any more than 
> what you just told me about how my opinions appear to
> you.  Couldn't we 
> leave personalities out of this, in future?

When someone appears to be arguing irrationally, it's at least worth trying to 
straighten him out.  But I'll stop - *if* you start addressing the very 
specific and quantitative issues that you've been so assiduously skirting until 
now.

> 
> > ...
> >
> >   
> >>>> And yet I know many people who have lost data in
> >>>>         
> >> ways
> >>     
> >>>> that ZFS would
> >>>> have prevented.
> >>>>     
> >>>>         
> >>> Specifics would be helpful here. How many? Can
> they
> >>>       
> >> reasonably be characterized as consumers (I'll
> remind
> >> you once more: *that's* the subject to which your
> >> comments purport to be responding)? Can the data
> loss
> >> reasonably be characterized as significant (to
> >> 'consumers')? Were the causes hardware problems
> that
> >> could reasonably have been avoided ('bad cables'
> >> might translate to 'improperly inserted, overly
> long,
> >> or severely kinked cables', for example - and such
> a
> >> poorly-constructed system will tend to have other
> >> problems that ZFS cannot address)?
> >>     
> >>>   
> >>>       
> >> "Reasonably avoided" is irrelevant; they *weren't*
> >> avoided.
> >>     
> >
> > While that observation has at least some merit,
> I'll observe that you jumped directly to the last of
> my questions above while carefully ignoring the three
> questions that preceded it.
> >
> >   
> 
> You'll notice (since you responded) that I got to at
> least one of them 
> by the end of the message.

The only specific example you gave later was loss of an entire disk - something 
which ZFS does not handle significantly better than conventional RAID (for 
volatile data, and including scrubbing) and is a poorer choice for handling 
than off-line copies (for archival data).

The statement of yours to which I was responding was "And yet I know many 
people who have lost data in ways that ZFS would have prevented."  I suggest it 
was not unreasonable for me to have interpreted that sentence as including by 
implication " ... and that conventional storage arrangements would *not* have 
prevented", since otherwise it's kind of pointless.  If interpreted that way, 
your later example of loss of an entire disk does not qualify as any kind of 
answer to any of the questions that I posed (though it might have had at least 
tangential relevance if you had stated that the data loss had been not because 
the only copy disappeared but because it occurred on a RAID which could not be 
successfully rebuilt due to read errors on the surviving disks - even here, 
though, the question then becomes whether it was still possible to copy most of 
the data off the degraded array, which it usually should be).

  And you cut out of your
> quotes what I 
> specifically said about cables; that's cheating.

Not at all:  there was no need to repeat it (unless you think people generally 
don't bother reading your own posts and would like others to try to help remedy 
that), because that was the one area in which you actually responded to what I 
had asked.

...
  
> >> Nearly everybody I can think of who's used a
> computer
> >> for more than a 
> >> couple of years has stories of stuff they've lost.
> >>     
> >
> > Of course they have - and usually in ways that ZFS
> would have been no help whatsoever in mitigating.
> >   
> 
> ZFS will help detect problems with marginal drives,
> cables, power 
> supplies, controllers, memory, and motherboards.

None of which the CERN study found in significant numbers (save for their RAID 
controller's possible failure to report disk timeouts, but consumer systems - 
once again, the subject under discussion here - don't use RAID controllers).

  The
> block checksumming 
> will show corruptions earlier on, and less
> ambiguously, giving you a 
> warning that there's a problem to find and fix that
> you mostly didn't 
> get before.

And mostly won't get afterward either, because the incidence of such errors 
(especially after you eliminate those which conventional scrubbing will expose) 
is so low.  You just can't seem to grasp the fact that while this kind of error 
does occur, it occurs in such insignificant numbers that consumers *just won't 
care*.  So while ZFS is indeed 'better' in this area, it's just not 
*sufficiently* better to make any difference to consumers (sure, once in a 
while one may get hit with something that ZFS could have prevented, but for 
every such occurrence there'll be hundreds or thousands of comparable problems 
that ZFS couldn't help at all with).

This is getting sufficiently tedious that I'm done with this part of the 
discussion unless you manage to respond with some actual substance.

- bill
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Response to phantom dd-b post

Reply via email to