On 2017-10-19 07:01, Peter Grandi wrote:
[ ... ]

Oh please, please a bit less silliness would be welcome here.
In a previous comment on this tedious thread I had written:

If the block device abstraction layer and lower layers work
correctly, Btrfs does not have problems of that sort when
adding new devices; conversely if the block device layer and
lower layers do not work correctly, no mainline Linux
filesystem I know can cope with that.

Note: "work correctly" does not mean "work error-free".

The last line is very important and I added it advisedly.

Even looking at things that way though, Zoltan's assessment
that reliability is essentially a measure of error rate is
correct.

It is instead based on a grave confusion between two very
different kinds of "error rate", confusion also partially based
on the ridiculous misunderstanding, which I have already pointed
out, that UNIX filesystems run on top of SATA or USB devices:

Internal SATA devices absolutely can randomly drop off the bus
just like many USB storage devices do,

Filesystems run on top of *block devices* with a definite
interface and a definite state machine, and filesystems in
general assume that the block device works *correctly*.
They do run on top of USB or SATA devices, otherwise a significant majority of systems running Linux and/or BSD should not be operating right now. Yes, they don't directly access them, but the block layer isn't much more than command translation, scheduling, and accounting, so this distinction is meaningless and largely irrelevant. It's also pretty standard practice among most sane sysadmins who aren't trying to be jerks, as well as most kernel developers I've met, is to refer to a block device connected via interface 'X' as an 'X device' or an 'X storage device'.

but it almost never happens (it's a statistical impossibility
if there are no hardware or firmware issues), so they are more
reliable in that respect.

What the OP was doing was using "unreliable" both for the case
where the device "lies" and the case where the device does not
"lie" but reports a failure. Both of these are malfunctions in a
wide sense:

   * The [block] device "lies" as to its status or what it has done.
   * The [block] device reports truthfully that an action has failed.

But they are of very different nature and need completely
different handling. Hint: one is an extensional property and the
other is a modal one, there is a huge difference between "this
data is wrong" and "I know that this data is wrong".

The really important "detail" is that filesystems are, as a rule
with very few exceptions, designed to work only if the block
device layer (and those below it) does not "lie" (see "Bizantyne
failures" below), that is "works correctly": reports the failure
of every operation that fails and the success of every operation
that succeeds and never gets into an unexpected state.

In particular filesystems designs are nearly always based on the
assumption that there are no undetected errors at the block
device level or below. Then the expected *frequency* of detected
errors influences how much redundancy and what kind of recovery
are desirable, but the frequency of "lies" is assumed to be
zero.

The one case where Btrfs does not assume that the storage layer
works *correctly* is checksumming: it is quite expensive and
makes sense only if the block device is expected to (sometimes)
"lie" about having written the data correctly or having read it
correctly. The role of the checksum is to spot when a block
device "lies" and turn an undetected read error into a detected
one (they could be used also to detect correct writes that are
misreported as having failed).

The crucial difference that exists between SATA and USB is not
that USB chips have higher rates of detected failures (even if
they often do), but that in my experience SATA interfaces from
reputable suppliers don't "lie" (more realistically have
negligible "lie" rates), and USB interfaces (both host bus
adapters and IO bus bridges) "lie" both systematically and
statistically with non negligible rates, and anyhow the USB mass
storage protocol is not very good at error reporting and
handling.
You do realize you just said exactly what I was saying, just in a more general and much more verbose manner which involved explaining things that are either well known and documented or aren't even entirely relevant to the thread in question?

For an end user, it generally doesn't matter whether a given layer reported the error or passed it on (or generated it), it matters whether it was corrected or not. If the subset of the storage stack below whatever layer is being discussed (in this case the filesystem) causes errors at a rate deemed unacceptable for the given application that it does not correct, it's unreliable, regardless of whether or not they get corrected at this layer or a higher layer. Even if you're running BTRFS on top of it, a SATA connected hard drive that returns bogus data on 1% of reads is from a user perspective just as unreliable as one that returns read errors 1% of the time, even though BTRFS can handle both (provided the user configures it correctly).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to