Re: Is it safe to use btrfs on top of different types of devices?

Austin S. Hemmelgarn Mon, 16 Oct 2017 10:28:10 -0700

On 2017-10-16 12:57, Zoltan wrote:

Hi,


On Mon, Oct 16, 2017 at 1:53 PM, Austin S. Hemmelgarn wrote:

you will need to scrub regularly to avoid data corruption


Is there any indication that a scrub is needed? Before actually doing
a scrub, is btrfs already aware that one of the devices did not
receive all data due to being unavailable for a brief time? If so,
which command shows this info in its output?

In an ideal situation, scrubbing should not be an 'only if needed'thing, even for a regular array that isn't dealing with USB issues.From a practical perspective, there's no way to know for certain if ascrub is needed short of reading every single file in the filesystem init's entirety, at which point, you're just better off running a scrub(because if you _do_ need to scrub, you'll end up reading everything twice).

If you insist on spot-checking things, you can check the output of`btrfs device stats` for the filesystem. If any numbers there arenon-zero, then some file that you've accessed _since the last time youreset the counters_ has corruption. If you go this route, make sure toreset the counters with `btrfs device stats -z` _immediately_ after yourun a scrub, or in some way track their values externally to compareagainst.


Additionally, how does btrfs scrub compare to btrfs balance
-dconvert=raid1,soft -mconvert=raid1,soft in this scenario? I would
suppose that if btrfs is aware that some data does not have a
replication count of 2, then a convert could fix that without a scrub
reading through the whole disk. On the other hand, while I would
expect btrfs scrub to find data with bad checksum, I would not expect
it do balance as well in order to achieve the desired replication
count of 2 for all data. So do I need to run both a scrub and a
convert, or is a scrub enough?

It kind of depends.  There are three things to deal with here:

1. Latent data corruption caused either by bit rot, or by a half-write(that is, one copy got written successfully, then the other devicedisappeared _before_ the other copy got written).

2. Single chunks generated when the array is degraded.
3. Half-raid1 chunks generated by newer kernels when the array is degraded.

Scrub will fix problem 1 because that's what it's designed to fix. itwill also fix problem 3, since that behaves just like problem 1 from ahigher-level perspective. It won't fix problem 2 though, as it doesn'tlook at chunk types (only if the data in the chunk doesn't have thecorrect number of valid copies).

In contrast, the balance command you quoted won't fix issue 1 (becauseit doesn't validate checksums or check that data has the right number ofcopies), or issue 3 (because it's been told to only operate on non-raid1chunks), but it will fix issue 2.

In comparison to both of the above, a full balance without filters willfix all three issues, although it will do so less efficiently (in termsof both time and disk usage) than running a soft-conversion balancefollowed by a scrub.

In the case of normal usage, device disconnects are rare, so you shouldgenerally be more worried about latent data corruption. As a result,for most normal users, I would suggest running the balance command yougave daily (it will usually finish instantly, so there's no point in notrunning it frequently to help ensure data safety) and a scrub daily orweekly (this is the one that matters more here, since you need to worrymore about latent data corruption).

For your use case though, I would instead suggest setting something upto monitor the kernel log to watch for device disconnects, remount thefilesystem when the device reconnects, and then run the balance commandfollowed by a scrub. With most hardware I've seen, USB disconnects tendto be relatively frequent unless you're using very high quality cablingand peripheral devices. If, however, they happen less than once a daymost of the time, just set up the log monitor to remount, and set thebalance and scrub commands on the schedule I suggested above for normalusage.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is it safe to use btrfs on top of different types of devices?

Reply via email to