Re: How do really work RAID1 on btrfs?

Sergio Belkin Tue, 08 Dec 2020 16:45:54 -0800

El mar, 8 dic 2020 a las 20:21, Chris Murphy (<li...@colorremedies.com>)
escribió:


> On Tue, Dec 8, 2020 at 12:22 PM Sergio Belkin <seb...@gmail.com> wrote:
> >
> > Hi!
> > I've read the explanation about how much space is available using disks
> with different sizes[1]. I understand the rules, but I see a contradiction
> with definition of RAID-1 in btrs:
> >
> > «A form of RAID which stores two complete copies of each piece of data.
> Each copy is stored on a different device. btrfs requires a minimum of two
> devices to use RAID-1. This is the default for btrfs's metadata on more
> than one device.
> >
> > So, let's say we have 3 small disks: 4GB, 3G, and 2GB.
>
> From the btrfs perspective, this is a 9G file system, with raid1
> metadata and data block groups. The "raidness" happens at the block
> group level, it is not at the device level like mdadm raid.
>
> Deep dive: Block groups are a logical range of bytes (variable size,
> typically 1G). Where and what drive a file extent actually exists on
> is a function of the block group to chunk mapping. i.e. a 1G data
> block group using raid1 profile, physically exists as two 1G chunks,
> each one on two devices. What this means is internally to Btrfs it
> sees everything as just one copy in a virtual address space, and it's
> a function of the chunk tree and allocator to handle the details of
> exactly where it's located physically and how it's replicated. It's
> normal to not totally grok this, it's pretty esoteric, but if there's
> one complicated thing to try to get about Btrfs, it's this. Because
> once you get it, all the other unique/unusual/confusing things start
> to make sense.
>
> Because the "pool" is 9G, and each 1G of data results in two 1G
> "mirror" chunks, each written on two drives, writes consume double the
> space. Two copies for raid1. The 'btrfs filesystem usage' command
> reveals this reality. Whereas 'df' kinda lies to try and make it
> behave more like what we've come to expect with more conventional
> raid1 implementation. This lie works ok for even number of same size
> devices. It starts to fall apart [1] with odd number of drives, and
> odd sized devices. So you're likely to run up against some still
> remaining issues in 'df' reporting in this example.
>
> https://carfax.org.uk/btrfs-usage/
>
> Set three disks. On the right side, use preset raid1. Go down to
> Devices sizes and enter 4000,3000,2000. And it'll show you what
> happens.
>
>
>
> > If I create one file of 3GB I think that
> > 3 GB is written on 4GB disk, it leaves 1 GB free.
> > 3 GB  of copy is written on 3 GB disk, it leaves 0 GB Free.
>
> It's more complicated than that because first it'll be broken up into
> 3 1GB block groups (possibly more and smaller block groups), and then
> the allocator tries to maintain equal free space. That means it'll
> tend to initially write to the biggest and 2nd biggest drives, but it
> won't fill either of them up. It'll start writing to the smaller
> device once it has more space than the free space in the middle
> device. And yep, it can split up chunks like this, sorta like Tetris.
>
> The example size 9G is perhaps not a great example of real world
> allocation for btrfs raid1, I'd bump that to T :) 9G is even below the
> threshold of USB sticks you can buy off the shelf these days.
>
> >
> > So, I create one file of 1GB that is written on 4GB disk, it leaves 0 GB
> free.
> > 1 GB of copy is written on 2 GB disk, so it leaves 1 GB free.
> >
> > So I've used 4GB, ok it leaves 1 GB free on only one disk, but cannot be
> mirrored.
> >
> > However as [1] I could use 4.5 ((4GB+3GB+2GB)/2) GB instead of 4GB.
> Surely, I'm missing or mistaking something.
>
> Block groups and chunks. There's lots of reused jargon in btrfs that
> sounds familiar but it's not the same as mdadm or lvm, they're just
> reused terms. Another example: raid1 or raid10 on btrfs don't work
> like you're used to with mdadm and LVM. i.e. raid10 on btrfs is not a
> ""stripe of mirrored drives" it is "striped and mirrored block
> groups". man mkfs.btrfs has quite concise and important information
> about such things, and of course questions welcome.
>
> So it's worth knowing a bit about how it works differently so you can
> properly assess (a) if it fits for your use case and meets your
> expectations (b) how to maintain and manage it, in particular disaster
> recovery. Because that too is different.
>
>
> [1]
> https://github.com/kdave/btrfs-progs/issues/277
>
> --
> Chris Murphy
>

Nice. I'm ruminating btrfs documentation :)
The size of disks of the examples were just to use relatively small and a
few files.
man mkfs.btrfs has a nice table of example but AFAIK it's only for disk of
equal size, for example in "Space Utilization" it says 50% for raid1.

-- 
--
Sergio Belkin
LPIC-2 Certified - http://www.lpi.org

_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: How do really work RAID1 on btrfs?

Reply via email to