El mar, 8 dic 2020 a las 20:21, Chris Murphy (<li...@colorremedies.com>) escribió:
> On Tue, Dec 8, 2020 at 12:22 PM Sergio Belkin <seb...@gmail.com> wrote: > > > > Hi! > > I've read the explanation about how much space is available using disks > with different sizes[1]. I understand the rules, but I see a contradiction > with definition of RAID-1 in btrs: > > > > «A form of RAID which stores two complete copies of each piece of data. > Each copy is stored on a different device. btrfs requires a minimum of two > devices to use RAID-1. This is the default for btrfs's metadata on more > than one device. > > > > So, let's say we have 3 small disks: 4GB, 3G, and 2GB. > > From the btrfs perspective, this is a 9G file system, with raid1 > metadata and data block groups. The "raidness" happens at the block > group level, it is not at the device level like mdadm raid. > > Deep dive: Block groups are a logical range of bytes (variable size, > typically 1G). Where and what drive a file extent actually exists on > is a function of the block group to chunk mapping. i.e. a 1G data > block group using raid1 profile, physically exists as two 1G chunks, > each one on two devices. What this means is internally to Btrfs it > sees everything as just one copy in a virtual address space, and it's > a function of the chunk tree and allocator to handle the details of > exactly where it's located physically and how it's replicated. It's > normal to not totally grok this, it's pretty esoteric, but if there's > one complicated thing to try to get about Btrfs, it's this. Because > once you get it, all the other unique/unusual/confusing things start > to make sense. > > Because the "pool" is 9G, and each 1G of data results in two 1G > "mirror" chunks, each written on two drives, writes consume double the > space. Two copies for raid1. The 'btrfs filesystem usage' command > reveals this reality. Whereas 'df' kinda lies to try and make it > behave more like what we've come to expect with more conventional > raid1 implementation. This lie works ok for even number of same size > devices. It starts to fall apart [1] with odd number of drives, and > odd sized devices. So you're likely to run up against some still > remaining issues in 'df' reporting in this example. > > https://carfax.org.uk/btrfs-usage/ > > Set three disks. On the right side, use preset raid1. Go down to > Devices sizes and enter 4000,3000,2000. And it'll show you what > happens. > > > > > If I create one file of 3GB I think that > > 3 GB is written on 4GB disk, it leaves 1 GB free. > > 3 GB of copy is written on 3 GB disk, it leaves 0 GB Free. > > It's more complicated than that because first it'll be broken up into > 3 1GB block groups (possibly more and smaller block groups), and then > the allocator tries to maintain equal free space. That means it'll > tend to initially write to the biggest and 2nd biggest drives, but it > won't fill either of them up. It'll start writing to the smaller > device once it has more space than the free space in the middle > device. And yep, it can split up chunks like this, sorta like Tetris. > > The example size 9G is perhaps not a great example of real world > allocation for btrfs raid1, I'd bump that to T :) 9G is even below the > threshold of USB sticks you can buy off the shelf these days. > > > > > So, I create one file of 1GB that is written on 4GB disk, it leaves 0 GB > free. > > 1 GB of copy is written on 2 GB disk, so it leaves 1 GB free. > > > > So I've used 4GB, ok it leaves 1 GB free on only one disk, but cannot be > mirrored. > > > > However as [1] I could use 4.5 ((4GB+3GB+2GB)/2) GB instead of 4GB. > Surely, I'm missing or mistaking something. > > Block groups and chunks. There's lots of reused jargon in btrfs that > sounds familiar but it's not the same as mdadm or lvm, they're just > reused terms. Another example: raid1 or raid10 on btrfs don't work > like you're used to with mdadm and LVM. i.e. raid10 on btrfs is not a > ""stripe of mirrored drives" it is "striped and mirrored block > groups". man mkfs.btrfs has quite concise and important information > about such things, and of course questions welcome. > > So it's worth knowing a bit about how it works differently so you can > properly assess (a) if it fits for your use case and meets your > expectations (b) how to maintain and manage it, in particular disaster > recovery. Because that too is different. > > > [1] > https://github.com/kdave/btrfs-progs/issues/277 > > -- > Chris Murphy > Nice. I'm ruminating btrfs documentation :) The size of disks of the examples were just to use relatively small and a few files. man mkfs.btrfs has a nice table of example but AFAIK it's only for disk of equal size, for example in "Space Utilization" it says 50% for raid1. -- -- Sergio Belkin LPIC-2 Certified - http://www.lpi.org
_______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org