On Jul 27, 2013, at 10:56 AM, Lennart Poettering <mzerq...@0pointer.de> wrote:
> 
> Well, I am pretty sure the burden must be on the file systems to report
> a useful estimate free blocks value in statfs()/statvfs().

tl;dr

4 VMs, each using one thinp LV. Each LV has a virtualsize of 1TB. The VG 
backing those LVs is 1TB. If each LV actually is using only 150GB, the real 
free space in the VG is 400GB. 

But how to you propose informing each VMs of the real free space? Are they all 
informed there's 400GB of free space? Or do you just do a simple scaling and 
tell them 400GB/4 is free?

OK well what if 2 of those VMs actively make use of snapshotting? The scaling 
approach quickly isn't going to work out for any of the VMs.

I think the burden is on the virtual storage layer designer/implementer. He 
shouldn't make 1TB virtualsize LVs, when only 150GB is needed. The idea isn't 
to use thinp to totally eliminate the need to ever grow an LV and the 
underlying fs, but to reduce the need (perhaps significantly).



> Note that btrfs RAID is broken in a similar way: it will return the
> amount of actual free blocks to the user. Since if RAID is enabled each
> file however requires twice (or some other factor) the number of blocks
> the value is completely bogus. The btrfs RAID userspace API is simply
> broken.

It's a problem. I'm unconvinced it's broken.

As I mentioned earlier, a btrfs volume as a whole doesn't have a raid profile 
set. It's the subvolume (or possibly a file). Because the work isn't done to 
enable per subvolume or per file raid profiles, this is done at mkfs time. But 
this actually only sets the profile for the default subvolume, not the whole 
file system. It just seems it is that way now. So you could argue that in the 
meantime, btrfs devs should punt, and report free space similar to md and lvm 
raid.

Long term fix seems to require the application making a more qualified inquiry. 
Asking free space for a whole volume that it may not even have write permission 
for seems unreasonable. It should instead ask for free space for a particular 
path. The actual write location might be a directory with a quota that must be 
honored; or a subvolume with a raid1 data profile set.

The program asking for volume free space is a totally ambiguous inquiry.


> The accepted way to get an estimate how much disk space is still
> available is statfs()/statvfs(), applications and admins rely on the
> values it returns. You cannot just break that and think you can get away
> with it.

Sorry, this is a half empty vs half full problem. A solution won't be found by 
disregarding the other perspective; as a consequence to calling it broken, 
you're saying to not break it we can't have per subvolume or per file raid. And 
that's less acceptable than the original problem, which really is that some 
programs are making unacceptably vague and grandiose inquiries about free space 
availability.


> 
> The thin provisioning folks need to find a way to make this work, not
> userspace programmers. 

99.9% of userspace programs are writing out pretty small files, at a rate 
that's fairly knowable. They are thus well behaved. A handful of applications 
want to know how much free space there is, as if the answer entitles them to 
use all or most of that free space, compared to some other program that asks at 
the same time?

I think the expectation programs can get ballpark free space information for a 
volume was probably always unreasonable, it's just that thin provisioning is 
making this more clear.

Most burden is on the user implementer who creates virtualsize LVs to not make 
them too big. And then I think there is some burden on programs to make more 
qualified inquiries for free space available.


Chris Murphy


-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Reply via email to