Date: Wed, 30 Apr 2025 09:42:31 +0200 From: Edgar Fu� <e...@math.uni-bonn.de> Message-ID: <abhuz8qrvdw8t...@trav.math.uni-bonn.de>
| > Note that the numbers in parentheses [of the fsck output] are what is free | Oooooops. I never understood it that way. Take Greg's filesystem: 603427 files, 48107560 used, 11047434 free (184530 frags, 1357863 blocks, 0.3% fragmentation) There are 11047434 free fragments, which are composed of 184530 frags which come from blocks where at least 1 frag is allocated, and 1357863 full sized blocks not used at all. 8 * 1357863 + 184530 == 11047434 That that calculation works says there are 8 frags/block (which one would normally work out by doing (free - frags) / blocks ... except that using dumpfs is a much easier method to discover that value, and dumpfs also tells you what the block and frag sizes actually are, not just the number of frags/block which is all you can deduce from the above line alone. The 0.3% is the percentage of all filesystem blocks that are currently part allocated, and so not available to be used as a full sized data block ... 184530 / (1357863 * 8 + 48107560) * 100 (== 0.31) If that value starts getting too high, the filesystem's block size is probably too big for the data stored in it. | > Perhaps surprisingly, the filesystem doesn't really bother keeping track | > of how much of anything is allocated | Yes, but fsck could. It could, when it does a full scan, but that's not its job. fsck's purpose is to validate (and fix when possible, and required) the file system structure. A little extra work in dumpfs (which would make it slower, perhaps a lot, no idea, haven't tried it) might allow it to provide that info though, that would be a better place to put code for something like that. It is actually almost there already, with some care you can work out fragments for each file from dumpfs -i output (its "blocks" value is 512 byte blocks, DEV_BSIZE, fragments are at least 2 of those, as the smallest frags allowed are 1K - but how many depends upon the filesys parameters). Given the block size, frag size, file size, and blocks allocated, the number of frags assigned to each file can be calculated (I think). Only consider files smaller than 12 blocks (filesys bsize blocks) and then ignore all multiples of that block size (those are full blocks) in the allocated block count, the remainder is probably how many actual frags that file contains (keep track of the units being used when doing this, 512 byte blocks, frag sized blocks, fs_bsize sized blocks...). | > If read speed is more important than write speed, then bigger stripes | > make more sense. [[I corrected my typo...]] | Why is that so? Just because in general bigger reads equate to faster overall read speed (less rotational delays waiting for the first block of a sequence to appear). Of course this only applies to rotating media, SSDs are entirely different, but even there, there's less overhead to do one large read than several smaller ones. The hope is that when raidframe reads a stripe, it might contain data for 2, or more, blocks, so the next time a read is done by the application, no actual i/o is needed. (Of course, this can make writing slower, as a block write needs to change just part of a stripe). But as Mouse said, all this depends upon all kinds of factors, and the only real way to know is to run (perhaps slightly cut down) versions of your real workload, and measure yourself, using the hardware you want to optimise and the data that you want to read (or write) quickly. Don't use benchmark applications - all they can ever achieve is for the system to be optimised so the benchmark runs quickly, which is almost never very much related to any real workload. Only what you will actually be using the system for is meaningful. kre