Steve <steve.jack...@norman.com> writes: > I would like to ask a question regarding ZFS performance overhead when > having hundreds of millions of files > > We have a storage solution, where one of the datasets has a folder > containing about 400 million files and folders (very small 1K files) > > What kind of overhead do we get from this kind of thing?
at least 50%. I don't think this is obvious, so I'll state it: RAID-Z will not gain you any additional capacity over mirroring in this scenario. remember each individual file gets its own stripe. if the file is 512 bytes or less, you'll need another 512 byte block for the parity (actually as a special case, it's not parity, but a copy. parity would just be an inversion of all bits, so it's not useful to spend time doing it.) what's more, even if the file is 1024 bytes or less, ZFS will allocate an additional padding block to reduce the chance of unusable single disk blocks. a 1536 byte file will also consume 2048 bytes of physical disk, however. the reasoning for RAID-Z2 is similar, except it will add a padding block even for the 1536 byte file. to summarise: net raid-z1 raidz-2 -------------------------- 512 1024 2x 1536 3x 1024 2048 2x 3072 3x 1536 2048 1½x 3072 2x 2048 3072 1½x 3072 1½x 2560 3072 1⅕x 3584 1⅖x the above assumes at least 8 (9) disks in the vdev, otherwise you'll get a little more overhead for the "larger" filesizes. > Our storage performance has degraded over time, and we have been > looking in different places for cause of problems, but now I am > wondering if its simply a file pointer issue? adding new files will fragment directories, that might cause performance degradation depending on access patterns. I don't think many files in itself will cause problems, but since you get a lot more ZFS records in your dataset (128x!), more of the disk space is "wasted" on block pointers, and you may get more block pointer writes since more levels are needed. -- Kjetil T. Homme Redpill Linpro AS - Changing the game _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss