The default blocksize is 128K. If you are using mirrors, then each block on disk will be 128K whenever possible. But if you're using raidzN with a capacity of M disks (M disks useful capacity + N disks redundancy) then the block size on each individual disk will be 128K / M. Right? This is one of the reasons the raidzN resilver code is inefficient. Since you end up waiting for the slowest seek time of any one disk in the vdev, and when that's done, the amount of data you were able to process was at most 128K. Rinse and repeat.
Would it not be wise, when creating raidzN vdev's, to increase the blocksize to 128K * M? Then, the on-disk blocksize for each disk could be the same as the mirror on-disk blocksize of 128K. It still won't resilver as fast as a mirror, but the raidzN resilver would be accelerated by as much as M times. Right? The only disadvantage that I know of would be wasted space. Every 4K file in a mirror can waste up to 124K of disk space, right? And in the above described scenario, every 4K file in the raidzN can waste up to 128K * M of disk space, right? Also, if you have a lot of these sparse 4K blocks, then the resilver time doesn't actually improve either. Because you perform one seek, and regardless if you fetch 128K or 128K*M, you still paid one maximum seek time to fetch 4K of useful data. Point is: If the goal is to reduce the number of on-disk slabs, and therefore reduce the number of seeks necessary to resilver, one thing you could do is increase the pool blocksize, right? YMMV, and YM will depend on how you use your pool. Hopefully you're able to bias your usage in favor of large block writes.
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss