The default blocksize is 128K.  If you are using mirrors, then each block on
disk will be 128K whenever possible.  But if you're using raidzN with a
capacity of M disks (M disks useful capacity + N disks redundancy) then the
block size on each individual disk will be 128K / M.  Right?  This is one of
the reasons the raidzN resilver code is inefficient.  Since you end up
waiting for the slowest seek time of any one disk in the vdev, and when
that's done, the amount of data you were able to process was at most 128K.
Rinse and repeat.

 

Would it not be wise, when creating raidzN vdev's, to increase the blocksize
to 128K * M?  Then, the on-disk blocksize for each disk could be the same as
the mirror on-disk blocksize of 128K.  It still won't resilver as fast as a
mirror, but the raidzN resilver would be accelerated by as much as M times.
Right?

 

The only disadvantage that I know of would be wasted space.  Every 4K file
in a mirror can waste up to 124K of disk space, right?  And in the above
described scenario, every 4K file in the raidzN can waste up to 128K * M of
disk space, right?  Also, if you have a lot of these sparse 4K blocks, then
the resilver time doesn't actually improve either.  Because you perform one
seek, and regardless if you fetch 128K or 128K*M, you still paid one maximum
seek time to fetch 4K of useful data.

 

Point is:  If the goal is to reduce the number of on-disk slabs, and
therefore reduce the number of seeks necessary to resilver, one thing you
could do is increase the pool blocksize, right?  YMMV, and YM will depend on
how you use your pool.  Hopefully you're able to bias your usage in favor of
large block writes.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to