On Mon, May 10, 2010 at 3:53 PM, Geoff Nordli <geo...@gnaa.net> wrote: > Doesn't this alignment have more to do with aligning writes to the > stripe/segment size of a traditional storage array? The articles I am
It is a lot like a stripe / segment size. If you want to think of it in those terms, you've got a segment of 512b (the iscsi block size) and a width of 16, giving you an 8k stripe size. Any write that is less than 8k will require a RMW cycle, and any write in multiples of 8k will do "full stripe" writes. If the write doesn't start on an 8k boundary, you risk having writes span multiple underlying zvol blocks. There's an explanation of WD's "Advanced Format" at Anandtech that describes the problem with 4k physical sectors, here http://www.anandtech.com/show/2888. Instead of sector, think zvol block though. When using a zvol, you've essentially got $volblocksize sized physical sectors, but the initiator sees the 512b block size that the LUN is reporting. If you don't block align, you risk having a write straddle two zfs blocks. There may be some benefit to using a 4k volblocksize, but you'll use more time and space on block checksums and, etc in your zpool. I think 8k is a reasonable trade off. > reading suggests creating a small unused partition to take up the space up > to 127bytes (assuming 128byte segment), then create the real partition from > the 128th sector going forward. I am not sure how this would happen with > zfs. If you're using the whole disk with zfs, you don't need to worry about it. If you're using fdisk partitions or slices, you need be a little more careful. I made an attempt to 4k block align the SSD that I'm using for a slog / L2ARC, which in theory should line up better with the devices erase boundary. While not really pertinent to this discussion it gives some idea on how to do it. You want the filesystem to start at a point where ( $offset * $sector_size * $sectors_per_cylinder ) % 4096 = 0. For most LBA drives, you've got 16065 sectors/cylinder and 512b sectors, giving 8 as the smallest offset that will align. ( 8 * 512 * 16065 ) % 4096 = 0 First you have to look at fdisk (on an SMI labeled disk) and realize that you're going to lose the first cylinder to the MBR. When you then create slices in format, it'll report one cylinder less than fdisk did, so remember to account for that in your offset. For an iscsi LUN used by a VM, you should align its filesystem on a zvol block boundary. Windows Vista and Server 2008 use 240 heads & 63 sectors/track, so they are already 8k block aligned. Linux, Solaris, and BSD also let you specify the geometry used by fdisk, but I wasn't comfortable doing it with Solaris since you have to create a geometry file first. For my 30GB OCZ Vertex: bh...@basestar:~$ pfexec fdisk -W - /dev/rdsk/c1t0d0p0 * /dev/rdsk/c1t0d0p0 default fdisk table * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 3892 cylinders [..] * Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect Numsect 191 128 0 1 1 254 63 1023 16065 62508915 bh...@basestar:~$ pfexec prtvtoc /dev/rdsk/c1t0d0p0 * /dev/rdsk/c1t0d0p0 partition map * * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 16065 sectors/cylinder * 3891 cylinders * 3889 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First Sector Last * Sector Count Sector * 0 112455 112454 * 62428590 48195 62476784 * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 4 00 112455 2056320 2168774 1 4 01 2168775 60243750 62412524 2 5 01 0 62508915 62508914 8 1 01 0 16065 16064 -B -- Brandon High : bh...@freaks.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss