> We will be using Cyrus to store mail on 2540 arrays. > > We have chosen to build 5-disk RAID-5 LUNs in 2 > arrays which are both connected to same host, and > mirror and stripe the LUNs. So a ZFS RAID-10 set > composed of 4 LUNs. Multi-pathing also in use for > redundancy.
Sounds good so far: lots of small files in a largish system with presumably significant access parallelism makes RAID-Z a non-starter, but RAID-5 should be OK, especially if the workload is read-dominated. ZFS might aggregate small writes such that their performance would be good as well if Cyrus doesn't force them to be performed synchronously (and ZFS doesn't force them to disk synchronously on file close); even synchronous small writes could perform well if you mirror the ZFS small-update log: flash - at least the kind with decent write performance - might be ideal for this, but if you want to steer clear of a specialized configuration just carving one small LUN for mirroring out of each array (you could use a RAID-0 stripe on each array if you were compulsive about keeping usage balanced; it would be nice to be able to 'center' it on the disks, but probably not worth the management overhead unless the array makes it easy to do so) should still offer a noticeable improv ement over just placing the ZIL on the RAID-5 LUNs. > > My question is any guidance on best choice in CAM for > stripe size in the LUNs? > > Default is 128K right now, can go up to 512K, should > we go higher? By 'stripe size' do you mean the size of the entire stripe (i.e., your default above reflects 32 KB on each data disk, plus a 32 KB parity segment) or the amount of contiguous data on each disk (i.e., your default above reflects 128 KB on each data disk for a total of 512 KB in the entire stripe, exclusive of the 128 KB parity segment)? If the former, by all means increase it to 512 KB: this will keep the largest ZFS block on a single disk (assuming that ZFS aligns them on 'natural' boundaries) and help read-access parallelism significantly in large-block cases (I'm guessing that ZFS would use small blocks for small files but still quite possibly use large blocks for its metadata). Given ZFS's attitude toward multi-block on-disk contiguity there might not be much benefit in going to even larger stripe sizes, though it probably wouldn't hurt noticeably either as long as the entire stripe (ignoring parity) didn't exceed 4 - 16 MB in size (all the above numbers assume the 4 + 1 stripe configuration that you described). In general, having less than 1 MB per-disk stripe segments doesn't make sense for *any* workload: it only takes 10 - 20 milliseconds to transfer 1 MB from a contemporary SATA drive (the analysis for high-performance SCSI/FC/SAS drives is similar, since both bandwidth and latency performance improve), which is comparable to the 12 - 13 ms. that it takes on average just to position to it - and you can still stream data at high bandwidths in parallel from the disks in an array as long as you have a client buffer as large in MB as the number of disks you need to stream from to reach the required bandwidth (you want 1 GB/sec? no problem: just use a 10 - 20 MB buffer and stream from 10 - 20 disks in parallel). Of course, this assumes that higher software layers organize data storage to provide that level of contiguity to leverage... - bill This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss