> We will be using Cyrus to store mail on 2540 arrays.
> 
> We have chosen to build 5-disk RAID-5 LUNs in 2
> arrays which are both connected to same host, and
> mirror and stripe the LUNs.  So a ZFS RAID-10 set
> composed of 4 LUNs.  Multi-pathing also in use for
> redundancy.

Sounds good so far:  lots of small files in a largish system with presumably 
significant access parallelism makes RAID-Z a non-starter, but RAID-5 should be 
OK, especially if the workload is read-dominated.  ZFS might aggregate small 
writes such that their performance would be good as well if Cyrus doesn't force 
them to be performed synchronously (and ZFS doesn't force them to disk 
synchronously on file close); even synchronous small writes could perform well 
if you mirror the ZFS small-update log:  flash - at least the kind with decent 
write performance - might be ideal for this, but if you want to steer clear of 
a specialized configuration just carving one small LUN for mirroring out of 
each array (you could use a RAID-0 stripe on each array if you were compulsive 
about keeping usage balanced; it would be nice to be able to 'center' it on the 
disks, but probably not worth the management overhead unless the array makes it 
easy to do so) should still offer a noticeable improv
 ement over just placing the ZIL on the RAID-5 LUNs.

> 
> My question is any guidance on best choice in CAM for
> stripe size in the LUNs?
> 
> Default is 128K right now, can go up to 512K, should
> we go higher?

By 'stripe size' do you mean the size of the entire stripe (i.e., your default 
above reflects 32 KB on each data disk, plus a 32 KB parity segment) or the 
amount of contiguous data on each disk (i.e., your default above reflects 128 
KB on each data disk for a total of 512 KB in the entire stripe, exclusive of 
the 128 KB parity segment)?

If the former, by all means increase it to 512 KB:  this will keep the largest 
ZFS block on a single disk (assuming that ZFS aligns them on 'natural' 
boundaries) and help read-access parallelism significantly in large-block cases 
(I'm guessing that ZFS would use small blocks for small files but still quite 
possibly use large blocks for its metadata).  Given ZFS's attitude toward 
multi-block on-disk contiguity there might not be much benefit in going to even 
larger stripe sizes, though it probably wouldn't hurt noticeably either as long 
as the entire stripe (ignoring parity) didn't exceed 4 - 16 MB in size (all the 
above numbers assume the 4 + 1 stripe configuration that you described).

In general, having less than 1 MB per-disk stripe segments doesn't make sense 
for *any* workload:  it only takes 10 - 20 milliseconds to transfer 1 MB from a 
contemporary SATA drive (the analysis for high-performance SCSI/FC/SAS drives 
is similar, since both bandwidth and latency performance improve), which is 
comparable to the 12 - 13 ms. that it takes on average just to position to it - 
and you can still stream data at high bandwidths in parallel from the disks in 
an array as long as you have a client buffer as large in MB as the number of 
disks you need to stream from to reach the required bandwidth (you want 1 
GB/sec?  no problem:  just use a 10 - 20 MB buffer and stream from 10 - 20 
disks in parallel).  Of course, this assumes that higher software layers 
organize data storage to provide that level of contiguity to leverage...

- bill
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to