On 11/28/2010 1:51 PM, Paul Piscuc wrote:
Hi,
We are a company that want to replace our current storage layout with
one that uses ZFS. We have been testing it for a month now, and
everything looks promising. One element that we cannot determine is
the optimum number of disks in a raid-z pool. In the ZFS best practice
guide, 7,9 and 11 disks are recommended to be used in a single
raid-z2. On the other hand, another user specifies that the most
important part is the distribution of the defaul 128k record size to
all the disks. So, the recommended layout would be:
4-disk RAID-Z2 = 128KiB / 2 = 64KiB = good
5-disk RAID-Z2 = 128KiB / 3 = ~43KiB = not good
6-disk RAID-Z2 = 128KiB / 4 = 32KiB = good
10-disk RAID-Z2 = 128KiB / 8 = 16KiB = good
What is your recommendations regarding the number of disks? We are
planning to use 2 raid-z2 pools with 8+2 disks, 2 spare, 2 SSDs for
L2ARC, 2 SSDs for ZIL, 2 for syspool, and a similar machine for
replication.
Thanks in advance,
You've hit on one of the hardest parts of using ZFS - optimization.
Truth of the matter is that there is NO one-size-fits-all "best"
solution. It heavily depends on your workload type - access patterns,
write patterns, type of I/O, and size of average I/O request.
A couple of things here:
(1) Unless you are using Zvols for "raw" disk partitions (for use with
something like a database), the recordsize value is a MAXIMUM value, NOT
an absolute value. Thus, if you have a ZFS filesystem with a record
size of 128k, it will break up I/O into 128k chunks for writing, but it
will also write smaller chunks. I forget what the minimum size is (512b
or 1k, IIRC), but what ZFS does is use a Variable block size, up to the
maximum size specified in the "recordsize" property. So, if
recordsize=128k and you have a 190k write I/O op, it will write a 128k
chunk, and a 64k chunk (64 being the smallest multiple of 2 greater than
the remaining 62 bits of info). It WON'T write two 128k chunks.
(2) #1 comes up a bit when you have a mix of file sizes - for instance,
home directories, where you have lots of small files (initialization
files, source code, etc.) combined with some much larger files (images,
mp3s, executable binaries, etc.). Thus, such a filesystem will have a
wide variety of chunk sizes, which makes optimization difficult, to say
the least.
(3) For *random* I/O, a raidZ of any number of disks performs roughly
like a *single* disk in terms of IOPs and a little better than a single
disk in terms of throughput. So, if you have considerable amounts of
random I/O, you should really either use small raidz configs (no more
than 4 data disks), or switch to mirrors instead.
(4) For *sequential* or large-size I/O, a raidZ performs roughly
equivalent to a stripe of the same number of data disks. That is, a
N-disk raidz2 will perform about the same as a (N-2) disk stripe in
terms of throughput and IOPS.
(5) As I mentioned in #1, *all* ZFS I/O is broken up into
powers-of-two-sized chunks, even if the last chunk must have some
padding in it to get to a power-of-two. This has implications as to
the best number of disks in a raidZ(n).
I'd have to re-look at the ZFS Best Practices Guide, but I'm pretty sure
the recommendation of 7, 9, or 11 disks was for a raidz1, NOT a raidz2.
Due to #5 above, best performance comes with an EVEN number of data
disks in any raidZ, so a write to any disks is always a full portion of
the chunk, rather than a partial one (that sounds funny, but trust me).
The best balance of size, IOPs, and throughput is found in the mid-size
raidZ(n) configs, where there are 4, 6 or 8 data disks.
Honestly, even with you describing a workload, it will be hard for us to
give you a real exact answer. My best suggestion is to do some testing
with raidZ(n) of different sizes, to see the tradeoffs between size and
performance.
Also, in your sample config, unless you plan to use the spare disks for
redundancy on the boot mirror, it would be better to configure 2 x
11-disk raidZ3 than 2 x 10-disk raidZ2 + 2 spares. Better reliability.
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss