On 11/28/2010 1:51 PM, Paul Piscuc wrote:
Hi,

We are a company that want to replace our current storage layout with one that uses ZFS. We have been testing it for a month now, and everything looks promising. One element that we cannot determine is the optimum number of disks in a raid-z pool. In the ZFS best practice guide, 7,9 and 11 disks are recommended to be used in a single raid-z2. On the other hand, another user specifies that the most important part is the distribution of the defaul 128k record size to all the disks. So, the recommended layout would be:

4-disk RAID-Z2 = 128KiB / 2 = 64KiB = good
5-disk RAID-Z2 = 128KiB / 3 = ~43KiB = not good
6-disk RAID-Z2 = 128KiB / 4 = 32KiB = good
10-disk RAID-Z2 = 128KiB / 8 = 16KiB = good

What is your recommendations regarding the number of disks? We are planning to use 2 raid-z2 pools with 8+2 disks, 2 spare, 2 SSDs for L2ARC, 2 SSDs for ZIL, 2 for syspool, and a similar machine for replication.

Thanks in advance,


You've hit on one of the hardest parts of using ZFS - optimization. Truth of the matter is that there is NO one-size-fits-all "best" solution. It heavily depends on your workload type - access patterns, write patterns, type of I/O, and size of average I/O request.

A couple of things here:

(1) Unless you are using Zvols for "raw" disk partitions (for use with something like a database), the recordsize value is a MAXIMUM value, NOT an absolute value. Thus, if you have a ZFS filesystem with a record size of 128k, it will break up I/O into 128k chunks for writing, but it will also write smaller chunks. I forget what the minimum size is (512b or 1k, IIRC), but what ZFS does is use a Variable block size, up to the maximum size specified in the "recordsize" property. So, if recordsize=128k and you have a 190k write I/O op, it will write a 128k chunk, and a 64k chunk (64 being the smallest multiple of 2 greater than the remaining 62 bits of info). It WON'T write two 128k chunks.

(2) #1 comes up a bit when you have a mix of file sizes - for instance, home directories, where you have lots of small files (initialization files, source code, etc.) combined with some much larger files (images, mp3s, executable binaries, etc.). Thus, such a filesystem will have a wide variety of chunk sizes, which makes optimization difficult, to say the least.

(3) For *random* I/O, a raidZ of any number of disks performs roughly like a *single* disk in terms of IOPs and a little better than a single disk in terms of throughput. So, if you have considerable amounts of random I/O, you should really either use small raidz configs (no more than 4 data disks), or switch to mirrors instead.

(4) For *sequential* or large-size I/O, a raidZ performs roughly equivalent to a stripe of the same number of data disks. That is, a N-disk raidz2 will perform about the same as a (N-2) disk stripe in terms of throughput and IOPS.

(5) As I mentioned in #1, *all* ZFS I/O is broken up into powers-of-two-sized chunks, even if the last chunk must have some padding in it to get to a power-of-two. This has implications as to the best number of disks in a raidZ(n).


I'd have to re-look at the ZFS Best Practices Guide, but I'm pretty sure the recommendation of 7, 9, or 11 disks was for a raidz1, NOT a raidz2. Due to #5 above, best performance comes with an EVEN number of data disks in any raidZ, so a write to any disks is always a full portion of the chunk, rather than a partial one (that sounds funny, but trust me). The best balance of size, IOPs, and throughput is found in the mid-size raidZ(n) configs, where there are 4, 6 or 8 data disks.


Honestly, even with you describing a workload, it will be hard for us to give you a real exact answer. My best suggestion is to do some testing with raidZ(n) of different sizes, to see the tradeoffs between size and performance.


Also, in your sample config, unless you plan to use the spare disks for redundancy on the boot mirror, it would be better to configure 2 x 11-disk raidZ3 than 2 x 10-disk raidZ2 + 2 spares. Better reliability.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to