I wrote: > Just thinking out loud here. Now I'm off to see what kind of performance > cost there is, comparing (with 400GB disks): > Simple ZFS stripe on one 2198GB LUN from a 6+1 HW RAID5 volume > 8+1 RAID-Z on 9 244.2GB LUN's from a 6+1 HW RAID5 volume
[EMAIL PROTECTED] said: > Interesting idea. Please post back to let us know how the performance looks. The short story is, performance is not bad with the raidz arrangement, until you get to doing reads, at which point it looks much worse than the 1-LUN setup. Please bear in mind that I'm not a storage nor benchmarking expert, though I'd say I'm not a neophyte either. Some specifics: The array is a low-end Hitachi, 9520V. My two test subjects are a pair of RAID-5 groups in the same shelf, each consisting of 6D+1P 400GB SATA drives. The test host is a Sun T2000, 16GB RAM, connected via 2Gb FC links through a pair of switches (the array/mpxio combination do not support load-balancing, so only one 2Gb channel is in use at a time). It is running Solaris-10U3, patches current as of 12-Jan-2007. The array was mostly idle except for my tests, although some light I/O to other shelves may have come from another host on occasion. The test host wasn't doing anything else during these tests. One RAID-5 group was configured as a single 2048GB LUN (with about 150GB left unallocated, the array has a max LUN size); The second RAID-5 group was setup as nine 244.3GB LUN's. Here are the zpool configurations I used for these tests: # zpool status -v pool: bulk_sp1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_sp1 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0 ONLINE 0 0 0 errors: No known data errors pool: bulk_zp2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_zp2 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c6t4849544143484920443630303133323230303330d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303331d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303332d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303333d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303334d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303335d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303336d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303337d0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303338d0 ONLINE 0 0 0 errors: No known data errors # zfs list NAME USED AVAIL REFER MOUNTPOINT bulk_sp1 83K 1.95T 24.5K /sp1 bulk_zp2 73.8K 1.87T 2.67K /zp2 I used two benchmarks: One was a "bunzip2 | tar" extract of the Sun Studio-11 SPARC distribution tarball, extracting from the T2000's internal drives onto the test zpools. For this benchmark, both zpools gave similar results: pool sp1 (single-LUN stripe): du -s -k: 1155141 time -p: real 713.67 user 614.42 sys 7.56 1.6MB/sec overall pool zp2 (8+1-LUN raidz1): du -s -k: 1169020 time -p: real 714.96 user 614.78 sys 7.56 1.6MB/sec overall The 2nd benchmark was bonnie++ v1.03, run single-threaded with default arguments, which means a 32GB dataset made of up 1GB files. Observations of "vmstat" and "mpstat" during the tests showed that bonnie++ is CPU-limited on the T2000, especially for the getc()/putc() tests, so I later ran 3x bonnie++'s simultaneously (13GB dataset each), and got the same results in total throughput for the block read/write tests on the single-LUN zpool (I was not patient enough to sit through the getc/putc tests again :-). pool sp1 (single-LUN stripe): Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP filer1 32G 15497 99 66245 84 16652 30 15210 90 106600 59 322.3 3 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 5204 100 +++++ +++ 8076 100 4551 100 +++++ +++ 7509 100 filer1,32G,15497,99,66245,84,16652,30,15210,90,106600,59,322.3,3,16,5204,100,+++++,+++,8076,100,4551,100,+++++,+++,7509,100 pool zp2 (8+1-LUN raidz1): Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP filer1 32G 16118 100 29702 40 7416 13 15828 94 30204 20 25.0 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 5215 100 +++++ +++ 8527 100 4453 100 +++++ +++ 8918 100 filer1,32G,16118,100,29702,40,7416,13,15828,94,30204,20,25.0,0,16,5215,100,+++++,+++,8527,100,4453,100,+++++,+++,8918,100 I'm not sure what to add in the way of comments. It seems clear from the results, and from watching "iostat -xn", "vmstat", "mpstat", etc. during the tests, that the raidz pool apparently suffers from not being able to make as good use of the array's 1GB cache (the sequential block read test seems to match well with Hitachi's read prefetch algorithms, I guess). There's also the potential of too much seeking going on for the raidz pool, since there are 9 LUN's on top of 7 physical disk drives (though how Hitachi divides/stripes those LUN's is not clear to me). One thing I noticed which puzzles me is that in both configurations, though more so in the divided-up raidz pool, there were long periods of time where the LUN's showed in "iostat -xn" output at 100% busy but with no I/O's happening at all. No paging, CPU 100% idle, no less than 2GB of free RAM, for as long as 20-30 seconds. Sure puts a dent in the throughput. I'm doing some more testing of NFS throughput over these two zpool's, since the test machine will eventually become an NFS and samba server. I've got some questions about the performance issues in the NFS scenario, but will address those in a separate message. Questions, observations, and/or suggestions are welcome. Regards, Marion _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss