Re: [zfs-discuss] Re: ZFS or UFS - what to do?

Marion Hakanson Wed, 31 Jan 2007 20:05:58 -0800

I wrote:
> Just thinking out loud here.  Now I'm off to see what kind of performance
> cost there is, comparing (with 400GB disks):
>       Simple ZFS stripe on one 2198GB LUN from a 6+1 HW RAID5 volume
>       8+1 RAID-Z on 9 244.2GB LUN's from a 6+1 HW RAID5 volume



[EMAIL PROTECTED] said:
> Interesting idea.  Please post back to let us know how the performance looks.


The short story is, performance is not bad with the raidz arrangement, until
you get to doing reads, at which point it looks much worse than the 1-LUN setup.

Please bear in mind that I'm not a storage nor benchmarking expert, though
I'd say I'm not a neophyte either.

Some specifics:

The array is a low-end Hitachi, 9520V.  My two test subjects are a pair
of RAID-5 groups in the same shelf, each consisting of 6D+1P 400GB SATA
drives.  The test host is a Sun T2000, 16GB RAM, connected via 2Gb FC
links through a pair of switches (the array/mpxio combination do not
support load-balancing, so only one 2Gb channel is in use at a time).
It is running Solaris-10U3, patches current as of 12-Jan-2007.

The array was mostly idle except for my tests, although some light
I/O to other shelves may have come from another host on occasion.
The test host wasn't doing anything else during these tests.

One RAID-5 group was configured as a single 2048GB LUN (with about 150GB
left unallocated, the array has a max LUN size);  The second RAID-5 group
was setup as nine 244.3GB LUN's.

Here are the zpool configurations I used for these tests:
# zpool status -v
  pool: bulk_sp1
 state: ONLINE
 scrub: none requested
config:

        NAME                                             STATE     READ WRITE 
CKSUM
        bulk_sp1                                         ONLINE       0     0   
  0
          c6t4849544143484920443630303133323230303230d0  ONLINE       0     0   
  0

errors: No known data errors

  pool: bulk_zp2
 state: ONLINE
 scrub: none requested
config:

        NAME                                               STATE     READ WRITE 
CKSUM
        bulk_zp2                                           ONLINE       0     0 
    0
          raidz1                                           ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303330d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303331d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303332d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303333d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303334d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303335d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303336d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303337d0  ONLINE       0     0 
    0
            c6t4849544143484920443630303133323230303338d0  ONLINE       0     0 
    0

errors: No known data errors
# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
bulk_sp1                83K  1.95T  24.5K  /sp1
bulk_zp2              73.8K  1.87T  2.67K  /zp2


I used two benchmarks:  One was a "bunzip2 | tar" extract of the Sun
Studio-11 SPARC distribution tarball, extracting from the T2000's
internal drives onto the test zpools.  For this benchmark, both zpools
gave similar results:

pool sp1 (single-LUN stripe):
  du -s -k:
    1155141
  time -p:
    real 713.67
    user 614.42
    sys 7.56
  1.6MB/sec overall

pool zp2 (8+1-LUN raidz1):
  du -s -k:
    1169020
  time -p:
    real 714.96
    user 614.78
    sys 7.56
  1.6MB/sec overall



The 2nd benchmark was bonnie++ v1.03, run single-threaded with default
arguments, which means a 32GB dataset made of up 1GB files.  Observations of
"vmstat" and "mpstat" during the tests showed that bonnie++ is CPU-limited
on the T2000, especially for the getc()/putc() tests, so I later ran 3x
bonnie++'s simultaneously (13GB dataset each), and got the same results
in total throughput for the block read/write tests on the single-LUN zpool
(I was not patient enough to sit through the getc/putc tests again :-).

pool sp1 (single-LUN stripe):
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
filer1          32G 15497  99 66245  84 16652  30 15210  90 106600  59 322.3   3
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  5204 100 +++++ +++  8076 100  4551 100 +++++ +++  7509 100
filer1,32G,15497,99,66245,84,16652,30,15210,90,106600,59,322.3,3,16,5204,100,+++++,+++,8076,100,4551,100,+++++,+++,7509,100

pool zp2 (8+1-LUN raidz1):
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
filer1          32G 16118 100 29702  40  7416  13 15828  94 30204  20  25.0   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  5215 100 +++++ +++  8527 100  4453 100 +++++ +++  8918 100
filer1,32G,16118,100,29702,40,7416,13,15828,94,30204,20,25.0,0,16,5215,100,+++++,+++,8527,100,4453,100,+++++,+++,8918,100



I'm not sure what to add in the way of comments.  It seems clear from the
results, and from watching "iostat -xn", "vmstat", "mpstat", etc. during
the tests, that the raidz pool apparently suffers from not being able to
make as good use of the array's 1GB cache (the sequential block read test
seems to match well with Hitachi's read prefetch algorithms, I guess).
There's also the potential of too much seeking going on for the raidz pool,
since there are 9 LUN's on top of 7 physical disk drives (though how Hitachi
divides/stripes those LUN's is not clear to me).

One thing I noticed which puzzles me is that in both configurations, though
more so in the divided-up raidz pool, there were long periods of time where
the LUN's showed in "iostat -xn" output at 100% busy but with no I/O's
happening at all.  No paging, CPU 100% idle, no less than 2GB of free RAM,
for as long as 20-30 seconds.  Sure puts a dent in the throughput.

I'm doing some more testing of NFS throughput over these two zpool's,
since the test machine will eventually become an NFS and samba server.
I've got some questions about the performance issues in the NFS scenario,
but will address those in a separate message.

Questions, observations, and/or suggestions are welcome.

Regards,

Marion


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS or UFS - what to do?

Reply via email to