On 22/07/2010 03:25, Edward Ned Harvey wrote:
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Robert Milkowski
I had a quick look at your results a moment ago.
The problem is that you used a server with 4GB of RAM + a raid card
with
a 256MB of cache.
Then your filesize for iozone was set to 4GB - so random or not you
probably had a relatively good cache hit ratio for random reads. And
Look again in the raw_results.  I ran it with 4G, and also with 12G.  There
was no significant difference between the two, so I only compiled the 4G
results into a spreadsheet PDF.



The only tests with 12GB file size in raw files are a mirror and a single disk configuration.
There are no results for raid-z there.


even then a random read from 8 threads gave you only about 40% more
IOPS
than for a RAID-Z made out of 5 disks than a single drive. The poor
result for HW-R5 is surprising though but it might be that a stripe
size
was not matched to ZFS recordsize and iozone block size in this case.
I think what you're saying is "With 5 disks performing well, you should
expect 4x higher iops than a single disk," and "the measured result was only
40% higher, which is a poor result."

I agree.  I guess the 128k recordsize used in iozone is probably large
enough that it frequently causes blocks to span disks?  I don't know.


Probably - but it would also depend on how you configured hw-r5 (mainly it's stripe size). The other thing is that you might have had some bottleneck somewhere else as your results for N-way mirrors aren't that good either.

The issue with raid-z and random reads is that as cache hit ratio goes
down to 0 the IOPS approaches IOPS of a single drive. For a little bit
more information see http://blogs.sun.com/roch/entry/when_to_and_not_to
I don't think that's correct, less you're using a single thread.  As long
as multiple threads are issuing random reads on raidz, and those reads are
small enough that each one is entirely written on a single disk, then you
should be able to get n-1 disk operating simultaneously, to achieve (n-1)x
performance of a single disk.

Even if blocks are large enough to span disks, you should be able to get
(n-1)x performance of a single disk for large sequential operations.

While it is tru to some degree for hw raid-5, raid-z doesn't work that way.
The issue is that each zfs filesystem block is basically spread across n-1 devices. So every time you want to read back a single fs block you need to wait for all n-1 devices to provide you with a part of it - and keep in mind in zfs you can't get a partial block even if that's what you are asking for as zfs has to check checksum of entire fs block. Now multiple readers make it actually worse for raid-z (assuming very poor cache hit ratio) - because each read from each reader involves all disk drives basically others can't read anything until it is done. It gets really bad for random reads. With HW raid-5 is your stripe size matches block you are reading back for random reads it is probable that while reader-X1 is reading from disk-Y1 reader-X2 is reading from disk-Y2 so you should end-up with all disk drives (-1) contributing to better overall iops.

Read Roch's blog entry carefully for more information.

btw: even in your results 6x disks in raid-z provided over 3x less IOPS than zfs raid-10 configuration for random reads. It is a big difference if one needs performance.

--
Robert Milkowski
http://milek.blogspot.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to