On 22/07/2010 03:25, Edward Ned Harvey wrote:
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Robert Milkowski
I had a quick look at your results a moment ago.
The problem is that you used a server with 4GB of RAM + a raid card
with
a 256MB of cache.
Then your filesize for iozone was set to 4GB - so random or not you
probably had a relatively good cache hit ratio for random reads. And
Look again in the raw_results. I ran it with 4G, and also with 12G. There
was no significant difference between the two, so I only compiled the 4G
results into a spreadsheet PDF.
The only tests with 12GB file size in raw files are a mirror and a
single disk configuration.
There are no results for raid-z there.
even then a random read from 8 threads gave you only about 40% more
IOPS
than for a RAID-Z made out of 5 disks than a single drive. The poor
result for HW-R5 is surprising though but it might be that a stripe
size
was not matched to ZFS recordsize and iozone block size in this case.
I think what you're saying is "With 5 disks performing well, you should
expect 4x higher iops than a single disk," and "the measured result was only
40% higher, which is a poor result."
I agree. I guess the 128k recordsize used in iozone is probably large
enough that it frequently causes blocks to span disks? I don't know.
Probably - but it would also depend on how you configured hw-r5 (mainly
it's stripe size).
The other thing is that you might have had some bottleneck somewhere
else as your results for N-way mirrors aren't that good either.
The issue with raid-z and random reads is that as cache hit ratio goes
down to 0 the IOPS approaches IOPS of a single drive. For a little bit
more information see http://blogs.sun.com/roch/entry/when_to_and_not_to
I don't think that's correct, less you're using a single thread. As long
as multiple threads are issuing random reads on raidz, and those reads are
small enough that each one is entirely written on a single disk, then you
should be able to get n-1 disk operating simultaneously, to achieve (n-1)x
performance of a single disk.
Even if blocks are large enough to span disks, you should be able to get
(n-1)x performance of a single disk for large sequential operations.
While it is tru to some degree for hw raid-5, raid-z doesn't work that way.
The issue is that each zfs filesystem block is basically spread across
n-1 devices.
So every time you want to read back a single fs block you need to wait
for all n-1 devices to provide you with a part of it - and keep in mind
in zfs you can't get a partial block even if that's what you are asking
for as zfs has to check checksum of entire fs block.
Now multiple readers make it actually worse for raid-z (assuming very
poor cache hit ratio) - because each read from each reader involves all
disk drives basically others can't read anything until it is done. It
gets really bad for random reads. With HW raid-5 is your stripe size
matches block you are reading back for random reads it is probable that
while reader-X1 is reading from disk-Y1 reader-X2 is reading from
disk-Y2 so you should end-up with all disk drives (-1) contributing to
better overall iops.
Read Roch's blog entry carefully for more information.
btw: even in your results 6x disks in raid-z provided over 3x less IOPS
than zfs raid-10 configuration for random reads. It is a big difference
if one needs performance.
--
Robert Milkowski
http://milek.blogspot.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss