On Tue, 17 Jul 2012, Bob Friesenhahn wrote:
On Tue, 17 Jul 2012, Michael Hase wrote:
If you were to add a second vdev (i.e. stripe) then you should see very
close to 200% due to the default round-robin scheduling of the writes.
My expectation would be > 200%, as 4 disks are involved. It may not be the
perfect 4x scaling, but imho it should be (and is for a scsi system) more
than half of the theoretical throughput. This is solaris or a solaris
derivative, not linux ;-)
Here are some results from my own machine based on the 'virgin mount' test
approach. The results show less boost than is reported by a benchmark tool
like 'iozone' which sees benefits from caching.
I get an initial sequential read speed of 657 MB/s on my new pool which has
1200 MB/s of raw bandwidth (if mirrors could produce 100% boost). Reading
the file a second time reports 6.9 GB/s.
The below is with a 2.6 GB test file but with a 26 GB test file (just add
another zero to 'count' and wait longer) I see an initial read rate of 618
MB/s and a re-read rate of 8.2 GB/s. The raw disk can transfer 150 MB/s.
To work around these caching effects just use a file > 2 times the size
of ram, iostat then shows the numbers really coming from disk. I always
test like this. a re-read rate of 8.2 GB/s is really just memory
bandwidth, but quite impressive ;-)
% pfexec zfs create tank/zfstest/defaults
% cd /tank/zfstest/defaults
% pfexec dd if=/dev/urandom of=random.dat bs=128k count=20000
20000+0 records in
20000+0 records out
2621440000 bytes (2.6 GB) copied, 36.8133 s, 71.2 MB/s
% cd ..
% pfexec zfs umount tank/zfstest/defaults
% pfexec zfs mount tank/zfstest/defaults
% cd defaults
% dd if=random.dat of=/dev/null bs=128k count=20000
20000+0 records in
20000+0 records out
2621440000 bytes (2.6 GB) copied, 3.99229 s, 657 MB/s
% pfexec dd if=/dev/rdsk/c7t50000393E8CA21FAd0p0 of=/dev/null bs=128k
count=2000
2000+0 records in
2000+0 records out
262144000 bytes (262 MB) copied, 1.74532 s, 150 MB/s
% bc
scale=8
657/150
4.38000000
It is very difficult to benchmark with a cache which works so well:
% dd if=random.dat of=/dev/null bs=128k count=20000
20000+0 records in
20000+0 records out
2621440000 bytes (2.6 GB) copied, 0.379147 s, 6.9 GB/s
This is not my point, I'm pretty sure I did not measure any arc effects -
maybe with the one exception of the raid0 test on the scsi array. Don't
know why the arc had this effect, filesize was 2x of ram. The point is:
I'm searching for an explanation for the relative slowness of a mirror
pair of sata disks, or some tuning knobs, or something like "the disks are
plain crap", or maybe: zfs throttles sata disks in general (don't know the
internals).
In the range of > 600 MB/s other issues may show up (pcie bus contention,
hba contention, cpu load). And performance at this level could be just
good enough, not requiring any further tuning. Could you recheck with only
4 disks (2 mirror pairs)? If you just get some 350 MB/s it could be the
same problem as with my boxes. All sata disks?
Michael
Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss