On Tue, 17 Jul 2012, Bob Friesenhahn wrote:

On Tue, 17 Jul 2012, Michael Hase wrote:

If you were to add a second vdev (i.e. stripe) then you should see very close to 200% due to the default round-robin scheduling of the writes.

My expectation would be > 200%, as 4 disks are involved. It may not be the perfect 4x scaling, but imho it should be (and is for a scsi system) more than half of the theoretical throughput. This is solaris or a solaris derivative, not linux ;-)

Here are some results from my own machine based on the 'virgin mount' test approach. The results show less boost than is reported by a benchmark tool like 'iozone' which sees benefits from caching.

I get an initial sequential read speed of 657 MB/s on my new pool which has 1200 MB/s of raw bandwidth (if mirrors could produce 100% boost). Reading the file a second time reports 6.9 GB/s.

The below is with a 2.6 GB test file but with a 26 GB test file (just add another zero to 'count' and wait longer) I see an initial read rate of 618 MB/s and a re-read rate of 8.2 GB/s. The raw disk can transfer 150 MB/s.

To work around these caching effects just use a file > 2 times the size of ram, iostat then shows the numbers really coming from disk. I always test like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but quite impressive ;-)

% pfexec zfs create tank/zfstest/defaults
% cd /tank/zfstest/defaults
% pfexec dd if=/dev/urandom of=random.dat bs=128k count=20000
20000+0 records in
20000+0 records out
2621440000 bytes (2.6 GB) copied, 36.8133 s, 71.2 MB/s
% cd ..
% pfexec zfs umount tank/zfstest/defaults
% pfexec zfs mount tank/zfstest/defaults
% cd defaults
% dd if=random.dat of=/dev/null bs=128k count=20000
20000+0 records in
20000+0 records out
2621440000 bytes (2.6 GB) copied, 3.99229 s, 657 MB/s
% pfexec dd if=/dev/rdsk/c7t50000393E8CA21FAd0p0 of=/dev/null bs=128k count=2000
2000+0 records in
2000+0 records out
262144000 bytes (262 MB) copied, 1.74532 s, 150 MB/s
% bc
scale=8
657/150
4.38000000

It is very difficult to benchmark with a cache which works so well:

% dd if=random.dat of=/dev/null bs=128k count=20000
20000+0 records in
20000+0 records out
2621440000 bytes (2.6 GB) copied, 0.379147 s, 6.9 GB/s

This is not my point, I'm pretty sure I did not measure any arc effects - maybe with the one exception of the raid0 test on the scsi array. Don't know why the arc had this effect, filesize was 2x of ram. The point is: I'm searching for an explanation for the relative slowness of a mirror pair of sata disks, or some tuning knobs, or something like "the disks are plain crap", or maybe: zfs throttles sata disks in general (don't know the internals).

In the range of > 600 MB/s other issues may show up (pcie bus contention, hba contention, cpu load). And performance at this level could be just good enough, not requiring any further tuning. Could you recheck with only 4 disks (2 mirror pairs)? If you just get some 350 MB/s it could be the same problem as with my boxes. All sata disks?

Michael


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to