Hi,

James Lever wrote:
Hi All,

We have recently acquired hardware for a new fileserver and my task, if I want to use OpenSolaris (osol or sxce) on it is for it to perform at least as well as Linux (and our 5 year old fileserver) in our environment.

Our current file server is a whitebox Debian server with 8x 10,000 RPM SCSI drives behind an LSI MegaRaid controller with a BBU. The filesystem in use is XFS.

The raw performance tests that I have to use to compare them are as follows:

 * Create 100,000 0 byte files over NFS
 * Delete 100,000 0 byte files over NFS
 * Repeat the previous 2 tasks with 1k files
 * Untar a copy of our product with object files (quite a nasty test)
 * Rebuild the product "make -j"
 * Delete the build directory

The reason for the 100k files tests is that this has been proven to be a significant indicator of desktop performance on the desktop systems of the developers.

Within the budget we had, we have purchased the following system to meet our goals - if the OpenSolaris tests do not meet our requirements, it is certain that the equivalent tests under Linux will. I'm the only person here who wants OpenSolaris specificially so it is in my interest to try to get it working at least on par if not better than our current system. So here I am begging for further help.

Dell R710
2x 2.40 Ghz Xeon 5330 CPU
16GB RAM (4x 4GB)

mpt0 SAS 6/i (LSI 1068E)
2x 1TB SATA-II drives (rpool)
2x 50GB Enterprise SSD (slog) - Samsung MCCOE50G5MPQ-0VAD3

mpt1 SAS 5/E (LSI 1068E)
Dell MD1000 15-bay External storage chassis with 2 heads
10x 450GB Seagate Cheetah 15,000 RPM SAS

We also have a PERC 6/E w/512MB BBWC to test with or fall back to if we go with a Linux solution.

I have installed OpenSolaris 2009.06 and updated to b117 and used mdb to modify the kernel to work around a current bug in b117 with the newer Dell systems. http://bugs.opensolaris.org/bugdatabase/view_bug.do%3Bjsessionid=76a34f41df5bbbfc2578934eeff8?bug_id=6850943

Keeping in mind that with these tests, the external MD1000 chassis is connected with a single 4 lane SAS cable which should give 12Gbps or 1.2GBps of throughput.

Individually, each disk exhibits about 170MB/s raw write performance. e.g.

jam...@scalzi:~$ pfexec dd if=/dev/zero of=/dev/rdsk/c8t5d0 bs=65536 count=32768
2147483648 bytes (2.1 GB) copied, 12.4934 s, 172 MB/s

A single spindle zpool seems to perform OK.

jam...@scalzi:~$ pfexec zpool create single c8t20d0
jam...@scalzi:~$ pfexec dd if=/dev/zero of=/single/foo bs=65536 count=327680
21474836480 bytes (21 GB) copied, 127.201 s, 169 MB/s

RAID10 tests seem to be quite slow (about half the speed I would have expected - 170*5 = 850, I would have expected to see around 800MB/s)

jam...@scalzi:~$ pfexec zpool create fastdata mirror c8t9d0 c8t10d0 mirror c8t11d0 c8t15d0 mirror c8t16d0 c8t17d0 mirror c8t18d0 c8t19d0 mirror c8t20d0 c8t21d0

jam...@scalzi:~$ pfexec dd if=/dev/zero of=/fastdata/foo bs=131072 count=163840
21474836480 bytes (21 GB) copied, 50.3066 s, 427 MB/s

a 5 disk stripe seemed to perform as expected

jam...@scalzi:~$ pfexec zpool create fastdata c8t10d0 c8t15d0 c8t17d0 c8t19d0 c8t21d0

jam...@scalzi:~$ pfexec dd if=/dev/zero of=/fastdata/foo bs=131072 count=163840
21474836480 bytes (21 GB) copied, 27.7972 s, 773 MB/s

but a 10 disk stripe did not increase significantly

jam...@scalzi:~$ pfexec zpool create fastdata c8t10d0 c8t15d0 c8t17d0 c8t19d0 c8t21d0 c8t20d0 c8t18d0 c8t16d0 c8t11d0 c8t9d0

jam...@scalzi:~$ pfexec dd if=/dev/zero of=/fastdata/foo bs=131072 count=163840
21474836480 bytes (21 GB) copied, 26.1189 s, 822 MB/s

The best sequential write test I could elicit with redundancy was a pool with 2x 5 disk RAIDZ's striped

jam...@scalzi:~$ pfexec zpool create fastdata raidz c8t10d0 c8t15d0 c8t16d0 c8t11d0 c8t9d0 raidz c8t17d0 c8t19d0 c8t21d0 c8t20d0 c8t18d0

jam...@scalzi:~$ pfexec dd if=/dev/zero of=/fastdata/foo bs=131072 count=163840
21474836480 bytes (21 GB) copied, 31.3934 s, 684 MB/s

Moving onto testing NFS and trying to perform the create 100,000 0 byte files (aka, the metadata and NFS sync test). The test seemed to be likely to take about half an hour without a slog as I worked out when I killed it. Painfully slow. So I added one of the SSDs to the system as a slog which improved matters. The client is a Red Hat Enterprise Linux server on modern hardware and has been used for all tests against our old fileserver.

The time to beat: RHEL5 client to Debian4+XFS server:

bash-3.2# time tar xf zeroes.tar

real    2m41.979s
user    0m0.420s
sys     0m5.255s

And on the currently configured system:

jam...@scalzi:~$ pfexec zpool create fastdata mirror c8t9d0 c8t10d0 mirror c8t11d0 c8t15d0 mirror c8t16d0 c8t17d0 mirror c8t18d0 c8t19d0 mirror c8t20d0 c8t21d0 log c7t2d0

jam...@scalzi:~$ pfexec zfs set sharenfs='rw,ro...@10.1.0/23' fastdata

bash-3.2# time tar xf zeroes.tar

real    8m7.176s
user    0m0.438s
sys     0m5.754s

While this was running, I was looking at the output of zpool iostat fastdata 10 to see how it was going and was surprised to see the seemingly low IOPS.

Have you tried running this locally on your OpenSolaris box - just to
get an idea of what it could deliver in terms of speed ? Which NFS
version are you using ?

jam...@scalzi:~$ zpool iostat fastdata 10
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
fastdata    10.0G  2.02T      0    312    268  3.89M
fastdata    10.0G  2.02T      0    818      0  3.20M
fastdata    10.0G  2.02T      0    811      0  3.17M
fastdata    10.0G  2.02T      0    860      0  3.27M

Strangely, when I added a second SSD as a second slog, it made no difference to the write operations.

I'm not sure where to go from here, these results are appalling (about 3x the time of the old system with 8x 10kRPM spindles) even with two Enterprise SSDs as separate log devices.

cheers,
James

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to