On 7/17/06, Jonathan Wheeler <[EMAIL PROTECTED]> wrote:
Hi All,
I've just built an 8 disk zfs storage box, and I'm in the testing phase before
I put it into production. I've run into some unusual results, and I was hoping
the community could offer some suggestions. I've bascially made the switch to
Solaris on the promises of ZFS alone (yes I'm that excited about it!), so
naturally I'm looking forward to some great performance - but it appears I'm
going to need some help finding all of it.
I was having even lower numbers with filebench, so I decided to dial back to a
really simple app for testing - bonnie.
The system is an nevada_41 EM64T 3ghz xeon. 1GB ram, with 8x seagate sata II
300GB disks, Supermicro SAT2-MV8 8 port sata controller, running at/on a 133Mhz
64pci-x bus.
The bottle neck here, by my thinkng, should be the disks themselves.
It's not the disk interfaces ('300MB'), the disk bus (300MB EACH), the pci-x
bus (1.1GB), and I'd hope a 64-bit 3Ghz cpu would be sufficent.
Tests were run on a fresh clean zpool, on an idle system. Rogue results were
dropped, and as you can see below, all tests were run more then once. 8GB
should be far more then the 1GB of RAM that the system has, eliminating caching
issues.
If I've still managed to overlook something in my testing setup, please let me
know - I sure did try!
Sorry about the formatting - this is bound to end up ugly
Bonnie
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
raid0 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
8 disk 8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1 286.0
2.0
8 disk 8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9 302.9
2.1
so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be
LOWER then writes??
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
mirror MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
8 disk 8196 33285 38.6 46033 9.9 33077 6.8 67934 90.4 93445 7.7 230.5 1.3
8 disk 8196 34821 41.4 46136 9.0 32445 6.6 67120 89.1 94403 6.9 210.4 1.8
46MB/sec writes, each disk individually can do better, but I guess keeping 8
disks in sync is hurting performance. The 94MB/sec writes is interesting. One
the one hand, that's greater then 1 disk's worth, so I'm getting striping
performance out of a mirror GO ZFS. On the other, if I can get striping
performance from mirrored reads, why is it only 94MB/sec? Seemingly it's not
cpu bound.
Now for the important test, raid-z
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
raidz MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
8 disk 8196 61785 70.9 142797 29.3 89342 19.9 64197 85.7 320554 32.6 131.3
1.0
8 disk 8196 62869 72.4 131801 26.7 90692 20.7 63986 85.7 306152 33.4 127.3
1.0
8 disk 8196 63103 72.9 128164 25.9 86175 19.4 64126 85.7 320410 32.7 124.5
0.9
7 disk 8196 51103 58.8 93815 19.1 74093 16.1 64705 86.5 331865 32.8 124.9
1.0
7 disk 8196 49446 56.8 93946 18.7 73092 15.8 64708 86.7 331458 32.7 127.1
1.0
7 disk 8196 49831 57.1 81305 16.2 78101 16.9 64698 86.4 331577 32.7 132.4
1.0
6 disk 8196 62360 72.3 157280 33.4 99511 21.9 65360 87.3 288159 27.1 132.7
0.9
6 disk 8196 63291 72.8 152598 29.1 97085 21.4 65546 87.2 292923 26.7 133.4
0.8
4 disk 8196 57965 67.9 123268 27.6 78712 17.1 66635 89.3 189482 15.9 134.1
0.9
I'm getting distinctly non-linear scaling here.
Writes: 4 disks gives me 123MB/sec. Raid0 was giving me 270/8 =33Mb/sec with
cpu to spare (roughly half on what each individual disk should be capable of).
Here I'm getting 123/4= 30Mb/sec, or should that be 123/3= 41Mb/sec?
Using 30 as a basline, I'd be expecting to see twice that with 8 disks
(240ish?). What I end up with is ~135, Clearly not good scaling at all.
The really interesting numbers happen at 7 disks - it's slower then with 4, in
all tests.
I ran it 3x to be sure.
Note this was a native 7 disk raid-z, it wasn't 8 running in degraded mode with
7.
Something is really wrong with my write performance here across the board.
Reads: 4 disks gives me 190MB/sec. WOAH! I'm very happy with that. 8 disks
should scale to 380 then, Well 320 isn't all that far off - no biggie.
Looking at the 6 disk raidz is interesting though, 290MB/sec. The disks are
good for 60+MB/sec individually. 290 is 48/disk - note also that this is better
then my raid0 performance?!
Adding another 2 disks to my raidz gives me a mere 30Mb/sec extra performance?
Something is going very wrong here too.
I'm not an expert, but would be great if you could run at least one more test.
can you try 2x 4disks in a raidz pool to see if the system does scale
to 380MB/s or does the cpu does get in the way.
If another controller card availible it would be interesting to see
what effect if any there is to spliting the 8 drives across 2
controllers 4 drives per controller, to see if you get any performance
change.
I wonder if more cpus/cores would help this test, well theoretically
the single cpu is fast enough but when you have checksuming, creating
parity, reading and writing, and the benchmark you may get some
strange interaction.
You may want to try again in a few weeks, I heard that a change went
into the kernel that makes SATA access more effiecient.
James Dickens
uadmin.blogspot.com
The 7 disk raidz read test is about what I'd expect (330/7= 47/disk), but it
shows that the 8 disk is actually going backwards.
hmm...
I understand that going for an 8 disk wide raidz isn't optimal in terms of
redundancy and IOPS/sec - but my workload shouldn't involve large amounts of
sustained random IO, so I'm happy to take the loss in favour of absolute
capacity.
My issue here is the scaling on sequential block transfers, not optimal design.
All three raid levels have had unexpected results, and I'll really apprectiate
some suggestions on how I can troubleshoot this. I know how to run iostat while
bonnie is running, but that's about it. Incidentally, iostat is telling me that
the disks are at best on hitting around 70% B. With the 8 disk tests, it was
often below 50%....
Is my issue perhaps with the sata card that I'm using? Maybe it's just not able
to handle that much throughput, despite being advertised to do so. With Raid0
(aka dynamic stripes), I know that each disk can read at 60-70Mb/sec. Why am I
not getting 65*8 (500MB/sec+) performance. Maybe it's the marvell driver at
fault here?
My thinking is that I need to get raid0 performing as expected before looking
at raidz, but I'm afraid I really don't know where to begin.
All thoughts & suggestions welcome. I'm not using the disks yet, so I can blow
the zpool away as needed.
Many thanks,
Jonathan Wheeler
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss