On Mar 30, 2010, at 2:50 PM, Jeroen Roodhart wrote:
> Hi Karsten. Adam, List,
> 
> Adam Leventhal wrote:
> 
>> Very interesting data. Your test is inherently single-threaded so I'm not 
>> surprised that the benefits aren't more impressive -- the flash modules on 
>> the F20 card are optimized more for concurrent IOPS than single-threaded 
>> latency.
> 
> Well, I actually wanted to do a bit more bottleneck searching, but let me 
> weigh in with some measurements of our own :)
> 
> We're om a single X4540 with quad-core CPUs so we're on the older 
> hypertransport bus. Connected it up to two X2200-s running Centos 5, each on 
> its own 1Gb link. Switched write caching off with the following addition to 
> the /kernel/drv/sd.conf file (Karsten: if you didn't do this already, you 
> _really_ want to :) ):
> 
> # 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes
> # Add whitespace to make the vendor ID (VID) 8 ... and Product ID (PID) 16 
> characters long...
> sd-config-list = "ATA     MARVELL SD88SA02","cache-nonvolatile";
> cache-nonvolatile=1, 0x40000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1;

If you are going to trick the system into thinking a volatile cache is 
nonvolatile, you
might as well disable the ZIL -- the data corruption potential is the same.

> As test we've found that untarring an eclipse sourcetar file is a good use 
> case. So we use that. Called from a shell script that creates a directory, 
> pushes directory and does the unpacking, for 40 times on each machine.
> 
> Now for the interesting bit: 
> 
> When we use one vmod, both machines are finished in about 6min45, zilstat 
> maxes out at about 4200 IOPS.       
> Using four vmods it takes about 6min55, zilstat maxes out at 2200 IOPS.
> 
> In both cases, probing the hyper transport bus seems to show no bottleneck 
> there (although I'd like to see the biderectional flow, but I know we can't 
> :) ). Network stays comfortably under the 400Mbits/s and that's peak load 
> when using 1 vmod.
> 
> Looking at the IO-connection architecture, it figures that in this set we 
> traverse the different HT busses quite a lot. So we've also placed an Intel 
> dual 1Gb NIC in another PCIE slot, so that ZIL traffic should only have to 
> use 1 HT bus (not counting offloading intelligence). That helped a bit, but 
> not much:
> 
> Around 6min35 using one vmod and 6min45 using four vmod-s.
> 
> It made looking at the HT-dtrace more telling though. Since the outgoing 
> HT-bus to the F20 (and the e1000-s) is now, expectedly, a better indication 
> of the ZIL traffic.
> 
> We didn't do the 40 x 2 untar test whilst not using a SSD device. As an 
> indication: unpacking a single tarbal then takes about 1min30. 
> 
> In case it means anything, single tarbal unpack no_zil, 1vmod, 1vmod_Intel, 
> 4vmod-s, 4vmod_Intel measures around (decimals only used as indication!):
>                                                                               
>      4s,     12s,            11.2s,      12.5s, 11.6s
> 
> 
> Taking this all in account, I still don't see what's holding it up. 
> Interestingly enough, the client side times are close within about 10 secs, 
> but zilstat shows something different. Hypothesis: Zilstat shows only one 
> vmod andwere capped in a layer above the ZIL? Can't rule out networking just 
> yet, but my gut tells me we're not network bound here. That leaves the ZFS 
> ZPL/VFS layer?  

The difference between writing to the ZIL and not writing to the ZIL is 
perhaps thousands of CPU cycles.  For a latency-sensitive workload
this will be noticed.
 -- richard

> 
> I'm very open to suggestions on how to proceed... :)
> 
> With kind regards,
> 
> Jeroen
> --
> Jeroen Roodhart
> ICT Consultant
>                                        University of Amsterdam
> j.r.roodhart uva.nl          Informatiseringscentrum
>                                        Technical support/ATG
> --
> See http://www.science.uva.nl/~jeroen for openPGP public key
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to