On Mar 30, 2010, at 2:50 PM, Jeroen Roodhart wrote: > Hi Karsten. Adam, List, > > Adam Leventhal wrote: > >> Very interesting data. Your test is inherently single-threaded so I'm not >> surprised that the benefits aren't more impressive -- the flash modules on >> the F20 card are optimized more for concurrent IOPS than single-threaded >> latency. > > Well, I actually wanted to do a bit more bottleneck searching, but let me > weigh in with some measurements of our own :) > > We're om a single X4540 with quad-core CPUs so we're on the older > hypertransport bus. Connected it up to two X2200-s running Centos 5, each on > its own 1Gb link. Switched write caching off with the following addition to > the /kernel/drv/sd.conf file (Karsten: if you didn't do this already, you > _really_ want to :) ): > > # > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes > # Add whitespace to make the vendor ID (VID) 8 ... and Product ID (PID) 16 > characters long... > sd-config-list = "ATA MARVELL SD88SA02","cache-nonvolatile"; > cache-nonvolatile=1, 0x40000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1;
If you are going to trick the system into thinking a volatile cache is nonvolatile, you might as well disable the ZIL -- the data corruption potential is the same. > As test we've found that untarring an eclipse sourcetar file is a good use > case. So we use that. Called from a shell script that creates a directory, > pushes directory and does the unpacking, for 40 times on each machine. > > Now for the interesting bit: > > When we use one vmod, both machines are finished in about 6min45, zilstat > maxes out at about 4200 IOPS. > Using four vmods it takes about 6min55, zilstat maxes out at 2200 IOPS. > > In both cases, probing the hyper transport bus seems to show no bottleneck > there (although I'd like to see the biderectional flow, but I know we can't > :) ). Network stays comfortably under the 400Mbits/s and that's peak load > when using 1 vmod. > > Looking at the IO-connection architecture, it figures that in this set we > traverse the different HT busses quite a lot. So we've also placed an Intel > dual 1Gb NIC in another PCIE slot, so that ZIL traffic should only have to > use 1 HT bus (not counting offloading intelligence). That helped a bit, but > not much: > > Around 6min35 using one vmod and 6min45 using four vmod-s. > > It made looking at the HT-dtrace more telling though. Since the outgoing > HT-bus to the F20 (and the e1000-s) is now, expectedly, a better indication > of the ZIL traffic. > > We didn't do the 40 x 2 untar test whilst not using a SSD device. As an > indication: unpacking a single tarbal then takes about 1min30. > > In case it means anything, single tarbal unpack no_zil, 1vmod, 1vmod_Intel, > 4vmod-s, 4vmod_Intel measures around (decimals only used as indication!): > > 4s, 12s, 11.2s, 12.5s, 11.6s > > > Taking this all in account, I still don't see what's holding it up. > Interestingly enough, the client side times are close within about 10 secs, > but zilstat shows something different. Hypothesis: Zilstat shows only one > vmod andwere capped in a layer above the ZIL? Can't rule out networking just > yet, but my gut tells me we're not network bound here. That leaves the ZFS > ZPL/VFS layer? The difference between writing to the ZIL and not writing to the ZIL is perhaps thousands of CPU cycles. For a latency-sensitive workload this will be noticed. -- richard > > I'm very open to suggestions on how to proceed... :) > > With kind regards, > > Jeroen > -- > Jeroen Roodhart > ICT Consultant > University of Amsterdam > j.r.roodhart uva.nl Informatiseringscentrum > Technical support/ATG > -- > See http://www.science.uva.nl/~jeroen for openPGP public key > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss