On 6/23/06, Roch <[EMAIL PROTECTED]> wrote:
Joe Little writes: > On 6/22/06, Bill Moore <[EMAIL PROTECTED]> wrote: > > Hey Joe. We're working on some ZFS changes in this area, and if you > > could run an experiment for us, that would be great. Just do this: > > > > echo 'zil_disable/W1' | mdb -kw > > > > We're working on some fixes to the ZIL so it won't be a bottleneck when > > fsyncs come around. The above command will let us know what kind of > > improvement is on the table. After our fixes you could get from 30-80% > > of that improvement, but this would be a good data point. This change > > makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a > > txg every 5 seconds. So at most, your disk will be 5 seconds out of > > date compared to what it should be. It's a pretty small window, but it > > all depends on your appetite for such windows. :) > > > > After running the above command, you'll need to unmount/mount the > > filesystem in order for the change to take effect. > > > > If you don't have time, no big deal. > > > > > > --Bill > > > > > > On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote: > > > On 6/22/06, Jeff Bonwick <[EMAIL PROTECTED]> wrote: > > > >> a test against the same iscsi targets using linux and XFS and the > > > >> NFS server implementation there gave me 1.25MB/sec writes. I was about > > > >> to throw in the towel and deem ZFS/NFS has unusable until B41 came > > > >> along and at least gave me 1.25MB/sec. > > > > > > > >That's still super slow -- is this over a 10Mb link or something? > > > > > > > >Jeff I think the performance is in line with expectation for, small file, single threaded, open/write/close NFS workload (nfs must commit on close). Therefore I expect : (avg file size) / (I/O latency). Joe does this formula approach the 1.25 MB/s ?
Ok. I was given a dtrace script, and it would _appear_ that my average latency for the iscsi devices (regardless of zil on/off) is 387 ms. Using the above calculation with my artificial test of 8k files, that's still only around 20K/sec, which is what I saw in the RAIDZ case. With zil disabled, its shoving 256k chunks down the pipe, or 677K/sec by your calculation. I'm seeing about 10x that in performance, so latency may not be so constant, or most likely affected below the device layer by more aggressive ordering of data when I can write larger chunks and spend less time on commiting smaller file sizes.
> > > > > > > > > > > > > > Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the > > > iscsi target, and single gig-e link (nge) to the NFS clients, who are > > > gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference. > > > > > > Again, the issue is the multiple fsyncs that NFS requires, and likely > > > the serialization of those iscsi requests. Apparently, there is a > > > basic latency in iscsi that one could improve upon with FC, but we are > > > definitely in the all ethernet/iscsi camp for multi-building storage > > > pool growth and don't have interest in a FC-based SAN. > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss@opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > Well, following Bill's advice and the previous note on disabling zil, > I ran my test on a B38 opteron initiator and if you do a time on the > copy from the client, 6250 8k files transfer at 6MB/sec now. If you > watch the entire commit on the backend using "zpool iostat 1" I see > that it takes a few more seconds, and the actual rate there is > 4MB/sec. Beats my best of 1.25MB/sec, and this is not B41. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Joe, you know this but for the benefit of others, I have to highlight that running any NFS server this way, may cause silent data corruption from client's point of view. Whenever a server keeps data in RAM this way and does not commit it to stable storage upon request from clients, that opens a time window for corruption. So a client writes to a page, then reads the same page, and if the server suffered a crash in between, the data may not match. So this is performance at the expense of data integrity. -r
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss