On 6/23/06, Roch <[EMAIL PROTECTED]> wrote:

Joe Little writes:
 > On 6/22/06, Bill Moore <[EMAIL PROTECTED]> wrote:
 > > Hey Joe.  We're working on some ZFS changes in this area, and if you
 > > could run an experiment for us, that would be great.  Just do this:
 > >
 > >     echo 'zil_disable/W1' | mdb -kw
 > >
 > > We're working on some fixes to the ZIL so it won't be a bottleneck when
 > > fsyncs come around.  The above command will let us know what kind of
 > > improvement is on the table.  After our fixes you could get from 30-80%
 > > of that improvement, but this would be a good data point.  This change
 > > makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a
 > > txg every 5 seconds.  So at most, your disk will be 5 seconds out of
 > > date compared to what it should be.  It's a pretty small window, but it
 > > all depends on your appetite for such windows.  :)
 > >
 > > After running the above command, you'll need to unmount/mount the
 > > filesystem in order for the change to take effect.
 > >
 > > If you don't have time, no big deal.
 > >
 > >
 > > --Bill
 > >
 > >
 > > On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote:
 > > > On 6/22/06, Jeff Bonwick <[EMAIL PROTECTED]> wrote:
 > > > >> a test against the same iscsi targets using linux and XFS and the
 > > > >> NFS server implementation there gave me 1.25MB/sec writes. I was about
 > > > >> to throw in the towel and deem ZFS/NFS has unusable until B41 came
 > > > >> along and at least gave me 1.25MB/sec.
 > > > >
 > > > >That's still super slow -- is this over a 10Mb link or something?
 > > > >
 > > > >Jeff

I  think the performance is   in line with expectation  for,
small  file,    single  threaded,     open/write/close   NFS
workload (nfs must commit on close). Therefore I expect :

        (avg file size) / (I/O latency).

Joe does this formula approach the 1.25 MB/s ?

Ok. I was given a dtrace script, and it would _appear_ that my average
latency for the iscsi devices (regardless of zil on/off) is 387 ms.
Using the above calculation with my artificial test of 8k files,
that's still only around 20K/sec, which is what I saw in the RAIDZ
case. With zil disabled, its shoving 256k chunks down the pipe, or
677K/sec by your calculation. I'm seeing about 10x that in
performance, so latency may not be so constant, or most likely
affected below the device layer by more aggressive ordering of data
when I can write larger chunks and spend less time on commiting
smaller file sizes.




 > > > >
 > > > >
 > > >
 > > > Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the
 > > > iscsi target, and single gig-e link (nge) to the NFS clients, who are
 > > > gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference.
 > > >
 > > > Again, the issue is the multiple fsyncs that NFS requires, and likely
 > > > the serialization of those iscsi requests. Apparently, there is a
 > > > basic latency in iscsi that one could improve upon with FC, but we are
 > > > definitely in the all ethernet/iscsi camp for multi-building storage
 > > > pool growth and don't have interest in a FC-based SAN.
 > > > _______________________________________________
 > > > zfs-discuss mailing list
 > > > zfs-discuss@opensolaris.org
 > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >
 >
 > Well, following Bill's advice and the previous note on disabling zil,
 > I ran my test on a B38 opteron initiator and if you do a time on the
 > copy from the client, 6250 8k files transfer at 6MB/sec now. If you
 > watch the entire commit on the backend using "zpool iostat 1" I see
 > that it takes a few more seconds, and the actual rate there is
 > 4MB/sec. Beats my best of 1.25MB/sec, and this is not B41.
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Joe, you know this but for the benefit of  others, I have to
highlight that running  any NFS server  this way, may cause
silent data corruption from client's point of view.

Whenever a server keeps  data in RAM this  way and  does not
commit it to stable storage  upon request from clients, that
opens a time window for corruption. So  a client writes to a
page, then reads the same page, and if the server suffered a
crash in between, the data may not match.

So this is performance at the expense of data integrity.

-r


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to