> -----Original Message-----
> From: Garrett D'Amore [mailto:garr...@nexenta.com] 
> Sent: Monday, July 26, 2010 2:27 AM
> To: Mike Gerdts
> Cc: Saxon, Will; zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] NFS performance?
> 
> On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote:
> > On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore 
> <garr...@nexenta.com> wrote:
> > > On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
> > >>
> > >> I think there may be very good reason to use iSCSI, if 
> you're limited
> > >> to gigabit but need to be able to handle higher throughput for a
> > >> single client. I may be wrong, but I believe iSCSI 
> to/from a single
> > >> initiator can take advantage of multiple links in an 
> active-active
> > >> multipath scenario whereas NFS is only going to be able to take
> > >> advantage of 1 link (at least until pNFS).
> > >
> > > There are other ways to get multiple paths.  First off, 
> there is IP
> > > multipathing. which offers some of this at the IP layer.  
> There is also
> > > 802.3ad link aggregation (trunking).  So you can still get high
> > > performance beyond  single link with NFS.  (It works with 
> iSCSI too,
> > > btw.)
> > 
> > With both IPMP and link aggregation, each TCP session will 
> go over the
> > same wire.  There is no guarantee that load will be evenly balanced
> > between links when there are multiple TCP sessions.  As such, any
> > scalability you get using these configurations will be dependent on
> > having a complex enough workload, wise cconfiguration 
> choices, and and
> > a bit of luck.
> 
> If you're really that concerned, you could use UDP instead of 
> TCP.  But
> that may have other detrimental performance impacts, I'm not sure how
> bad they would be in a data center with generally lossless ethernet
> links.
> 

UDP is an advantage for NFS in this regard. 

> Btw, I am not certain that the multiple initiator support (mpxio) is
> necessarily any better as far as guaranteed 
> performance/balancing.  (It
> may be; I've not looked closely enough at it.)
> 

I'm not sure I'm referring to multi initiator. iSCSI can have multiple sessions 
between an initiator and a target, or multiple sessions by virtue of 
connections to different targets presenting the same LUN (this is the 
multipathing I am talking about). I'm not sure about the multiple sessions 
between single initiator/target scenario, but the single initiator/multiple 
target config can work in an IPMP scenario to get you more usable capacity 
between your initiator and target(s) over multiple links, using a variety of 
algorithms to balance load amongst the sessions. 

> I should look more closely at NFS as well -- if multiple 
> applications on
> the same client are access the same filesystem, do they use a single
> common TCP session, or can they each have separate instances open?
> Again, I'm not sure.

This is probably going to depend on the software, but in the scenario I am 
personally interested in (VMware), it doesn't really matter: it's a single 
application. VMware says they create two sessions - one for control and one for 
data - so I assume the maximum speed available for data tranfer to/from a 
particular mount is going to be the speed of 1 link.

I guess this is getting way off topic for the list, but VMware also computes a 
unique ID for each NFS mount. The ID is computed based somehow off the mount 
configuration. If NFS datastore IDs are not identical then VMware thinks they 
are different datastores regardless of their contents. I have had a situation 
where some clients thought a particular datastore was different from the same 
datastore on some other clients, which prevented VMs hosted on that datastore 
from migrating between these clients. I traced the problem to inconsistent NFS 
mount configs; I'd used the fqdn for the configuration on most of the hosts but 
an IP address on the others. Reconfiguration resolved the issue. This would 
suggest that at least for this client/server combo, it would not be possible to 
do manual load balancing by pointing some clients at one IP and some at another 
IP for the same export. It would have to be balanced by export instead, which 
is a lot less convenient.

VMware could also be more intelligent about generating their ID.

> 
> > 
> > Note that with Sun Trunking there was an option to load 
> balance using
> > a round robin hashing algorithm.  When pushing high network 
> loads this
> > may cause performance problems with reassembly.
> 
> Yes.  Reassembly is Evil for TCP performance.
> 
> Btw, the iSCSI balancing act that was described does seem a bit
> contrived -- a single initiator and a COMSTAR server, both client *and
> server* with multiple ethernet links instead of a single 10GbE link.
> 
> I'm not saying it doesn't happen, but I think it happens infrequently
> enough that its reasonable that this scenario wasn't one that popped
> immediately into my head. :-)

I don't agree that it's contrived, but I do agree that it's reasonable you 
didn't think of it :). I don't want to have to create a bunch of custom 
initiator/target configurations to spread load. I want to have a target with 
some particular configuration and a bunch of initiators configured identically 
to each other, and I want load to be spread across the available gigabit links. 

My understanding is that the way to do this means having multiple targets 
configured per LUN on the storage server, with each target set up to be 
available only on a specific network. Each initiator/client is set up with 
interfaces in these networks and pointed at these targets, and an appropriate 
load balancing algorithm is chosen to spread load between the sessions. The 
configuration can use individual links and/or an 802.3ad configuration if the 
aggregate(s) are also 802.1q trunks.

This is obviously dependent on client/initiator support, but I think initiators 
that claim to support MPIO or multipathing implement something like this to get 
it done. I'm pretty sure Solaris/COMSTAR permits this also, with multiple 
targets configured per LUN and each target able to be pinned to specific IP 
addresses. I haven't actually done this though so I guess I'm not 100% certain.

-Will 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to