> -----Original Message----- > From: Garrett D'Amore [mailto:garr...@nexenta.com] > Sent: Monday, July 26, 2010 2:27 AM > To: Mike Gerdts > Cc: Saxon, Will; zfs-discuss@opensolaris.org > Subject: Re: [zfs-discuss] NFS performance? > > On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote: > > On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore > <garr...@nexenta.com> wrote: > > > On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote: > > >> > > >> I think there may be very good reason to use iSCSI, if > you're limited > > >> to gigabit but need to be able to handle higher throughput for a > > >> single client. I may be wrong, but I believe iSCSI > to/from a single > > >> initiator can take advantage of multiple links in an > active-active > > >> multipath scenario whereas NFS is only going to be able to take > > >> advantage of 1 link (at least until pNFS). > > > > > > There are other ways to get multiple paths. First off, > there is IP > > > multipathing. which offers some of this at the IP layer. > There is also > > > 802.3ad link aggregation (trunking). So you can still get high > > > performance beyond single link with NFS. (It works with > iSCSI too, > > > btw.) > > > > With both IPMP and link aggregation, each TCP session will > go over the > > same wire. There is no guarantee that load will be evenly balanced > > between links when there are multiple TCP sessions. As such, any > > scalability you get using these configurations will be dependent on > > having a complex enough workload, wise cconfiguration > choices, and and > > a bit of luck. > > If you're really that concerned, you could use UDP instead of > TCP. But > that may have other detrimental performance impacts, I'm not sure how > bad they would be in a data center with generally lossless ethernet > links. >
UDP is an advantage for NFS in this regard. > Btw, I am not certain that the multiple initiator support (mpxio) is > necessarily any better as far as guaranteed > performance/balancing. (It > may be; I've not looked closely enough at it.) > I'm not sure I'm referring to multi initiator. iSCSI can have multiple sessions between an initiator and a target, or multiple sessions by virtue of connections to different targets presenting the same LUN (this is the multipathing I am talking about). I'm not sure about the multiple sessions between single initiator/target scenario, but the single initiator/multiple target config can work in an IPMP scenario to get you more usable capacity between your initiator and target(s) over multiple links, using a variety of algorithms to balance load amongst the sessions. > I should look more closely at NFS as well -- if multiple > applications on > the same client are access the same filesystem, do they use a single > common TCP session, or can they each have separate instances open? > Again, I'm not sure. This is probably going to depend on the software, but in the scenario I am personally interested in (VMware), it doesn't really matter: it's a single application. VMware says they create two sessions - one for control and one for data - so I assume the maximum speed available for data tranfer to/from a particular mount is going to be the speed of 1 link. I guess this is getting way off topic for the list, but VMware also computes a unique ID for each NFS mount. The ID is computed based somehow off the mount configuration. If NFS datastore IDs are not identical then VMware thinks they are different datastores regardless of their contents. I have had a situation where some clients thought a particular datastore was different from the same datastore on some other clients, which prevented VMs hosted on that datastore from migrating between these clients. I traced the problem to inconsistent NFS mount configs; I'd used the fqdn for the configuration on most of the hosts but an IP address on the others. Reconfiguration resolved the issue. This would suggest that at least for this client/server combo, it would not be possible to do manual load balancing by pointing some clients at one IP and some at another IP for the same export. It would have to be balanced by export instead, which is a lot less convenient. VMware could also be more intelligent about generating their ID. > > > > > Note that with Sun Trunking there was an option to load > balance using > > a round robin hashing algorithm. When pushing high network > loads this > > may cause performance problems with reassembly. > > Yes. Reassembly is Evil for TCP performance. > > Btw, the iSCSI balancing act that was described does seem a bit > contrived -- a single initiator and a COMSTAR server, both client *and > server* with multiple ethernet links instead of a single 10GbE link. > > I'm not saying it doesn't happen, but I think it happens infrequently > enough that its reasonable that this scenario wasn't one that popped > immediately into my head. :-) I don't agree that it's contrived, but I do agree that it's reasonable you didn't think of it :). I don't want to have to create a bunch of custom initiator/target configurations to spread load. I want to have a target with some particular configuration and a bunch of initiators configured identically to each other, and I want load to be spread across the available gigabit links. My understanding is that the way to do this means having multiple targets configured per LUN on the storage server, with each target set up to be available only on a specific network. Each initiator/client is set up with interfaces in these networks and pointed at these targets, and an appropriate load balancing algorithm is chosen to spread load between the sessions. The configuration can use individual links and/or an 802.3ad configuration if the aggregate(s) are also 802.1q trunks. This is obviously dependent on client/initiator support, but I think initiators that claim to support MPIO or multipathing implement something like this to get it done. I'm pretty sure Solaris/COMSTAR permits this also, with multiple targets configured per LUN and each target able to be pinned to specific IP addresses. I haven't actually done this though so I guess I'm not 100% certain. -Will _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss