Could you re-enable the SL param (btl_openib_ib_service_level) for RoCE? Jeff was kind enough to provide a patch to let me specify the gid_index, but that doesn't seem to be working. To get RoCE to work correctly (at least, on Nexus switches) I'll need to specify both a gid_index and an IB service level. I think. :-)
Also, while the rdmacm connection manager is required for RoCE, it's not selected by default (like it is for iWARP). You still need to add that to a config file or command line, or you get a rather cryptic option (at least up through OpenMPI 1.5.1). -- Mike Shuey On Mon, Feb 21, 2011 at 12:34 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Random thought: is there a check to ensure that the SL MCA param is not set > in a RoCE environment? If not, we should probably add a show_help warning if > the SL MCA param is set when using RoCE (i.e., that its value will be > ignored). > > > On Feb 19, 2011, at 12:22 AM, Shamis, Pavel wrote: > >> As far as I remember we don't allow to user to specify SL for RoCE. RoCE >> considered kinda ethernet device and RDMACM connection manager is used to >> setup the connections. it means that in order to select network X or Y, you >> may use ip/netmask (btl_openib_ipaddr_include) . >> >> Pavel (Pasha) Shamis >> --- >> Application Performance Tools Group >> Computer Science and Math Division >> Oak Ridge National Laboratory >> >> >> >> >> >> >> On Feb 18, 2011, at 4:14 PM, Michael Shuey wrote: >> >>> Per-node GID & SL settings == bad. Site-wide GID & SL settings == good. >>> >>> If this could be an MCA param (like btl_openib_ib_service_level) >>> that'd be great - we already have a global config file of similar >>> params. We'd definitely want the same N everywhere. >>> >>> -- >>> Mike Shuey >>> >>> >>> >>> On Fri, Feb 18, 2011 at 3:44 PM, Jeff Squyres <jsquy...@cisco.com> wrote: >>>> On Feb 18, 2011, at 1:39 PM, Michael Shuey wrote: >>>> >>>>> RoCE HCAs keep a GID table, like normal HCAs. Every time you bring up >>>>> a vlan interface, another entry gets automatically added to the table. >>>>> If I select one of these other GIDs, packets get a VLAN tag, and that >>>>> contains the necessary priority bits (well, assuming I selected the >>>>> right IB service level, which is mapped to the priority tag in the >>>>> VLAN header) for the traffic to match a lossless class of service on >>>>> the switch. >>>> >>>> Ah -- I see it now (it's been a looong time since I've looked in Open >>>> MPI's verbs code!). We query and simply take the 0th GID from a given IBV >>>> device port's GID table. >>>> >>>>> For this to work, I really need for the IB client to select a >>>>> non-default GID. A few test programs included in OFED will do this, >>>>> but I'm not sure OpenMPI will. Any thoughts? >>>> >>>> Yes, we can do this. It's pretty easy to add an MCA parameter to select >>>> the Nth GID rather than always taking the 0th. >>>> >>>> To make this simple, can you make it so that the value of N is the same >>>> across all nodes in your cluster? Then you can set a site-wide MCA param >>>> for that value of N and be done with this issue. If we have to have a >>>> per-node setting of N, it could get a little hairy (it's do-able, but... >>>> it's a heckuva lot easier if N is the same everywhere). >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >