Couple of things.
With linux I believe you need the interface instance in the 7th field of the /etc/dat.conf file.
   example:

InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 " " " "
should be
InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 "ib0 0 " " "


Also, I did see a problem when running with less than ofed 1.2 which I did not pursue because v1.2 worked. Last, it appears that you are running udapl 1.1, I have only ever run on 1.2 so I don't know what to expect.

-DON

Troy Telford wrote:

OK, I've got a system set up so that it can use uDAPL over IB (! OFED, ! Mellanox, though) on Linux.

Running simple dapl test programs (shamelessly pulled from the OFED tree) seems to verify that DAPL is in fact operating properly.

After searching through the mail archives, I found a small test code by Donald Kerr (dat_reg.c), and compiled an ran that successfully. When run, it returns the name of the DAT name (ib0)

I've also been able to run programs using uDAPL with Intel MPI, for example. I'm fairly sure uDAPL is working.

However, when I attempt to run an MPI program over uDAPL (--mca btl udapl,sm,self), I receive the following error:

WARNING: Failed to open "ib0" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.

[0,1,0]: uDAPL on host n02 was unable to find any NICs.


I've also tried using --mca btl_udapl_if_include ib0, but that doesn't seem to have any effect.

Interestingly enough, when I don't specify a DAT provider, and I play with the name in /etc/dat.conf, Open MPI seems aware of the name change; it will list 'failed to open "newname"'


my /etc/dat.conf looks like this:
InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 " " " "


Any ideas on why I'm not able to get Open MPI to use uDAPL?

Reply via email to