I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2 processes per node and --mca btl udapl,self. I didn't encouter any problems.
The comment above line 197 says that dat_ep_query() returns wrong port numbers (which it does indeed), but I can't find any call to dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date? Boris Andrew Friedley wrote: > You say that fixes the problem, does it work even when running more than > one MPI process per node? (that is the case the hack fixes) Simply > doing an mpirun with a -np paremeter higher than the number of nodes you > have set up should trigger this case, and making sure to use '-mca btl > udapl,self' (ie not SM or anything else). > > Andrew > > Boris Bierbaum wrote: >> It has been explained in a different thread on [ofa-general] that the >> problem lies in a combination of the OpenIB-cma provider not setting the >> local and remote port numbers on endpoints correctly and Open MPI >> stepping over the IA to save the port number to circumvent this problem, >> thereby confusing the provider. >> >> I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI >> 1.2.1 release) and this fixes the problem. As the problem in the >> provider is currently being fixed, the whole saving of the port number >> in the uDAPL BTL code will be unnecessary in the future. >> >> Steve Wise wrote: >>>>> Can the UDAPL OFED wizards shed any light on the error messages that >>>>> are listed below? In particular, these seem to be worrysome: >>>>> >>>>>> setup_listener Permission denied >>>>> setup_listener Address already in use >>>> These failures are from rdma_cm_bind indicating the port is already >>>> bound to this IA address. How are you creating the service point? >>>> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you >>>> will see some failures until it gets to a free port. That is normal. >>>> Just make sure your create call returns DAT_SUCCESS. >>>> >>> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down >>> and let the rdma-cma pick an available port number? >>> >>> >>> >>> _______________________________________________ >>> general mailing list >>> gene...@lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- | _ RWTH | Boris Bierbaum |_|_`_ | Lehrstuhl fuer Betriebssysteme | |_) _ | RWTH Aachen D-52056 Aachen |_)(_` | Tel: +49-241-80-27805 ._) | Fax: +49-241-80-22339