Glad you figured it out!

I was waiting for Mellanox support to jump in and answer here; I am not part of 
the UCX community, so I can't really provide definitive UCX answers.



On Jul 22, 2020, at 1:16 PM, Lana Deere 
<lana.de...@gmail.com<mailto:lana.de...@gmail.com>> wrote:

Never mind.  This was apparently because I had ucx configured for static 
libraries while openmpi was configured for shared libraries.

.. Lana (lana.de...@gmail.com<mailto:lana.de...@gmail.com>)




On Tue, Jul 21, 2020 at 12:58 PM Lana Deere 
<lana.de...@gmail.com<mailto:lana.de...@gmail.com>> wrote:
I'm using the infiniband drivers in the CentOS7 distribution, not the Mellanox 
drivers.  The version of Lustre we're using is built against the distro drivers 
and breaks if the Mellanox drivers get installed.

Is there a particular version of ucx which should be used with openmpi 4.0.4?  
I downloaded ucx 1.8.1 and installed it, then tried to configure openmpi with 
--with-ucx=<location> but the configure failed.  The configure finds the ucx 
installation OK but thinks some symbols are undeclared.  I tried to find those 
in the ucx source area (in case I configured ucx wrong) but didn't turn them up 
anywhere.  Here is the bottom of the configure output showing mostly "yes" for 
checks but a series of "no" at the end.

[...]
checking ucp/api/ucp.h usability... yes
checking ucp/api/ucp.h presence... yes
checking for ucp/api/ucp.h... yes
checking for library containing ucp_cleanup... no
checking whether ucp_tag_send_nbr is declared... yes
checking whether ucp_ep_flush_nb is declared... yes
checking whether ucp_worker_flush_nb is declared... yes
checking whether ucp_request_check_status is declared... yes
checking whether ucp_put_nb is declared... yes
checking whether ucp_get_nb is declared... yes
checking whether ucm_test_events is declared... yes
checking whether UCP_ATOMIC_POST_OP_AND is declared... yes
checking whether UCP_ATOMIC_POST_OP_OR is declared... yes
checking whether UCP_ATOMIC_POST_OP_XOR is declared... yes
checking whether UCP_ATOMIC_FETCH_OP_FAND is declared... yes
checking whether UCP_ATOMIC_FETCH_OP_FOR is declared... yes
checking whether UCP_ATOMIC_FETCH_OP_FXOR is declared... yes
checking whether UCP_PARAM_FIELD_ESTIMATED_NUM_PPN is declared... yes
checking whether UCP_WORKER_ATTR_FIELD_ADDRESS_FLAGS is declared... yes
checking whether ucp_tag_send_nbx is declared... no
checking whether ucp_tag_send_sync_nbx is declared... no
checking whether ucp_tag_recv_nbx is declared... no
checking for ucp_request_param_t... no
configure: error: UCX support requested but not found.  Aborting


.. Lana (lana.de...@gmail.com<mailto:lana.de...@gmail.com>)




On Mon, Jul 20, 2020 at 12:43 PM Jeff Squyres (jsquyres) 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
Correct, UCX = OpenUCX.org<http://openucx.org/>.

If you have the Mellanox drivers package installed, it probably would have 
installed UCX (and Open MPI).  You'll have to talk to your sysadmin and/or 
Mellanox support for details about that.


On Jul 20, 2020, at 11:36 AM, Lana Deere 
<lana.de...@gmail.com<mailto:lana.de...@gmail.com>> wrote:

I assume UCX is https://www.openucx.org<https://www.openucx.org/>?  (Google 
found several things called UCX when I searched, but that seemed the right 
one.)  I will try installing it and then reinstall OpenMPI.  Hopefully it will 
then choose between network transports automatically based on what's available. 
 I'll also look at the slides and see if I can make sense of them.  Thanks.

.. Lana (lana.de...@gmail.com<mailto:lana.de...@gmail.com>)




On Sat, Jul 18, 2020 at 9:41 AM Jeff Squyres (jsquyres) 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
On Jul 16, 2020, at 2:56 PM, Lana Deere via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:

I am new to open mpi.  I built 4.0.4 on a CentOS7 machine and tried doing an 
mpirun of a small program compiled against openmpi.  It seems to have failed 
because my host does not have infiniband.  I can't seem to figure out how I 
should configure when I build so it will do what I want, namely use infiniband 
if there are IB HCAs on the system and otherwise use the ethernet on the system.

UCX is the underlying library that Mellanox/Nvidia prefers these days for use 
with MPI and InfiniBand.

Meaning: you should first install UCX and then build Open MPI with 
--with-ucx=/directory/of/ucx/installation.

We just hosted parts 1 and 2 of a seminar entitled "The ABCs of Open MPI" that 
covered topics like this.  Check out:

https://www.open-mpi.org/video/?category=general#abcs-of-open-mpi-part-1
and
https://www.open-mpi.org/video/?category=general#abcs-of-open-mpi-part-2

In particular, you might want to look at slides 28-42 in part 2 for a bunch of 
discussion about how Open MPI (by default) picks the underlying network / APIs 
to use, and then how you can override that if you want to.

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>



--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>



--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>

Reply via email to