Hi Jeff !

What does "ompi_info | grep openib" show?
>
>
$ ompi_info | grep openib
                 MCA btl: openib (MCA v2.0.0, API v2.0.0, Component v1.10.2)

Additionally, Mellanox provides alternate support through their MXM
> libraries, if you want to try that.
>

Yes, I know.
But we already have a hybrid cluster with OpenMPI, OpenMP, CUDA, Torque and
many other libraries installed,
and because it works perfect over Ethernet interconnect my idea was to add
InfiniBand support with minimum
of changes. Mainly because we already have some custom-written software for
OpenMPI.


> If that shows that you have the openib BTL plugin loaded, try running with
> "mpirun --mca btl_base_verbose 100 ..."  That will provide additional
> output about why / why not each point-to-point plugin is chosen.
>
>
Yes, I tried to get this info already.
And I saw in log that rdmacm wants IP address on port.
So my question in topc start message was:

Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?

The mpirun output is:

[node1:02674] mca: base: components_register: registering btl components
[node1:02674] mca: base: components_register: found loaded component openib
[node1:02674] mca: base: components_register: component openib register
function successful
[node1:02674] mca: base: components_register: found loaded component sm
[node1:02674] mca: base: components_register: component sm register
function successful
[node1:02674] mca: base: components_register: found loaded component self
[node1:02674] mca: base: components_register: component self register
function successful
[node1:02674] mca: base: components_open: opening btl components
[node1:02674] mca: base: components_open: found loaded component openib
[node1:02674] mca: base: components_open: component openib open function
successful
[node1:02674] mca: base: components_open: found loaded component sm
[node1:02674] mca: base: components_open: component sm open function
successful
[node1:02674] mca: base: components_open: found loaded component self
[node1:02674] mca: base: components_open: component self open function
successful
[node1:02674] select: initializing btl component openib
[node1:02674] openib BTL: rdmacm IP address not found on port
[node1:02674] openib BTL: rdmacm CPC unavailable for use on mlx4_0:1;
skipped
[node1:02674] select: init of component openib returned failure
[node1:02674] mca: base: close: component openib closed
[node1:02674] mca: base: close: unloading component openib
[node1:02674] select: initializing btl component sm
[node1:02674] select: init of component sm returned failure
[node1:02674] mca: base: close: component sm closed
[node1:02674] mca: base: close: unloading component sm
[node1:02674] select: initializing btl component self
[node1:02674] select: init of component self returned success
[node1:02674] mca: bml: Using self btl to [[16642,1],0] on node node1
[node1:02674] mca: base: close: component self closed
[node1:02674] mca: base: close: unloading component self

Best regards,
Sergei.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to