Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE fabric.
Got this run time error:
An invalid CPC name was specified via the btl_openib_cpc_include MCA
parameter.
Local host: atl3-14
btl_openib_cpc_include value: rdmacm
Invalid name: rdmacm
All possible valid names: oob,xoob
--------------------------------------------------------------------------
[atl3-14:07184] mca: base: components_open: component btl / openib open
function failed
[atl3-12:09178] mca: base: components_open: component btl / openib open
function failed
Used these options to mpirun:
"--mca btl openib,self,sm --mca btl_openib_cpc_include rdmacm -mca
btl_openib_if_include mlx4_0:2"
We have a Mellanox LOM with two ports, first is an IB port, second is an 10GbE
port.
Running over the IB port and TCP over the 10GbE port work fine.
Built OpenMPI with this option "--enable-openib-rdmacm".
Our system has OFED 1.5.2 with librdmacm-1.0.13-1
I noticed this output from configure script:
checking rdma/rdma_cma.h usability... no
checking rdma/rdma_cma.h presence... no
checking for rdma/rdma_cma.h... no
checking whether IBV_LINK_LAYER_ETHERNET is declared... yes
checking if RDMAoE support is enabled... yes
checking for infiniband/driver.h... yes
checking if ConnectX XRC support is enabled... yes
checking if dynamic SL is enabled... no
checking if OpenFabrics RDMACM support is enabled... no
Are we missing a build option or a piece of software?
Config.log and output from "ompi_info --all" attached.
% ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.9.1000
node_guid: 78e7:d103:0021:4464
sys_image_guid: 78e7:d103:0021:4467
vendor_id: 0x02c9
vendor_part_id: 26438
hw_ver: 0xB0
board_id: HP_0200000003
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 34
port_lid: 11
port_lmc: 0x00
link_layer: IB
port: 2
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
% /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 78:E7:D1:21:44:60
inet addr:16.113.180.147 Bcast:16.113.183.255 Mask:255.255.252.0
inet6 addr: fe80::7ae7:d1ff:fe21:4460/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1861763 errors:0 dropped:0 overruns:0 frame:0
TX packets:1776402 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:712448939 (679.4 MiB) TX bytes:994111004 (948.0 MiB)
Memory:fb9e0000-fba00000
eth2 Link encap:Ethernet HWaddr 78:E7:D1:21:44:65
inet addr:10.10.0.147 Bcast:10.10.0.255 Mask:255.255.255.0
inet6 addr: fe80::78e7:d100:121:4465/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8519814 errors:0 dropped:0 overruns:0 frame:0
TX packets:8555715 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12370127778 (11.5 GiB) TX bytes:12372246315 (11.5 GiB)
ib0 Link encap:InfiniBand HWaddr
80:00:00:4D:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.0.147 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::7ae7:d103:21:4465/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:16384 Metric:1
RX packets:1989 errors:0 dropped:0 overruns:0 frame:0
TX packets:208 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:275196 (268.7 KiB) TX bytes:19202 (18.7 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:42224 errors:0 dropped:0 overruns:0 frame:0
TX packets:42224 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3115668 (2.9 MiB) TX bytes:3115668 (2.9 MiB)
Thanks,
-Jeff
/**********************************************************/
/* Jeff Konz [email protected] */
/* Solutions Architect HPC Benchmarking */
/* Americas Shared Solutions Architecture (SSA) */
/* Hewlett-Packard Company */
/* Office: 248-491-7480 Mobile: 248-345-6857 */
/**********************************************************/
config.log.gz
Description: config.log.gz
ompi_info.txt.gz
Description: ompi_info.txt.gz
