I cannot find a firmware for my card:
https://www.gigabyte.com/za/Accessory/CLNOQ42-rev-10#ov Do you have the same model? I found this zip file of the web: Linux_FWupg_41xxx_2.10.78.zip which contains a firmware upgrade tool and a firmware version 8.50.83, but when I run it I get this error message (card is not supported): ./LnxQlgcUpg.sh Extracting package contents... QLogic Firmware Upgrade Utility for Linux: v2.10.78 NIC is not supported. Quitting program ... Program Exit Code: (16) Failed to upgraded MBI thank you. ________________________________ From: Llolsten Kaonga <l...@soft-forge.com> Sent: Wednesday, November 13, 2019 3:25:16 PM To: 'Open MPI Users' Cc: Matteo Guglielmi Subject: RE: [OMPI users] qelr_alloc_context: Failed to allocate context for device. Hello Mateo, What version of openmpi are you running? Also, the OFED-4.17-1 release notes do not claim support for CentOS 7.7. It supports CentsOS 7.6. Apologies if you have already tried CentOS 7.6. We have been able to run openmpi (earlier this month): OS: CentOS 7.6 mpirun --version: 3.1.4 ofed_info -s: OFED-4.17-1 RNIC fw version 8.50.9.0 Thanks. -- Llolsten -----Original Message----- From: users <users-boun...@lists.open-mpi.org> On Behalf Of Matteo Guglielmi via users Sent: Wednesday, November 13, 2019 2:12 AM To: users@lists.open-mpi.org Cc: Matteo Guglielmi <matteo.guglie...@dalco.ch> Subject: [OMPI users] qelr_alloc_context: Failed to allocate context for device. I'm trying to get openmpi over RoCE working with this setup: card: https://www.gigabyte.com/Accessory/CLNOQ42-rev-10#ov OS: CentOS 7.7 modinfo qede filename: /lib/modules/3.10.0-1062.4.1.el7.x86_64/kernel/drivers/net/ethernet/qlogic/q ede/qede.ko.xz version: 8.37.0.20 license: GPL description: QLogic FastLinQ 4xxxx Ethernet Driver retpoline: Y rhelversion: 7.7 srcversion: A6AFD0788918644F2EFFF31 alias: pci:v00001077d00008090sv*sd*bc*sc*i* alias: pci:v00001077d00008070sv*sd*bc*sc*i* alias: pci:v00001077d00001664sv*sd*bc*sc*i* alias: pci:v00001077d00001656sv*sd*bc*sc*i* alias: pci:v00001077d00001654sv*sd*bc*sc*i* alias: pci:v00001077d00001644sv*sd*bc*sc*i* alias: pci:v00001077d00001636sv*sd*bc*sc*i* alias: pci:v00001077d00001666sv*sd*bc*sc*i* alias: pci:v00001077d00001634sv*sd*bc*sc*i* depends: ptp,qed intree: Y vermagic: 3.10.0-1062.4.1.el7.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key: 60:48:F2:5B:83:1E:C4:47:02:00:E2:36:02:C5:CA:83:1D:18:CF:8F sig_hashalgo: sha256 parm: debug: Default debug msglevel (uint) modinfo qedr filename: /lib/modules/3.10.0-1062.4.1.el7.x86_64/kernel/drivers/infiniband/hw/qedr/qe dr.ko.xz license: Dual BSD/GPL author: QLogic Corporation description: QLogic 40G/100G ROCE Driver retpoline: Y rhelversion: 7.7 srcversion: B5B65473217AA2B1F2F619B depends: qede,qed,ib_core intree: Y vermagic: 3.10.0-1062.4.1.el7.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key: 60:48:F2:5B:83:1E:C4:47:02:00:E2:36:02:C5:CA:83:1D:18:CF:8F sig_hashalgo: sha256 ibv_devinfo hca_id: qedr0 transport: InfiniBand (0) fw_ver: 8.37.7.0 node_guid: b62e:99ff:fea7:8439 sys_image_guid: b62e:99ff:fea7:8439 vendor_id: 0x1077 vendor_part_id: 32880 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet hca_id: qedr1 transport: InfiniBand (0) fw_ver: 8.37.7.0 node_guid: b62e:99ff:fea7:843a sys_image_guid: b62e:99ff:fea7:843a vendor_id: 0x1077 vendor_part_id: 32880 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet RDMA actually works at system level which means that I cand do rdma ping-pong tests etc. But when I try to run openmpi with these options: mpirun --mca btl openib,self,vader --mca btl_openib_cpc_include rdmacm ... I get the following error messages: -------------------------------------------------------------------------- WARNING: There is at least non-excluded one OpenFabrics device found, but there are no active ports detected (or Open MPI was unable to use them). This is most certainly not what you wanted. Check your cables, subnet manager configuration, etc. The openib BTL will be ignored for this job. Local host: node001 -------------------------------------------------------------------------- qelr_alloc_context: Failed to allocate context for device. qelr_alloc_context: Failed to allocate context for device. qelr_alloc_context: Failed to allocate context for device. qelr_alloc_context: Failed to allocate context for device. qelr_alloc_context: Failed to allocate context for device. -------------------------------------------------------------------------- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: node002 Local device: qedr0 Local port: 1 CPCs attempted: rdmacm -------------------------------------------------------------------------- qelr_alloc_context: Failed to allocate context for device. qelr_alloc_context: Failed to allocate context for device. ... I've tried several things such as: 1) upgrade the 3.10 kernel's qed* drivers to the latest stable version 8.42.9 2) upgrade the CentOS kernel from 3.10 to 5.3 via elrepo 3) install the latest OFED-4.17-1.tgz stack but the error messages never go away ad do remain always the same. Any advice is highly appreciated.