[OMPI users] MPI with RoCE
Hi everyone Could someone please share any experience using MPI with RoCE ? I am trying to set up infiniband adapters (Mellanox cards for example) and run MPI applications with RoCE (Instead of TCP). As I understand, there might be some environment requirements or restrictions like kernel version, installed drivers, etc. I have tried a lot of versions of mpi libs and could not succeed. Would highly appreciate any hint or experience shared. Best regards, Harutyun Umrshatyan
Re: [OMPI users] [EXT] MPI with RoCE
Hi Harutyun, We use RoCE v2 using OpenMPI on our cluster, and it works great. We used to use the openib BTL, but have moved competely across to UCX. You have to configure RoCE on your switches and NICs (we use a mixture of Mellanox CX-4, CX-5 and CX-6 NICs, with Mellanox switches running Cumulus). We use DSCP and priority 3 for RoCE traffic tagging, and all our nodes run Mellanox OFED on RHEL7. Once RoCE is configured and tested (using things like ib_send_bw -d mlx5_bond_0 -x 7 -R -T 106 -D 10), getting UCX to use RoCE is quite easy, and compiling OpenMPI to use UCX is also very easy. Sean From: users on behalf of Harutyun Umrshatyan via users Sent: Sunday, 4 September 2022 04:28 To: users@lists.open-mpi.org Cc: Harutyun Umrshatyan Subject: [EXT] [OMPI users] MPI with RoCE External email: Please exercise caution Hi everyone Could someone please share any experience using MPI with RoCE ? I am trying to set up infiniband adapters (Mellanox cards for example) and run MPI applications with RoCE (Instead of TCP). As I understand, there might be some environment requirements or restrictions like kernel version, installed drivers, etc. I have tried a lot of versions of mpi libs and could not succeed. Would highly appreciate any hint or experience shared. Best regards, Harutyun Umrshatyan
Re: [OMPI users] [EXT] MPI with RoCE
Dear Sean, You gave me a lot of info! Now I am going to set up RHEL7 with Mellanox OFED to test it. Before I had setup without Mlx OFED on Ubuntu. Do you think it might cause issues ? Also please let me know the ompi version you have used and do I understand it right that ucx is intalled and configured separately, then openmpi is configured to used it? Thanks again for your help! Harutyun On Sun, Sep 4, 2022, 09:20 Sean Crosby wrote: > Hi Harutyun, > > We use RoCE v2 using OpenMPI on our cluster, and it works great. We used > to use the openib BTL, but have moved competely across to UCX. > > You have to configure RoCE on your switches and NICs (we use a mixture of > Mellanox CX-4, CX-5 and CX-6 NICs, with Mellanox switches running Cumulus). > We use DSCP and priority 3 for RoCE traffic tagging, and all our nodes run > Mellanox OFED on RHEL7. > > Once RoCE is configured and tested (using things like ib_send_bw -d > mlx5_bond_0 -x 7 -R -T 106 -D 10), getting UCX to use RoCE is quite easy, > and compiling OpenMPI to use UCX is also very easy. > > Sean > -- > *From:* users on behalf of Harutyun > Umrshatyan via users > *Sent:* Sunday, 4 September 2022 04:28 > *To:* users@lists.open-mpi.org > *Cc:* Harutyun Umrshatyan > *Subject:* [EXT] [OMPI users] MPI with RoCE > > * External email: Please exercise caution * > -- > Hi everyone > > Could someone please share any experience using MPI with RoCE ? > I am trying to set up infiniband adapters (Mellanox cards for example) and > run MPI applications with RoCE (Instead of TCP). > As I understand, there might be some environment requirements or > restrictions like kernel version, installed drivers, etc. > I have tried a lot of versions of mpi libs and could not succeed. Would > highly appreciate any hint or experience shared. > > Best regards, > Harutyun Umrshatyan > >