[OMPI users] UCX and MPI_THREAD_MULTIPLE
I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm using OpenMPI 4.0.1. Since 4.0.1 requires UCX I have it installed with MT on (1.6.0 build). The thing is that the code keeps stalling out when I go above a couple of nodes. UCX is new to our environment as previously we have just used the regular IB Verbs with no problem. My guess is that there is either some option in OpenMPI I am missing or some variable in UCX I am not setting. Any insight on what could be causing the stalls? -Paul Edmon- ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] UCX and MPI_THREAD_MULTIPLE
Sure. The code I'm using is the latest version of Wombat (https://bitbucket.org/pmendygral/wombat-public/wiki/Home , I'm using an unreleased updated version as I know the devs). I'm using OMP_THREAD_NUM=12 and the command line is: mpirun -np 16 --hostfile hosts ./wombat Where the host file lists 4 machines, so 4 ranks per machine and 12 threads per rank. Each node has 48 Intel Cascade Lake cores. I've also tried using the Slurm scheduler version which is: srun -n 16 -c 12 --mpi=pmix ./wombat Which also hangs. It works if I constrain to one or two nodes but any greater than that hangs. As for network hardware: [root@holy7c02101 ~]# ibstat CA 'mlx5_0' CA type: MT4119 Number of ports: 1 Firmware version: 16.25.6000 Hardware version: 0 Node GUID: 0xb8599f0300158f20 System image GUID: 0xb8599f0300158f20 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 808 LMC: 1 SM lid: 584 Capability mask: 0x2651e848 Port GUID: 0xb8599f0300158f20 Link layer: InfiniBand [root@holy7c02101 ~]# lspci | grep Mellanox 58:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] As for IB RDMA kernel stack we are using the default drivers that come with CentOS 7.6.1810 which is rdma core 17.2-3. I will note that I successfully ran an old version of Wombat on all 30,000 cores of this system using OpenMPI 3.1.3 and regular IB Verbs with no problem earlier this week, though that was pure MPI ranks with no threads. Nonetheless the fabric itself is healthy and in good shape. It seems to be this edge case using the latest OpenMPI with UCX and threads that is causing the hang ups. To be sure the latest version of Wombat (as I believe the public version does as well) uses many of the state of the art MPI RMA direct calls, so its definitely pushing the envelope in ways our typical user base here will not. Still it would be good to iron out this kink so if users do hit it we have a solution. As noted UCX is very new to us and thus it is entirely possible that we are missing something in its interaction with OpenMPI. Our MPI is compiled thusly: https://github.com/fasrc/helmod/blob/master/rpmbuild/SPECS/centos7/openmpi-4.0.1-fasrc01.spec I will note that when I built this it was built using the default version of UCX that comes with EPEL (1.5.1). We only built 1.6.0 as the version provided by EPEL did not build with MT enabled, which to me seems strange as I don't see any reason not to build with MT enabled. Anyways that's the deeper context. -Paul Edmon- On 8/23/2019 5:49 PM, Joshua Ladd via users wrote: Paul, Can you provide a repro and command line, please. Also, what network hardware are you using? Josh On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users mailto:users@lists.open-mpi.org>> wrote: I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm using OpenMPI 4.0.1. Since 4.0.1 requires UCX I have it installed with MT on (1.6.0 build). The thing is that the code keeps stalling out when I go above a couple of nodes. UCX is new to our environment as previously we have just used the regular IB Verbs with no problem. My guess is that there is either some option in OpenMPI I am missing or some variable in UCX I am not setting. Any insight on what could be causing the stalls? -Paul Edmon- ___ users mailing list users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] UCX and MPI_THREAD_MULTIPLE
I forgot to include that we have not rebuilt this OpenMPI 4.0.1 against 1.6.0 of UCX but rather 1.5.1. When we upgraded to 1.6.0 everything seemed to be working for OpenMPI when we swapped the UCX version with out recompiling (at least in normal rank level MPI as we had to do the upgrade to UCX to get MPI_THREAD_MULTIPLE to work at all). -Paul Edmon- On 8/23/2019 9:31 PM, Paul Edmon wrote: Sure. The code I'm using is the latest version of Wombat (https://bitbucket.org/pmendygral/wombat-public/wiki/Home , I'm using an unreleased updated version as I know the devs). I'm using OMP_THREAD_NUM=12 and the command line is: mpirun -np 16 --hostfile hosts ./wombat Where the host file lists 4 machines, so 4 ranks per machine and 12 threads per rank. Each node has 48 Intel Cascade Lake cores. I've also tried using the Slurm scheduler version which is: srun -n 16 -c 12 --mpi=pmix ./wombat Which also hangs. It works if I constrain to one or two nodes but any greater than that hangs. As for network hardware: [root@holy7c02101 ~]# ibstat CA 'mlx5_0' CA type: MT4119 Number of ports: 1 Firmware version: 16.25.6000 Hardware version: 0 Node GUID: 0xb8599f0300158f20 System image GUID: 0xb8599f0300158f20 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 808 LMC: 1 SM lid: 584 Capability mask: 0x2651e848 Port GUID: 0xb8599f0300158f20 Link layer: InfiniBand [root@holy7c02101 ~]# lspci | grep Mellanox 58:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] As for IB RDMA kernel stack we are using the default drivers that come with CentOS 7.6.1810 which is rdma core 17.2-3. I will note that I successfully ran an old version of Wombat on all 30,000 cores of this system using OpenMPI 3.1.3 and regular IB Verbs with no problem earlier this week, though that was pure MPI ranks with no threads. Nonetheless the fabric itself is healthy and in good shape. It seems to be this edge case using the latest OpenMPI with UCX and threads that is causing the hang ups. To be sure the latest version of Wombat (as I believe the public version does as well) uses many of the state of the art MPI RMA direct calls, so its definitely pushing the envelope in ways our typical user base here will not. Still it would be good to iron out this kink so if users do hit it we have a solution. As noted UCX is very new to us and thus it is entirely possible that we are missing something in its interaction with OpenMPI. Our MPI is compiled thusly: https://github.com/fasrc/helmod/blob/master/rpmbuild/SPECS/centos7/openmpi-4.0.1-fasrc01.spec I will note that when I built this it was built using the default version of UCX that comes with EPEL (1.5.1). We only built 1.6.0 as the version provided by EPEL did not build with MT enabled, which to me seems strange as I don't see any reason not to build with MT enabled. Anyways that's the deeper context. -Paul Edmon- On 8/23/2019 5:49 PM, Joshua Ladd via users wrote: Paul, Can you provide a repro and command line, please. Also, what network hardware are you using? Josh On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users mailto:users@lists.open-mpi.org>> wrote: I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm using OpenMPI 4.0.1. Since 4.0.1 requires UCX I have it installed with MT on (1.6.0 build). The thing is that the code keeps stalling out when I go above a couple of nodes. UCX is new to our environment as previously we have just used the regular IB Verbs with no problem. My guess is that there is either some option in OpenMPI I am missing or some variable in UCX I am not setting. Any insight on what could be causing the stalls? -Paul Edmon- ___ users mailing list users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] UCX and MPI_THREAD_MULTIPLE
It's the public source. The one I'm testing with is the latest internal version. I'm going to cc Pete Mendygral and Julius Donnert on this as they may be able to provide you the version I'm using (as it is not ready for public use). -Paul Edmon- On 8/26/19 9:20 PM, Joshua Ladd wrote: **apropos :-) On Mon, Aug 26, 2019 at 9:19 PM Joshua Ladd <mailto:jladd.m...@gmail.com>> wrote: Hi, Paul I must say, this is eerily appropo. I've just sent a request for Wombat last week as I was planning to have my group start looking at the performance of UCX OSC on IB. We are most interested in ensuring UCX OSC MT performs well on Wombat. The bitbucket you're referencing; is this the source code? Can we build and run it? Best, Josh On Fri, Aug 23, 2019 at 9:37 PM Paul Edmon via users mailto:users@lists.open-mpi.org>> wrote: I forgot to include that we have not rebuilt this OpenMPI 4.0.1 against 1.6.0 of UCX but rather 1.5.1. When we upgraded to 1.6.0 everything seemed to be working for OpenMPI when we swapped the UCX version with out recompiling (at least in normal rank level MPI as we had to do the upgrade to UCX to get MPI_THREAD_MULTIPLE to work at all). -Paul Edmon- On 8/23/2019 9:31 PM, Paul Edmon wrote: Sure. The code I'm using is the latest version of Wombat (https://bitbucket.org/pmendygral/wombat-public/wiki/Home , I'm using an unreleased updated version as I know the devs). I'm using OMP_THREAD_NUM=12 and the command line is: mpirun -np 16 --hostfile hosts ./wombat Where the host file lists 4 machines, so 4 ranks per machine and 12 threads per rank. Each node has 48 Intel Cascade Lake cores. I've also tried using the Slurm scheduler version which is: srun -n 16 -c 12 --mpi=pmix ./wombat Which also hangs. It works if I constrain to one or two nodes but any greater than that hangs. As for network hardware: [root@holy7c02101 ~]# ibstat CA 'mlx5_0' CA type: MT4119 Number of ports: 1 Firmware version: 16.25.6000 Hardware version: 0 Node GUID: 0xb8599f0300158f20 System image GUID: 0xb8599f0300158f20 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 808 LMC: 1 SM lid: 584 Capability mask: 0x2651e848 Port GUID: 0xb8599f0300158f20 Link layer: InfiniBand [root@holy7c02101 ~]# lspci | grep Mellanox 58:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] As for IB RDMA kernel stack we are using the default drivers that come with CentOS 7.6.1810 which is rdma core 17.2-3. I will note that I successfully ran an old version of Wombat on all 30,000 cores of this system using OpenMPI 3.1.3 and regular IB Verbs with no problem earlier this week, though that was pure MPI ranks with no threads. Nonetheless the fabric itself is healthy and in good shape. It seems to be this edge case using the latest OpenMPI with UCX and threads that is causing the hang ups. To be sure the latest version of Wombat (as I believe the public version does as well) uses many of the state of the art MPI RMA direct calls, so its definitely pushing the envelope in ways our typical user base here will not. Still it would be good to iron out this kink so if users do hit it we have a solution. As noted UCX is very new to us and thus it is entirely possible that we are missing something in its interaction with OpenMPI. Our MPI is compiled thusly: https://github.com/fasrc/helmod/blob/master/rpmbuild/SPECS/centos7/openmpi-4.0.1-fasrc01.spec I will note that when I built this it was built using the default version of UCX that comes with EPEL (1.5.1). We only built 1.6.0 as the version provided by EPEL did not build with MT enabled, which to me seems strange as I don't see any reason not to build with MT enabled. Anyways that's the deeper context. -Paul Edmon- On 8/23/2019 5:49 PM, Joshua Ladd via users wrote: Paul, Can you provide a repro and command line, please. Also, what network hardware are you using? Josh On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users mailto:users@lists.open-mpi.org>> wrote: I have a code using MPI_THREAD_MULTIPLE along wit
Re: [OMPI users] UCX and MPI_THREAD_MULTIPLE
As a coda to this I managed to get UCX 1.6.0 built with threading and OpenMPI 4.0.1 to build using this: https://github.com/openucx/ucx/issues/4020 That appears to be working. -Paul Edmon- On 8/26/19 9:20 PM, Joshua Ladd wrote: **apropos :-) On Mon, Aug 26, 2019 at 9:19 PM Joshua Ladd <mailto:jladd.m...@gmail.com>> wrote: Hi, Paul I must say, this is eerily appropo. I've just sent a request for Wombat last week as I was planning to have my group start looking at the performance of UCX OSC on IB. We are most interested in ensuring UCX OSC MT performs well on Wombat. The bitbucket you're referencing; is this the source code? Can we build and run it? Best, Josh On Fri, Aug 23, 2019 at 9:37 PM Paul Edmon via users mailto:users@lists.open-mpi.org>> wrote: I forgot to include that we have not rebuilt this OpenMPI 4.0.1 against 1.6.0 of UCX but rather 1.5.1. When we upgraded to 1.6.0 everything seemed to be working for OpenMPI when we swapped the UCX version with out recompiling (at least in normal rank level MPI as we had to do the upgrade to UCX to get MPI_THREAD_MULTIPLE to work at all). -Paul Edmon- On 8/23/2019 9:31 PM, Paul Edmon wrote: Sure. The code I'm using is the latest version of Wombat (https://bitbucket.org/pmendygral/wombat-public/wiki/Home , I'm using an unreleased updated version as I know the devs). I'm using OMP_THREAD_NUM=12 and the command line is: mpirun -np 16 --hostfile hosts ./wombat Where the host file lists 4 machines, so 4 ranks per machine and 12 threads per rank. Each node has 48 Intel Cascade Lake cores. I've also tried using the Slurm scheduler version which is: srun -n 16 -c 12 --mpi=pmix ./wombat Which also hangs. It works if I constrain to one or two nodes but any greater than that hangs. As for network hardware: [root@holy7c02101 ~]# ibstat CA 'mlx5_0' CA type: MT4119 Number of ports: 1 Firmware version: 16.25.6000 Hardware version: 0 Node GUID: 0xb8599f0300158f20 System image GUID: 0xb8599f0300158f20 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 808 LMC: 1 SM lid: 584 Capability mask: 0x2651e848 Port GUID: 0xb8599f0300158f20 Link layer: InfiniBand [root@holy7c02101 ~]# lspci | grep Mellanox 58:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] As for IB RDMA kernel stack we are using the default drivers that come with CentOS 7.6.1810 which is rdma core 17.2-3. I will note that I successfully ran an old version of Wombat on all 30,000 cores of this system using OpenMPI 3.1.3 and regular IB Verbs with no problem earlier this week, though that was pure MPI ranks with no threads. Nonetheless the fabric itself is healthy and in good shape. It seems to be this edge case using the latest OpenMPI with UCX and threads that is causing the hang ups. To be sure the latest version of Wombat (as I believe the public version does as well) uses many of the state of the art MPI RMA direct calls, so its definitely pushing the envelope in ways our typical user base here will not. Still it would be good to iron out this kink so if users do hit it we have a solution. As noted UCX is very new to us and thus it is entirely possible that we are missing something in its interaction with OpenMPI. Our MPI is compiled thusly: https://github.com/fasrc/helmod/blob/master/rpmbuild/SPECS/centos7/openmpi-4.0.1-fasrc01.spec I will note that when I built this it was built using the default version of UCX that comes with EPEL (1.5.1). We only built 1.6.0 as the version provided by EPEL did not build with MT enabled, which to me seems strange as I don't see any reason not to build with MT enabled. Anyways that's the deeper context. -Paul Edmon- On 8/23/2019 5:49 PM, Joshua Ladd via users wrote: Paul, Can you provide a repro and command line, please. Also, what network hardware are you using? Josh On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users mailto:users@lists.open-mpi.org>> wrote: I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm using OpenMPI 4.0.1. Since 4.0.
Re: [OMPI users] Oldest version of SLURM in use?
At FASRC Harvard we generally keep up with the latest so we are on 22.05.2. -Paul Edmon- On 8/16/2022 9:51 AM, Jeff Squyres (jsquyres) via users wrote: I have a curiosity question for the Open MPI user community: what version of SLURM are you using? I ask because we're honestly curious about what the expectations are regarding new versions of Open MPI supporting older versions of SLURM. I believe that SchedMD's policy is that they support up to 5-year old versions of SLURM, which is perfectly reasonable. But then again, there's lots of people who don't have support contracts with SchedMD, and therefore don't want or need support from SchedMD. Indeed, in well-funded institutions, HPC clusters tend to have a lifetime of 2-4 years before they are refreshed, which fits nicely within that 5-year window. But in less well-funded institutions, HPC clusters could have lifetimes longer than 5 years. Do any of you run versions of SLURM that are more than 5 years old? -- Jeff Squyres jsquy...@cisco.com