Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-16 Thread Stijn De Weirdt
hi max, >> you let pmix do it's job and thus simply start the mpi parts with srun > instead of mpirun > > In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix > pingpong 100 100', but the IB connection is not used for the > communication, only the tcp connection. h, this od

[slurm-users] [External] Re: openmpi / UCX / srun

2020-08-16 Thread Max Quast
hi stijn, > i would look into using mellanox ofed with rdma-core I will try the new 5.1 Mellanox OFED in the near future. > you let pmix do it's job and thus simply start the mpi parts with srun instead of mpirun In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix ping

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-14 Thread Stijn De Weirdt
for details and link to youtube recording) stijn > > Thanks for helping me! > -max > > -Ursprüngliche Nachricht- > Von: Stijn De Weirdt > Gesendet: Mittwoch, 12. August 2020 22:30 > An: slurm-users@lists.schedmd.com > Betreff: Re: [slurm-users] [External] Re:

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-13 Thread Max Quast
1000'. But if I put srun before mpirun four tasks will be created, two on each node. Thanks for helping me! -max -Ursprüngliche Nachricht- Von: Stijn De Weirdt Gesendet: Mittwoch, 12. August 2020 22:30 An: slurm-users@lists.schedmd.com Betreff: Re: [slurm-users] [External] Re: ope

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-12 Thread Stijn De Weirdt
hi max, are you using rdma-core with mellanox ofed? and do you have any uverbs_write error messages in dmesg on the hosts? there is an issue with rdma vs tcp in ucx+pmix when rdma-core is not used. the workaournd for the issue is to start slurmd on the nodes with environment 'UCX_TLS=tcp,self,sm'

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-12 Thread Prentice Bisbal
Max, You didn't quote the original e-mail so I'm not sure what the original problem was, or who "you" is. -- Prentice On 8/12/20 6:55 AM, Max Quast wrote: I am also trying to use ucx with slurm/PMIx and get the same error.  Also mpirun with "--mca pml ucx" works fine. Used versions: Ubu

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-12 Thread Max Quast
Hello Prentice, sorry for that. My post refers to a post by Dean Hidas on Mon Jun 17 17:40:56 UTC 2019: > Hello, > > I am trying to use ucx with slurm/pmix and run into the error below. The following works using mpirun, but what I was hoping was the srun equivalent fails. Is there some