hi max,
>> you let pmix do it's job and thus simply start the mpi parts with srun
> instead of mpirun
>
> In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix
> pingpong 100 100', but the IB connection is not used for the
> communication, only the tcp connection.
h, this od
hi stijn,
> i would look into using mellanox ofed with rdma-core
I will try the new 5.1 Mellanox OFED in the near future.
> you let pmix do it's job and thus simply start the mpi parts with srun
instead of mpirun
In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix
ping
for details and
link to youtube recording)
stijn
>
> Thanks for helping me!
> -max
>
> -Ursprüngliche Nachricht-
> Von: Stijn De Weirdt
> Gesendet: Mittwoch, 12. August 2020 22:30
> An: slurm-users@lists.schedmd.com
> Betreff: Re: [slurm-users] [External] Re:
1000'.
But if I put srun before mpirun four tasks will be created, two on each
node.
Thanks for helping me!
-max
-Ursprüngliche Nachricht-
Von: Stijn De Weirdt
Gesendet: Mittwoch, 12. August 2020 22:30
An: slurm-users@lists.schedmd.com
Betreff: Re: [slurm-users] [External] Re: ope
hi max,
are you using rdma-core with mellanox ofed? and do you have any
uverbs_write error messages in dmesg on the hosts? there is an issue
with rdma vs tcp in ucx+pmix when rdma-core is not used. the workaournd
for the issue is to start slurmd on the nodes with environment
'UCX_TLS=tcp,self,sm'
Max,
You didn't quote the original e-mail so I'm not sure what the original
problem was, or who "you" is.
--
Prentice
On 8/12/20 6:55 AM, Max Quast wrote:
I am also trying to use ucx with slurm/PMIx and get the same error.
Also mpirun with "--mca pml ucx" works fine.
Used versions:
Ubu
Hello Prentice,
sorry for that.
My post refers to a post by Dean Hidas on Mon Jun 17 17:40:56 UTC 2019:
> Hello,
>
> I am trying to use ucx with slurm/pmix and run into the error below. The
following works using mpirun, but what I was hoping was the srun equivalent
fails. Is there some