hi stijn, 

 

> i would look into using mellanox ofed with rdma-core

I will try the new 5.1 Mellanox OFED in the near future.

 

> you let pmix do it's job and thus simply start the mpi parts with srun
instead of mpirun

In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix
pingpong 100 100', but the IB connection is not used for the communication,
only the tcp connection.

 

The output of 'pmix_info' is:

                MCA bfrops: v21 (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA bfrops: v12 (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA bfrops: v20 (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA bfrops: v3 (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA gds: ds12 (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA gds: hash (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA gds: ds21 (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA pdl: pdlopen (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA pif: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component

                      v3.1.5)

                MCA pif: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component

                      v3.1.5)

                MCA pinstalldirs: env (MCA v2.1.0, API v1.0.0, Component
v3.1.5)

                MCA pinstalldirs: config (MCA v2.1.0, API v1.0.0, Component
v3.1.5)

                MCA plog: stdfd (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA plog: syslog (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA plog: default (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA pnet: tcp (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA pnet: test (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA preg: native (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA psec: native (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA psec: none (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA psensor: file (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA psensor: heartbeat (MCA v2.1.0, API v1.0.0, Component

                      v3.1.5)

                MCA pshmem: mmap (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA ptl: usock (MCA v2.1.0, API v1.0.0, Component v3.1.5)

                MCA ptl: tcp (MCA v2.1.0, API v1.0.0, Component v3.1.5)

 

Isn't there supposed to appear something with ucx?

 

Thanks :)

max

 

> hi max,

> 

> > I have set: 'UCX_TLS=tcp,self,sm' on the slurmd's.

> > Is it better to build slurm without UCX support or should I simply 

> > install rdma-core?

> i would look into using mellanox ofed with rdma-core, as it is what
mellanox is shifting towards or has already done (not sure what 4.9 has
tbh). or leave the env vars, i think for pmix it's ok unless you have very
large clusters (but i'm no expert here).

> 

> > 

> > How do I use ucx together with OpenMPI and srun now? 

> > It works when I set this manually:

> > 'mpirun -np 2 -H lsm218,lsm219 --mca pml ucx -x UCX_TLS=rc -x

> > UCX_NET_DEVICES=mlx5_0:1 pingpong 1000 1000'.

> > But if I put srun before mpirun four tasks will be created, two on 

> > each node.

> you let pmix do it's job and thus simply start the mpi parts with srun
instead of mpirun

> 

> srun pingpong 1000 1000

> 

> if you must tune UCX (as in: default behaviour is not ok), also set it via
env vars. (at least try to use the defaults, it's pretty good i think)

> 

> (shameless plug: one of my colleagues setup a tech talk with openmpi
people wrt pmix, ucx, openmpi etc; see

> https://github.com/easybuilders/easybuild/issues/630 for details and link
to youtube recording)

> 

> stijn

> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to