hi stijn,
> i would look into using mellanox ofed with rdma-core I will try the new 5.1 Mellanox OFED in the near future. > you let pmix do it's job and thus simply start the mpi parts with srun instead of mpirun In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix pingpong 100 100', but the IB connection is not used for the communication, only the tcp connection. The output of 'pmix_info' is: MCA bfrops: v21 (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA bfrops: v12 (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA bfrops: v20 (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA bfrops: v3 (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA gds: ds12 (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA gds: hash (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA gds: ds21 (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA pdl: pdlopen (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA pif: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component v3.1.5) MCA pif: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component v3.1.5) MCA pinstalldirs: env (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA pinstalldirs: config (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA plog: stdfd (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA plog: syslog (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA plog: default (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA pnet: tcp (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA pnet: test (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA preg: native (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA psec: native (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA psec: none (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA psensor: file (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA psensor: heartbeat (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA pshmem: mmap (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA ptl: usock (MCA v2.1.0, API v1.0.0, Component v3.1.5) MCA ptl: tcp (MCA v2.1.0, API v1.0.0, Component v3.1.5) Isn't there supposed to appear something with ucx? Thanks :) max > hi max, > > > I have set: 'UCX_TLS=tcp,self,sm' on the slurmd's. > > Is it better to build slurm without UCX support or should I simply > > install rdma-core? > i would look into using mellanox ofed with rdma-core, as it is what mellanox is shifting towards or has already done (not sure what 4.9 has tbh). or leave the env vars, i think for pmix it's ok unless you have very large clusters (but i'm no expert here). > > > > > How do I use ucx together with OpenMPI and srun now? > > It works when I set this manually: > > 'mpirun -np 2 -H lsm218,lsm219 --mca pml ucx -x UCX_TLS=rc -x > > UCX_NET_DEVICES=mlx5_0:1 pingpong 1000 1000'. > > But if I put srun before mpirun four tasks will be created, two on > > each node. > you let pmix do it's job and thus simply start the mpi parts with srun instead of mpirun > > srun pingpong 1000 1000 > > if you must tune UCX (as in: default behaviour is not ok), also set it via env vars. (at least try to use the defaults, it's pretty good i think) > > (shameless plug: one of my colleagues setup a tech talk with openmpi people wrt pmix, ucx, openmpi etc; see > https://github.com/easybuilders/easybuild/issues/630 for details and link to youtube recording) > > stijn >
smime.p7s
Description: S/MIME cryptographic signature