Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-16 Thread Stijn De Weirdt
hi max, >> you let pmix do it's job and thus simply start the mpi parts with srun > instead of mpirun > > In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix > pingpong 100 100', but the IB connection is not used for the > communication, only the tcp connection. h, this od

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-14 Thread Stijn De Weirdt
for details and link to youtube recording) stijn > > Thanks for helping me! > -max > > -Ursprüngliche Nachricht- > Von: Stijn De Weirdt > Gesendet: Mittwoch, 12. August 2020 22:30 > An: slurm-users@lists.schedmd.com > Betreff: Re: [slurm-users] [External] Re:

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-12 Thread Stijn De Weirdt
hi max, are you using rdma-core with mellanox ofed? and do you have any uverbs_write error messages in dmesg on the hosts? there is an issue with rdma vs tcp in ucx+pmix when rdma-core is not used. the workaournd for the issue is to start slurmd on the nodes with environment 'UCX_TLS=tcp,self,sm'

Re: [slurm-users] Heterogeneous HPC

2019-09-20 Thread Stijn De Weirdt
hi michael, very intersting feedback! have you ever tried/looked at https://github.com/eth-cscs/sarus? stijn On 9/20/19 9:11 AM, Mahmood Naderan wrote: > I appreciate the repplies. > I will try to test Charliecloud to see what is what... > > > On Fri, Sep 20, 2019, 10:37 Fulcomer, Samuel > wr

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Stijn De Weirdt
hi jurgen, > For our next cluster we will switch from Moab/Torque to Slurm and have > to adapt the documentation and example batch scripts for the users. heh, we did that a year ago, and we made (well, fixed the slurm one) a qsub wrapper to avoid having to document this and retraining our users. (

Re: [slurm-users] Issue with x11

2019-05-15 Thread Stijn De Weirdt
hi all, we are currently also going through the painful process of making x11 support userfriendly, so i'm also in favour of making this work from eg vnc or nx/x2go. however, we now run 17.11.8, and we already noticed that 17.11.11 has very different x11 related code. is the 19.05 x11 even more d