hi max,
>> you let pmix do it's job and thus simply start the mpi parts with srun
> instead of mpirun
>
> In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix
> pingpong 100 100', but the IB connection is not used for the
> communication, only the tcp connection.
h, this od
for details and
link to youtube recording)
stijn
>
> Thanks for helping me!
> -max
>
> -Ursprüngliche Nachricht-
> Von: Stijn De Weirdt
> Gesendet: Mittwoch, 12. August 2020 22:30
> An: slurm-users@lists.schedmd.com
> Betreff: Re: [slurm-users] [External] Re:
hi max,
are you using rdma-core with mellanox ofed? and do you have any
uverbs_write error messages in dmesg on the hosts? there is an issue
with rdma vs tcp in ucx+pmix when rdma-core is not used. the workaournd
for the issue is to start slurmd on the nodes with environment
'UCX_TLS=tcp,self,sm'
hi michael,
very intersting feedback!
have you ever tried/looked at https://github.com/eth-cscs/sarus?
stijn
On 9/20/19 9:11 AM, Mahmood Naderan wrote:
> I appreciate the repplies.
> I will try to test Charliecloud to see what is what...
>
>
> On Fri, Sep 20, 2019, 10:37 Fulcomer, Samuel
> wr
hi jurgen,
> For our next cluster we will switch from Moab/Torque to Slurm and have
> to adapt the documentation and example batch scripts for the users.
heh, we did that a year ago, and we made (well, fixed the slurm one) a
qsub wrapper to avoid having to document this and retraining our users.
(
hi all,
we are currently also going through the painful process of making x11
support userfriendly, so i'm also in favour of making this work from eg
vnc or nx/x2go.
however, we now run 17.11.8, and we already noticed that 17.11.11 has
very different x11 related code. is the 19.05 x11 even more d