hi all,
we are currently also going through the painful process of making x11
support userfriendly, so i'm also in favour of making this work from eg
vnc or nx/x2go.
however, we now run 17.11.8, and we already noticed that 17.11.11 has
very different x11 related code. is the 19.05 x11 even more d
hi jurgen,
> For our next cluster we will switch from Moab/Torque to Slurm and have
> to adapt the documentation and example batch scripts for the users.
heh, we did that a year ago, and we made (well, fixed the slurm one) a
qsub wrapper to avoid having to document this and retraining our users.
(
hi michael,
very intersting feedback!
have you ever tried/looked at https://github.com/eth-cscs/sarus?
stijn
On 9/20/19 9:11 AM, Mahmood Naderan wrote:
> I appreciate the repplies.
> I will try to test Charliecloud to see what is what...
>
>
> On Fri, Sep 20, 2019, 10:37 Fulcomer, Samuel
> wr
hi max,
are you using rdma-core with mellanox ofed? and do you have any
uverbs_write error messages in dmesg on the hosts? there is an issue
with rdma vs tcp in ucx+pmix when rdma-core is not used. the workaournd
for the issue is to start slurmd on the nodes with environment
'UCX_TLS=tcp,self,sm'
for details and
link to youtube recording)
stijn
>
> Thanks for helping me!
> -max
>
> -Ursprüngliche Nachricht-
> Von: Stijn De Weirdt
> Gesendet: Mittwoch, 12. August 2020 22:30
> An: slurm-users@lists.schedmd.com
> Betreff: Re: [slurm-users] [External] Re:
hi max,
>> you let pmix do it's job and thus simply start the mpi parts with srun
> instead of mpirun
>
> In this case, the srun command works fine for 'srun -N2 -n2 --mpi=pmix
> pingpong 100 100', but the IB connection is not used for the
> communication, only the tcp connection.
h, this od
hi all,
we are currently going through the process of reviewing our limits after
subtle OOM issues that had nothing to do with jobs. we found out that
idle (just rebooted) nodes were not representative for nodes that were
running for a while: gpfs mmfsd was using up to 2.5GB extra, rsyslogd
w
salut guillaume,
nothing else is different between the v1 and v2 setup? (/tmp is tmpfs on
v2 setup perhaps?)
stijn
On 5/22/25 11:10, Guillaume COCHARD via slurm-users wrote:
Hello,
We've noticed a recent change in how MaxRSS is reported on our cluster.
Specifically, the MaxRSS value for m