Il 08/06/20 12:16, Diego Zuccato ha scritto:
> I have another partition on these new nodes. 4 identical machines, new
> installation, ConnectX-5 card, dual Intel Xeon 5120 (14 core dual
> thread). No problem running a job requiring 112 threads (on 4 nodes),
> but can't run a single-node job with 5
Hi,
Can you please help me understand how the passwordless ssh works on SLURM?
I was under the assumption that jobs/tasks are ultimately submitted by the
"slurm" linux user and not by the linux user who wants to run jobs. Is this
not correct? So is it not sufficient for only the "slurm" linux use
Hi Durai,
I can only try to explain how I understand this: The "slurm" user runs
only the slurmctld and slurmdbd central server daemons. On the compute
nodes, the slurmd daemon runs as the root user so that it can start user
tasks on behalf of normal users.
The "slurm" user should *not* hav
On 6/9/20 12:12 PM, Steve Brasier wrote:
Hi all, looking for some advice on the process to following when doing one
of the reconfigurations which requires a slurm daemon restart (as listed
in docs for "scontrol reconfigure").
When reconfiguring slurm.conf, make sure to propagate that file to a
All,
Has anyone successfully implemented the DNS SRV records for configs?
I am curious about where to put the SRV record (what domain/name) as we
have more than one cluster in the same domain.
Maybe that would not be supported. Cannot tell from the documentation at
https://slurm.schedmd.com/
On Tuesday, 09 June 2020, at 12:43:34 (+0200),
Ole Holm Nielsen wrote:
> in which case you need to set up SSH authorized_keys files for such
> users.
I'll admit that I didn't know about this until I came to LANL, but
there's actually a much better alternative than having to create user
key pairs
Hi,
I am encountering a weird issue, and I'm not sure where it is coming from.
I have setup a slurm-based cluster using AWS ParallelCluster. I have tweaked
the slurm configuration to enable X forwarding by setting PrologFlags=X11. The
ParallelCluster portion is relevant, as basically every ti
Sounds like a race condition where slurmd is starting before the node is
truly ready.
You can try adding dependencies for slurmd so it will not start until
some other needed service is running.
The benefits of systemd :)
Brian Andrus
On 6/9/2020 10:53 AM, Dumont, Joey wrote:
Hi,
I am
Hi Michael,
Thanks very much, this is really cool! I need to look into the
HostbasedAuthentication for intra-cluster MPI tasks spawned by SSH (not
using srun).
Presumably external access still needs to use SSH authorized keys?
Best regards,
Ole
On 09-06-2020 17:45, Michael Jennings wrote:
Hi Prentice,
Could you kindly elaborate on this statement? Is host-based security
safe inside a compute cluster compared to user-based SSH keys?
Thanks,
Ole
On 09-06-2020 21:26, Prentice Bisbal wrote:
Host-based security is not considered as safe as user-based security, so
should only be u
Host-based security is not considered as safe as user-based security, so
should only be used in special cases.
On 6/9/20 11:45 AM, Michael Jennings wrote:
On Tuesday, 09 June 2020, at 12:43:34 (+0200),
Ole Holm Nielsen wrote:
in which case you need to set up SSH authorized_keys files for such
> Using sacct you can find those information, try the below options and see if
> that works.
>
> sacct -j --format=jobid,ReqTRES%50,ReqGres
Thanks, I tried that command but it looks to show the requested number of GPUs
instead of the GPU index. I tried ` sacct -j -l` too. However, it
seems
On Tuesday, 09 June 2020, at 21:27:27 (+0200),
Ole Holm Nielsen wrote:
> Thanks very much, this is really cool! I need to look into the
> HostbasedAuthentication for intra-cluster MPI tasks spawned by SSH (not
> using srun).
>
> Presumably external access still needs to use SSH authorized keys?
> -j -l` too. However, it seems to include any GPU index information
> even in AllocGres and AllocTres columns.
It DOES NOT seem to include any GPU index, I meant. Sorry.
Best.
露崎 浩太 (Kota Tsuyuzaki)
kota.tsuyuzaki...@hco.ntt.co.jp
NTTソフトウェアイノベーションセ
14 matches
Mail list logo