I met the error when using slurm.
srun: error: _server_read: fd 18 error reading header: Connection reset by peer
srun: error: step_launch_notify_io_failure: aborting, io error with slurmstepd
on node 1
0: slurmstepd: error: execve(): singularity: No such file or directory
srun: error: master: ta
It is probably that it could not find the executable singularity?
On Tue, Oct 17, 2017 at 8:04 AM, Chaofeng Zhang wrote:
> I met the error when using slurm.
>
>
>
> srun: error: _server_read: fd 18 error reading header: Connection reset by
> peer
> srun: error: step_launch_notify_io_failure: ab
Hi,
We have been having some with NFS mounts via Infiniband getting dropped
by nodes. We ended up switching our main admin server, which provides
NFS and Slurm from one machine to another.
Now, however, if slurmdbd is started, as soon as slurmctld starts,
slurmdbd seg faults. In the slurmdbd.l
You probably have a core file in the directory where slurmdbd logs to, a
back trace from gdb would be most telling
On Oct 17, 2017 08:17, "Loris Bennett" wrote:
>
> Hi,
>
> We have been having some with NFS mounts via Infiniband getting dropped
> by nodes. We ended up switching our main admin s
Hi, All,
I am gathering hardware requirements for head nodes for my next cluster.
The new cluster will have ~1500 nodes. We ran 5 million jobs last year. I
plan to run the slurmctld on one node and the slurmdbd on another. I also
plan to write the StateSaveLocation to an NFS appliance. Does the f
From my experience that should all be sufficient. The real key is the
high clock rate CPU and sufficient memory. SSD's for the slurm spool
directory is good too. You will want to put CentOS7 on this at least
and run the latest version of MariaDB as well.
-Paul Edmon-
On 10/17/2017 5:13 PM
Hey everyone,
I am working at a university and we trying to setup a slurm cluster
for courses and research.
for the courses we would like to enforce qos on users that can connect
via pbis-open auth. meaning they are authenticating against AD server.
There are alot of users and ea
You Can use AD, but it is bothersome in Many ways. Use à Web portail for
manage your users.
Le 18 oct. 2017 07:25, "Nadav Toledo" a
écrit :
> Hey everyone,
> I am working at a university and we trying to setup a slurm cluster for
> courses and research.
> for the courses we would like to enforce
Qos limits associations and AD auth Sorry for all the wierd symbols, I
was copying the code from linux terminal
here is the clean code(I hope):
if ((accounting_enforce & ACCOUNTING_ENFORCE_QOS)
&& assoc_ptr
&& !admin
&& (!assoc_ptr->usage->valid_qos
|| !bit_test(assoc_ptr->
Yo. Put à freaking Web portail, if you add this to thé cluster you and your
student will have to manage it. The will get bad habit of it. Or installé à
singularity cluster. You Can code all this in à afternoon easy.
Le 18 oct. 2017 07:35, "Nadav Toledo" a
écrit :
> Sorry for all the wierd symbol
Re: [slurm-dev] Re: Qos limits associations and AD auth can you
ellaborate what exactly you mean by web portal?
at the moment users are logging to login server via ssh with their AD
credentials, these creds are being auth against AD via pbis-open
What do you suggest I add to these me
Wellington, for security, first wrong starting. HPC not secure. Except if
you have à 10pers team. I hope that at list you put thé cluster behind a
router firewall in à militarisation zone. If you d'idées not second score
in your ass, Man. Also thé third screw is that you let ssh access to not
trust
Am 17. Oktober 2017 23:12:35 MESZ, schrieb Daniel Barker :
>Hi, All,
>
>I am gathering hardware requirements for head nodes for my next
>cluster.
>The new cluster will have ~1500 nodes. We ran 5 million jobs last year.
>I
>plan to run the slurmctld on one node and the slurmdbd on another. I
>also
13 matches
Mail list logo