Hi,
No it doesn’t need to be below 1000.
Best
Andreas
Am 03.12.2024 um 22:08 schrieb Steven Jones via slurm-users
:
HI,
Does the slurm user need to be <1000UID?Using IPA with a UID of
[root@vuwunicoslurmd1 slurm]# id slurm
uid=126209577(slurm) gid=126209576(slurm) groups=126209576(slurm)
I guess I have the syntax wrong,
root@node1 slurm]# /usr/sbin/slurmd -D
slurmd: fatal: Unable to create NodeAddr list from node[1-7].ods.vuw.ac.nz
[root@node1 slurm]# tail /etc/slurm/slurm.conf
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeNam
Well that is a start, TY.
[root@node1 slurm]# /usr/sbin/slurmd -D
slurmd: fatal: Unable to determine this slurmd's NodeName
Where is this set?
regards
Steven
From: Jeffrey R. Lang
Sent: Wednesday, 4 December 2024 1:17 pm
To: Steven Jones ; slurm-us...@schedmd
Steve
Trying running the failing process from the command line and use the -D
option.
Per man page: Run slurmd in the foreground. Error and debug messages will be
copied to stderr.
Jeffrey R. Lang
Advanced Research Computing Center
University of Wyoming, Information Technology Center
1000 E.
Hi,
I have set a log creation/location in slurm.conf as,
SlurmdLogFile=/var/log/slurm/slurmd.log
But it is 0 length.
Slurm will not run, what else do I need to do to log why its failing pls?
regards
Steven
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send
Thanks, but yeah I do not want to use ` --exclusive` I just want it to be
exclusive for me..
Thanks
On Tue, 3 Dec 2024 at 16:40, Renfro, Michael wrote:
> As Thomas had mentioned earlier in the thread, there is --exclusive with
> no extra additions. But that’d prevent **every** other job from ru
As Thomas had mentioned earlier in the thread, there is --exclusive with no
extra additions. But that’d prevent *every* other job from running on that
node, which unless this is a cluster for you and you alone, sounds like wasting
90% of the resources. I’d be most perturbed at a user doing that
Thanks, nice workaround.
It will be great if there was a way to actually set it so that one can use
only one node per job, a bit like ---exclusive.
Thanks
On Tue, 3 Dec 2024 at 16:24, Renfro, Michael wrote:
> I’ve never done this myself, but others probably have. At the end of [1],
> there’s an
I’ve never done this myself, but others probably have. At the end of [1],
there’s an example of making a generic resource for bandwidth. You could set
that to any convenient units (bytes/second or bits/second, most likely), and
assign your nodes a certain amount. Then any network-intensive job c
Thank you Michael,
yeah, you guessed right, Networking.
My job is mostly IO (Networking) intensive, my nodes connect to the network
via a non blocking switch, but the ethernet cards are not the best,
So I don't need many CPUs per node, but I do want to run on all nodes to
fully utilize the network
HI,
Does the slurm user need to be <1000UID?Using IPA with a UID of
[root@vuwunicoslurmd1 slurm]# id slurm
uid=126209577(slurm) gid=126209576(slurm) groups=126209576(slurm)
regards
Steven
From: Ole Holm Nielsen via slurm-users
Sent: Wednesday, 4 December
I’ll start with the question of “why spread the jobs out more than required?”
and move on to why the other items didn’t work:
1. exclusive only ensures that others’ jobs don’t run on a node with your
jobs, and does nothing about other jobs you own.
2. spread-job distributes the work of on
Hi,
I have a cluster of 20-nodes, and I want to run a jobarray on that cluster,
but I want each node to get one job per node.
When I do the following:
#!/bin/bash
#SBATCH --job-name=process_images_train# Job name
#SBATCH --time=50:00:00 # Time limit hrs:min:sec
#SBATCH --tasks=1
Hi Steven,
On 03-12-2024 19:34, Steven Jones via slurm-users wrote:
I have munge running on the controller and nodes fine, all tests passed.
I have slurmctld running on the controller ok after checking the logs /
var/spool/slurmctld was not created which I assume should have happened
via the
Hi,
I have munge running on the controller and nodes fine, all tests passed.
I have slurmctld running on the controller ok after checking the logs
/var/spool/slurmctld was not created which I assume should have happened via
the rpm install?
Anyway I cant get slurmd to run on the warewulf node
Not sure anyone would know, but...
If you are running slurm in HA mode (multiple SlurmctldHost entries) is
it possible to point sackd to more than one using the --conf-server option?
So either specify --conf-server more than once, or have a
comma-delimited list of them?
The docs are a little
16 matches
Mail list logo