[slurm-users] Re: slurmd on a warwwulf node - not running

2024-12-03 Thread Henkel, Andreas via slurm-users
Hi, No it doesn’t need to be below 1000. Best Andreas Am 03.12.2024 um 22:08 schrieb Steven Jones via slurm-users :  HI, Does the slurm user need to be <1000UID?Using IPA with a UID of [root@vuwunicoslurmd1 slurm]# id slurm uid=126209577(slurm) gid=126209576(slurm) groups=126209576(slurm)

[slurm-users] Re: Slurm not running on a warewulf node

2024-12-03 Thread Steven Jones via slurm-users
I guess I have the syntax wrong, root@node1 slurm]# /usr/sbin/slurmd -D slurmd: fatal: Unable to create NodeAddr list from node[1-7].ods.vuw.ac.nz [root@node1 slurm]# tail /etc/slurm/slurm.conf #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES NodeNam

[slurm-users] Re: Slurm not running on a warewulf node

2024-12-03 Thread Steven Jones via slurm-users
Well that is a start, TY. [root@node1 slurm]# /usr/sbin/slurmd -D slurmd: fatal: Unable to determine this slurmd's NodeName Where is this set? regards Steven From: Jeffrey R. Lang Sent: Wednesday, 4 December 2024 1:17 pm To: Steven Jones ; slurm-us...@schedmd

[slurm-users] Re: Slurm not running on a warewulf node

2024-12-03 Thread Jeffrey R. Lang via slurm-users
Steve Trying running the failing process from the command line and use the -D option. Per man page: Run slurmd in the foreground. Error and debug messages will be copied to stderr. Jeffrey R. Lang Advanced Research Computing Center University of Wyoming, Information Technology Center 1000 E.

[slurm-users] slurm not running on a warewulf node

2024-12-03 Thread Steven Jones via slurm-users
Hi, I have set a log creation/location in slurm.conf as, SlurmdLogFile=/var/log/slurm/slurmd.log But it is 0 length. Slurm will not run, what else do I need to do to log why its failing pls? regards Steven -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send

[slurm-users] Re: How can I make sure my user have only one job per node (Job array --exclusive=user,)

2024-12-03 Thread Oren via slurm-users
Thanks, but yeah I do not want to use ` --exclusive` I just want it to be exclusive for me.. Thanks On Tue, 3 Dec 2024 at 16:40, Renfro, Michael wrote: > As Thomas had mentioned earlier in the thread, there is --exclusive with > no extra additions. But that’d prevent **every** other job from ru

[slurm-users] Re: How can I make sure my user have only one job per node (Job array --exclusive=user,)

2024-12-03 Thread Renfro, Michael via slurm-users
As Thomas had mentioned earlier in the thread, there is --exclusive with no extra additions. But that’d prevent *every* other job from running on that node, which unless this is a cluster for you and you alone, sounds like wasting 90% of the resources. I’d be most perturbed at a user doing that

[slurm-users] Re: How can I make sure my user have only one job per node (Job array --exclusive=user,)

2024-12-03 Thread Oren via slurm-users
Thanks, nice workaround. It will be great if there was a way to actually set it so that one can use only one node per job, a bit like ---exclusive. Thanks On Tue, 3 Dec 2024 at 16:24, Renfro, Michael wrote: > I’ve never done this myself, but others probably have. At the end of [1], > there’s an

[slurm-users] Re: How can I make sure my user have only one job per node (Job array --exclusive=user,)

2024-12-03 Thread Renfro, Michael via slurm-users
I’ve never done this myself, but others probably have. At the end of [1], there’s an example of making a generic resource for bandwidth. You could set that to any convenient units (bytes/second or bits/second, most likely), and assign your nodes a certain amount. Then any network-intensive job c

[slurm-users] Re: How can I make sure my user have only one job per node (Job array --exclusive=user,)

2024-12-03 Thread Oren via slurm-users
Thank you Michael, yeah, you guessed right, Networking. My job is mostly IO (Networking) intensive, my nodes connect to the network via a non blocking switch, but the ethernet cards are not the best, So I don't need many CPUs per node, but I do want to run on all nodes to fully utilize the network

[slurm-users] Re: slurmd on a warwwulf node - not running

2024-12-03 Thread Steven Jones via slurm-users
HI, Does the slurm user need to be <1000UID?Using IPA with a UID of [root@vuwunicoslurmd1 slurm]# id slurm uid=126209577(slurm) gid=126209576(slurm) groups=126209576(slurm) regards Steven From: Ole Holm Nielsen via slurm-users Sent: Wednesday, 4 December

[slurm-users] Re: How can I make sure my user have only one job per node (Job array --exclusive=user,)

2024-12-03 Thread Renfro, Michael via slurm-users
I’ll start with the question of “why spread the jobs out more than required?” and move on to why the other items didn’t work: 1. exclusive only ensures that others’ jobs don’t run on a node with your jobs, and does nothing about other jobs you own. 2. spread-job distributes the work of on

[slurm-users] How can I make sure my user have only one job per node (Job array --exclusive=user,)

2024-12-03 Thread Oren via slurm-users
Hi, I have a cluster of 20-nodes, and I want to run a jobarray on that cluster, but I want each node to get one job per node. When I do the following: #!/bin/bash #SBATCH --job-name=process_images_train# Job name #SBATCH --time=50:00:00 # Time limit hrs:min:sec #SBATCH --tasks=1

[slurm-users] Re: slurmd on a warwwulf node - not running

2024-12-03 Thread Ole Holm Nielsen via slurm-users
Hi Steven, On 03-12-2024 19:34, Steven Jones via slurm-users wrote: I have munge running on the controller and nodes fine, all tests passed. I have slurmctld running on the controller ok after checking the logs / var/spool/slurmctld  was not created which I assume should have happened via the

[slurm-users] slurmd on a warwwulf node - not running

2024-12-03 Thread Steven Jones via slurm-users
Hi, I have munge running on the controller and nodes fine, all tests passed. I have slurmctld running on the controller ok after checking the logs /var/spool/slurmctld was not created which I assume should have happened via the rpm install? Anyway I cant get slurmd to run on the warewulf node

[slurm-users] multiple conf-server entries for sackd

2024-12-03 Thread Brian Andrus via slurm-users
Not sure anyone would know, but... If you are running slurm in HA mode (multiple SlurmctldHost entries) is it possible to point sackd to more than one using the --conf-server option? So either specify --conf-server more than once, or have a comma-delimited list of them? The docs are a little