from:"David Henkemeyer"

[slurm-users] Questions about adding new nodes to Slurm

2021-04-27 Thread David Henkemeyer

Hello, I'm new to Slurm (coming from PBS), and so I will likely have a few questions over the next several weeks, as I work to transition my infrastructure from PBS to Slurm. My first question has to do with *adding nodes to Slurm*. According to the FAQ (and other articles I've read), you need t

[slurm-users] slurmd -C vs lscpu - which do I use to populate slurm.conf?

2021-04-28 Thread David Henkemeyer

I'm working on populating slurm.conf on my nodes, and I noticed that slurmd -C doesn't agree with lscpu, in all cases, and I'm not sure why. Here is what lscpu reports: Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 And here is what slurmd -C is reporting: NodeName=devops2

[slurm-users] Configless mode enabling issue

2021-05-07 Thread David Henkemeyer

Hello all. My team is enabling slurm (version 20.11.5) in our environment, and we got a controller up and running, along with 2 nodes. Everything was working fine. However, when we try to enable configless mode, I ran into a problem. The node that has a GPU is coming up in "drained" state, and s

Re: [slurm-users] Configless mode enabling issue

2021-05-07 Thread David Henkemeyer

ll > > > ------ > *From:* slurm-users on behalf of > David Henkemeyer > *Sent:* Friday, May 7, 2021 2:41:41 PM > *To:* slurm-users@lists.schedmd.com > *Subject:* [slurm-users] Configless mode enabling issue > > Hello all. My team is enabling

[slurm-users] Question about adding and removing features in Slurm

2021-06-18 Thread David Henkemeyer

We are transitioning from PBS to Slurm. In PBS, we use the following syntax to add/remove properties to a node: qmgr -c "set node properties += " qmgr -c "set node properties -= " Is there a similar way to do this for Slurm? Or is it expected that the administrator will manually edit slurm.co

[slurm-users] New node w/ 3 GPUs is not accepting GPUs tasks

2021-06-23 Thread David Henkemeyer

Hello, I just added a 3rd node to my slurm partition (called "hsw5"), as we continue to enable Slurm in our environment. But the new node is not accepting jobs that require a GPU, despite the fact that it has 3 GPUs. The other node that has a GPU ("devops3") is accepting GPU jobs as expected. A

[slurm-users] When using RequeueExit in Slurm.conf, can you limit the # of requeues?

2021-07-01 Thread David Henkemeyer

Hello, I am investigating Slurm's ability to do requeuing of jobs. I like the fact that I can set RequeueExit= in the slurm.conf file, since this will automatically requeue jobs that exit with the specified exit codes. But, is there a way to limit the # of requeues? Thanks David

[slurm-users] Can I get the original sbatch command, after the fact?

2021-07-16 Thread David Henkemeyer

If I execute a bunch of sbatch commands, can I use sacct (or something else) to show me the original sbatch command line for a given job ID? Thanks David

[slurm-users] Bug when I run "sinfo --states=idle"

2021-10-28 Thread David Henkemeyer

Hello, I just noticed today that when I run "sinfo --states=idle", I get all the idle nodes, plus an additional node that is in the "DRAIN" state (notice how xavier6 is showing up below, even though its not in the idle state): (! 807)-> sinfo --states=idle PARTITION AVAIL TIMELIMIT NODES STATE

[slurm-users] How to limit # of execution slots for a given node

2022-01-06 Thread David Henkemeyer

All, When my team used PBS, we had several nodes that had a TON of CPUs, so many, in fact, that we ended up setting np to a smaller value, in order to not starve the system of memory. What is the best way to do this with Slurm? I tried modifying # of CPUs in the slurm.conf file, but I noticed th

[slurm-users] Questions about default_queue_depth

2022-01-12 Thread David Henkemeyer

Hello, A few weeks ago, we tested Slurm against about 50K jobs, and observed at least one instance where a node went idle, while there were jobs on the queue that could have run on the idle node. The best guess as to why this occurred, at this point, is that the default_queue_depth was set to the

[slurm-users] Question about sbatch options: -n, and --cpus-per-task

2022-03-24 Thread David Henkemeyer

Assuming -N is 1 (meaning, this job needs only one node), then is there a difference between any of these 3 flag combinations: -n 64 (leaving cpus-per-task to be the default of 1) --cpus-per-task 64 (leaving -n to be the default of 1) --cpus-per-task 32 -n 2 As far as I can tell, there is no fun

Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task

2022-03-24 Thread David Henkemeyer

is significant. > > > On Mar 24, 2022, at 12:32 PM, David Henkemeyer < > david.henkeme...@gmail.com> wrote: > > > > Assuming -N is 1 (meaning, this job needs only one node), then is there > a difference between any of these 3 flag combinations: > > > > -n

Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task

2022-03-24 Thread David Henkemeyer

; will likely bite you in the end. E.g., the 64 thread case should do > "--cpus-per-task 64", and the launching processes in the loop should > _probably_ do "-n 64" (assuming it can handle the tasks being assigned to > different nodes). > > On Thu, Mar 24, 2022 at 3:

[slurm-users] Why is --cpu_bind not an option for sbatch? Why only srun?

2022-03-31 Thread David Henkemeyer

We noticed that we can pass --cpu_bind into an srun commandline, but not sbatch. Why is that? Thanks David

[slurm-users] Can I define and use custom env vars in slurm.conf?

2022-04-04 Thread David Henkemeyer

If I have a large number of heterogeneously named nodes in my cluster, and several partitions that include the same large subset of those nodes, I would love to be able to define an env var, and reference that in each partition specification. For instance, say we have the following: PartitionName

Re: [slurm-users] Can I define and use custom env vars in slurm.conf?

2022-04-04 Thread David Henkemeyer

uded as part of this > nodeset. > Nodes > List of nodes in this set. > NodeSet > Unique name for a set of nodes. Must not overlap with any NodeName > definitions. > > Brian Andrus > > > On 4/4/2022 1:08 PM, David Henkemeyer wrote: > > If I have a large nu

[slurm-users] Looking for examples of daily job reports

2022-04-15 Thread David Henkemeyer

All, I'm wanting to improve our daily Slurm job reports. Can anyone point me to some good examples? Currently we are reporting on several things, such as # of jobs that failed to schedule, # of jobs that failed during execution, node utilization, etc, but the report itself is pretty basic and not

Re: [slurm-users] gres/gpu count lower than reported

2022-05-03 Thread David Henkemeyer

I have found that the "reason" field doesn't get updated after you correct the issue. For me, its only when I move the node back to the idle state, that the reason field is then reset. So, assuming /dev/nvidia[0-3] is correct (I've never seen otherwise with nvidia GPUs), then try taking them back

[slurm-users] Is sacct not handling quotes properly?

2022-05-04 Thread David Henkemeyer

I am seeing what I think might be a bug with sacct. When I do the following: *> sbatch --export=NONE --wrap='uname -a' --exclusive* *Submitted batch job 2869585* Then, I ask sacct for the SubmitLine, as such: *> sacct -j 2869586 -o "SubmitLine%-70"SubmitLine-

Re: [slurm-users] Is sacct not handling quotes properly?

2022-05-04 Thread David Henkemeyer

-- sbatch --export=NONE --wrap=uname -a --exclusive So, its storing properly, now I need to see if I can figure out how to preserve/add the quotes on the way out of the DB... David On Wed, May 4, 2022 at 11:15 AM Michael Jennings wrote: > On Wednesday, 04 May 2022, at 10:00:57 (-0700), > Davi

[slurm-users] How to run a job at the end of a set of jobs

2022-05-09 Thread David Henkemeyer

Prologue is a feature whereby I can run something after a single job. Is there a best practice for running a job after a set of jobs? We submit a bunch of jobs to a bunch of nodes, and after all the jobs are done, we would like to submit a "utility job" on each node, but it has to be the last job

[slurm-users] Question about having 2 partitions that are mutually exclusive, but have unexpected interactions

2022-05-12 Thread David Henkemeyer

Question for the braintrust: I have 3 partitions: - Partition A_highpri: 80 nodes - Partition A_lowpri: same 80 nodes - Partition B_lowpri: 10 different nodes There is no overlap between A and B partitions. Here is what I'm observing. If I fill the queue with ~20-30k jobs for partiti

Re: [slurm-users] Question about having 2 partitions that are mutually exclusive, but have unexpected interactions

2022-05-12 Thread David Henkemeyer

5000 jobs being considered, the > remaining aren't even looked at. > > Brian Andrus > On 5/12/2022 7:34 AM, David Henkemeyer wrote: > > Question for the braintrust: > > I have 3 partitions: > >- Partition A_highpri: 80 nodes >- Partition A_lowpri: same 80 n

[slurm-users] Is there a way create reservations w/o being Operator or Admin?

2022-07-11 Thread David Henkemeyer

I would like to remove the restriction that users must be at least operator level to do "scontrol create reservation". So, either I could: - Change the default AdminLevel to operator. Is that possible? - Remove the restriction that a user has to be operator to create a reservation. Is

[slurm-users] Questions about adding new nodes to Slurm

[slurm-users] slurmd -C vs lscpu - which do I use to populate slurm.conf?

[slurm-users] Configless mode enabling issue

Re: [slurm-users] Configless mode enabling issue

[slurm-users] Question about adding and removing features in Slurm

[slurm-users] New node w/ 3 GPUs is not accepting GPUs tasks

[slurm-users] When using RequeueExit in Slurm.conf, can you limit the # of requeues?

[slurm-users] Can I get the original sbatch command, after the fact?

[slurm-users] Bug when I run "sinfo --states=idle"

[slurm-users] How to limit # of execution slots for a given node

[slurm-users] Questions about default_queue_depth

[slurm-users] Question about sbatch options: -n, and --cpus-per-task

Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task

Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task

[slurm-users] Why is --cpu_bind not an option for sbatch? Why only srun?

[slurm-users] Can I define and use custom env vars in slurm.conf?

Re: [slurm-users] Can I define and use custom env vars in slurm.conf?

[slurm-users] Looking for examples of daily job reports

Re: [slurm-users] gres/gpu count lower than reported

[slurm-users] Is sacct not handling quotes properly?

Re: [slurm-users] Is sacct not handling quotes properly?

[slurm-users] How to run a job at the end of a set of jobs

[slurm-users] Question about having 2 partitions that are mutually exclusive, but have unexpected interactions

Re: [slurm-users] Question about having 2 partitions that are mutually exclusive, but have unexpected interactions

[slurm-users] Is there a way create reservations w/o being Operator or Admin?

25 matches

Site Navigation

Mail list logo

Footer information