Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

2024-01-18 Thread Baer, Troy
Hi Hafedh, Your job script has the sbatch directive “—gpus-per-node=4” set. I suspect that if you look at what’s allocated to the running job by doing “scontrol show job ” and looking at the TRES field, it’s been allocated 4 GPUs instead of one. Regards, --Troy From: slurm-us

Re: [slurm-users] Use all cores when submitting to heterogeneous nodes

2022-03-22 Thread Baer, Troy
Requesting --exclusive and then using $SLURM_CPUS_ON_NODE to determine the number of the tasks or threads to use inside the job script would be my recommendation. --Troy -Original Message- From: slurm-users On Behalf Of Tina Friedrich Sent: Tuesday, March 22, 2022 10:43 AM To:

Re: [slurm-users] Database Compression

2021-12-02 Thread Baer, Troy
My site has just updated to Slurm 21.08 and we are looking at moving to the built-in job script capture capability, so I'm curious about this as well. --Troy -Original Message- From: slurm-users On Behalf Of Paul Edmon Sent: Thursday, December 2, 2021 10:30 AM To: slurm-users@l

Re: [slurm-users] Testing Lua job submit plugins

2021-05-06 Thread Baer, Troy
We have developed a set of unit tests based on LuaUnit for our clusters' submit filters. --Troy From: slurm-users On Behalf Of Michael Robbert Sent: Thursday, May 6, 2021 1:11 PM To: Slurm User Community List Subject: [slurm-users] Testing Lua job submit plugins I'm wondering

Re: [slurm-users] sprio not working

2020-10-27 Thread Baer, Troy
What version of Slurm are you running? I had a problem like this in the initial 20.02 release that was fixed in 20.02.1. --Troy From: slurm-users on behalf of Erik Bryer Reply-To: Slurm User Community List Date: Tuesday, October 27, 2020 at 8:30 PM To: "slurm-users@lists.sch

Re: [slurm-users] SLURM reservations with MAGNETIC flag

2020-09-25 Thread Baer, Troy
I've been looking at it for classroom type reservations, but I ran into a bug where jobs that weren't eligible to access the reservation were being attracted to it anyway. That's supposed to be fixed in 20.02.6. See https://bugs.schedmd.com/show_bug.cgi?id=9593 for details. --Troy O

Re: [slurm-users] know time limit from inside job

2020-07-27 Thread Baer, Troy
There's an outstanding feature request for that: https://bugs.schedmd.com/show_bug.cgi?id=8383 While waiting on that, we've taken to injecting it into the job's environment ourselves in the Lua submit filter. --Troy On 7/27/20, 12:45 PM, "slurm-users on behalf of Brian Andrus" wrote

Re: [slurm-users] one job at a time - how to set?

2020-04-29 Thread Baer, Troy
I don’t think there’s a way to do that in Slurm using just the node declaration, other than the previously mentioned way of configuring it to show up as having only 1 core. However, you could put the node in a partition that has OverSubscribe=EXCLUSIVE set, and have that partition be the only w