[slurm-users] Re: [INTERNET] Re: question on sbatch --prefer

2024-02-09 Thread Alan Stange via slurm-users
Chip, Thank you for your prompt response.  We could do that, but the helper is optional, and at times might involve additional helpers depending  on the inputs to the problem being solved, and we don't a priori know the number of helpers that might be needed. Alan On 2/9/24 10:59, Chip Seraphine

[slurm-users] Re: Compilation question

2024-02-09 Thread Davide DelVento via slurm-users
Hi Sylvain, For the series better late than never, is this still a problem? If so, is this a new install or an update? Whan environment/compiler are you using? The error undefined reference to `__nv_init_env' seems to indicate that you are doing something cuda-related which I think you should not

[slurm-users] Re: Memory used per node

2024-02-09 Thread Davide DelVento via slurm-users
If you would like the high watermark memory utilization after the job completes, https://github.com/NCAR/peak_memusage is a great tool. Of course it has the limitation that you need to know that you want that information *before* starting the job, which might or might not a problem for your use cas

[slurm-users] Memory used per node

2024-02-09 Thread Gerhard Strangar via slurm-users
Hello, I'm wondering if there's a way to tell how much memory my job is using per node. I'm doing #SBATCH -n 256 srun solver inputfile When I run sacct -o maxvmsize, the result apparently is the maxmimum VSZ of the largest solver process, not the maximum of the sum of them all (unlike when calli

[slurm-users] Re: Could not find group with gid even when they exist

2024-02-09 Thread Nic Lewis via slurm-users
Managed to narrow it down a little bit. Our groups file is pretty large and we have a handful of individual groups that are also quite large as shown below [root@batch1 ~]# wc /etc/group 6075 6075 349457 /etc/group [root@batch1 ~]# grep 8xxx2 /etc/group | wc -c 56959 It looks like one of th

[slurm-users] Relative QOS limits in sacctmgr?

2024-02-09 Thread Chip Seraphine via slurm-users
Hello, TL,DR: How does the relative QOS flag work? I have a QOS and I want it to be collectively restricted to 50% of the reachable cores in the cluster. I’ve been managing this by dividing my core count to 2 to get N, and doing ‘sacctmgr update qos foobar set MaxTRES=cpu=N’. That’s fine,

[slurm-users] Re: question on sbatch --prefer

2024-02-09 Thread Chip Seraphine via slurm-users
Normally I'd address this by having an sbatch script allocate enough resources for both jobs (specifying one node), and then kick off the helper as a separate step (assuming I am understanding your issue correctly). On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" mailto:slurm-users@lists.sc

[slurm-users] question on sbatch --prefer

2024-02-09 Thread Alan Stange via slurm-users
Hello all, I'm somewhat new to Slurm, but long time user of other batch systems.   Assume we have a simple cluster of uniform racks of systems with no special resources, and our jobs are all single cpu tasks. Lets say I have a long running job in the cluster, which needs to spawn a helper process

[slurm-users] Re: [External] Is there a way to list allocated/unallocated resources defined in a QoS?

2024-02-09 Thread Pacey, Mike via slurm-users
Hi Alistair, I was holding off replying in the hope someone would have a good answer. In lieu of that, here’s my partial answer: When I looked at trying to report per-user and per-group qos values a few months I discovered that SLURM reports the information via this command: scontrol -o show a