[slurm-users] sshare is crashing

2020-08-11 Thread Richard Lefebvre
Hi, The command "sshare -l" is crashing. I isolated the problem to an account which is causing the problem. The problem seems to be an extremely large LevelFS in the order of 4.8x10e16. I can see the value if I add the "-p" option. Is there a way to fix the account? Below are the results of the 2

[slurm-users] Questions about sacctmgr load filename

2020-12-16 Thread Richard Lefebvre
Hi, I would like to do the equivalent of: sacctmgr -i add user namef account=grpa sacctmgr -i add user nameg account=grpa ... sacctmgr -i add user namez account=grpa but with an "sacct -i load filename" in which filename contains the grpa with the list of user. The documentation mentions the "lo

Re: [slurm-users] Parallel sbatch

2021-11-05 Thread Richard Lefebvre
I would suggest using Gnu Parallel (https://www.gnu.org/software/parallel/). Also, if you run that many "srun" in a row, on a very large cluster where the slurmctl is very solicited some of the srun might time out and not run. Richard Le ven. 5 nov. 2021 à 05:45, Marcus Pedersén a écrit : > Hi

[slurm-users] Billing/accounting for MIGs is not working

2023-06-06 Thread Richard Lefebvre
We have MIG defined and being used. But the billing for which MIG is used dean't seem to work. I have in the partitions the slurm.conf with something like below for TRESBilllings: TRESBillingWeights=CPU=1,Mem=1G,GRES/gpu:3g.20gb=0.375,GRES/gpu:4g.20gb=0.5,GRES/gpu=1.0 Yet, when I do sacct -j I d

[slurm-users] MIG H100 with xeon Intel

2025-06-12 Thread Richard Lefebvre via slurm-users
I'm having problems with Autodetect=nvml in gres.conf. I get on the controller log the following: error: _check_core_range_matches_sock: gres/gpu GRES autodetected core affinity 16-31 on node node001 doesn't match socket boundaries. (Socket 0 is cores 0-31). Consider setting SlurmdParameters=l3ca