Re: [slurm-users] Why does Slurm kill one particular user's jobs after a few seconds?

2021-04-15 Thread Thomas Arildsen
Hi Ole Thanks for the suggestion. I am afraid the solution is not the same. At least, restarting `slurmdbd` and `slurmctld` on the head node has made no difference either. It puzzles me why Slurm appears to treat this one user differently than all others. Even other users under the same account

Re: [slurm-users] AutoDetect=nvml throwing an error message

2021-04-15 Thread Cristóbal Navarro
Hi Michael, Thanks, Indeed I don't have it. Slurm must have not detected it. I double checked and NVML is installed (libnvidia-ml-dev for Ubuntu) Here is some output, including the relevant paths for nvml. Is it possible to tell the slurm compilation to check these paths for nvml ? best *NVML PKG

[slurm-users] NHC and slurm

2021-04-15 Thread Heitor
Hello, I'm trying to setup NHC[0] for our Slurm cluster, but I'm not getting it to work properly. I'm using the dev branch from [0] and compiled it this way: $ ./autogen.sh --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib $ make test $ sudo make install When I run nhc, I get an error that

Re: [slurm-users] AutoDetect=nvml throwing an error message

2021-04-15 Thread Michael Di Domenico
the error message sounds like when you built the slurm source it wasn't able to find the nvml devel packages. if you look in where you installed slurm, in lib/slurm you should have a gpu_nvml.so. do you? On Wed, Apr 14, 2021 at 5:53 PM Cristóbal Navarro wrote: > > typing error, should be --> **

Re: [slurm-users] GRES Restrictions

2021-04-15 Thread Stefan Staeglich
Hello, is there a best practise for activating this feature (set ConstrainDevices=yes)? Do I have restart the slurmds? Does this affects running jobs? We are using Slurm 19.05. Best, Stefan Am Dienstag, 25. August 2020, 17:24:41 CEST schrieb Christoph Brüning: > Hello, > > we're using cgroup

Re: [slurm-users] Why does Slurm kill one particular user's jobs after a few seconds?

2021-04-15 Thread Ole Holm Nielsen
Hi Thomas, I wonder if your problem is related to that reported in this list thread? https://lists.schedmd.com/pipermail/slurm-users/2021-April/007107.html You could try to restart the slurmctld service, and also make sure your configuration (slurm.conf etc.) has been pushed correctly to the sl