Re: [slurm-users] Unable to delete account

2023-03-06 Thread Kilian Cavalotti
Hi Simon, On Mon, Mar 6, 2023 at 1:34 PM Simon Gao wrote: > We are experiencing an issue with deleting any Slurm account. > > When running a command like: sacctmgr delete account , > following errors are returned and the command failed. > > # sacctmgr delete account > Database is busy or waitin

[slurm-users] Unable to delete account

2023-03-06 Thread Simon Gao
Hi, We are experiencing an issue with deleting any Slurm account. When running a command like: sacctmgr delete account , following errors are returned and the command failed. # sacctmgr delete account Database is busy or waiting for lock from other user. sacctmgr: error: Getting response to m

Re: [slurm-users] Cleanup of job_container/tmpfs

2023-03-06 Thread Michael Jennings
On Monday, 06 March 2023, at 10:15:22 (+0100), Niels Carl W. Hansen wrote: Seems there still are some issues with the autofs - job_container/tmpfs functionality in Slurm 23.02. If the required directories aren't mounted on the allocated node(s) before jobstart, we get: slurmstepd: error: coul

Re: [slurm-users] Cleanup of job_container/tmpfs

2023-03-06 Thread Brian Andrus
That looks like the users' home directory doesn't exist on the node. If you are not using a shared home for the nodes, your onboarding process should be looked at to ensure it can handle any issues that may arise. If you are using a shared home, you should do the above and have the node ensu

[slurm-users] Partition Hold/Release

2023-03-06 Thread Nicolas Sonoda
Hi! Can I create a partition with a capacity of hold and release jobs when another partition jobs is submited? For example, the partition one and two can hold their jobs when some job of partition three is submited, and after this job completes the partition one and two releases their jobs agai

[slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-06 Thread Stefan Staeglich
Hi, since a half year we using the suspend/resume support for Slurm. This works quite well but sometimes it breaks and no nodes are suspended or resumed anymore. In this case we see the following message in the log: error: power_save module disabled, NULL SuspendProgram A restart of slurmctld

Re: [slurm-users] Cleanup of job_container/tmpfs

2023-03-06 Thread Niels Carl W. Hansen
Hi all Seems there still are some issues with the autofs - job_container/tmpfs functionality in Slurm 23.02. If the required directories aren't mounted on the allocated node(s) before jobstart, we get: slurmstepd: error: couldn't chdir to `/users/lutest': No such file or directory: going to