[slurm-users] HOST CPUs not used by jobs

2024-10-21 Thread Bhaskar Chakraborty via slurm-users
I have a SLURM configuration of 2 hosts with 6 + 4 CPUs. I am submitting jobs with sbatch -n . However, I see that even when I have exhausted all 10 CPU slots for the running jobs it's still allowing subsequent jobs to run ! The CPU slots availability is also show as full for the 2 hosts. No job

[slurm-users] Customization of (error) messages after the job submission

2024-10-21 Thread Sebastian Sitkiewicz via slurm-users
Dear SLUR Users and Administrators, I am interested in a way to customize the job submission exit statuses (mainly error codes) after the job has already been queued by the SLURM controller. We aim to provide more user-friendly messages and reminders in case of any errors or obstacles (also adj

[slurm-users] Re: Randomly draining nodes

2024-10-21 Thread laddaoui--- via slurm-users
You were right, I found that the slurm.conf file was different between the controller node and the computes, so I've synchronized it now. I was also considering setting up an epilogue script to help debug what happens after the job finishes. Do you happen to have any examples of what an epilogue

[slurm-users] Re: HOST CPUs not used by jobs

2024-10-21 Thread jubhaskar--- via slurm-users
Apologies for the trouble. Just discovered that I had done some temporary tweaks in the code which was prohibiting the reservation of the resources. This was to be reverted back after the testing which I missed! This in turn led to running of all jobs. Please ignore the query. -Bhaskar. -- sl

[slurm-users] Re: Randomly draining nodes

2024-10-21 Thread Christopher Samuel via slurm-users
On 10/21/24 4:35 am, laddaoui--- via slurm-users wrote: It seems like there's an issue with the termination process on these nodes. Any thoughts on what could be causing this? That usually means processes wedged in the kernel for some reason, in an uninterruptible sleep state. You can define