[slurm-users] CPU utilisation using two commands scontrol association and sreport makes a huge difference

2024-02-19 Thread prachikakade.lit--- via slurm-users
Dear Team, I created a small cluster of 3 nodes on my VM ware to work on the CPU utilization concept. I created a user name= hpcuser01, and allocated GrpTresMin=cpu=5940 -> CPU minutes and gpu=0 Now, when I checked his utilization using scontrol association cmd # scontrol show ass user=hp

[slurm-users] Slurm Power Saving Guide: Why doesnt slurm mark as failed when resumeProgram returns =/= 0

2024-02-19 Thread Xaver Stiensmeier via slurm-users
Dear slurm-user list, I had cases where our resumeProgram failed due to temporary cloud timeouts. In that case the resumeProgram returns a value =/= 0. Why does Slurm still wait until resumeTimeout instead of just accepting the startup as failed which then should lead to a rescheduling of the job

[slurm-users] slurmdbd 17.02: "cluster not registered" (but things work)

2024-02-19 Thread Matthias Leopold via slurm-users
Hi, I need to take care of a 17.02 Slurm cluster (I'm preparing it for upgrades). I see that slurmdbd logs various "cluster not registered" messages at startup (DBD_CLUSTER_TRES,DBD_JOB_START,DBD_STEP_START), but I don't see a real problem. Accounting works. Do I have to worry? Can this be re