Re: [slurm-users] Can sinfo/scontrol be called from job_submit.lua?

2022-10-11 Thread Thomas M. Payerle
Running scontrol/sinfo from within a job_submit.lua script seems to be opening a big can of worms --- it might be doable, but it would scare me. Since it sounds like you are only doing such for a fairly limited amount of information which presumably does not change frequently, perhaps it would be b

[slurm-users] Can sinfo/scontrol be called from job_submit.lua?

2022-10-11 Thread Groner, Rob
I am testing a method where, when a job gets submitted asking for specific features, then, if those features don't exist, I'll do something. The job_submit.lua plugin has worked to determine when a job is submitted asking for the specific features. I'm at the point of checking if those feature

Re: [slurm-users] slurm_update error: Invalid node state specified

2022-10-11 Thread Sushil Mishra
Thanks so much! Indeed it was a mismatch between the actual and slurmd.conf SocketsPerBoard value. Sushil On Tue, Oct 11, 2022 at 11:25 AM Paul H. Hargrove wrote: > I think Rob is "on the right track" here. Specifically, I don't think the > error message means that "RESUME" is unrecognized as t

Re: [slurm-users] slurm_update error: Invalid node state specified

2022-10-11 Thread Paul H. Hargrove
I think Rob is "on the right track" here. Specifically, I don't think the error message means that "RESUME" is unrecognized as the name of a state. Rather the message means that a state transition from "INVAL" to "RESUME" is invalid. I can reproduce that message by trying to "RESUME" an "IDLE" no

Re: [slurm-users] slurm_update error: Invalid node state specified

2022-10-11 Thread Groner, Rob
Have you checked the logs for slurmd and slurmctld? I seem to recall that the "invalid" state for a node meant that there was some discrepancy between what the node says or thinks it has (slurmd -C) and what the slurm.conf says it has. While there is that discrepancy and the node is invalid, y

[slurm-users] slurm_update error: Invalid node state specified

2022-10-11 Thread Sushil Mishra
Dear all, I am stuck with scontrol not recognizing the state keywords. I wonder if someone can point me to the possible cause of the error. I restarted slurmd a few times, and it didn't help. [sushil@fucose ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST LocalQ* up infinite

Re: [slurm-users] Accounting core-hours usages

2022-10-11 Thread Bjørn-Helge Mevik
Sushil Mishra writes: > Dear all, > > I am pretty new to system administration and looking for some help > setup slumdb or maridb in a GPU cluster. We bought a machine but the vendor > simply installed slurm and did not install any database for accounting. I > tried installing MariaDB and then sl