All,
I have a cloud based cluster using slurm 19.05.0-1
I removed one of the partitions, but now everytime I start slurmctld I get
some errors:
slurmctld[63042]: error: Invalid partition (mpi-h44rs) for JobId=52545
slurmctld[63042]: error: _find_node_record(756): lookup failure for
mpi-h44rs-01
s
If you check the source code (src/slurmctld/job_mgr.c) this error is indeed
thrown when slurmctl unpacks job state files. Tracing through
read_slurm_conf() -> load_all_job_state() -> _load_job_state():
part_ptr = find_part_record (partition);
if (part_ptr == NUL
fyi… Joe is there now staining front entrance & fixing a few minor touchups,
nailing baseboard in basement…
Lock box is on the house now w/ key in it…
On Jul 26, 2019, at 11:28 AM, Jeffrey Frey
mailto:f...@udel.edu>> wrote:
If you check the source code (src/slurmctld/job_mgr.c) this error is i
On Thu, Jul 25, 2019 at 10:20 PM Benjamin Redling <
benjamin.ra...@uni-jena.de> wrote:
> If the 30s delay is only for jobs after the first full queue than it is
> backfill in action?
>
I'm certain this is not the backfill. I see the same behavior when I boot
the controller with all nodes in idle+
On 26/7/19 8:28 am, Jeffrey Frey wrote:
If you check the source code (src/slurmctld/job_mgr.c) this error is
indeed thrown when slurmctl unpacks job state files. Tracing through
read_slurm_conf() -> load_all_job_state() -> _load_job_state():
I don't think that's the actual error that Brian i