[slurm-users] Errors after removing partition

2019-07-26 Thread Brian Andrus
All, I have a cloud based cluster using slurm 19.05.0-1 I removed one of the partitions, but now everytime I start slurmctld I get some errors: slurmctld[63042]: error: Invalid partition (mpi-h44rs) for JobId=52545 slurmctld[63042]: error: _find_node_record(756): lookup failure for mpi-h44rs-01 s

Re: [slurm-users] Errors after removing partition

2019-07-26 Thread Jeffrey Frey
If you check the source code (src/slurmctld/job_mgr.c) this error is indeed thrown when slurmctl unpacks job state files. Tracing through read_slurm_conf() -> load_all_job_state() -> _load_job_state(): part_ptr = find_part_record (partition); if (part_ptr == NUL

Re: [slurm-users] Errors after removing partition

2019-07-26 Thread Jodie H. Sprouse
fyi… Joe is there now staining front entrance & fixing a few minor touchups, nailing baseboard in basement… Lock box is on the house now w/ key in it… On Jul 26, 2019, at 11:28 AM, Jeffrey Frey mailto:f...@udel.edu>> wrote: If you check the source code (src/slurmctld/job_mgr.c) this error is i

Re: [slurm-users] [Long] Why are tasks started on a 30 second clock?

2019-07-26 Thread Kirill Katsnelson
On Thu, Jul 25, 2019 at 10:20 PM Benjamin Redling < benjamin.ra...@uni-jena.de> wrote: > If the 30s delay is only for jobs after the first full queue than it is > backfill in action? > I'm certain this is not the backfill. I see the same behavior when I boot the controller with all nodes in idle+

Re: [slurm-users] Errors after removing partition

2019-07-26 Thread Chris Samuel
On 26/7/19 8:28 am, Jeffrey Frey wrote: If you check the source code (src/slurmctld/job_mgr.c) this error is indeed thrown when slurmctl unpacks job state files.  Tracing through read_slurm_conf() -> load_all_job_state() -> _load_job_state(): I don't think that's the actual error that Brian i