Hi all: Running Slurm 20.11.8. I missed a chance at a recent outage to change our JobAcctGatherType from 'linux' to 'cgroup'. Our ProctrackType has been 'cgroup' for a long time. In short, I'm thinking it would harmless for me to do this now, with running jobs, and below I discuss the caveats I know of. Have any of you made this change with jobs running, or see why in my case I should not?
More info: I see the warnings in the doc about not changing JobAcctGatherType while jobs are running. Some of you have asked SchedMd about this before: - In slurm-dev.schedmd.narkive.com/EbK7qgSg/adding-jobacctgather-plugin-causing-rpc-errors#post1 from 2013, Moe says "don't change this while jobs are running; I'll doc that." (Hence it being doc'd now.) - https://bugs.schedmd.com/show_bug.cgi?id=861 in 2014 mentioned that doing so would break 'sstat' for the already-running jobs. - in https://bugs.schedmd.com/show_bug.cgi?id=2781 in 2016 SchedMD repeated the doc'd warning. In that case, the user reported job tasks completing while Slurm considered the jobs still running. On a dev cluster, I started a job, then changed JobAcctGatherType from 'linux' to 'cgroup', then restarted slurmctld, then the slurmds. That job continued to run and was terminated by its timelimit. This was replicable. I submitted a job with a known RAM-vs-time profile to several otherwise idle nodes. One node I left alone. The other four I switched from 'linux' to 'cgroup' at varied times during the jobs' lives. We have a Prometheus exporter which feeds a Grafana instance to graph the cgroup data. I looked at the 'memory' data across the nodes. One of them reported falsely high memory for the test job. Running the same job again without touching slurmd mid-job yielded identically correct graphs across the nodes. Suppose I switch my cluster (slurmctld, all slurmds) at time T0. In principle a user might want to size her jobs and happen to look at the affected one of the memory-related metrics for a job which was running at T0.and get inaccurate info. Modulo that, we can afford to declare the memory-usage historical info re: the jobs running at T0 (we could tolerate any seeming inacurracies in fairshare arising from that info being inaccurate, and don't yet have e.g. a MaxTresPerX with some RAM value). With our 'cgroup' ProcTrackType, and requiring a mem spec on all jobs, I think we don't need worry if a given slurmd is sending slurmctld wrong or incomprehensible information about a given job's resource usage. Does anyone know of reason to think otherwise? Thanks for reading this far :) -- Grinning like an idiot, Paul Brunk, system administrator Georgia Advanced Computing Resource Center (GACRC) Enterprise IT Svcs, the University of Georgia