[slurm-users] changing JobAcctGatherType w/running jobs

Paul Brunk Tue, 07 Sep 2021 16:27:44 -0700

Hi all:

Running Slurm 20.11.8.  I missed a chance at a recent outage to change our 
JobAcctGatherType from 'linux' to 'cgroup'.  Our ProctrackType has been 
'cgroup' for a long time.  In short, I'm thinking it would harmless for me to 
do this now, with running jobs, and below I discuss the caveats I know of.  
Have any of you made this change with jobs running, or see why in my case I 
should not?

More info:

I see the warnings in the doc about not changing JobAcctGatherType while jobs
are running. Some of you have asked SchedMd about this before:

- In
slurm-dev.schedmd.narkive.com/EbK7qgSg/adding-jobacctgather-plugin-causing-rpc-errors#post1
from 2013, Moe says "don't change this while jobs are running; I'll doc that."
(Hence it being doc'd now.)

- https://bugs.schedmd.com/show_bug.cgi?id=861 in 2014 mentioned that doing so
would break 'sstat' for the already-running jobs.

- in https://bugs.schedmd.com/show_bug.cgi?id=2781 in 2016 SchedMD repeated the
doc'd warning. In that case, the user reported job tasks completing while
Slurm considered the jobs still running.

On a dev cluster, I started a job, then changed JobAcctGatherType from 'linux'
to 'cgroup', then restarted slurmctld, then the slurmds. That job continued to
run and was terminated by its timelimit. This was replicable.

I submitted a job with a known RAM-vs-time profile to several otherwise idle
nodes. One node I left alone. The other four I switched from 'linux' to
'cgroup' at varied times during the jobs' lives. We have a Prometheus exporter
which feeds a Grafana instance to graph the cgroup data. I looked at the
'memory' data across the nodes. One of them reported falsely high memory for
the test job. Running the same job again without touching slurmd mid-job
yielded identically correct graphs across the nodes.

Suppose I switch my cluster (slurmctld, all slurmds) at time T0. In principle
a user might want to size her jobs and happen to look at the affected one of
the memory-related metrics for a job which was running at T0.and get inaccurate
info. Modulo that, we can afford to declare the memory-usage historical info
re: the jobs running at T0 (we could tolerate any seeming inacurracies in
fairshare arising from that info being inaccurate, and don't yet have e.g. a
MaxTresPerX with some RAM value). With our 'cgroup' ProcTrackType, and
requiring a mem spec on all jobs, I think we don't need worry if a given slurmd
is sending slurmctld wrong or incomprehensible information about a given job's
resource usage.

Does anyone know of reason to think otherwise? Thanks for reading this far :)

--
Grinning like an idiot,
Paul Brunk, system administrator
Georgia Advanced Computing Resource Center (GACRC)
Enterprise IT Svcs, the University of Georgia

[slurm-users] changing JobAcctGatherType w/running jobs

Reply via email to