No, all the archiving does is remove the pointer. What slurm does right
now is that it creates a hash of the job_script/job_env and then checks
and sees if that hash matches one on record. If not then it adds it to
the record, if it does match then it adds a pointer to the appropriate
record. So you can think of the job_script/job_env as an internal
database of all the various scripts and envs that slurm has ever seen
and then what ends up in the Job record is a pointer to that database.
This way slurm can deduplicate scripts/envs that are the same. This
works great for job_scripts as they are functionally the same and thus
you have many jobs pointed to the same script, but less so for job_envs.
-Paul Edmon-
On 9/28/2023 1:55 PM, Ryan Novosielski wrote:
Thank you; we’ll put in a feature request for improvements in that
area, and also thanks for the warning? I thought of that in passing,
but the real world experience is really useful. I could easily see
wanting that stuff to be retained less often than the main records,
which is what I’d ask for.
I assume that archiving, in general, would also remove this stuff,
since old jobs themselves will be removed?
--
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB
A555B, Newark
`'
On Sep 28, 2023, at 13:48, Paul Edmon <ped...@cfa.harvard.edu> wrote:
Slurm should take care of it when you add it.
So far as horror stories, under previous versions our database size
ballooned to be so massive that it actually prevented us from
upgrading and we had to drop the columns containing the job_script
and job_env. This was back before slurm started hashing the scripts
so that it would only store one copy of duplicate scripts. After
this point we found that the job_script database stayed at a fairly
reasonable size as most users use functionally the same script each
time. However the job_env continued to grow like crazy as there are
variables in our environment that change fairly consistently
depending on where the user is. Thus job_envs ended up being too
massive to keep around and so we had to drop them. Frankly we never
really used them for debugging. The job_scripts though are super
useful and not that much overhead.
In summary my recommendation is to only store job_scripts. job_envs
add too much storage for little gain, unless your job_envs are
basically the same for each user in each location.
Also it should be noted that there is no way to prune out job_scripts
or job_envs right now. So the only way to get rid of them if they get
large is to 0 out the column in the table. You can ask SchedMD for
the mysql command to do this as we had to do it here to our job_envs.
-Paul Edmon-
On 9/28/2023 1:40 PM, Davide DelVento wrote:
In my current slurm installation, (recently upgraded to slurm
v23.02.3), I only have
AccountingStoreFlags=job_comment
I now intend to add both
AccountingStoreFlags=job_script
AccountingStoreFlags=job_env
leaving the default 4MB value for max_script_size
Do I need to do anything on the DB myself, or will slurm take care
of the additional tables if needed?
Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I
know about the additional diskspace and potentially load needed, and
with our resources and typical workload I should be okay with that.
Thanks!