Fantastic, this is really helpful, thanks! On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon <ped...@cfa.harvard.edu> wrote:
> Yes it was later than that. If you are 23.02 you are good. We've been > running with storing job_scripts on for years at this point and that part > of the database only uses up 8.4G. Our entire database takes up 29G on > disk. So its about 1/3 of the database. We also have database compression > which helps with the on disk size. Raw uncompressed our database is about > 90G. We keep 6 months of data in our active database. > > -Paul Edmon- > On 9/28/2023 1:57 PM, Ryan Novosielski wrote: > > Sorry for the duplicate e-mail in a short time: do you know (or anyone) > when the hashing was added? Was planning to enable this on 21.08, but we > then had to delay our upgrade to it. I’m assuming later than that, as I > believe that’s when the feature was added. > > On Sep 28, 2023, at 13:55, Ryan Novosielski <novos...@rutgers.edu> > <novos...@rutgers.edu> wrote: > > Thank you; we’ll put in a feature request for improvements in that area, > and also thanks for the warning? I thought of that in passing, but the real > world experience is really useful. I could easily see wanting that stuff to > be retained less often than the main records, which is what I’d ask for. > > I assume that archiving, in general, would also remove this stuff, since > old jobs themselves will be removed? > > -- > #BlackLivesMatter > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novos...@rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB > A555B, Newark > `' > > On Sep 28, 2023, at 13:48, Paul Edmon <ped...@cfa.harvard.edu> > <ped...@cfa.harvard.edu> wrote: > > Slurm should take care of it when you add it. > > So far as horror stories, under previous versions our database size > ballooned to be so massive that it actually prevented us from upgrading and > we had to drop the columns containing the job_script and job_env. This was > back before slurm started hashing the scripts so that it would only store > one copy of duplicate scripts. After this point we found that the > job_script database stayed at a fairly reasonable size as most users use > functionally the same script each time. However the job_env continued to > grow like crazy as there are variables in our environment that change > fairly consistently depending on where the user is. Thus job_envs ended up > being too massive to keep around and so we had to drop them. Frankly we > never really used them for debugging. The job_scripts though are super > useful and not that much overhead. > > In summary my recommendation is to only store job_scripts. job_envs add > too much storage for little gain, unless your job_envs are basically the > same for each user in each location. > > Also it should be noted that there is no way to prune out job_scripts or > job_envs right now. So the only way to get rid of them if they get large is > to 0 out the column in the table. You can ask SchedMD for the mysql command > to do this as we had to do it here to our job_envs. > > -Paul Edmon- > > On 9/28/2023 1:40 PM, Davide DelVento wrote: > > In my current slurm installation, (recently upgraded to slurm v23.02.3), I > only have > > AccountingStoreFlags=job_comment > > I now intend to add both > > AccountingStoreFlags=job_script > AccountingStoreFlags=job_env > > leaving the default 4MB value for max_script_size > > Do I need to do anything on the DB myself, or will slurm take care of the > additional tables if needed? > > Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know > about the additional diskspace and potentially load needed, and with our > resources and typical workload I should be okay with that. > > Thanks! > > > > >