Yeah, a word of warning about going from 21.08 to 22.05, make sure you have enough storage on the database host you are doing the work on and budget a long enough time for the upgrade.  We just converted our 198 GB (compressed, 534 GB raw) database this week.  The initial attempt failed (after running for 8 hours) because we ran out of disk space (part of the reason we had to compress is that the server we use for our slurm master only has 800 GB of SSD on it).  That meant we had to reimport our DB, which took 8 hours, plus then we had to drop the job scripts and job envs, which took another 5 hours, to then attempt the upgrade which took 2 hours.

Moral of the story, make sure you have enough space and budget sufficient time.  You may want to consider nulling out the job scripts and envs for the upgrade as they complete redo the way those are stored in the database in 22.05 so that it is more efficient but getting from here to there is the trick.


For details see the bug report we filed: https://bugs.schedmd.com/show_bug.cgi?id=14514


-Paul Edmon-


On 7/14/2022 2:34 PM, Timony, Mick wrote:


    What I can tell you is that we have never had a problem
    reimporting the data back in that was dumped from older versions
    into a current version database.  So the import using sacctmgr
    must do the conversion from the older formats to the newer formats
    and handle the schema changes.

​That's the bit of info I was missing, I didn't realise that it outputs the data in a format that sacctmgr can read.

    I will note that if you are storing job_scripts and envs those can
    eat up a ton of space in 21.08.  It looks like they've solved that
    problem in 22.05 but the archive steps on 21.08 took forever due
    to those scripts and envs.

​Yes, we are storing job_scripts with:

AccountingStoreFlags=job_script

I think when we made that decision, we decided that also saving the job_env would take up too much room as our DB is pretty big at the moment, at approx. 300GB with the o2_step_table and the o2_job_table taking up the most space for obvious reasons:

+----------------------------+-----------+
| Table                      | Size (GB) |
+----------------------------+-----------+
| o2_step_table              |    183.83 |
| o2_job_table               |    128.18 |


That's good advice Paul, much appreciated.

>took forever and actually caused issues with the archive process
I think that should be highlighted for other users!

For those interested, to find the tables sizes I did this:

SELECT table_name AS "Table", ROUND(((data_length + index_length) / 1024 / 1024 / 1024), 2) AS "Size (GB)" FROM information_schema.TABLES WHERE table_schema = "slurmdbd" ORDER BY (data_length + index_length) DESC;

Replace slurmdbdwith the name of your database.

Cheers
--Mick

Reply via email to