Re: [slurm-users] enabling job script archival

Paul Edmon Mon, 02 Oct 2023 08:09:20 -0700

At least in our setup, users can see their own scripts by doing sacct -B-j JOBID

I would make sure that the scripts are being stored and how you havePrivateData set.


-Paul Edmon-

On 10/2/2023 10:57 AM, Davide DelVento wrote:

I deployed the job_script archival and it is working, however it canbe queried only by root.

A regular user can run sacct -lj towards any jobs (even those by otherusers, and that's okay in our setup) with no problem. However if theyrun sacct -j job_id --batch-script even against a job they ownthemselves, nothing is returned and I get a


slurmdbd: error: couldn't get information for this user (null)(xxxxxx)

where xxxxx is the posix ID of the user who's running the query in theslurmdbd logs.

Both configure files slurmdbd.conf and slurm.conf do not have any"permission" setting. FWIW, we use LDAP.

Is that the expected behavior, in that by default only root can seethe job scripts? I was assuming the users themselves should be able todebug their own jobs... Any hint on what could be changed to achieve this?


Thanks!

On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento<davide.quan...@gmail.com> wrote:


    Fantastic, this is really helpful, thanks!

    On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon
    <ped...@cfa.harvard.edu> wrote:

        Yes it was later than that. If you are 23.02 you are good. 
        We've been running with storing job_scripts on for years at
        this point and that part of the database only uses up 8.4G. 
        Our entire database takes up 29G on disk. So its about 1/3 of
        the database.  We also have database compression which helps
        with the on disk size. Raw uncompressed our database is about
        90G.  We keep 6 months of data in our active database.

        -Paul Edmon-

        On 9/28/2023 1:57 PM, Ryan Novosielski wrote:

        Sorry for the duplicate e-mail in a short time: do you know
        (or anyone) when the hashing was added? Was planning to
        enable this on 21.08, but we then had to delay our upgrade to
        it. I’m assuming later than that, as I believe that’s when
        the feature was added.

        On Sep 28, 2023, at 13:55, Ryan Novosielski
        <novos...@rutgers.edu> <mailto:novos...@rutgers.edu> wrote:

        Thank you; we’ll put in a feature request for improvements
        in that area, and also thanks for the warning? I thought of
        that in passing, but the real world experience is really
        useful. I could easily see wanting that stuff to be retained
        less often than the main records, which is what I’d ask for.

        I assume that archiving, in general, would also remove this
        stuff, since old jobs themselves will be removed?

        --
        #BlackLivesMatter
        ____
        || \\UTGERS,
        |---------------------------*O*---------------------------
        ||_// the State |         Ryan Novosielski -
        novos...@rutgers.edu
        || \\ University | Sr. Technologist - 973/972.0922 (2x0922)
        ~*~ RBHS Campus
        ||  \\    of NJ | Office of Advanced Research Computing -
        MSB A555B, Newark
             `'

        On Sep 28, 2023, at 13:48, Paul Edmon
        <ped...@cfa.harvard.edu> <mailto:ped...@cfa.harvard.edu> wrote:

        Slurm should take care of it when you add it.

        So far as horror stories, under previous versions our
        database size ballooned to be so massive that it actually
        prevented us from upgrading and we had to drop the columns
        containing the job_script and job_env.  This was back
        before slurm started hashing the scripts so that it would
        only store one copy of duplicate scripts.  After this point
        we found that the job_script database stayed at a fairly
        reasonable size as most users use functionally the same
        script each time. However the job_env continued to grow
        like crazy as there are variables in our environment that
        change fairly consistently depending on where the user is.
        Thus job_envs ended up being too massive to keep around and
        so we had to drop them. Frankly we never really used them
        for debugging. The job_scripts though are super useful and
        not that much overhead.

        In summary my recommendation is to only store job_scripts.
        job_envs add too much storage for little gain, unless your
        job_envs are basically the same for each user in each location.

        Also it should be noted that there is no way to prune out
        job_scripts or job_envs right now. So the only way to get
        rid of them if they get large is to 0 out the column in the
        table. You can ask SchedMD for the mysql command to do this
        as we had to do it here to our job_envs.

        -Paul Edmon-

        On 9/28/2023 1:40 PM, Davide DelVento wrote:

        In my current slurm installation, (recently upgraded to
        slurm v23.02.3), I only have

        AccountingStoreFlags=job_comment

        I now intend to add both

        AccountingStoreFlags=job_script
        AccountingStoreFlags=job_env

        leaving the default 4MB value for max_script_size

        Do I need to do anything on the DB myself, or will slurm
        take care of the additional tables if needed?

        Any comments/suggestions/gotcha/pitfalls/horror_stories to
        share? I know about the additional diskspace and
        potentially load needed, and with our resources and
        typical workload I should be okay with that.

        Thanks!

Re: [slurm-users] enabling job script archival

Reply via email to