Thanks Paul, this helps. I don't have any PrivateData line in either config file. According to the docs, "By default, all information is visible to all users" so this should not be an issue. I tried to add a line with "PrivateData=jobs" to the conf files, just in case, but that didn't change the behavior.
On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon <ped...@cfa.harvard.edu> wrote: > At least in our setup, users can see their own scripts by doing sacct -B > -j JOBID > > I would make sure that the scripts are being stored and how you have > PrivateData set. > > -Paul Edmon- > On 10/2/2023 10:57 AM, Davide DelVento wrote: > > I deployed the job_script archival and it is working, however it can be > queried only by root. > > A regular user can run sacct -lj towards any jobs (even those by other > users, and that's okay in our setup) with no problem. However if they run > sacct -j job_id --batch-script even against a job they own themselves, > nothing is returned and I get a > > slurmdbd: error: couldn't get information for this user (null)(xxxxxx) > > where xxxxx is the posix ID of the user who's running the query in the > slurmdbd logs. > > Both configure files slurmdbd.conf and slurm.conf do not have any > "permission" setting. FWIW, we use LDAP. > > Is that the expected behavior, in that by default only root can see the > job scripts? I was assuming the users themselves should be able to debug > their own jobs... Any hint on what could be changed to achieve this? > > Thanks! > > > > On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento <davide.quan...@gmail.com> > wrote: > >> Fantastic, this is really helpful, thanks! >> >> On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon <ped...@cfa.harvard.edu> >> wrote: >> >>> Yes it was later than that. If you are 23.02 you are good. We've been >>> running with storing job_scripts on for years at this point and that part >>> of the database only uses up 8.4G. Our entire database takes up 29G on >>> disk. So its about 1/3 of the database. We also have database compression >>> which helps with the on disk size. Raw uncompressed our database is about >>> 90G. We keep 6 months of data in our active database. >>> >>> -Paul Edmon- >>> On 9/28/2023 1:57 PM, Ryan Novosielski wrote: >>> >>> Sorry for the duplicate e-mail in a short time: do you know (or anyone) >>> when the hashing was added? Was planning to enable this on 21.08, but we >>> then had to delay our upgrade to it. I’m assuming later than that, as I >>> believe that’s when the feature was added. >>> >>> On Sep 28, 2023, at 13:55, Ryan Novosielski <novos...@rutgers.edu> >>> <novos...@rutgers.edu> wrote: >>> >>> Thank you; we’ll put in a feature request for improvements in that area, >>> and also thanks for the warning? I thought of that in passing, but the real >>> world experience is really useful. I could easily see wanting that stuff to >>> be retained less often than the main records, which is what I’d ask for. >>> >>> I assume that archiving, in general, would also remove this stuff, since >>> old jobs themselves will be removed? >>> >>> -- >>> #BlackLivesMatter >>> ____ >>> || \\UTGERS, >>> |---------------------------*O*--------------------------- >>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu >>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ >>> RBHS Campus >>> || \\ of NJ | Office of Advanced Research Computing - MSB >>> A555B, Newark >>> `' >>> >>> On Sep 28, 2023, at 13:48, Paul Edmon <ped...@cfa.harvard.edu> >>> <ped...@cfa.harvard.edu> wrote: >>> >>> Slurm should take care of it when you add it. >>> >>> So far as horror stories, under previous versions our database size >>> ballooned to be so massive that it actually prevented us from upgrading and >>> we had to drop the columns containing the job_script and job_env. This was >>> back before slurm started hashing the scripts so that it would only store >>> one copy of duplicate scripts. After this point we found that the >>> job_script database stayed at a fairly reasonable size as most users use >>> functionally the same script each time. However the job_env continued to >>> grow like crazy as there are variables in our environment that change >>> fairly consistently depending on where the user is. Thus job_envs ended up >>> being too massive to keep around and so we had to drop them. Frankly we >>> never really used them for debugging. The job_scripts though are super >>> useful and not that much overhead. >>> >>> In summary my recommendation is to only store job_scripts. job_envs add >>> too much storage for little gain, unless your job_envs are basically the >>> same for each user in each location. >>> >>> Also it should be noted that there is no way to prune out job_scripts or >>> job_envs right now. So the only way to get rid of them if they get large is >>> to 0 out the column in the table. You can ask SchedMD for the mysql command >>> to do this as we had to do it here to our job_envs. >>> >>> -Paul Edmon- >>> >>> On 9/28/2023 1:40 PM, Davide DelVento wrote: >>> >>> In my current slurm installation, (recently upgraded to slurm v23.02.3), >>> I only have >>> >>> AccountingStoreFlags=job_comment >>> >>> I now intend to add both >>> >>> AccountingStoreFlags=job_script >>> AccountingStoreFlags=job_env >>> >>> leaving the default 4MB value for max_script_size >>> >>> Do I need to do anything on the DB myself, or will slurm take care of >>> the additional tables if needed? >>> >>> Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know >>> about the additional diskspace and potentially load needed, and with our >>> resources and typical workload I should be okay with that. >>> >>> Thanks! >>> >>> >>> >>> >>>