Re: [slurm-users] monitor draining/drain nodes

2021-06-14 Thread Marcus Boden
I think your slurm-user has /sbin/nologin as the the shell in /etc/passwd. Try `su -s /bin/bash slurm`. Best, Marcus On 14.06.21 20:52, Rodrigo Santibáñez wrote: Thank you Marcus, Ole and Samuel. Regarding Samuel's answer, I added ifne from moreutils before mail to not have empty emails. Reg

Re: [slurm-users] Slurmrestd unspecified errors.

2021-06-14 Thread Brian Andrus
No problem. You may want to set your variables in your /etc/sysconfig/slurmrestd file. That is where you can set that variable along with others (SLURMRESTD_LISTEN, SLURMRESTD_DEBUG, SLURMREST_OPTIONS) and your service file will pick them up. Brian Andrus On 6/14/2021 12:05 PM, Heitor wrot

Re: [slurm-users] Slurmrestd unspecified errors.

2021-06-14 Thread Heitor
On Mon, 14 Jun 2021 11:25:52 -0700 Brian Andrus wrote: > Using v20.11.7 > > I have 8081 because that is the port I am running slurmrestd on. > > How are you starting slurmrestd? If you are using systemd and have > the service file, look inside it. I'm using systemd: $ cat /usr/lib/sys

Re: [slurm-users] monitor draining/drain nodes

2021-06-14 Thread Rodrigo Santibáñez
Thank you Marcus, Ole and Samuel. Regarding Samuel's answer, I added ifne from moreutils before mail to not have empty emails. Regarding strigger, I don't know how to become the slurm user. "su slurm" complains "This account is currently not available.". The user "slurm" exists and is the SlurmUs

Re: [slurm-users] Slurmrestd unspecified errors.

2021-06-14 Thread Brian Andrus
Using v20.11.7 I have 8081 because that is the port I am running slurmrestd on. How are you starting slurmrestd? If you are using systemd and have the service file, look inside it. Brian Andrus On 6/14/2021 9:48 AM, Heitor wrote: On Mon, 14 Jun 2021 08:30:51 -0700 Brian Andrus wrote: Yo

Re: [slurm-users] Information about finished jobs

2021-06-14 Thread Paul Raines
I have been writing my own 'jobinfo' tool for users to see info on a job in any state that is useful and readable by them. Still new to slurm and trying to wrap my head around the database info and the effects of arrays and such. A completed job output looks like this: # jobinfo 357300 ---

Re: [slurm-users] Slurmrestd unspecified errors.

2021-06-14 Thread Heitor
On Mon, 14 Jun 2021 08:30:51 -0700 Brian Andrus wrote: > You don't use the prefix. > > This works for me on the node running slurmrestd on port 8081: > > user=someuser > curl --header "X-SLURM-USER-NAME: ${user}" --header > "X-SLURM-USER-TOKEN: $(sudo scontrol toker username=${user}|cut > -d='=

Re: [slurm-users] Slurmrestd unspecified errors.

2021-06-14 Thread Brian Andrus
You don't use the prefix. This works for me on the node running slurmrestd on port 8081: user=someuser curl --header "X-SLURM-USER-NAME: ${user}" --header "X-SLURM-USER-TOKEN: $(sudo scontrol toker username=${user}|cut -d='=' -f2-)" http://localhost:8081/slurm/v0.0.36/ping Brian Andrus On 6

[slurm-users] Slurmrestd unspecified errors.

2021-06-14 Thread Heitor
Hello, So far I've been unable to use slurmrestd. I'm running CentOS7 with slurm 20.11.7 from the EPEL7 repo. I generate a token this way: $ sudo scontrol token username=ubuntu SLURM_JWT=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2MjM2ODA2NjUsImlhdCI6MTYyMzY3ODg2NSwic3VuIjoidWJ1bnR1In0.bNIY

Re: [slurm-users] monitor draining/drain nodes

2021-06-14 Thread Ole Holm Nielsen
On 6/14/21 7:50 AM, Marcus Boden wrote: Slurm provides the strigger[1] utility for that. You can set it up to automatically send mails when nodes go into drain. I provide some Slurm triggers examples in https://github.com/OleHolmNielsen/Slurm_tools/tree/master/triggers On 12.06.21 22:29, Rodr

Re: [slurm-users] [EXTERNAL] Re: Information about finished jobs

2021-06-14 Thread Greg Wickham
As others have commented, some information is lost when it is stored in the database. To keep historically accurate data on the job run a script (refer to PrologSlurmctld in slurm.conf) that runs an "scontrol show -d job " and drops it into a local file. Using " PrologSlurmctld" is neat, as it

Re: [slurm-users] Information about finished jobs

2021-06-14 Thread Arthur Gilly
Thanks Ole! >From one of the threads you referred to: “People have come up with various >home grown solutions to save that data.” That’s my case too at the moment, I >currently have a daemon that extracts this data as it’s still live in scontrol >show jobs, and puts it in a separate database… S

Re: [slurm-users] Information about finished jobs

2021-06-14 Thread Ole Holm Nielsen
On 6/14/21 9:33 AM, Arthur Gilly wrote: A related question, on my setup, scontrol show job displays the standard output, standard error redirections as well as the wd, whereas this info is lost after completion when sacct is required. Is this something that's configurable so that this info is pre

Re: [slurm-users] Information about finished jobs

2021-06-14 Thread Arthur Gilly
Hi all, A related question, on my setup, scontrol show job displays the standard output, standard error redirections as well as the wd, whereas this info is lost after completion when sacct is required. Is this something that's configurable so that this info is preserved with sacct? Cheers, A -