[slurm-users] Re: Slurm 23.11 - Unknown system variable 'wsrep_on'

2024-04-04 Thread Russell Jones via slurm-users
Thanks! I realized I made a mistake and had it still talking to an older slurmdbd system. On Wed, Apr 3, 2024 at 1:54 PM Timo Rothenpieler via slurm-users < slurm-users@lists.schedmd.com> wrote: > On 02.04.2024 22:15, Russell Jones via slurm-users wrote: > > Hi all, > >

[slurm-users] Slurm 23.11 - Unknown system variable 'wsrep_on'

2024-04-02 Thread Russell Jones via slurm-users
Hi all, I am working on upgrading a Slurm cluster from 20 -> 23. I was successfully able to upgrade to 22, however now that I am trying to go from 22 to 23, starting slurmdbd results in the following error being logged: error: mysql_query failed: 1193 Unknown system variable 'wsrep_on' When try

Re: [slurm-users] Slurm account coordinator

2023-10-09 Thread Russell Jones
Coordinator is a type of privilege level, not the name of an already existing account. See https://slurm.schedmd.com/user_permissions.html "Set using an *AdminLevel* option in the user's database record. For configuration information, see Accounting and Resource Limits

Re: [slurm-users] No coffee allowed on BYU campus(!) Suggestions for alternatives?

2023-07-04 Thread Russell Jones
Besides the obvious answer of energy drinks, you could buy canned cold brew coffee. It may not be super obvious to others that you are drinking coffee and still allow you to get your coffee fix. On Tue, Jul 4, 2023, 1:36 PM Bjørn-Helge Mevik wrote: > I've signed up for SLUG 2023, which is on Bri

Re: [slurm-users] Preventing --exclusive on a per-partition basis

2023-03-22 Thread Russell Jones
Thank you! Yes I ended up doing exactly that after finding the job submit API docs. On Wed, Mar 22, 2023 at 3:58 AM Bjørn-Helge Mevik wrote: > I'd simply add a test like > and job_desc.partition == "the_partition" > to the test for exclusiveness. > > -- > Regards, > Bjørn-Helge Mevik, dr. scie

[slurm-users] Preventing --exclusive on a per-partition basis

2023-03-21 Thread Russell Jones
Hi all, We are running into the documented issues of job preemption not working for jobs running in a lower priority queue, but the user used --exclusive=user in the job submission. I have found the example job_submit.lua file for preventing using this flag, but I don't want to prevent it on ever

Re: [slurm-users] Limit partition to 1 job at a time

2022-03-23 Thread Russell Jones
partition, not 1 job at a time total in the partition? On Tue, Mar 22, 2022 at 12:46 PM Gerhard Strangar wrote: > Russell Jones wrote: > > > I am struggling to figure out how to do this. Any tips? > > Create a QoS with GrpJobs=1 and assign it to the partition? > >

[slurm-users] Limit partition to 1 job at a time

2022-03-22 Thread Russell Jones
Hi all, For various reasons, we need to limit a partition to being able to run max 1 job at a time. Not 1 job per user, but 1 job total at a time, while queuing any other jobs to run after this one is complete. I am struggling to figure out how to do this. Any tips? Thanks!

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

2022-01-31 Thread Russell Jones
I solved this issue by adding a group to IPA that matched the same name and GID of the local groups, then using [SUCCESS=merge] in nsswitch.conf for groups, and on our CentOS 8 nodes adding "enable_files_domain = False" in the sssd.conf file*.* On Fri, Jan 28, 2022 at 5:02 PM Ratnasamy, Fritz < fr

Re: [slurm-users] [External] How can I do to prevent a specific job from being prempted?

2021-09-14 Thread Russell Jones
The other option is creating a "special" partition that only this user(s) can submit to, where jobs running in that partition have a higher priority than all the others (if you are using partition priority like we are). On Tue, Sep 14, 2021 at 3:26 AM Loris Bennett wrote: > Dear Peter, > > 顏文 w

Re: [slurm-users] [External] Re: Preemption not working for jobs in higher priority partition

2021-08-27 Thread Russell Jones
slurmctld.log should give you more data on > how or why it is making its decisions. > > > > Mike > > > > *From: *slurm-users on behalf of > Russell Jones > *Date: *Tuesday, August 24, 2021 at 10:36 > *To: *Slurm User Community List > *Subject: *[Extern

Re: [slurm-users] Preemption not working for jobs in higher priority partition

2021-08-24 Thread Russell Jones
seeing what is wrong with my config, but it also isn't working anymore to allow preemption. On Fri, Aug 20, 2021 at 9:46 AM Russell Jones wrote: > I could have swore I had tested this before implementing it and it worked > as expected. > > If I am dreaming that testing - is there

Re: [slurm-users] Preemption not working for jobs in higher priority partition

2021-08-20 Thread Russell Jones
ode. > > Since your pending job is in the 'day' partition, it will not preempt > something in the 'night' partition (even if the node is in both). > > Brian Andrus > On 8/19/2021 2:49 PM, Russell Jones wrote: > > Hi all, > > I could use some help to under

[slurm-users] Preemption not working for jobs in higher priority partition

2021-08-19 Thread Russell Jones
Hi all, I could use some help to understand why preemption is not working for me properly. I have a job blocking other jobs that doesn't make sense to me. Any assistance is appreciated, thank you! I have two partitions defined in slurm, a day time and a night time pariition: Day partition - Pri

Re: [slurm-users] 2 nodes being randomly set to "not responding"

2021-07-22 Thread Russell Jones
; > source: https://slurm.schedmd.com/elastic_computing.html > > Cheers > > Josef > > Sent from Nine <http://www.9folders.com/> > > -- > *From:* Russell Jones > *Sent:* Wednesday, 21 July 2021 22:30 > *To:* Slurm User Community List > *Subject

[slurm-users] 2 nodes being randomly set to "not responding"

2021-07-21 Thread Russell Jones
Hi all, We have a single slurm cluster with multiple different architectures and compute clusters talking to a single slurmctld. This slurmctld is dual-homed on two different networks. We have two individual nodes who are by themselves on "network 2" while all of the other nodes are on "network 1"

Re: [slurm-users] Slurm interactive job not populating all groups

2021-05-12 Thread Russell Jones
Brian Andrus wrote: > Ah. > > You should put files first. Otherwise, if it finds an entry in SSS, that > takes precedence and the local groups/users will not be seen. > > Brian Andrus > > > On 5/10/2021 1:09 PM, Russell Jones wrote: > > Thanks! > > No, we are n

Re: [slurm-users] Slurm interactive job not populating all groups

2021-05-10 Thread Russell Jones
> Are you using nss_slurm? > > *enable_nss_slurm* Permits passwd and group resolution for a job to be > serviced by slurmstepd rather than requiring a lookup from a network based > service. See https://slurm.schedmd.com/nss_slurm.html for more > information. > > That could explain it

[slurm-users] Slurm interactive job not populating all groups

2021-05-10 Thread Russell Jones
Hello, We have a few users we are needing to add to the local "video" group of a specific set of compute nodes. When submitting a job, slurm appears to not be populating that local group to their list of groups. For example, an interactive job (srun --pty bash -l) is resulting in a bash shell on t

Re: [slurm-users] X11 forwarding issues

2020-11-17 Thread Russell Jones
:21 AM Patrick Bégou < patrick.be...@legi.grenoble-inp.fr> wrote: > Hi Russell Jones, > > did you try to stop firewall on the client cluster-cn02 ? > > Patrick > > Le 16/11/2020 à 19:20, Russell Jones a écrit : > > Here's some debug logs from the compute node

Re: [slurm-users] X11 forwarding issues

2020-11-16 Thread Russell Jones
r-cn02 [0] mpi_pmix.c:180 [p_mpi_hook_slurmstepd_task] mpi/pmix: Patch environment for task 0 [2020-11-16T12:12:34.289] [30873.0] debug: task_p_pre_launch: 30873.0, task 0 [2020-11-16T12:12:39.475] [30873.extern] error: _x11_socket_read: slurm_open_msg_conn: Connection timed out On Mon, Nov 16, 2020

Re: [slurm-users] X11 forwarding issues

2020-11-16 Thread Russell Jones
Hello, Thanks for the reply! We are using Slurm 20.02.0. On Mon, Nov 16, 2020 at 10:59 AM sathish wrote: > Hi Russell Jones, > > I believe you are using a slurm version older than 19.05. X11 forwarding > code has been revamped and it works as expected starting from the 19.05

[slurm-users] X11 forwarding issues

2020-11-16 Thread Russell Jones
Hi all, Hoping I can get pointed in the right direction here. I have X11 forwarding enabled in Slurm, however I cannot seem to get it working properly. It works when I test with "ssh -Y" to the compute node from the login node, however when I try through Slurm the Display variable looks very diff