[slurm-users] How to clean up?

2025-02-03 Thread Steven Jones via slurm-users
>From the logs 2 errors, 8><--- Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: Starting Slurm controller daemon... Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]: slurmctld: error: chdir(/var/log): Permission denied Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slur

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Steven Jones via slurm-users
No, [root@node5 log]# ls -la /etc/pam.d/*slurm* ls: cannot access '/etc/pam.d/*slurm*': No such file or directory Slurm is installed, [root@node5 log]# rpm -qi slurm Name: slurm Version : 22.05.9 Release : 1.el9 Architecture: x86_64 Install Date: Thu Dec 12 21:02:12 2024 Group

[slurm-users] Re: Installing slurm*

2025-02-03 Thread Marko Markoc via slurm-users
Hi Steven, You can find list of packages to installed based on the node role here: https://slurm.schedmd.com/quickstart_admin.html#pkg_install Thanks, Marko On Mon, Feb 3, 2025 at 3:51 PM Steven Jones via slurm-users < slurm-users@lists.schedmd.com> wrote: > Hi, > > After rpmbuilding slurm, >

[slurm-users] Installing slurm*

2025-02-03 Thread Steven Jones via slurm-users
Hi, After rpmbuilding slurm, Do I need to install all of these or just slurm-slurmctld-24.11.1-1.el9.x86_64.rpm on the controller and slurm-slurmd-24.11.1-1.el9.x86_64.rpm on the compute nodes? -rw-r--r--. 1 root root 18508016 Feb 3 23:46 slurm-24.11.1-1.el9.x86_64.rpm -rw-r--r--. 1 root roo

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Sean Crosby via slurm-users
Just double checking. Can you check on your worker node 1. ls -la /etc/pam.d/*slurm* (just checking if there's a specific pam file for slurmd on your system) 1. scontrol show config | grep -i SlurmdUser (checking if slurmd is set up with a different user to SlurmUser) 1. grep slurm /e

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Steven Jones via slurm-users
I rebuilt 4 nodes as rocky9.5 8><--- [2025-02-03T21:40:11.978] Node node6 now responding [2025-02-03T21:41:15.698] _slurm_rpc_submit_batch_job: JobId=17 InitPrio=4294901759 usec=501 [2025-02-03T21:41:16.055] sched: Allocate JobId=17 NodeList=node6 #CPUs=1 Partition=debug [2025-02-03T21:41:16.059

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Christopher Samuel via slurm-users
On 2/3/25 2:33 pm, Steven Jones via slurm-users wrote: Just built 4 x rocky9 nodes and I do not get that error (but I get another I know how to fix, I think) so holistically  I am thinking the version difference is too large. Oh I think I missed this - when you say version difference do you m

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Renfro, Michael via slurm-users
We only do isolated on the students’ VirtualBox setups because it’s simpler for them to get started with. Our production HPC with OpenHPC is definitely integrated with our Active Directory (directly via sssd, not with an intermediate product), etc. Not everyone does it that way, but our scale is

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Steven Jones via slurm-users
Hi, Thanks, but isolated isnt the goal in my case. The goal is to save admin time we cant afford and to have a far reaching setup. So I have to link the HPC to IPA/Idm and on to AD in a trust that way user admins can jsut drop a student or staff member into an AD group and job done. That a

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Renfro, Michael via slurm-users
Late to the party here, but depending on how much time you have invested, how much you can tolerate reformats or other more destructive work, etc., you might consider OpenHPC and its install guide ([1] for RHEL 8 variants, [2] or [3] for RHEL 9 variants, depending on which version of Warewulf yo

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-03 Thread Steven Jones via slurm-users
Slurm.conf is copied between nodes. Just built 4 x rocky9 nodes and I do not get that error (but I get another I know how to fix, I think) so holistically I am thinking the version difference is too large. regards Steven From: Chris Samuel via slurm-users

[slurm-users] Re: Assistance with Node Restrictions and Priority for Users in Floating Partition

2025-02-03 Thread Bjørn-Helge Mevik via slurm-users
Manisha Yadav writes: > Could you please confirm if my setup is correct, or if any modifications are > required on my end? I don't see anything wrong with the part of the setup that you've shown. Have you checked with `sprio -l -j ` whether the jobs get the extra qos priority? If not, perhaps