Late to the party here, but depending on how much time you have invested, how 
much you can tolerate reformats or other more destructive work, etc., you might 
consider OpenHPC and its install guide ([1] for RHEL 8 variants, [2] or [3] for 
RHEL 9 variants, depending on which version of Warewulf you prefer). I’ve also 
got some workshop materials on building login nodes, GPU drivers, stateful 
provisioning, etc. for OpenHPC 3 and Warewulf 3 at [4].

At least in an isolated VirtualBox environment with no outside IdP or other 
dependencies, my student workers have usually been able to get their first 
batch job running within a day.

[1] 
https://github.com/openhpc/ohpc/releases/download/v2.9.GA/Install_guide-Rocky8-Warewulf-SLURM-2.9-x86_64.pdf
[2] 
https://github.com/openhpc/ohpc/releases/download/v3.2.GA/Install_guide-Rocky9-Warewulf-SLURM-3.2-x86_64.pdf
[3] 
https://github.com/openhpc/ohpc/releases/download/v3.2.GA/Install_guide-Rocky9-Warewulf4-SLURM-3.2-x86_64.pdf
[4] 
https://github.com/mikerenfro/openhpc-beyond-the-install-guide/blob/main/ohpc-btig-pearc24-handouts.pdf

From: Steven Jones via slurm-users <slurm-users@lists.schedmd.com>
Date: Sunday, February 2, 2025 at 5:48 PM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>, Chris Samuel 
<ch...@csamuel.org>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.

________________________________
Hi,

I have never done a HPC before, it is all new to me so I can be making "newbie 
errors".   The old HPC has been dumped on us so I am trying to build it 
"professionally" shall we say  ie documented, stable and I will train ppl to 
build it  (all this with no money at all).

My understanding is a login as a normal user and run a job, and this worked for 
me last time. It is possible I have missed something,

[xxxjone...@xxx.ac.nz@xxxunicoslurmd1 ~]$ cat testjob.sh
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --partition=debug
#SBATCH --time=00:10:00
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err

echo "Hello World"
echo "Hello Error" 1>&2

This worked on a previous setup the outputs were in my home directory on the 
NFS server as expected.

regards

Steven

________________________________
From: Chris Samuel via slurm-users <slurm-users@lists.schedmd.com>
Sent: Monday, 3 February 2025 11:59 am
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote:

> [2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002

Looking at the code that seems to come from this code:

         if (!_slurm_authorized_user(msg->auth_uid)) {
                 error("Security violation, batch launch RPC from uid %u",
                       msg->auth_uid);
                 rc = ESLURM_USER_ID_MISSING;  /* or bad in this case */
                 goto done;
         }


and what it is calling is:

/*
  *  Returns true if "uid" is a "slurm authorized user" - i.e. uid == 0
  *   or uid == slurm user id at this time.
  */
static bool
_slurm_authorized_user(uid_t uid)
{
         return ((uid == (uid_t) 0) || (uid == slurm_conf.slurm_user_id));
}


Is it possible you're trying to run Slurm as a user other than root or
the user designated as the "SlurmUser" in your config?

Also check that whoever you have set as the SlurmUser has the same UID
everywhere (in fact everyone should do).

All the best,
Chris

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to