We only do isolated on the students’ VirtualBox setups because it’s simpler for 
them to get started with. Our production HPC with OpenHPC is definitely 
integrated with our Active Directory (directly via sssd, not with an 
intermediate product), etc. Not everyone does it that way, but our scale is 
small enough to where we’ve never had a load or other performance issue with 
our AD.

From: Steven Jones <steven.jo...@vuw.ac.nz>
Date: Monday, February 3, 2025 at 2:14 PM
To: Renfro, Michael <ren...@tntech.edu>, slurm-users@lists.schedmd.com 
<slurm-users@lists.schedmd.com>, Chris Samuel <ch...@csamuel.org>
Subject: Re: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.

________________________________
Hi,

Thanks, but isolated isnt the goal in my case.   The goal is to save admin time 
we cant afford and to have a far reaching setup.

So I have to link the HPC  to IPA/Idm and on to AD in a trust that way user 
admins can jsut drop a student or staff member into an AD group and job done.  
That also means we can use Globus to transfer large lumps of data globally in 
and out of the HPC.

I have taken your notes as they look interesting for "extras"  like I have not 
looked at making GPUs work yet. If I can get the basics going then I'll look at 
the icing.


regards

Steven

________________________________
From: Renfro, Michael <ren...@tntech.edu>
Sent: Tuesday, 4 February 2025 8:51 am
To: Steven Jones <steven.jo...@vuw.ac.nz>; slurm-users@lists.schedmd.com 
<slurm-users@lists.schedmd.com>; Chris Samuel <ch...@csamuel.org>
Subject: Re: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

You don't often get email from ren...@tntech.edu. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

Late to the party here, but depending on how much time you have invested, how 
much you can tolerate reformats or other more destructive work, etc., you might 
consider OpenHPC and its install guide ([1] for RHEL 8 variants, [2] or [3] for 
RHEL 9 variants, depending on which version of Warewulf you prefer). I’ve also 
got some workshop materials on building login nodes, GPU drivers, stateful 
provisioning, etc. for OpenHPC 3 and Warewulf 3 at [4].



At least in an isolated VirtualBox environment with no outside IdP or other 
dependencies, my student workers have usually been able to get their first 
batch job running within a day.



[1] 
https://github.com/openhpc/ohpc/releases/download/v2.9.GA/Install_guide-Rocky8-Warewulf-SLURM-2.9-x86_64.pdf

[2] 
https://github.com/openhpc/ohpc/releases/download/v3.2.GA/Install_guide-Rocky9-Warewulf-SLURM-3.2-x86_64.pdf

[3] 
https://github.com/openhpc/ohpc/releases/download/v3.2.GA/Install_guide-Rocky9-Warewulf4-SLURM-3.2-x86_64.pdf

[4] 
https://github.com/mikerenfro/openhpc-beyond-the-install-guide/blob/main/ohpc-btig-pearc24-handouts.pdf



From: Steven Jones via slurm-users <slurm-users@lists.schedmd.com>
Date: Sunday, February 2, 2025 at 5:48 PM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>, Chris Samuel 
<ch...@csamuel.org>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.

________________________________

Hi,

I have never done a HPC before, it is all new to me so I can be making "newbie 
errors".   The old HPC has been dumped on us so I am trying to build it 
"professionally" shall we say  ie documented, stable and I will train ppl to 
build it  (all this with no money at all).

My understanding is a login as a normal user and run a job, and this worked for 
me last time. It is possible I have missed something,

[xxxjone...@xxx.ac.nz@xxxunicoslurmd1 ~]$ cat testjob.sh

#!/bin/bash

#

#SBATCH --job-name=test

#SBATCH --nodes=1

#SBATCH --ntasks=1

#SBATCH --cpus-per-task=1

#SBATCH --mem=1G

#SBATCH --partition=debug

#SBATCH --time=00:10:00

#SBATCH --output=%x_%j.out

#SBATCH --error=%x_%j.err



echo "Hello World"

echo "Hello Error" 1>&2



This worked on a previous setup the outputs were in my home directory on the 
NFS server as expected.



regards

Steven

________________________________

From: Chris Samuel via slurm-users <slurm-users@lists.schedmd.com>
Sent: Monday, 3 February 2025 11:59 am
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld



On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote:

> [2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002

Looking at the code that seems to come from this code:

         if (!_slurm_authorized_user(msg->auth_uid)) {
                 error("Security violation, batch launch RPC from uid %u",
                       msg->auth_uid);
                 rc = ESLURM_USER_ID_MISSING;  /* or bad in this case */
                 goto done;
         }


and what it is calling is:

/*
  *  Returns true if "uid" is a "slurm authorized user" - i.e. uid == 0
  *   or uid == slurm user id at this time.
  */
static bool
_slurm_authorized_user(uid_t uid)
{
         return ((uid == (uid_t) 0) || (uid == slurm_conf.slurm_user_id));
}


Is it possible you're trying to run Slurm as a user other than root or
the user designated as the "SlurmUser" in your config?

Also check that whoever you have set as the SlurmUser has the same UID
everywhere (in fact everyone should do).

All the best,
Chris

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to