Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?

2023-03-27 Thread Pritchard Jr., Howard
HI Craig, Its not essential to use the pmix lib used to build the SLURM pmix plugin but it does reduce likelihood of problems. I don’t know how, but there is some way that the admin installing SLURM can “name” the available pmix –mpi options. For instance on one of our systems, the admin has bui

Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?

2023-03-27 Thread Craig
conf.log... checking if user requested PMI support result: no checking if user requested internal PMIx support(yes) result: no checking for pmix.h in /usr result: not found checking for pmix.h in /usr/include result: not found WARNING: discovered external PMIx version

Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?

2023-03-27 Thread Pritchard Jr., Howard
HI Craig, Your use of the –with-pmix on the open mpi configure line is important. Without any args to this configure option open mpi configure will first check if there’s an external pmix which is newer than the one that is included in the openmpi release tarball. If it is not, the internal

Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?

2023-03-27 Thread Craig
srun: MPI types are... srun: none srun: openmpi srun: pmix_v3 srun: pmi2 srun: pmix but I'm not sure that tells me much about how I am supposed to be building OpenMPI? On 3/27/23 14:41, Pritchard Jr., Howard wrote: HI Craig, If you run srun –mpi=list what does slurm report? That will he

Re: [slurm-users] [EXTERNAL] OpenMPI and Slurm clarification?

2023-03-27 Thread Pritchard Jr., Howard
HI Craig, If you run srun –mpi=list what does slurm report? That will help in determining what argument you want to supply for the –mpi srun option. Howard From: slurm-users on behalf of Craig Reply-To: Slurm User Community List Date: Monday, March 27, 2023 at 12:38 PM To: "slurm-users@

Re: [slurm-users] [External] Power saving method selection for different kinds of hardware

2023-03-27 Thread Ole Holm Nielsen
Hi Prentice, Since the last message I figured out a way to implement power_save: I've documented our setup in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving This page contains a link to power_save scripts on GitHub. Best r

[slurm-users] OpenMPI and Slurm clarification?

2023-03-27 Thread Craig
Can someone please clarify the "best practices" for building OpenMPI compatible with Slurm? https://slurm.schedmd.com/mpi_guide.html#open_mpi tells me what I _can_ do but I'm unclear as to what I _should_ do. I've built OpenMPI 4.1.5 with:   --with-pmix --with-libevent=internal  --with-hwl

Re: [slurm-users] [External] Power saving method selection for different kinds of hardware

2023-03-27 Thread Prentice Bisbal
I'm just catching up on old mailing list messages now. Why not make your SuspendProgram and ResumePrograms be shell scripts that look at some node information in Slurm (look at the features as in your example) or some other source ( use a case statement based on node names) and call the correct

Re: [slurm-users] nodes lingering in completion

2023-03-27 Thread Henderson, Brent
Sorry William for the long time in not replying (almost exactly a year!) your note was sent to my spam folder and I lost access to that cluster so it became less of a concern. I recently got access to another system and had the same issue even with a local epilog with just /bin/true in it. Thi

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Ole Holm Nielsen
Hi Thomas, FYI: Slurm power_save works very well for us without the issues that you describe below. We run Slurm 22.05.8, what's your version? I've documented our setup in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving T

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Dr. Thomas Orgis
Am Mon, 06 Mar 2023 13:35:38 +0100 schrieb Stefan Staeglich : > But this fixed not the main error but might have reduced the frequency of > occurring. Has someone observed similar issues? We will try a higher > SuspendTimeout. We had issues with power saving. We powered the idle nodes off, caus

Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-03-27 Thread Laurence Field
Hi Ümit, Thanks for the reply. Yes, it looks like this is the issue. Although from the master branch it suggests that the claim_field can also be used but this is not in the version we have deployed. Cheers, Laurence On 24.03.23 16:51, Ümit Seren wrote: Looks like you are missing the userna