[slurm-users] question about hyperthreaded CPUS, --hint=nomultithread and mutli-jobstep jobs

2023-05-23 Thread Hans van Schoot
Hi all, I am getting some unexpected behavior with SLURM on a multithreaded CPU (AMD Ryzen 7950X), in combination with a job that uses multiple jobsteps and a program that prefers to run without hyperthreading. My job consists of a simple shell script that does multiple srun executions, and

Re: [slurm-users] cgroups issue

2023-05-23 Thread Alan Orth
Dear Boris, Do you really mean Ubuntu 14.04? I doubt that will work with modern SLURM cgroups, even v1... Regards, On Tue, Mar 14, 2023 at 5:28 PM Boris Yazlovitsky wrote: > I sent this a while ago - don't know if it got to the mailing list: > > I'm running slurm 23.02.0 on ubuntu 14.04 > when

Re: [slurm-users] Slurmd enabled crash with CgroupV2

2023-05-23 Thread Alan Orth
I notice the exact same behavior as Tristan. My CentOS Stream 8 system is in full unified cgroupv2 mode, the slurmd.service has a "Delegate=Yes" override added to it, and all cgroup stuff is added to slurm.conf and cgroup.conf, yet slurmd does not start after reboot. I don't understand what is happ

[slurm-users] Weird scheduling behaviour

2023-05-23 Thread Badorreck, Holger
Hello, I observe a weird behaviour of my SLURM installation (23.02.2). Some tasks take some hours to be scheduled (probably on one specific node), the pending state reason is "Resources", although resources are free. I have tested a bit around and get this weird behaviour for salloc command: "sal

Re: [slurm-users] [EXTERNAL] Re: Question about PMIX ERROR messages being emitted by some child of srun process

2023-05-23 Thread Pritchard Jr., Howard
Thanks Christopher, This doesn't seem to be related to Open MPI at all except that for our 5.0.0 and newer one has to use PMix to talk to the job launcher. I built MPICH 4.1 on Perlmutter using the --with-pmix option and see a similar message from srun --mpi=pmix hpp@nid008589:~/ompi/examples>

Re: [slurm-users] [EXTERNAL] Re: Question about PMIX ERROR messages being emitted by some child of srun process

2023-05-23 Thread Christopher Samuel
On 5/23/23 10:33 am, Pritchard Jr., Howard wrote: Thanks Christopher, No worries! This doesn't seem to be related to Open MPI at all except that for our 5.0.0 and newer one has to use PMix to talk to the job launcher. I built MPICH 4.1 on Perlmutter using the --with-pmix option and see a si

[slurm-users] Task launch failure on cloud nodes (Address family '0' not supported)

2023-05-23 Thread Weaver, Christopher
I'm working on setting up a cloud partition, and running into some communications problems between my nodes. This looks like something I have misconfigured, or information I haven't correctly supplied to slurm, but the low-level nature of the error has made it hard for me to figure out what I've