[OMPI users] Slurm or OpenMPI error?

2024-07-01 Thread Mccall, Kurt E. (MSFC-EV41) via users
Using OpenMPI 5.0.3 and Slurm slurm 20.11.8. Is this error message issued by Slurm or by OpenMPI? A google search on the error message yielded nothing. -- At least one of the requested hosts is not included in the current a

Re: [OMPI users] [EXTERNAL] Slurm or OpenMPI error?

2024-07-01 Thread Pritchard Jr., Howard via users
Hello Kurt, The host name looks a little odd. Do you by chance have a reproducer and instructions on how you’re running it that we could try? Howard From: users on behalf of "Mccall, Kurt E. (MSFC-EV41) via users" Reply-To: Open MPI Users Date: Monday, July 1, 2024 at 9:36 AM To: "OpenMpi

[OMPI users] Slurm or OpenMPI error?

2024-07-01 Thread Mccall, Kurt E. (MSFC-EV41) via users
Howard, I don’t know where that ^X following the hostname came from. The node is definitely named n001.I will try to create a reproducer. Thanks, Kurt From: Pritchard Jr., Howard Sent: Monday, July 1, 2024 11:03 AM To: Open MPI Users Cc: Mccall, Kurt E. (MSFC-EV41) Subject: Re: [EXTERN

[OMPI users] Slurm or OpenMPI error?

2024-07-01 Thread Mccall, Kurt E. (MSFC-EV41) via users
Howard, I should note that this code ran fine up to the point that our sysadmins updated something on the cluster. That makes me think it is a configuration issue, and that it wouldn’t give you any insight if you ran my reproducer. It would succeed for you and still fail for me. What do you t

[OMPI users] Invalid -L flag added to aprun

2024-07-01 Thread Borchert, Christopher B ERDC-RDE-ITL-MS CIV via users
On a Cray XC (requiring aprun launcher to get from batch node to compute node), 4.0.5 works but 4.1.1 and 4.1.6 do not (even on a single node). The newer ones throw this: -- An ORTE daemon has unexpectedly failed after launch a

Re: [OMPI users] [EXTERNAL] Invalid -L flag added to aprun

2024-07-01 Thread Pritchard Jr., Howard via users
Hi Christoph, First a big caveat and disclaimer. I'm not sure if any Open MPI developers have access any longer to Cray XC systems, so all I can do is make suggestions. What's probably happening is orte is thinking it is going to fork off the application processes on the head node itself. Tha