Okay. Something must have broken between 4.0.x and 4.1.x to give pbs pro ras
priority over alps even for Cray XC systems.
On 7/11/24, 8:21 AM, "Borchert, Christopher B ERDC-RDE-ITL-MS CIV"
mailto:christopher.b.borch...@erdc.dren.mil>> wrote:
That did it! Thanks Howard!
-Original Messag
That did it! Thanks Howard!
-Original Message-
From: Pritchard Jr., Howard
Sent: Thursday, July 11, 2024 9:14 AM
To: Borchert, Christopher B ERDC-RDE-ITL-MS CIV
; Open MPI Users
Subject: Re: [EXTERNAL] [OMPI users] Invalid -L flag added to aprun
Okay, try setting this environment var
Okay, try setting this environment variable and see if the mpirun command works:
export OMPI_MCA_ras=alps
On 7/11/24, 8:10 AM, "Borchert, Christopher B ERDC-RDE-ITL-MS CIV"
mailto:christopher.b.borch...@erdc.dren.mil>> wrote:
It’s the same output and the same result:
batch13:~> aprun -n 2
It’s the same output and the same result:
batch13:~> aprun -n 2 -N 1 hostname
nid00418
nid00419
batch13:~> aprun -n 2 -N 1 -L nid00418,nid00419 hostname
aprun: -L node_list contains an invalid entry
Usage: aprun [global_options] [command_options] cmd1
...
Thanks,
Chris
-Original Message
Hi Chris
I wonder if somethings messed up with the way alps is interpreting node names
on the system.
Could you try doing the following:
1. get a two node allocation on your cluster
2. run aprun -n 2 -N 1 hostname
3. take the hostnames returned then run aprun -n 2 -N 1 -L X,Y hostname
Where X=
Thanks Howard. Here is what I got.
batch35:/p/work/borchert> mpirun -n 1 -d ./a.out
[batch35:62735] procdir: /p/work/borchert/ompi.batch35.34110/pid.62735/0/0
[batch35:62735] jobdir: /p/work/borchert/ompi.batch35.34110/pid.62735/0
[batch35:62735] top: /p/work/borchert/ompi.batch35.34110/pid.62735