Hi Gilles,
Thank you for your message and your suggestion. As you suggested i tried
mpirun -np 84 - -hostfile hostsfile --mca routed direct ./openmpi_hello.c
The command hangs with no message or error message until i hit "control + z".
Then i have the same error message as before.
To answer your question here are the answer which made me realize that the
Master node’s Open MPI version is 4.0.0
and the other node(s) - computational nodes the Open MPI version is 4.0.1 - see
below output of "ompi_info" Could that be the issue?
In my “hostsfile" there are 7 nodes. I followed the FAQ instructions but i am
not sure if i created the “hostsfile” correctly. Each node in my cluster has 32
cores, except the Master node.
hostsfile
radonc-phaser01 slots=12
radonc-phaser02 slots=12
radonc-phaser03 slots=12
radonc-phaser04 slots=12
radonc-phaser05 slots=12
radonc-phaser06 slots=12
radonc-phaser07 slots=12
For all 7 nodes lscpu is the same
lscpu
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
There are no firewall running. I also configured password-less ssh for each
nodes (including the Master node). All nodes can ssh in all nodes without
password.
ompi@phaser-manager:~$ ompi_info
Package: Open MPI root@phaser-manager Distribution
Open MPI: 4.0.0
Open MPI repo revision: v4.0.0
Open MPI release date: Nov 12, 2018
Open RTE: 4.0.0
Open RTE repo revision: v4.0.0
Open RTE release date: Nov 12, 2018
OPAL: 4.0.0
OPAL repo revision: v4.0.0
OPAL release date: Nov 12, 2018
MPI API: 3.1.0
Ident string: 4.0.0
Prefix: /usr/local/.openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configure host: phaser-manager
Configured by: root
Configured on: Fri Nov 30 15:16:24 PST 2018
Configure host: phaser-manager
Configure command line: '--prefix=/usr/local/.openmpi'
Built by: root
Built on: Fri Nov 30 15:28:26 PST 2018
Built host: phaser-manager
ompi@radonc-phaser01:~$ ompi_info
Package: Open MPI root@radonc-phaser01 Distribution
Open MPI: 4.0.1
Open MPI repo revision: v4.0.1
Open MPI release date: Mar 26, 2019
Open RTE: 4.0.1
Open RTE repo revision: v4.0.1
Open RTE release date: Mar 26, 2019
OPAL: 4.0.1
OPAL repo revision: v4.0.1
OPAL release date: Mar 26, 2019
MPI API: 3.1.0
Ident string: 4.0.1
Prefix: /usr/local/.openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configure host: radonc-phaser01
Configured by: root
Configured on: Sat Apr 13 22:14:28 PDT 2019
Configure host: radonc-phaser01
Configure command line: '--prefix=/usr/local/.openmpi'
Built by: root
Built on: Sat Apr 13 22:32:50 PDT 2019
Built host: radonc-phaser01
Thank you again for your help.
Best,
Eric
_____________________________________________________________________________________________________
Eric F. Alemany
System Administrator for Research
IRT
Division of Radiation & Cancer Biology
Department of Radiation Oncology
Stanford University School of Medicine
Stanford, California 94305
Tel:1-650-498-7969<tel:1-650-498-7969> No Texting
Fax:1-650-723-7382<tel:1-650-723-7382>
_______________________________________________
users mailing list
[email protected]
https://lists.open-mpi.org/mailman/listinfo/users