[OMPI users] Running mpirun from a "worker" node on two nodes.

2019-10-23 Thread Eric F. Alemany via users
Good morning,

Sorry of the subject line is not very clear. I hope someone can answer my 
question.  I have two nodes.

Node 1- radoncjonsnow: 64 cores, runs ubuntu 18.04, OpenMPI-4.0.2, NFS, 
password-less ssh to node 2.
Node 2- radonc-phaser11: 12 cores, runs ubuntu 18.04, OpenMPI-4.0.2, NFS, 
password-less ssh to  node 1

I created a —hostfile called “hostsfile”:
radoncjonsnow slots=64
radonc-phaser11 slots=12

When I run:   mpirun -np 64  mpi_helloJ   on radoncjonsnow all goes as expected


egs@radoncjonsnow:~$ cat hostsfile
radoncjonsnow slots=64
radonc-phaser11 slots=12
egs@radoncjonsnow:~$ mpirun -np 64  mpi_helloJ
Hello from processor 3 of 64
Hello from processor 9 of 64
Hello from processor 19 of 64
Hello from processor 26 of 64
Hello from processor 28 of 64

Etc…..


When I edit my —hostfile and comment radoncjonsnow and run  mpirun --hostfile 
hostsfile -np 12 mpi_helloJ   all goes as expected mpirun calls for the 12 
cores of radonc-phaser11

egs@radoncjonsnow:~$ sudo cat hostsfile
#radoncjonsnow slots=64
radonc-phaser11 slots=12
egs@radoncjonsnow:~$ mpirun --hostfile hostsfile -np 12 mpi_helloJ
Hello from processor 2 of 12
Hello from processor 6 of 12
Hello from processor 10 of 12
Hello from processor 3 of 12
Hello from processor 5 of 12
Hello from processor 8 of 12
Hello from processor 11 of 12
Hello from processor 1 of 12
Hello from processor 7 of 12
Hello from processor 9 of 12
Hello from processor 0 of 12
Hello from processor 4 of 12


BUT when I edit my —hostfile with the two nodes and run mpirun with 76 cores it 
gives the following error message.


egs@radoncjonsnow:~$ mpirun —-hostfile hostsfile -np 76  mpi_helloJ
--
There are not enough slots available in the system to satisfy the 76
slots that were requested by the application:

  mpi_helloJ

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
 processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
 hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
 RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--


How can I run mpirun from radoncjonsnow and allocate 76 cores (64 cores from 
radoncjonsnow and 12 from rapdonc-phaser11)?

In the past I was successful creating a “master” node and several “slave” 
nodes. Running mpirun from the master node  launched successfully all the cores 
from the “slave” nodes. This time I want the “master” node to utilize its cores 
(64) as well.


Thank you in advance for your help.

Best,
Eric










Eric F.  Alemany
Systems Administrator for Research
EXO Extended Operations

Stanford Medicine - Technology & Digital Services
Stanford, California 94305










[OMPI users] Open MPI State of the Union BOF at SC'19

2019-10-23 Thread Jeff Squyres (jsquyres) via users
Be sure to come to the Open MPI State of the Union BOF at SC'19 next month!

As usual, we'll discuss the current status and future roadmap for Open MPI, 
answer questions, and generally be available for discussion.

The BOF will be in the Wednesday noon hour: 
https://sc19.supercomputing.org/session/?sess=sess296

The BOF is not live streamed, but the slides will be available after SC.

We only have an hour; it can be helpful to submit your questions ahead of time. 
 That way, we can be sure to answer them during the main presentation:

https://sc19.supercomputing.org/session/?sess=sess296

Hope to see you in Denver!

-- 
Jeff Squyres
jsquy...@cisco.com