Ralph, I'm not set any default hostfile; nevertheless how can I check this?


I have 2 machines: a “master” and a “slave”. Master has the Open MPI build. 
Both machines share files (Open MPI bins and libs, etc) by NFS. Path is 
/cluster/openmpi. My example its in /cluster/examples/martin and my hostfile 
Its in /cluster/examples/martin/resources (named as “hostsfile”). I attach both 
files.

So, when I run:


$ mpirun -np 1 ./spawn7


I get:


I'm papi 0/1

I'm the spawned 1/7

I'm the spawned 2/7

I'm the spawned 0/7. Received: 99

I'm the spawned 5/7

I'm the spawned 6/7

I'm the spawned 4/7

I'm the spawned 3/7


But when I run:


$ ./spawn7


I get:


I'm papi 0/1

--------------------------------------------------------------------------

There are not enough slots available in the system to satisfy the 7

slots that were requested by the application:


/cluster/examples/martin/spawn7


Either request fewer slots for your application, or make more slots

available for use.


A "slot" is the Open MPI term for an allocatable unit where we can

launch a process. The number of slots available are defined by the

environment in which Open MPI processes are run:


1. Hostfile, via "slots=N" clauses (N defaults to number of

processor cores if not provided)

2. The --host command line parameter, via a ":N" suffix on the

hostname (N defaults to 1 if not provided)

3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)

4. If none of a hostfile, the --host command line parameter, or an

RM is present, Open MPI defaults to the number of processor cores


In all the above cases, if you want Open MPI to default to the number

of hardware threads instead of the number of processor cores, use the

--use-hwthread-cpus option.


Alternatively, you can use the --oversubscribe option to ignore the

number of available slots when deciding the number of processes to

launch.

--------------------------------------------------------------------------

[master:09093] *** An error occurred in MPI_Comm_spawn

[master:09093] *** reported by process [2032730113,0]

[master:09093] *** on communicator MPI_COMM_WORLD

[master:09093] *** MPI_ERR_SPAWN: could not spawn processes

[master:09093] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,

[master:09093] *** and potentially your MPI job)


I have:


Open Mpi version: 4.0.1

OS: Ubuntu 18.04 (on both machines)

________________________________
De: Ralph Castain <r...@open-mpi.org>
Enviado: miércoles, 25 de septiembre de 2019 16:50
Para: Martín Morales <martineduardomora...@hotmail.com>
Cc: Open MPI Users <users@lists.open-mpi.org>
Asunto: Re: [OMPI users] Singleton and Spawn

It's a different code path, that's all - just a question of what path gets 
traversed.

Would you mind posting a little more info on your two use-cases? For example, 
do you have a default hostfile telling mpirun what machines to use?


On Sep 25, 2019, at 12:41 PM, Martín Morales 
<martineduardomora...@hotmail.com<mailto:martineduardomora...@hotmail.com>> 
wrote:

Thanks Ralph, but if I have a wrong hostfile path in my MPI_Comm_spawn 
function, why it works if I run with mpirun (Eg. mpirun -np 1 ./spawnExample)?
________________________________
De: Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>>
Enviado: miércoles, 25 de septiembre de 2019 15:42
Para: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Cc: steven.va...@gmail.com<mailto:steven.va...@gmail.com> 
<steven.va...@gmail.com<mailto:steven.va...@gmail.com>>; Martín Morales 
<martineduardomora...@hotmail.com<mailto:martineduardomora...@hotmail.com>>
Asunto: Re: [OMPI users] Singleton and Spawn

Yes, of course it can - however, I believe there is a bug in the add-hostfile 
code path. We can address that problem far easier than moving to a different 
interconnect.


On Sep 25, 2019, at 11:39 AM, Martín Morales via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:

Thanks Steven. So, actually it can’t spawns from a singleton?

________________________________
De: users 
<users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> en 
nombre de Steven Varga via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Enviado: miércoles, 25 de septiembre de 2019 14:50
Para: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Cc: Steven Varga <steven.va...@gmail.com<mailto:steven.va...@gmail.com>>
Asunto: Re: [OMPI users] Singleton and Spawn

As far as I know you have to wire up the connections among MPI clients, 
allocate resources etc. PMIx is a library to set up all processes, and shipped 
with openmpi.

The standard HPC method to launch tasks is through job schedulers such as SLURM 
or GRID Engine. SLURM srun is very similar to mpirun: does the resource 
allocations, then launches the jobs on allocated nodes and cores, etc. It does 
this through PMIx library, or mpiexec.

When running mpiexec without integrated job manager, you are responsible 
allocating recourses. See mpirun for details to pass host lists, 
oversubscription etc.

If you are looking for a different, not MPI based interconnect, try ZeroMQ or 
other Remote Procedure Calls -- it won't be simpler though.

Hope it helps:
Steve

On Wed, Sep 25, 2019, 13:15 Martín Morales via users, 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:
Hi all! This is my first post. I'm newbie on Open MPI (and on MPI likewise!). I 
recently build the current version of this fabulous software (v4.0.1) on two 
Ubuntu 18 machines (a little part of our Beowulf Cluster). I already read (a 
lot) the FAQ and posts on the mail users list but I cant figure out how can I 
do this (if it can):  I need run my parallel programs without mpirun/exec 
commands; I need just one process (in my “master” machine) that will spawns 
processes dynamically (in the “slaves” machines). I already maked some dummies 
tests scripts and they works fine with  mpirun/exec commands. I set in  the 
MPI_Info_set the key “add-hostfile” with the file containing that 2 machines, 
that I mention before, with 4 slots each one. Nevertheless it doesn't work when 
I just run like a singleton program (e.g. ./spawnExample): it throws an error 
like this: “There are not enough slots available in the system to satisfy the 7 
slots that were requested by the application:...”. Here I try to start 8 
processes on the 2 machines. It seems that one process its executing fine on 
“master” and when it tries to spawns the other 7 it crashes.
We need this execution schema because we already have our software (used for 
scientific research) and we need to “incorporate” or “embed” Open MPI on it.
Thanks in advance guys!
_______________________________________________
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

int main( int argc, char *argv[] )
{
    int rank,size,msg;
    int np=7;
    MPI_Info info;
    MPI_Comm parentcomm, intercomm;
    
    MPI_Init( &argc, &argv );
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    
    MPI_Comm_get_parent( &parentcomm );
    
    if (parentcomm == MPI_COMM_NULL) {
        
        MPI_Info_create( &info );
        MPI_Info_set( info, "add-hostfile", "/cluster/examples/martin/resources/hostsfile" );
        
        printf("I'm papi %i/%i\n", rank, size);
        MPI_Comm_spawn( "/cluster/examples/martin/spawn7", MPI_ARGV_NULL, np, info, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE );
        if (0 == rank) {
            msg = 99;
            MPI_Send(&msg, 1, MPI_INT, 0, 0, intercomm);
        }
    } else {
        if (0 == rank) {
            MPI_Recv(&msg, 1, MPI_INT, 0, 0, parentcomm, MPI_STATUS_IGNORE);
            printf("I'm the spawned %i/%i. Received: %i\n", rank, size, msg);
        } else {
            printf("I'm the spawned %i/%i\n", rank, size);
        }
    }
    fflush(stdout);
    MPI_Finalize(); 
    
    return 0;
}

Attachment: hostsfile
Description: hostsfile

Reply via email to