Ralph, I'm not set any default hostfile; nevertheless how can I check this?
I have 2 machines: a “master” and a “slave”. Master has the Open MPI build. Both machines share files (Open MPI bins and libs, etc) by NFS. Path is /cluster/openmpi. My example its in /cluster/examples/martin and my hostfile Its in /cluster/examples/martin/resources (named as “hostsfile”). I attach both files. So, when I run: $ mpirun -np 1 ./spawn7 I get: I'm papi 0/1 I'm the spawned 1/7 I'm the spawned 2/7 I'm the spawned 0/7. Received: 99 I'm the spawned 5/7 I'm the spawned 6/7 I'm the spawned 4/7 I'm the spawned 3/7 But when I run: $ ./spawn7 I get: I'm papi 0/1 -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 7 slots that were requested by the application: /cluster/examples/martin/spawn7 Either request fewer slots for your application, or make more slots available for use. A "slot" is the Open MPI term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which Open MPI processes are run: 1. Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided) 2. The --host command line parameter, via a ":N" suffix on the hostname (N defaults to 1 if not provided) 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) 4. If none of a hostfile, the --host command line parameter, or an RM is present, Open MPI defaults to the number of processor cores In all the above cases, if you want Open MPI to default to the number of hardware threads instead of the number of processor cores, use the --use-hwthread-cpus option. Alternatively, you can use the --oversubscribe option to ignore the number of available slots when deciding the number of processes to launch. -------------------------------------------------------------------------- [master:09093] *** An error occurred in MPI_Comm_spawn [master:09093] *** reported by process [2032730113,0] [master:09093] *** on communicator MPI_COMM_WORLD [master:09093] *** MPI_ERR_SPAWN: could not spawn processes [master:09093] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [master:09093] *** and potentially your MPI job) I have: Open Mpi version: 4.0.1 OS: Ubuntu 18.04 (on both machines) ________________________________ De: Ralph Castain <r...@open-mpi.org> Enviado: miércoles, 25 de septiembre de 2019 16:50 Para: Martín Morales <martineduardomora...@hotmail.com> Cc: Open MPI Users <users@lists.open-mpi.org> Asunto: Re: [OMPI users] Singleton and Spawn It's a different code path, that's all - just a question of what path gets traversed. Would you mind posting a little more info on your two use-cases? For example, do you have a default hostfile telling mpirun what machines to use? On Sep 25, 2019, at 12:41 PM, Martín Morales <martineduardomora...@hotmail.com<mailto:martineduardomora...@hotmail.com>> wrote: Thanks Ralph, but if I have a wrong hostfile path in my MPI_Comm_spawn function, why it works if I run with mpirun (Eg. mpirun -np 1 ./spawnExample)? ________________________________ De: Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>> Enviado: miércoles, 25 de septiembre de 2019 15:42 Para: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: steven.va...@gmail.com<mailto:steven.va...@gmail.com> <steven.va...@gmail.com<mailto:steven.va...@gmail.com>>; Martín Morales <martineduardomora...@hotmail.com<mailto:martineduardomora...@hotmail.com>> Asunto: Re: [OMPI users] Singleton and Spawn Yes, of course it can - however, I believe there is a bug in the add-hostfile code path. We can address that problem far easier than moving to a different interconnect. On Sep 25, 2019, at 11:39 AM, Martín Morales via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Thanks Steven. So, actually it can’t spawns from a singleton? ________________________________ De: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> en nombre de Steven Varga via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Enviado: miércoles, 25 de septiembre de 2019 14:50 Para: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: Steven Varga <steven.va...@gmail.com<mailto:steven.va...@gmail.com>> Asunto: Re: [OMPI users] Singleton and Spawn As far as I know you have to wire up the connections among MPI clients, allocate resources etc. PMIx is a library to set up all processes, and shipped with openmpi. The standard HPC method to launch tasks is through job schedulers such as SLURM or GRID Engine. SLURM srun is very similar to mpirun: does the resource allocations, then launches the jobs on allocated nodes and cores, etc. It does this through PMIx library, or mpiexec. When running mpiexec without integrated job manager, you are responsible allocating recourses. See mpirun for details to pass host lists, oversubscription etc. If you are looking for a different, not MPI based interconnect, try ZeroMQ or other Remote Procedure Calls -- it won't be simpler though. Hope it helps: Steve On Wed, Sep 25, 2019, 13:15 Martín Morales via users, <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Hi all! This is my first post. I'm newbie on Open MPI (and on MPI likewise!). I recently build the current version of this fabulous software (v4.0.1) on two Ubuntu 18 machines (a little part of our Beowulf Cluster). I already read (a lot) the FAQ and posts on the mail users list but I cant figure out how can I do this (if it can): I need run my parallel programs without mpirun/exec commands; I need just one process (in my “master” machine) that will spawns processes dynamically (in the “slaves” machines). I already maked some dummies tests scripts and they works fine with mpirun/exec commands. I set in the MPI_Info_set the key “add-hostfile” with the file containing that 2 machines, that I mention before, with 4 slots each one. Nevertheless it doesn't work when I just run like a singleton program (e.g. ./spawnExample): it throws an error like this: “There are not enough slots available in the system to satisfy the 7 slots that were requested by the application:...”. Here I try to start 8 processes on the 2 machines. It seems that one process its executing fine on “master” and when it tries to spawns the other 7 it crashes. We need this execution schema because we already have our software (used for scientific research) and we need to “incorporate” or “embed” Open MPI on it. Thanks in advance guys! _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users
#include "mpi.h" #include <stdio.h> #include <stdlib.h> int main( int argc, char *argv[] ) { int rank,size,msg; int np=7; MPI_Info info; MPI_Comm parentcomm, intercomm; MPI_Init( &argc, &argv ); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_get_parent( &parentcomm ); if (parentcomm == MPI_COMM_NULL) { MPI_Info_create( &info ); MPI_Info_set( info, "add-hostfile", "/cluster/examples/martin/resources/hostsfile" ); printf("I'm papi %i/%i\n", rank, size); MPI_Comm_spawn( "/cluster/examples/martin/spawn7", MPI_ARGV_NULL, np, info, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE ); if (0 == rank) { msg = 99; MPI_Send(&msg, 1, MPI_INT, 0, 0, intercomm); } } else { if (0 == rank) { MPI_Recv(&msg, 1, MPI_INT, 0, 0, parentcomm, MPI_STATUS_IGNORE); printf("I'm the spawned %i/%i. Received: %i\n", rank, size, msg); } else { printf("I'm the spawned %i/%i\n", rank, size); } } fflush(stdout); MPI_Finalize(); return 0; }
hostsfile
Description: hostsfile