Hi,
I installed openmpi-v2.x-dev-1468-g6011906 on my "SUSE Linux Enterprise
Server 12 (x86_64)" with Sun C 5.13 and gcc-6.1.0. Unfortunately I
get an error for the combination of "--host" and "--slot-list" for a
small program, while the program runs as expected with a single option
"--host" or "--slot-list". I reported this problem already for
openmpi-1.10.3, where I got and still get a segmentation fault for the
same combination. By the way, the combination works without problems for
openmpi-dev-4221-gb707d13.
loki spawn 162 ompi_info | grep -e "OPAL repo revision:" -e "C compiler
absolute:"
OPAL repo revision: v2.x-dev-1468-g6011906
C compiler absolute: /opt/solstudio12.4/bin/cc
loki spawn 162 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master
Parent process 0 running on loki
I create 4 slave processes
Slave process 0 of 4 running on loki
spawn_slave 0: argv[0]: spawn_slave
Slave process 1 of 4 running on loki
spawn_slave 1: argv[0]: spawn_slave
Slave process 2 of 4 running on loki
spawn_slave 2: argv[0]: spawn_slave
Slave process 3 of 4 running on loki
spawn_slave 3: argv[0]: spawn_slave
Parent process 0: tasks in MPI_COMM_WORLD: 1
tasks in COMM_CHILD_PROCESSES local group: 1
tasks in COMM_CHILD_PROCESSES remote group: 4
loki spawn 163 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master
Parent process 0 running on loki
I create 4 slave processes
[loki:19507] OPAL ERROR: Timeout in file
../../../../openmpi-v2.x-dev-1468-g6011906/opal/mca/pmix/base/pmix_base_fns.c
at line 195
[loki:19507] *** An error occurred in MPI_Comm_spawn
[loki:19507] *** reported by process [3549560833,0]
[loki:19507] *** on communicator MPI_COMM_WORLD
[loki:19507] *** MPI_ERR_UNKNOWN: unknown error
[loki:19507] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
abort,
[loki:19507] *** and potentially your MPI job)
loki spawn 164
loki spawn 164 mpiexec -np 1 --slot-list 0:0-5,1:0-5 spawn_master
Parent process 0 running on loki
I create 4 slave processes
Slave process 0 of 4 running on loki
Slave process 1 of 4 running on loki
Slave process 2 of 4 running on loki
spawn_slave 2: argv[0]: spawn_slave
Slave process 3 of 4 running on loki
spawn_slave 3: argv[0]: spawn_slave
Parent process 0: tasks in MPI_COMM_WORLD: 1
tasks in COMM_CHILD_PROCESSES local group: 1
tasks in COMM_CHILD_PROCESSES remote group: 4
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
loki spawn 165
I would be grateful if somebody can fix the problem. Thank you very much
for any help in advance.
Best regards
Siegmar
/* The program demonstrates how to spawn some dynamic MPI processes.
* This version uses one master process which creates some slave
* processes.
*
* A process or a group of processes can create another group of
* processes with "MPI_Comm_spawn ()" or "MPI_Comm_spawn_multiple ()".
* In general it is best (better performance) to start all processes
* statically with "mpiexec" via the command line. If you want to use
* dynamic processes you will normally have one master process which
* starts a lot of slave processes. In some cases it may be useful to
* enlarge a group of processes, e.g., if the MPI universe provides
* more virtual cpu's than the current number of processes and the
* program may benefit from additional processes. You will use
* "MPI_Comm_spwan_multiple ()" if you must start different
* programs or if you want to start the same program with different
* parameters.
*
* There are some reasons to prefer "MPI_Comm_spawn_multiple ()"
* instead of calling "MPI_Comm_spawn ()" multiple times. If you
* spawn new (child) processes they start up like any MPI application,
* i.e., they call "MPI_Init ()" and can use the communicator
* MPI_COMM_WORLD afterwards. This communicator contains only the
* child processes which have been created with the same call of
* "MPI_Comm_spawn ()" and which is distinct from MPI_COMM_WORLD
* of the parent process or processes created in other calls of
* "MPI_Comm_spawn ()". The natural communication mechanism between
* the groups of parent and child processes is via an
* inter-communicator which will be returned from the above
* MPI functions to spawn new processes. The local group of the
* inter-communicator contains the parent processes and the remote
* group contains the child processes. The child processes can get
* the same inter-communicator calling "MPI_Comm_get_parent ()".
* Now it is obvious that calling "MPI_Comm_spawn ()" multiple
* times will create many sets of children with different
* communicators MPI_COMM_WORLD whereas "MPI_Comm_spawn_multiple ()"
* creates child processes with a single MPI_COMM_WORLD. Furthermore
* spawning several processes in one call may be faster than spawning
* them sequentially and perhaps even the communication between
* processes spawned at the same time may be faster than communication
* between sequentially spawned processes.
*
* For collective operations it is sometimes easier if all processes
* belong to the same intra-communicator. You can use the function
* "MPI_Intercomm_merge ()" to merge the local and remote group of
* an inter-communicator into an intra-communicator.
*
*
* Compiling:
* Store executable(s) into local directory.
* mpicc -o <program name> <source code file name>
*
* Store executable(s) into predefined directories.
* make
*
* Make program(s) automatically on all specified hosts. You must
* edit the file "make_compile" and specify your host names before
* you execute it.
* make_compile
*
* Running:
* LAM-MPI:
* mpiexec -boot -np <number of processes> <program name>
* or
* mpiexec -boot \
* -host <hostname> -np <number of processes> <program name> : \
* -host <hostname> -np <number of processes> <program name>
* or
* mpiexec -boot [-v] -configfile <application file>
* or
* lamboot [-v] [<host file>]
* mpiexec -np <number of processes> <program name>
* or
* mpiexec [-v] -configfile <application file>
* lamhalt
*
* OpenMPI:
* "host1", "host2", and so on can all have the same name,
* if you want to start a virtual computer with some virtual
* cpu's on the local host. The name "localhost" is allowed
* as well.
*
* mpiexec -np <number of processes> <program name>
* or
* mpiexec --host <host1,host2,...> \
* -np <number of processes> <program name>
* or
* mpiexec -hostfile <hostfile name> \
* -np <number of processes> <program name>
* or
* mpiexec -app <application file>
*
* Cleaning:
* local computer:
* rm <program name>
* or
* make clean_all
* on all specified computers (you must edit the file "make_clean_all"
* and specify your host names before you execute it.
* make_clean_all
*
*
* File: spawn_master.c Author: S. Gross
* Date: 28.09.2013
*
*/
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define NUM_SLAVES 4 /* create NUM_SLAVES processes */
#define SLAVE_PROG "spawn_slave" /* slave program name */
int main (int argc, char *argv[])
{
MPI_Comm COMM_CHILD_PROCESSES; /* inter-communicator */
int ntasks_world, /* # of tasks in MPI_COMM_WORLD */
ntasks_local, /* COMM_CHILD_PROCESSES local */
ntasks_remote, /* COMM_CHILD_PROCESSES remote */
mytid, /* my task id */
namelen; /* length of processor name */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
/* check that only the master process is running in MPI_COMM_WORLD. */
if (ntasks_world > 1)
{
if (mytid == 0)
{
fprintf (stderr, "\n\nError: Too many processes (only one "
"process allowed).\n"
"Usage:\n"
" mpiexec %s\n\n",
argv[0]);
}
MPI_Finalize ();
exit (EXIT_SUCCESS);
}
MPI_Get_processor_name (processor_name, &namelen);
printf ("\nParent process %d running on %s\n"
" I create %d slave processes\n\n",
mytid, processor_name, NUM_SLAVES);
MPI_Comm_spawn (SLAVE_PROG, MPI_ARGV_NULL, NUM_SLAVES,
MPI_INFO_NULL, 0, MPI_COMM_WORLD,
&COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
MPI_Comm_size (COMM_CHILD_PROCESSES, &ntasks_local);
MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
printf ("Parent process %d: "
"tasks in MPI_COMM_WORLD: %d\n"
" tasks in COMM_CHILD_PROCESSES local "
"group: %d\n"
" tasks in COMM_CHILD_PROCESSES remote "
"group: %d\n\n",
mytid, ntasks_world, ntasks_local, ntasks_remote);
MPI_Comm_free (&COMM_CHILD_PROCESSES);
MPI_Finalize ();
return EXIT_SUCCESS;
}
/* The program demonstrates how to spawn some dynamic MPI processes.
* This program is the slave part for the programs "spawn_master" and
* "spawn_multiple_master".
*
* A process or a group of processes can create another group of
* processes with "MPI_Comm_spawn ()" or "MPI_Comm_spawn_multiple ()".
* In general it is best (better performance) to start all processes
* statically with "mpiexec" via the command line. If you want to use
* dynamic processes you will normally have one master process which
* starts a lot of slave processes. In some cases it may be useful to
* enlarge a group of processes, e.g., if the MPI universe provides
* more virtual cpu's than the current number of processes and the
* program may benefit from additional processes. You will use
* "MPI_Comm_spwan_multiple ()" if you must start different
* programs or if you want to start the same program with different
* parameters.
*
* There are some reasons to prefer "MPI_Comm_spawn_multiple ()"
* instead of calling "MPI_Comm_spawn ()" multiple times. If you
* spawn new (child) processes they start up like any MPI application,
* i.e., they call "MPI_Init ()" and can use the communicator
* MPI_COMM_WORLD afterwards. This communicator contains only the
* child processes which have been created with the same call of
* "MPI_Comm_spawn ()" and which is distinct from MPI_COMM_WORLD
* of the parent process or processes created in other calls of
* "MPI_Comm_spawn ()". The natural communication mechanism between
* the groups of parent and child processes is via an
* inter-communicator which will be returned from the above
* MPI functions to spawn new processes. The local group of the
* inter-communicator contains the parent processes and the remote
* group contains the child processes. The child processes can get
* the same inter-communicator calling "MPI_Comm_get_parent ()".
* Now it is obvious that calling "MPI_Comm_spawn ()" multiple
* times will create many sets of children with different
* communicators MPI_COMM_WORLD whereas "MPI_Comm_spawn_multiple ()"
* creates child processes with a single MPI_COMM_WORLD. Furthermore
* spawning several processes in one call may be faster than spawning
* them sequentially and perhaps even the communication between
* processes spawned at the same time may be faster than communication
* between sequentially spawned processes.
*
* For collective operations it is sometimes easier if all processes
* belong to the same intra-communicator. You can use the function
* "MPI_Intercomm_merge ()" to merge the local and remote group of
* an inter-communicator into an intra-communicator.
*
*
* Compiling:
* Store executable(s) into local directory.
* mpicc -o <program name> <source code file name>
*
* Store executable(s) into predefined directories.
* make
*
* Make program(s) automatically on all specified hosts. You must
* edit the file "make_compile" and specify your host names before
* you execute it.
* make_compile
*
* Running:
* LAM-MPI:
* mpiexec -boot -np <number of processes> <program name>
* or
* mpiexec -boot \
* -host <hostname> -np <number of processes> <program name> : \
* -host <hostname> -np <number of processes> <program name>
* or
* mpiexec -boot [-v] -configfile <application file>
* or
* lamboot [-v] [<host file>]
* mpiexec -np <number of processes> <program name>
* or
* mpiexec [-v] -configfile <application file>
* lamhalt
*
* OpenMPI:
* "host1", "host2", and so on can all have the same name,
* if you want to start a virtual computer with some virtual
* cpu's on the local host. The name "localhost" is allowed
* as well.
*
* mpiexec -np <number of processes> <program name>
* or
* mpiexec --host <host1,host2,...> \
* -np <number of processes> <program name>
* or
* mpiexec -hostfile <hostfile name> \
* -np <number of processes> <program name>
* or
* mpiexec -app <application file>
*
* Cleaning:
* local computer:
* rm <program name>
* or
* make clean_all
* on all specified computers (you must edit the file "make_clean_all"
* and specify your host names before you execute it.
* make_clean_all
*
*
* File: spawn_slave.c Author: S. Gross
* Date: 30.08.2012
*
*/
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main (int argc, char *argv[])
{
int ntasks_world, /* # of tasks in MPI_COMM_WORLD */
mytid, /* my task id */
namelen, /* length of processor name */
i; /* loop variable */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
MPI_Get_processor_name (processor_name, &namelen);
/* With the next statement every process executing this code will
* print one line on the display. It may happen that the lines will
* get mixed up because the display is a critical section. In general
* only one process (mostly the process with rank 0) will print on
* the display and all other processes will send their messages to
* this process. Nevertheless for debugging purposes (or to
* demonstrate that it is possible) it may be useful if every
* process prints itself.
*/
fprintf (stdout, "Slave process %d of %d running on %s\n",
mytid, ntasks_world, processor_name);
fflush (stdout);
MPI_Barrier (MPI_COMM_WORLD); /* wait for all other processes */
for (i = 0; i < argc; ++i)
{
printf ("%s %d: argv[%d]: %s\n", argv[0], mytid, i, argv[i]);
}
MPI_Finalize ();
return EXIT_SUCCESS;
}