Hi,

I have built openmpi-v1.10.2-142-g5cd9490 on my machines
(Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux
12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. Unfortunately I get
runtime errors for some programs.


Sun C 5.13:
===========

tyr spawn 116 mpiexec -np 1 --host tyr,sunpc1,linpc1,linpc1,ruester spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
  I create 4 slave processes

Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (proc_pointer))->obj_magic_id, file ../../openmpi-v1.10.2-142-g5cd9490/ompi/group/group_init.c, line 215, function ompi_group_increment_proc_count
[ruester:10077] *** Process received signal ***
[ruester:10077] Signal: Abort (6)
[ruester:10077] Signal code:  (-1)
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x1c
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:0x1b10f0
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 2091943080 (?)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert_c99+0x78
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0x10c
/usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0xe758
/usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0x113d4
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x188c
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:MPI_Init+0x26c
/home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x18
/home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x108
[ruester:10077] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 3 with PID 0 on node ruester exited on signal 6 (Abort).
--------------------------------------------------------------------------





GCC-5.1.0:
==========

tyr spawn 129 mpiexec -np 1 --host ruester,ruester,sunpc1,linpc1,linpc1 spawn_master

Parent process 0 running on ruester.informatik.hs-fulda.de
  I create 4 slave processes

[ruester.informatik.hs-fulda.de:09823] [[60617,1],0] ORTE_ERROR_LOG: Unreachable in file ../../../../../openmpi-v1.10.2-142-g5cd9490/ompi/mca/dpm/orte/dpm_orte.c at line 523
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[60617,1],0]) is on host: ruester
  Process 2 ([[0,0],0]) is on host: unknown!
  BTLs attempted: tcp self

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[ruester:9823] *** An error occurred in MPI_Comm_spawn
[ruester:9823] *** reported by process [3972595713,0]
[ruester:9823] *** on communicator MPI_COMM_WORLD
[ruester:9823] *** MPI_ERR_INTERN: internal error
[ruester:9823] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ruester:9823] ***    and potentially your MPI job)
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[60617,1],0]
  Exit code:    17
--------------------------------------------------------------------------
tyr spawn 130


tyr spawn 133 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester 
spawn_multiple_master

Parent process 0 running on tyr.informatik.hs-fulda.de
  I create 3 slave processes.

Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (proc_pointer))->obj_magic_id, file ../../openmpi-v1.10.2-142-g5cd9490/ompi/group/group_init.c, line 215, function ompi_group_increment_proc_count
[ruester:09954] *** Process received signal ***
[ruester:09954] Signal: Abort (6)
[ruester:09954] Signal code:  (-1)
/usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x2c
/usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:0xc2c0c
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert_c99+0x78
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0xf0
/usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x6638
/usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x948c
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x1978
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:MPI_Init+0x2a8
/home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x10
/home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x7c
[ruester:09954] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 0 on node ruester exited on signal 6 (Abort).
--------------------------------------------------------------------------
tyr spawn 134



I would be grateful if somebody can fix the problems. Thank you very
much for any help in advance.


Kind regards

Siegmar
/* The program demonstrates how to spawn some dynamic MPI processes.
 * This version uses one master process which creates some slave
 * processes.
 *
 * A process or a group of processes can create another group of
 * processes with "MPI_Comm_spawn ()" or "MPI_Comm_spawn_multiple ()".
 * In general it is best (better performance) to start all processes
 * statically with "mpiexec" via the command line. If you want to use
 * dynamic processes you will normally have one master process which
 * starts a lot of slave processes. In some cases it may be useful to
 * enlarge a group of processes, e.g., if the MPI universe provides
 * more virtual cpu's than the current number of processes and the
 * program may benefit from additional processes. You will use
 * "MPI_Comm_spwan_multiple ()" if you must start different
 * programs or if you want to start the same program with different
 * parameters.
 *
 * There are some reasons to prefer "MPI_Comm_spawn_multiple ()"
 * instead of calling "MPI_Comm_spawn ()" multiple times. If you
 * spawn new (child) processes they start up like any MPI application,
 * i.e., they call "MPI_Init ()" and can use the communicator
 * MPI_COMM_WORLD afterwards. This communicator contains only the
 * child processes which have been created with the same call of
 * "MPI_Comm_spawn ()" and which is distinct from MPI_COMM_WORLD
 * of the parent process or processes created in other calls of
 * "MPI_Comm_spawn ()". The natural communication mechanism between
 * the groups of parent and child processes is via an
 * inter-communicator which will be returned from the above
 * MPI functions to spawn new processes. The local group of the
 * inter-communicator contains the parent processes and the remote
 * group contains the child processes. The child processes can get
 * the same inter-communicator calling "MPI_Comm_get_parent ()".
 * Now it is obvious that calling "MPI_Comm_spawn ()" multiple
 * times will create many sets of children with different
 * communicators MPI_COMM_WORLD whereas "MPI_Comm_spawn_multiple ()"
 * creates child processes with a single MPI_COMM_WORLD. Furthermore
 * spawning several processes in one call may be faster than spawning
 * them sequentially and perhaps even the communication between
 * processes spawned at the same time may be faster than communication
 * between sequentially spawned processes.
 *
 * For collective operations it is sometimes easier if all processes
 * belong to the same intra-communicator. You can use the function
 * "MPI_Intercomm_merge ()" to merge the local and remote group of
 * an inter-communicator into an intra-communicator.
 * 
 *
 * Compiling:
 *   Store executable(s) into local directory.
 *     mpicc -o <program name> <source code file name>
 *
 *   Store executable(s) into predefined directories.
 *     make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 *     make_compile
 *
 * Running:
 *   LAM-MPI:
 *     mpiexec -boot -np <number of processes> <program name>
 *     or
 *     mpiexec -boot \
 *	 -host <hostname> -np <number of processes> <program name> : \
 *	 -host <hostname> -np <number of processes> <program name>
 *     or
 *     mpiexec -boot [-v] -configfile <application file>
 *     or
 *     lamboot [-v] [<host file>]
 *       mpiexec -np <number of processes> <program name>
 *	 or
 *	 mpiexec [-v] -configfile <application file>
 *     lamhalt
 *
 *   OpenMPI:
 *     "host1", "host2", and so on can all have the same name,
 *     if you want to start a virtual computer with some virtual
 *     cpu's on the local host. The name "localhost" is allowed
 *     as well.
 *
 *     mpiexec -np <number of processes> <program name>
 *     or
 *     mpiexec --host <host1,host2,...> \
 *	 -np <number of processes> <program name>
 *     or
 *     mpiexec -hostfile <hostfile name> \
 *	 -np <number of processes> <program name>
 *     or
 *     mpiexec -app <application file>
 *
 * Cleaning:
 *   local computer:
 *     rm <program name>
 *     or
 *     make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 *     make_clean_all
 *
 *
 * File: spawn_master.c			Author: S. Gross
 * Date: 28.09.2013
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_SLAVES	4		/* create NUM_SLAVES processes	*/
#define SLAVE_PROG	"spawn_slave"	/* slave program name		*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_CHILD_PROCESSES;	/* inter-communicator		*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   mytid,			/* my task id			*/
	   namelen;			/* length of processor name	*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
  /* check that only the master process is running in MPI_COMM_WORLD.   */
  if (ntasks_world > 1)
  {
    if (mytid == 0)
    {
      fprintf (stderr, "\n\nError: Too many processes (only one "
	       "process allowed).\n"
	       "Usage:\n"
	       "  mpiexec %s\n\n",
	       argv[0]);
    }
    MPI_Finalize ();
    exit (EXIT_SUCCESS);
  }
  MPI_Get_processor_name (processor_name, &namelen);
  printf ("\nParent process %d running on %s\n"
	  "  I create %d slave processes\n\n",
	  mytid,  processor_name, NUM_SLAVES);
  MPI_Comm_spawn (SLAVE_PROG, MPI_ARGV_NULL, NUM_SLAVES,
		  MPI_INFO_NULL, 0, MPI_COMM_WORLD,
		  &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  MPI_Comm_size	(COMM_CHILD_PROCESSES, &ntasks_local);
  MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
  printf ("Parent process %d: "
	  "tasks in MPI_COMM_WORLD:                    %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES local "
	  "group:  %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES remote "
	  "group: %d\n\n",
	  mytid, ntasks_world, ntasks_local, ntasks_remote);
  MPI_Comm_free (&COMM_CHILD_PROCESSES);
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
/* The program demonstrates how to spawn some dynamic MPI processes.
 * This version uses one master process which creates two types of
 * slave processes with different argument vectors. The argument
 * vector contains the parameters passed to the program. Basically it
 * corresponds to a normal argument vector for C programs. The main
 * difference is that p_argv[0] contains the first parameter and not
 * the name of the program. The function which you will use to spawn
 * processes will build a normal argument vector consisting of the
 * program name followed by the parameters in "p_argv".
 *
 * A process or a group of processes can create another group of
 * processes with "MPI_Comm_spawn ()" or "MPI_Comm_spawn_multiple ()".
 * In general it is best (better performance) to start all processes
 * statically with "mpiexec" via the command line. If you want to use
 * dynamic processes you will normally have one master process which
 * starts a lot of slave processes. In some cases it may be useful to
 * enlarge a group of processes, e.g., if the MPI universe provides
 * more virtual cpu's than the current number of processes and the
 * program may benefit from additional processes. You will use
 * "MPI_Comm_spwan_multiple ()" if you must start different
 * programs or if you want to start the same program with different
 * parameters.
 *
 * There are some reasons to prefer "MPI_Comm_spawn_multiple ()"
 * instead of calling "MPI_Comm_spawn ()" multiple times. If you
 * spawn new (child) processes they start up like any MPI application,
 * i.e., they call "MPI_Init ()" and can use the communicator
 * MPI_COMM_WORLD afterwards. This communicator contains only the
 * child processes which have been created with the same call of
 * "MPI_Comm_spawn ()" and which is distinct from MPI_COMM_WORLD
 * of the parent process or processes created in other calls of
 * "MPI_Comm_spawn ()". The natural communication mechanism between
 * the groups of parent and child processes is via an
 * inter-communicator which will be returned from the above
 * MPI functions to spawn new processes. The local group of the
 * inter-communicator contains the parent processes and the remote
 * group contains the child processes. The child processes can get
 * the same inter-communicator calling "MPI_Comm_get_parent ()".
 * Now it is obvious that calling "MPI_Comm_spawn ()" multiple
 * times will create many sets of children with different
 * communicators MPI_COMM_WORLD whereas "MPI_Comm_spawn_multiple ()"
 * creates child processes with a single MPI_COMM_WORLD. Furthermore
 * spawning several processes in one call may be faster than spawning
 * them sequentially and perhaps even the communication between
 * processes spawned at the same time may be faster than communication
 * between sequentially spawned processes.
 *
 * For collective operations it is sometimes easier if all processes
 * belong to the same intra-communicator. You can use the function
 * "MPI_Intercomm_merge ()" to merge the local and remote group of
 * an inter-communicator into an intra-communicator.
 * 
 *
 * Compiling:
 *   Store executable(s) into local directory.
 *     mpicc -o <program name> <source code file name>
 *
 *   Store executable(s) into predefined directories.
 *     make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 *     make_compile
 *
 * Running:
 *   LAM-MPI:
 *     mpiexec -boot -np <number of processes> <program name>
 *     or
 *     mpiexec -boot \
 *	 -host <hostname> -np <number of processes> <program name> : \
 *	 -host <hostname> -np <number of processes> <program name>
 *     or
 *     mpiexec -boot [-v] -configfile <application file>
 *     or
 *     lamboot [-v] [<host file>]
 *       mpiexec -np <number of processes> <program name>
 *	 or
 *	 mpiexec [-v] -configfile <application file>
 *     lamhalt
 *
 *   OpenMPI:
 *     "host1", "host2", and so on can all have the same name,
 *     if you want to start a virtual computer with some virtual
 *     cpu's on the local host. The name "localhost" is allowed
 *     as well.
 *
 *     mpiexec -np <number of processes> <program name>
 *     or
 *     mpiexec --host <host1,host2,...> \
 *	 -np <number of processes> <program name>
 *     or
 *     mpiexec -hostfile <hostfile name> \
 *	 -np <number of processes> <program name>
 *     or
 *     mpiexec -app <application file>
 *
 * Cleaning:
 *   local computer:
 *     rm <program name>
 *     or
 *     make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 *     make_clean_all
 *
 *
 * File: spawn_multiple_master.c	Author: S. Gross
 * Date: 28.09.2013
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_PROGS	2		/* # of programs		*/
#define NUM_SLAVES_1	1		/* # of slave processes, type 1	*/
#define NUM_SLAVES_2	2		/* # of slave processes, type 2	*/
#define SLAVE_PROG_1	"spawn_slave"	/* slave program name, type 1	*/
#define SLAVE_PROG_2	"spawn_slave"	/* slave program name, type 2	*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_CHILD_PROCESSES;	/* inter-communicator		*/
  MPI_Info array_of_infos[NUM_PROGS];	/* startup hints for each cmd	*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   mytid,			/* my task id			*/
	   namelen,			/* length of processor name	*/
	   array_of_n_procs[NUM_PROGS],	/* number of processes		*/
	   count_slaves,		/* total number of slaves	*/
	   i;				/* loop variable		*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME],
	   *array_of_commands[NUM_PROGS],
	   **array_of_argvs[NUM_PROGS],
	   *p_argv_1[] = {"program type 1", NULL},
	   *p_argv_2[] = {"program type 2", "another parameter", NULL};

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
  /* check that only the master process is running in MPI_COMM_WORLD.   */
  if (ntasks_world > 1)
  {
    if (mytid == 0)
    {
      fprintf (stderr, "\n\nError: Too many processes (only one "
	       "process allowed).\n"
	       "Usage:\n"
	       "  mpiexec %s\n\n",
	       argv[0]);
    }
    MPI_Finalize ();
    exit (EXIT_SUCCESS);
  }
  MPI_Get_processor_name (processor_name, &namelen);
  count_slaves = 0;
  for (i = 0; i < NUM_PROGS; ++i)
  {
    if ((i % 2) == 0)
    {
      array_of_commands[i] = SLAVE_PROG_1;
      array_of_argvs[i]	   = p_argv_1;
      array_of_n_procs[i]  = NUM_SLAVES_1;
      array_of_infos[i]	   = MPI_INFO_NULL;
      count_slaves	   += NUM_SLAVES_1;
    }
    else
    {
      array_of_commands[i] = SLAVE_PROG_2;
      array_of_argvs[i]	   = p_argv_2;
      array_of_n_procs[i]  = NUM_SLAVES_2;
      array_of_infos[i]	   = MPI_INFO_NULL;
      count_slaves	   += NUM_SLAVES_2;
    }
  }
  printf ("\nParent process %d running on %s\n"
	  "  I create %d slave processes.\n\n",
	  mytid,  processor_name, count_slaves);
  MPI_Comm_spawn_multiple (NUM_PROGS, array_of_commands,
			   array_of_argvs, array_of_n_procs,
			   array_of_infos, 0, MPI_COMM_WORLD,
			   &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  MPI_Comm_size	(COMM_CHILD_PROCESSES, &ntasks_local);
  MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
  printf ("Parent process %d: "
	  "tasks in MPI_COMM_WORLD:                    %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES local "
	  "group:  %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES remote "
	  "group: %d\n\n",
	  mytid, ntasks_world, ntasks_local, ntasks_remote);
  MPI_Comm_free (&COMM_CHILD_PROCESSES);
  MPI_Finalize ();
  return EXIT_SUCCESS;
}

Reply via email to