Hi Jeff,

today I upgraded to the latest version and I still have
problems. I compiled with gcc-6.1.0 and I tried to compile
with Sun C 5.14 beta. Sun C still broke with "unrecognized
option '-path'" which was reported before, so that I use
my gcc version. By the way, this problem is solved for
openmpi-v2.x-dev-1425-ga558e90 and openmpi-dev-4050-g7f65c2b.

loki hello_2 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
      OPAL repo revision: v1.10.2-189-gfc05056
     C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
loki hello_2 125 mpiexec -np 1 --host loki hello_2_mpi : -np 1 --host loki 
--slot-list 0:0-5,1:0-5 hello_2_slave_mpi
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 1 slots
that were requested by the application:
  hello_2_slave_mpi

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------



I get a result, if I add "--slot-list" to the master process
as well. I changed "-np 2" to "-np 1" for the slave processes
to show new problems.

loki hello_2 126 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 hello_2_mpi 
: -np 1 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
Process 0 of 2 running on loki
Process 1 of 2 running on loki

Now 1 slave tasks are sending greetings.

Greetings from task 1:
  message type:        3
  msg length:          132 characters
  message:
    hostname:          loki
    operating system:  Linux
    release:           3.12.55-52.42-default
    processor:         x86_64


Now lets increase the number of slave processes to 2.
I still get only greetings from one slave process and
if I increase the number of slave processes to 3, I get
a segmentation fault. It's nearly the same for
openmpi-v2.x-dev-1425-ga558e90 (the only difference is
that the program hangs forever for 3 slave processes
for my cc and gcc version). Everything works as expected
for openmpi-dev-4050-g7f65c2b (although it takes very long
until I get all messages). It even works, if I put
"--slot-list" only once on the command line as you can see
below.

loki hello_2 127 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 hello_2_mpi 
: -np 2 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
Process 0 of 2 running on loki
Process 1 of 2 running on loki

Now 1 slave tasks are sending greetings.

Greetings from task 1:
  message type:        3
  msg length:          132 characters
  message:
    hostname:          loki
    operating system:  Linux
    release:           3.12.55-52.42-default
    processor:         x86_64


loki hello_2 128 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 hello_2_mpi 
: -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
[loki:28536] *** Process received signal ***
[loki:28536] Signal: Segmentation fault (11)
[loki:28536] Signal code: Address not mapped (1)
[loki:28536] Failing at address: 0x8
[loki:28536] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7fd40eb75870]
[loki:28536] [ 1] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7fd40edd85b0]
[loki:28536] [ 2] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7fd40edb7b08]
[loki:28536] [ 3] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7fd40eddde8a]
[loki:28536] [ 4] 
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7fd40ee1a28e]
[loki:28536] [ 5] hello_2_slave_mpi[0x400bee]
[loki:28536] [ 6] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[loki:28534] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[loki:28535] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd40e7dfb05]
[loki:28536] [ 7] hello_2_slave_mpi[0x400fb0]
[loki:28536] *** End of error message ***
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[61640,1],0]
  Exit code:    1
--------------------------------------------------------------------------
loki hello_2 129



loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
      OPAL repo revision: dev-4050-g7f65c2b
     C compiler absolute: /opt/solstudio12.5b/bin/cc
loki hello_2 115 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 hello_2_mpi 
: -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
Process 0 of 4 running on loki
Process 1 of 4 running on loki
Process 2 of 4 running on loki
Process 3 of 4 running on loki
...


It even works, if I put "--slot-list" only once on the command
line.

loki hello_2 116 mpiexec -np 1 --host loki hello_2_mpi : -np 3 --host loki 
--slot-list 0:0-5,1:0-5 hello_2_slave_mpi
Process 1 of 4 running on loki
Process 2 of 4 running on loki
Process 0 of 4 running on loki
Process 3 of 4 running on loki
...


Hopefully you know what happens and why it happens so that
you can fix the problem for openmpi-1.10.x and openmpi-2.x.
My three spawn programs work with openmpi-master as well
while "spawn_master" breaks on both openmpi-1.10.x and
openmpi-2.x with the same failure as my hello master/slave
program.

Do you know when the Java problem will be solved?


Kind regards

Siegmar



Am 15.05.2016 um 01:27 schrieb Ralph Castain:

On May 7, 2016, at 1:13 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

Hi,

yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0. The
following programs don't run anymore.


loki hello_2 112 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
     OPAL repo revision: v1.10.2-176-g9d45e07
    C compiler absolute: /opt/solstudio12.4/bin/cc
loki hello_2 113 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki 
hello_2_slave_mpi
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
 hello_2_slave_mpi

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
loki hello_2 114


The above worked fine for me with:

OPAL repo revision: v1.10.2-182-g52c7573

You might try updating.



Everything worked as expected with openmpi-v1.10.0-178-gb80f802.

loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
     OPAL repo revision: v1.10.0-178-gb80f802
    C compiler absolute: /opt/solstudio12.4/bin/cc
loki hello_2 115 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki 
hello_2_slave_mpi
Process 0 of 3 running on loki
Process 1 of 3 running on loki
Process 2 of 3 running on loki

Now 2 slave tasks are sending greetings.

Greetings from task 2:
 message type:        3
...


I have the same problem with openmpi-v2.x-dev-1404-g74d8ea0, if I use
the following commands.

mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi
mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 hello_2_slave_mpi
mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 
0:0-5,1:0-5 hello_2_slave_mpi


I have also the same problem with openmpi-dev-4010-g6c9d65c, if I use
the following command.

mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi


openmpi-dev-4010-g6c9d65c works as expected with the following commands.

mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 hello_2_slave_mpi
mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 
0:0-5,1:0-5 hello_2_slave_mpi


Has the interface changed so that I'm not allowed to use some of my
commands any longer? I would be grateful, if somebody can fix the
problem if it is a problem. Thank you very much for any help in
advance.



Kind regards

Siegmar
<hello_2_mpi.c><hello_2_slave_mpi.c>_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29126.php

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29205.php

/* Another MPI-version of the "hello world" program, which delivers
 * some information about its machine and operating system. In this
 * version the functions "master" and "slave" from "hello_1_mpi.c"
 * are implemented as independant processes. This is the file for the
 * "master".
 *
 *
 * Compiling:
 *   Store executable(s) into local directory.
 *     mpicc -o <program name> <source code file name>
 *
 *   Store executable(s) into predefined directories.
 *     make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 *     make_compile
 *
 * Running:
 *   LAM-MPI:
 *     mpiexec -boot -np <number of processes> <program name>
 *     or
 *     mpiexec -boot \
 *       -host <hostname> -np <number of processes> <program name> : \
 *       -host <hostname> -np <number of processes> <program name>
 *     or
 *     mpiexec -boot [-v] -configfile <application file>
 *     or
 *     lamboot [-v] [<host file>]
 *       mpiexec -np <number of processes> <program name>
 *       or
 *       mpiexec [-v] -configfile <application file>
 *     lamhalt
 *
 *   OpenMPI:
 *     "host1", "host2", and so on can all have the same name,
 *     if you want to start a virtual computer with some virtual
 *     cpu's on the local host. The name "localhost" is allowed
 *     as well.
 *
 *     mpiexec -np <number of processes> <program name>
 *     or
 *     mpiexec --host <host1,host2,...> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -hostfile <hostfile name> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -app <application file>
 *
 * Cleaning:
 *   local computer:
 *     rm <program name>
 *     or
 *     make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 *     make_clean_all
 *
 *
 * File: hello_2_mpi.c                  Author: S. Gross
 * Date: 01.10.2012
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mpi.h"

#define BUF_SIZE        255             /* message buffer size          */
#define MAX_TASKS       12              /* max. number of tasks         */
#define SENDTAG         1               /* send message command         */
#define EXITTAG         2               /* termination command          */
#define MSGTAG          3               /* normal message token         */

#define ENTASKS         -1              /* error: too many tasks        */

int main (int argc, char *argv[])
{
  int  mytid,                           /* my task id                   */
       ntasks,                          /* number of parallel tasks     */
       namelen,                         /* length of processor name     */
       num,                             /* number of chars in buffer    */
       i;                               /* loop variable                */
  char processor_name[MPI_MAX_PROCESSOR_NAME],
       buf[BUF_SIZE + 1];               /* message buffer (+1 for '\0') */
  MPI_Status    stat;                   /* message details              */

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the next statement every process executing this code will
   * print one line on the display. It may happen that the lines will
   * get mixed up because the display is a critical section. In general
   * only one process (mostly the process with rank 0) will print on
   * the display and all other processes will send their messages to
   * this process. Nevertheless for debugging purposes (or to
   * demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  fprintf (stdout, "Process %d of %d running on %s\n",
           mytid, ntasks, processor_name);
  fflush (stdout);
  MPI_Barrier (MPI_COMM_WORLD);         /* wait for all other processes */

  if (ntasks > MAX_TASKS)
  {
    fprintf (stderr, "Error: Too many tasks. Try again with at most "
             "%d tasks.\n", MAX_TASKS);
    /* terminate all slave tasks                                        */
    for (i = 1; i < ntasks; ++i)
    {
      MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
    }
    MPI_Finalize ();
    exit (ENTASKS);
  }
  printf ("\n\nNow %d slave tasks are sending greetings.\n\n",
          ntasks - 1);
  /* request messages from slave tasks                                  */
  for (i = 1; i < ntasks; ++i)
  {
    MPI_Send ((char *) NULL, 0, MPI_CHAR, i, SENDTAG, MPI_COMM_WORLD);
  }
  /* wait for messages and print greetings                              */
  for (i = 1; i < ntasks; ++i)
  {
    MPI_Recv (buf, BUF_SIZE, MPI_CHAR, MPI_ANY_SOURCE,
              MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
    MPI_Get_count (&stat, MPI_CHAR, &num);
    buf[num] = '\0';                    /* add missing end-of-string    */
    printf ("Greetings from task %d:\n"
            "  message type:        %d\n"
            "  msg length:          %d characters\n"
            "  message:             %s\n\n",
            stat.MPI_SOURCE, stat.MPI_TAG, num, buf);
  }
  /* terminate all slave tasks                                          */
  for (i = 1; i < ntasks; ++i)
  {
    MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
/* Another MPI-version of the "hello world" program, which delivers
 * some information about its machine and operating system. In this
 * version the functions "master" and "slave" from "hello_1_mpi.c"
 * are implemented as independant processes. This is the file for the
 * "slave".
 *
 * To simulate "real work" the "slave"-process waits up to MAX_WTIME
 * seconds before replying to a message request.
 *
 *
 * Compiling:
 *   Store executable(s) into local directory.
 *     mpicc -o <program name> <source code file name>
 *
 *   Store executable(s) into predefined directories.
 *     make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 *     make_compile
 *
 * Running:
 *   LAM-MPI:
 *     mpiexec -boot -np <number of processes> <program name>
 *     or
 *     mpiexec -boot \
 *       -host <hostname> -np <number of processes> <program name> : \
 *       -host <hostname> -np <number of processes> <program name>
 *     or
 *     mpiexec -boot [-v] -configfile <application file>
 *     or
 *     lamboot [-v] [<host file>]
 *       mpiexec -np <number of processes> <program name>
 *       or
 *       mpiexec [-v] -configfile <application file>
 *     lamhalt
 *
 *   OpenMPI:
 *     "host1", "host2", and so on can all have the same name,
 *     if you want to start a virtual computer with some virtual
 *     cpu's on the local host. The name "localhost" is allowed
 *     as well.
 *
 *     mpiexec -np <number of processes> <program name>
 *     or
 *     mpiexec --host <host1,host2,...> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -hostfile <hostfile name> \
 *       -np <number of processes> <program name>
 *     or
 *     mpiexec -app <application file>
 *
 * Cleaning:
 *   local computer:
 *     rm <program name>
 *     or
 *     make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 *     make_clean_all
 *
 *
 * File: hello_2_slave.c                Author: S. Gross
 * Date: 01.10.2012
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/utsname.h>
#include "mpi.h"

#define BUF_SIZE        255             /* message buffer size          */
#define MAX_TASKS       12              /* max. number of tasks         */
#define MAX_WTIME       30              /* max. waiting time            */
#define SENDTAG         1               /* send message command         */
#define EXITTAG         2               /* termination command          */
#define MSGTAG          3               /* normal message token         */

#define ENTASKS         -1              /* error: too many tasks        */

int main (int argc, char *argv[])
{
  struct utsname sys_info;              /* system information           */
  int  mytid,                           /* my task id                   */
       ntasks,                          /* number of parallel tasks     */
       namelen,                         /* length of processor name     */
       more_to_do;
  char processor_name[MPI_MAX_PROCESSOR_NAME],
       buf[BUF_SIZE];                   /* message buffer               */
  MPI_Status stat;                      /* message details              */

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the next statement every process executing this code will
   * print one line on the display. It may happen that the lines will
   * get mixed up because the display is a critical section. In general
   * only one process (mostly the process with rank 0) will print on
   * the display and all other processes will send their messages to
   * this process. Nevertheless for debugging purposes (or to
   * demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  fprintf (stdout, "Process %d of %d running on %s\n",
           mytid, ntasks, processor_name);
  fflush (stdout);
  MPI_Barrier (MPI_COMM_WORLD);         /* wait for all other processes */

  srand ((unsigned int) time ((time_t *) NULL) * mytid * mytid);
  more_to_do = 1;
  while (more_to_do == 1)
  {
    /* wait for a message from the master task                          */
    MPI_Recv (buf, BUF_SIZE, MPI_CHAR, 0, MPI_ANY_TAG,
              MPI_COMM_WORLD, &stat);
    if (stat.MPI_TAG != EXITTAG)
    {
      uname (&sys_info);
      strcpy (buf, "\n    hostname:          ");
      strncpy (buf + strlen (buf), sys_info.nodename,
               BUF_SIZE - strlen (buf));
      strncpy (buf + strlen (buf), "\n    operating system:  ",
               BUF_SIZE - strlen (buf));
      strncpy (buf + strlen (buf), sys_info.sysname,
               BUF_SIZE - strlen (buf));
      strncpy (buf + strlen (buf), "\n    release:           ",
               BUF_SIZE - strlen (buf));
      strncpy (buf + strlen (buf), sys_info.release,
               BUF_SIZE - strlen (buf));
      strncpy (buf + strlen (buf), "\n    processor:         ",
               BUF_SIZE - strlen (buf));
      strncpy (buf + strlen (buf), sys_info.machine,
               BUF_SIZE - strlen (buf));
      sleep (rand () % MAX_WTIME);
      MPI_Send (buf, strlen (buf), MPI_CHAR, stat.MPI_SOURCE,
                MSGTAG, MPI_COMM_WORLD);
    }
    else
    {
      more_to_do = 0;                   /* terminate                    */
    }
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}

Reply via email to