I just upgraded the OS on one of my workstations from Fedora 17 to 18
and now I can't run even the simplest MPI programs.

I filed a bug report with Fedora's bug tracker :

https://bugzilla.redhat.com/show_bug.cgi?id=986409

My simple program is attached as mpi_simple.c

mpicc works :

  mpicc -g -o mpi_simple mpi_simple.c

I can even take the generated program to another computer and it runs fine.

I can run mon MPI programs with mpirun :

  mpirun -n 4 hostname
  murron.hobbs-hancock
  murron.hobbs-hancock
  murron.hobbs-hancock
  murron.hobbs-hancock

When I run a program that calls MPI_Init I get an error which includes :

--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

The output of :

 mpirun -n 1 mpi_simple

is attached as error.txt

I suspect it matters that this is a lenovo S10 with what /proc/cpuinfo
calls a "Intel(R) Core(TM)2 Quad CPU    Q6600"

I did a bit of poking around in gdb but I don't know what I'm looking for.

Does anybody have an idea what's going on?
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

int main( int argc, char * argv[] )
{

  int rank, size;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  printf("my rank is %i of %i\n", rank, size );

  MPI_Finalize();

  return EXIT_SUCCESS;
}
[murron.hobbs-hancock:22465] [[38938,1],0] ORTE_ERROR_LOG: Error in file 
util/nidmap.c at line 148
[murron.hobbs-hancock:22465] [[38938,1],0] ORTE_ERROR_LOG: Error in file 
ess_env_module.c at line 174
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[murron.hobbs-hancock:22465] [[38938,1],0] ORTE_ERROR_LOG: Error in file 
runtime/orte_init.c at line 128
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[murron.hobbs-hancock:22465] *** An error occurred in MPI_Init
[murron.hobbs-hancock:22465] *** on a NULL communicator
[murron.hobbs-hancock:22465] *** Unknown error
[murron.hobbs-hancock:22465] *** MPI_ERRORS_ARE_FATAL: your MPI job will now 
abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason:     Before MPI_INIT completed
  Local host: murron.hobbs-hancock
  PID:        22465
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 22465 on
node murron.hobbs-hancock exiting improperly. There are two reasons this could 
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to