The test code looks ok to me.

I will mention that Open MPI 1.4.3 is *very* old; it is now 2 generations 
behind the current.  The current stable release is 1.6.5, and the current 
feature series (1.7.x) is likely to transition to stable (1.8.x) in a few 
months.  I don't follow Ubuntu at all, but I guess I'm a bit surprised that a) 
they're so far out of date, and b) they don't even have the last release of the 
Open MPI 1.4.x series (which was 1.4.5, released Feb 14, 2012).

So yes, it could be a bug in Open MPI -- it's really hard to say with a version 
that old.  I would say that the first step is upgrading to at least Open MPI 
1.4.5 -- 1.6.5, if possible.


On Jan 10, 2014, at 5:49 AM, David Froger <david.fro...@inria.fr> wrote:

> Dear all,
> 
> We are migrating a code using OpenMPI from Ubuntu 10.04 to Ubuntu 12.04, and
> encouter some problems.
> 
> Bellow is a test code that work on Ubuntu 10.04, but fails on Ubuntu 12.04
> 
> The question is: is there a bug in the test code, or is it due to a bug in
> OpenMPI?
> 
> Thanks for any help,
> David
> 
> ==============================================================================
> OpenMPI versions
> ==============================================================================
> 
> We use the default OpenMPI versions on both version of Ubuntu:
> 
> $ apt-cache policy openmpi-bin # On Ubuntu 10.04
> openmpi-bin:
>  Installed: 1.4.1-2
>  Candidate: 1.4.1-2
>  Version table:
> *** 1.4.1-2 0
>        500 http://ubuntu.lucid.miroir.rocq.inria.fr/ lucid/universe Packages
>        100 /var/lib/dpkg/status
> 
> $ apt-cache policy openmpi-bin # On Ubuntu 12.04
> openmpi-bin:
>  Installed: 1.4.3-2.1ubuntu3
>  Candidate: 1.4.3-2.1ubuntu3
>  Version table:
> *** 1.4.3-2.1ubuntu3 0
>        500 http://ubuntu.precise.miroir.rocq.inria.fr/ precise/universe amd64 
> Packages
>        100 /var/lib/dpkg/status
> 
> ==============================================================================
> Error messages
> ==============================================================================
> 
> The test code given bellow is working on Ubuntu 10.04, but sometimes fails on
> 12.04, with the folling output for example:
> 
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 10 in communicator MPI_COMM_WORLD 
> with errorcode 1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> Error rank 10 tab[0] = 8
> Error rank 11 tab[0] = 7
> Error rank 12 tab[0] = 6
> Error rank 13 tab[0] = 10
> Error rank 14 tab[2] = 10
> --------------------------------------------------------------------------
> mpiexec has exited due to process rank 10 with PID 10284 on
> node saphene exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
> --------------------------------------------------------------------------
> [saphene:10273] 4 more processes have sent help message help-mpi-api.txt / 
> mpi-abort
> [saphene:10273] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> 
> ==============================================================================
> Test code
> ==============================================================================
> 
> Here is the code:
> 
> #include <iostream>
> #include <mpi.h>
> 
> using namespace std;
> 
> int main(int argc, char** argv)
> {
>       int ierr;
>       ierr = MPI_Init(&argc, &argv);
> 
>       if(ierr != MPI_SUCCESS){
>               cout << "Error initializing mpi" << endl;
>               MPI_Abort(MPI_COMM_WORLD, ierr);
>       }
> 
>       // get the number of process
>       int numProcess;
>       MPI_Comm_size(MPI_COMM_WORLD, &numProcess);
> 
>       // get the rank of the process
>       int rank;
>       MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>       
>       for(int it=0; it<20; it++){
>               // gather all rank in an array
>               int *tab = new int[numProcess];
>               ierr = MPI_Allgather(&rank, 1, MPI_INT, tab, 1, MPI_INT, 
> MPI_COMM_WORLD);
> 
>         if(ierr != MPI_SUCCESS){
>                 cout << "Error MPI_Allgather rank:" << rank << endl;
>                 MPI_Abort(MPI_COMM_WORLD, ierr);
>         }
> 
>               // check that everything is ok
>               for(int i=0; i<numProcess; i++){
>                       if(tab[i] != i){
>                               cout << "Error rank " << rank << " tab[" << i 
> << "] = " << tab[i] << endl;
>        MPI_Abort(MPI_COMM_WORLD, 1);
>                       }
>               }
>               delete [] tab;  
>       }
> 
>       MPI_Finalize();
>       cout << "Exit normally" << endl;
>       return 0;
> }
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to