On Tue, 13 Mar 2012 at 7:53pm, Gutierrez, Samuel K wrote

The failure signature isn't exactly what we were seeing here at LANL, but there were misplaced memory barriers in Open MPI 1.4.3. Ticket 2619 talks about this issue (https://svn.open-mpi.org/trac/ompi/ticket/2619). This doesn't explain, however, the failures that you are experiencing within Open MPI 1.5.4. Can you give 1.4.4 a whirl and see if this fixes the issue?

Would it be best to use 1.4.4 specifically, or simply the most recent 1.4.x (which appears to be 1.4.5 at this point)?

Any more information surrounding your failures in 1.5.4 are greatly appreciated.

I'm happy to provide, but what exactly are you looking for? The test code I'm running is *very* simple:

#include <stdio.h>
#include <mpi.h>

main(int argc, char **argv)
{
   int node;

   int i, j;
   float f;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &node);

   printf("Hello World from Node %d.\n", node);

   for(i=0; i<=1000000000000; i++)
       f=i*2.718281828*i+i+i*3.141592654;

   MPI_Finalize();
}

And my environment is a pretty standard CentOS-6.2 install.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

Reply via email to