Hi Nathan, On Thu, Jan 21, 2010 at 02:48:51PM -0600, Champagne, Nathan J. (JSC-EV)[Jacobs T echnology] wrote: > > We started having a problem with OpenMPI beginning with version 1.3.2 > where the program output can be correct, junk, or NaNs (result is not > predictable). The output is the solution of a matrix equation solved by > ScaLAPACK. We are using the Intel Fortran compiler (version 11.1) and > the GNU compiler (version 4.1.2) on Gentoo Linux. So far, the problem > manifests itself for a matrix (N X N) with N ~ 10,000 or more with a > processor count ~ 64 or more. Note that the problem still occurs using > OpenMPI 1.4.1. > > > We build the ScaLAPACK and BLACS libraries locally and use the LAPACK > and BLAS libraries supplied by Intel. > > > We wrote a test program to demonstrate the problem. The matrix is built > on each processor (no communication). Then, the matrix is factored and > solved. The solution vector is collected from the processors and > printed to a file by the master processor. The program and associated > OpenMPI information (ompi_info --all) are available at: > > > http://www.em-stuff.com/files/files.tar.gz > > > The file "compile" in the "test" directory is used to create the > executable. Edit it to reflect libraries on your local machine. Data > created using OpenMPI 1.3.1 and 1.4.1 are in the "output" directory for > reference.
For what it is worth: I compiled and ran your code using 64 processors. # diff -u output/sol_1.3.1_96.txt test/mkl/solution_vector.txt --- output/sol_1.3.1_96.txt 2010-01-20 06:46:41.000000000 -0800 +++ test/mkl/solution_vector.txt 2010-01-21 14:41:59.000000000 -0800 @@ -4786,7 +4786,7 @@ 4785 -0.3914681E+00 0.1178753E-03 4786 -0.3913341E+00 0.7695833E-04 4787 -0.3912001E+00 0.3607245E-04 - 4788 -0.3910662E+00 -0.4782369E-05 + 4788 -0.3910662E+00 -0.4782368E-05 4789 -0.3909323E+00 -0.4560614E-04 4790 -0.3907985E+00 -0.8639889E-04 4791 -0.3906647E+00 -0.1271607E-03 In other words: I do not see a problem. This is with openmpi-1.3.3, scalapack-1.8.0, mpiblacs-1.1p3, ifort-11.1.038, mkl-10.2.0.013. Cheers, Martin -- Martin Siegert Head, Research Computing WestGrid Site Lead IT Services phone: 778 782-4691 Simon Fraser University fax: 778 782-4242 Burnaby, British Columbia email: sieg...@sfu.ca Canada V5A 1S6