Hi Nathan,

On Thu, Jan 21, 2010 at 02:48:51PM -0600, Champagne, Nathan J. (JSC-EV)[Jacobs T
echnology] wrote:
>
>    We started having a problem with OpenMPI beginning with version 1.3.2
>    where the program output can be correct, junk, or NaNs (result is not
>    predictable). The output is the solution of a matrix equation solved by
>    ScaLAPACK. We are using the Intel Fortran compiler (version 11.1) and
>    the GNU compiler (version 4.1.2) on Gentoo Linux. So far, the problem
>    manifests itself for a matrix (N X N) with N ~ 10,000 or more with a
>    processor count ~ 64 or more. Note that the problem still occurs using
>    OpenMPI 1.4.1.
>
>
>    We build the ScaLAPACK and BLACS libraries locally and use the LAPACK
>    and BLAS libraries supplied by Intel.
>
>
>    We wrote a test program to demonstrate the problem. The matrix is built
>    on each processor (no communication). Then, the matrix is factored and
>    solved. The solution vector is collected from the processors and
>    printed to a file by the master processor. The program and associated
>    OpenMPI information (ompi_info --all) are available at:
>
>
>    http://www.em-stuff.com/files/files.tar.gz
>
>
>    The file "compile" in the "test" directory is used to create the
>    executable. Edit it to reflect libraries on your local machine. Data
>    created using OpenMPI 1.3.1 and 1.4.1 are in the "output" directory for
>    reference.

For what it is worth:
I compiled and ran your code using 64 processors. 

# diff -u output/sol_1.3.1_96.txt test/mkl/solution_vector.txt
--- output/sol_1.3.1_96.txt     2010-01-20 06:46:41.000000000 -0800
+++ test/mkl/solution_vector.txt        2010-01-21 14:41:59.000000000 -0800
@@ -4786,7 +4786,7 @@
        4785     -0.3914681E+00   0.1178753E-03
        4786     -0.3913341E+00   0.7695833E-04
        4787     -0.3912001E+00   0.3607245E-04
-       4788     -0.3910662E+00  -0.4782369E-05
+       4788     -0.3910662E+00  -0.4782368E-05
        4789     -0.3909323E+00  -0.4560614E-04
        4790     -0.3907985E+00  -0.8639889E-04
        4791     -0.3906647E+00  -0.1271607E-03

In other words: I do not see a problem.

This is with openmpi-1.3.3, scalapack-1.8.0, mpiblacs-1.1p3,
ifort-11.1.038, mkl-10.2.0.013.

Cheers,
Martin

--
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: sieg...@sfu.ca
Canada  V5A 1S6

Reply via email to