Sorry for not replying earlier.

I'm not a SCALAPACK expert, but a common mistake I've seen users make is to use the mpif.h from a different MPI implementation when compiling their fortran programs. Can you verify that you're getting the Open MPI mpif.h?

Also, there is a known problem that with the Pathscale compiler that they have stubbornly refused to comment on for about a year now (meaning: a problem was identified many moons ago, and it has not been tracked down to be either a Pathscale compiler problem or an Open MPI problem -- we did as much as we could and handed off to Pathscale, but with no forward progress since then). So you *may* be running into that issue...? FWIW, we only saw the pathscale problem when running on InfiniBand hardware, so YMMV.

Can you run any other MPI programs with Open MPI?



On Jan 22, 2008, at 4:06 PM, Backlund, Daniel wrote:


Hello all, I am using OMPI 1.2.4 on a Linux cluster (Rocks 4.2). OMPI was configured to use the Pathscale Compiler Suite installed in the (NFS mounted on nodes) / home/PROGRAMS/pathscale. I am trying to compile and run the example1.f that comes with the ACML package from AMD, and I am unable to get it to run. All nodes have the same Opteron processors and 2GB ram per core. OMPI
was configured as below.

export CC=pathcc
export CXX=pathCC
export FC=pathf90
export F77=pathf90

./configure --prefix=/opt/openmpi/1.2.4 --enable-static --without- threads --without-memory-manager \
 --without-libnuma --disable-mpi-threads

The configuration was successful, the install was successful, I can even run a sample mpihello.f90 program. I would eventually like to link the ACML SCALAPACK and BLACS libraries to our code, but I need some help. The ACML version is 3.1.0 for pathscale64. I go into the scalapack_examples directory, modify GNUmakefile to the correct values, and compile successfully. I have made openmpi into an rpm and pushed it to the nodes, modified LD_LIBRARY_PATH and PATH, and made sure I can see it on all nodes. When I try to run the example1.exe which is generated, using /opt/ openmpi/1.2.4/bin/mpirun -np 6 example1.exe
I get the following output:

<<<< example1.res >>>>

[XXXXXXX:31295] *** Process received signal ***
[XXXXXXX:31295] Signal: Segmentation fault (11)
[XXXXXXX:31295] Signal code: Address not mapped (1)
[XXXXXXX:31295] Failing at address: 0x44000070
[XXXXXXX:31295] *** End of error message ***
[XXXXXXX:31298] *** Process received signal ***
[XXXXXXX:31298] Signal: Segmentation fault (11)
[XXXXXXX:31298] Signal code: Address not mapped (1)
[XXXXXXX:31298] Failing at address: 0x44000070
[XXXXXXX:31298] *** End of error message ***
[XXXXXXX:31299] *** Process received signal ***
[XXXXXXX:31299] Signal: Segmentation fault (11)
[XXXXXXX:31299] Signal code: Address not mapped (1)
[XXXXXXX:31299] Failing at address: 0x44000070
[XXXXXXX:31299] *** End of error message ***
[XXXXXXX:31300] *** Process received signal ***
[XXXXXXX:31300] Signal: Segmentation fault (11)
[XXXXXXX:31300] Signal code: Address not mapped (1)
[XXXXXXX:31300] Failing at address: 0x44000070
[XXXXXXX:31300] *** End of error message ***
[XXXXXXX:31296] *** Process received signal ***
[XXXXXXX:31296] Signal: Segmentation fault (11)
[XXXXXXX:31296] Signal code: Address not mapped (1)
[XXXXXXX:31296] Failing at address: 0x44000070
[XXXXXXX:31296] *** End of error message ***
[XXXXXXX:31297] *** Process received signal ***
[XXXXXXX:31297] Signal: Segmentation fault (11)
[XXXXXXX:31297] Signal code: Address not mapped (1)
[XXXXXXX:31297] Failing at address: 0x44000070
[XXXXXXX:31297] *** End of error message ***
mpirun noticed that job rank 0 with PID 31295 on node XXXXXXX.ourdomain.com exited on signal 11 (Segmentation fault).
5 additional processes aborted (not shown)

<<<< end example1.res >>>>

Here is the result of ldd example1.exe

<<<< ldd example1.exe >>>>
libmpi_f90.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f90.so.0 (0x0000002a9557d000) libmpi_f77.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f77.so.0 (0x0000002a95681000) libmpi.so.0 => /opt/openmpi/1.2.4/lib/libmpi.so.0 (0x0000002a957b3000) libopen-rte.so.0 => /opt/openmpi/1.2.4/lib/libopen-rte.so.0 (0x0000002a959fb000) libopen-pal.so.0 => /opt/openmpi/1.2.4/lib/libopen-pal.so.0 (0x0000002a95be7000)
       librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e7cd00000)
       libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003e7c200000)
       libutil.so.1 => /lib64/libutil.so.1 (0x0000003e79e00000)
libmv.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmv.so.1 (0x0000002a95d4d000) libmpath.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmpath.so. 1 (0x0000002a95e76000)
       libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e77a00000)
       libdl.so.2 => /lib64/libdl.so.2 (0x0000003e77c00000)
libpathfortran.so.1 => /home/PROGRAMS/pathscale/lib/3.0/ libpathfortran.so.1 (0x0000002a95f97000)
       libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e77700000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e78200000)
       /lib64/ld-linux-x86-64.so.2 (0x0000003e76800000)
<<<< end ldd >>>>

Like I said, the compilation of the example program yields no errors, it just will not run.
Does anybody have any suggestions? Am I doing something wrong?

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to