Sorry for not replying earlier.
I'm not a SCALAPACK expert, but a common mistake I've seen users make
is to use the mpif.h from a different MPI implementation when
compiling their fortran programs. Can you verify that you're getting
the Open MPI mpif.h?
Also, there is a known problem that with the Pathscale compiler that
they have stubbornly refused to comment on for about a year now
(meaning: a problem was identified many moons ago, and it has not been
tracked down to be either a Pathscale compiler problem or an Open MPI
problem -- we did as much as we could and handed off to Pathscale, but
with no forward progress since then). So you *may* be running into
that issue...? FWIW, we only saw the pathscale problem when running
on InfiniBand hardware, so YMMV.
Can you run any other MPI programs with Open MPI?
On Jan 22, 2008, at 4:06 PM, Backlund, Daniel wrote:
Hello all, I am using OMPI 1.2.4 on a Linux cluster (Rocks 4.2).
OMPI was configured to use the
Pathscale Compiler Suite installed in the (NFS mounted on nodes) /
home/PROGRAMS/pathscale. I am
trying to compile and run the example1.f that comes with the ACML
package from AMD, and I am
unable to get it to run. All nodes have the same Opteron processors
and 2GB ram per core. OMPI
was configured as below.
export CC=pathcc
export CXX=pathCC
export FC=pathf90
export F77=pathf90
./configure --prefix=/opt/openmpi/1.2.4 --enable-static --without-
threads --without-memory-manager \
--without-libnuma --disable-mpi-threads
The configuration was successful, the install was successful, I can
even run a sample mpihello.f90
program. I would eventually like to link the ACML SCALAPACK and
BLACS libraries to our code, but I
need some help. The ACML version is 3.1.0 for pathscale64. I go into
the scalapack_examples directory,
modify GNUmakefile to the correct values, and compile successfully.
I have made openmpi into an rpm and
pushed it to the nodes, modified LD_LIBRARY_PATH and PATH, and made
sure I can see it on all nodes.
When I try to run the example1.exe which is generated, using /opt/
openmpi/1.2.4/bin/mpirun -np 6 example1.exe
I get the following output:
<<<< example1.res >>>>
[XXXXXXX:31295] *** Process received signal ***
[XXXXXXX:31295] Signal: Segmentation fault (11)
[XXXXXXX:31295] Signal code: Address not mapped (1)
[XXXXXXX:31295] Failing at address: 0x44000070
[XXXXXXX:31295] *** End of error message ***
[XXXXXXX:31298] *** Process received signal ***
[XXXXXXX:31298] Signal: Segmentation fault (11)
[XXXXXXX:31298] Signal code: Address not mapped (1)
[XXXXXXX:31298] Failing at address: 0x44000070
[XXXXXXX:31298] *** End of error message ***
[XXXXXXX:31299] *** Process received signal ***
[XXXXXXX:31299] Signal: Segmentation fault (11)
[XXXXXXX:31299] Signal code: Address not mapped (1)
[XXXXXXX:31299] Failing at address: 0x44000070
[XXXXXXX:31299] *** End of error message ***
[XXXXXXX:31300] *** Process received signal ***
[XXXXXXX:31300] Signal: Segmentation fault (11)
[XXXXXXX:31300] Signal code: Address not mapped (1)
[XXXXXXX:31300] Failing at address: 0x44000070
[XXXXXXX:31300] *** End of error message ***
[XXXXXXX:31296] *** Process received signal ***
[XXXXXXX:31296] Signal: Segmentation fault (11)
[XXXXXXX:31296] Signal code: Address not mapped (1)
[XXXXXXX:31296] Failing at address: 0x44000070
[XXXXXXX:31296] *** End of error message ***
[XXXXXXX:31297] *** Process received signal ***
[XXXXXXX:31297] Signal: Segmentation fault (11)
[XXXXXXX:31297] Signal code: Address not mapped (1)
[XXXXXXX:31297] Failing at address: 0x44000070
[XXXXXXX:31297] *** End of error message ***
mpirun noticed that job rank 0 with PID 31295 on node
XXXXXXX.ourdomain.com exited on signal 11 (Segmentation fault).
5 additional processes aborted (not shown)
<<<< end example1.res >>>>
Here is the result of ldd example1.exe
<<<< ldd example1.exe >>>>
libmpi_f90.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f90.so.0
(0x0000002a9557d000)
libmpi_f77.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f77.so.0
(0x0000002a95681000)
libmpi.so.0 => /opt/openmpi/1.2.4/lib/libmpi.so.0
(0x0000002a957b3000)
libopen-rte.so.0 => /opt/openmpi/1.2.4/lib/libopen-rte.so.0
(0x0000002a959fb000)
libopen-pal.so.0 => /opt/openmpi/1.2.4/lib/libopen-pal.so.0
(0x0000002a95be7000)
librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e7cd00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003e7c200000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003e79e00000)
libmv.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmv.so.1
(0x0000002a95d4d000)
libmpath.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmpath.so.
1 (0x0000002a95e76000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e77a00000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003e77c00000)
libpathfortran.so.1 => /home/PROGRAMS/pathscale/lib/3.0/
libpathfortran.so.1 (0x0000002a95f97000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e77700000)
libpthread.so.0 => /lib64/tls/libpthread.so.0
(0x0000003e78200000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e76800000)
<<<< end ldd >>>>
Like I said, the compilation of the example program yields no
errors, it just will not run.
Does anybody have any suggestions? Am I doing something wrong?
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems