Dear all,

I have recently started working on a project using OpenMPI. Basically,
I have been given some c++ code, a cluster to play with and a deadline
in order to make the c++ code run faster. The cluster was a bit
crowded, so I started working on my laptop (g++ 4.3.3 -- Ubuntu repos,
OpenMPI 1.3.2 -- compiled with no options ) and after one week I
actually had something that was running on my computer, therefore
decided to move to the cluster. Since the cluster is very old and it
was using g++ 3.2 and an old version of OpenMPI, I decided to install
both of them from source in my home folder (g++ 4.4, OpenMPI 1.3.2).
The issue is that when I run the program (after being compiled
flawless on the machine), I get these error messages:

[denali:30134] *** Process received signal ***
[denali:30134] Signal: Segmentation fault (11)
[denali:30134] Signal code: Address not mapped (1)
[denali:30134] Failing at address: 0x18

(more in the attached file -- mpirun -np 4 ray-trace)

All this morning, I have gone through the mailing lists, found people
experiencing my problems, but their solution did not work for me. By
using simple debugging (cout), I was able to determine where the error
comes from:

//Initialize step
MPI_Init(&argc,&argv);
//Here it breaks!!! Memory allocation issue!
MPI_Comm_size(MPI_COMM_WORLD, &pool);
std::cout<<"I'm here"<<std::endl; //this statement is never reached
MPI_Comm_rank(MPI_COMM_WORLD, &myid);

When trying to debug via gdb, the problem seems to be:

Program received signal SIGSEGV, Segmentation fault.
0xb7524772 in ompi_comm_invalid (comm=Could not find the frame base
for "ompi_comm_invalid".) at communicator.h:261
261     communicator.h: No such file or directory.
        in communicator.h

which might indicate a problem with paths. For now, my LD_LIBRARY_PATH
is set to "/users/cluster/cdavid/local/lib/" (the local folder in my
home folder emulates the directory structure of the / folder).

Moreover, I wanted to see if the installation is actually ok and I
tried running this program:

http://en.wikipedia.org/wiki/Message_Passing_Interface#Example_program

with exactly the same results; the code breaks when the memory address
of variable pool is referenced.


So, if you have any ideas or you think I might have missed something,
please let me know.



Thanks,

Catalin

-- 

******************************
Catalin David
B.Sc. Computer Science 2010
Jacobs University Bremen

Phone: +49-(0)1577-49-38-667

College Ring 4, #343
Bremen, 28759
Germany
******************************
rm -f bin/*.o
rm -f src/*~
mpicxx -Wall -g -c -o bin/building.o src/building.cpp
mpicxx -Wall -g -c -o bin/complex.o src/complex.cpp
mpicxx -Wall -g -c -o bin/ComplexVector.o src/ComplexVector.cpp
mpicxx -Wall -g -c -o bin/environment.o src/environment.cpp
mpicxx -Wall -g -c -o bin/face.o src/face.cpp
mpicxx -Wall -g -c -o bin/image.o src/image.cpp
mpicxx -Wall -g -c -o bin/point.o src/point.cpp
mpicxx -Wall -g -c -o bin/ray.o src/ray.cpp
mpicxx -Wall -g -c -o bin/raylist.o src/raylist.cpp
mpicxx -Wall -g -c -o bin/raynode.o src/raynode.cpp
mpicxx -Wall -g -c -o bin/ray-trace.o src/ray-trace.cpp 
mpicxx -Wall -g -c -o bin/segment.o src/segment.cpp
mpicxx -Wall -g -o bin/ray-trace bin/building.o bin/complex.o bin/ComplexVector.o bin/environment.o bin/face.o bin/image.o bin/point.o bin/ray.o bin/raylist.o bin/raynode.o bin/ray-trace.o bin/segment.o

[denali:31245] *** Process received signal ***
[denali:31245] Signal: Segmentation fault (11)
[denali:31245] Signal code: Address not mapped (1)
[denali:31245] Failing at address: 0x18
[denali:31245] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20]
[denali:31245] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3]
[denali:31245] [ 2] ray-trace(main+0x11a) [0x805f8e8]
[denali:31245] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768]
[denali:31243] *** Process received signal ***
[denali:31243] Signal: Segmentation fault (11)
[denali:31243] Signal code: Address not mapped (1)
[denali:31243] Failing at address: 0x18
[denali:31244] *** Process received signal ***
[denali:31243] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20]
[denali:31243] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3]
[denali:31243] [ 2] ray-trace(main+0x11a) [0x805f8e8]
[denali:31243] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768]
[denali:31243] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1]
[denali:31243] *** End of error message ***
[denali:31244] Signal: Segmentation fault (11)
[denali:31244] Signal code: Address not mapped (1)
[denali:31244] Failing at address: 0x18
[denali:31245] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1]
[denali:31245] *** End of error message ***
[denali:31244] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20]
[denali:31244] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3]
[denali:31244] [ 2] ray-trace(main+0x11a) [0x805f8e8]
[denali:31244] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768]
[denali:31244] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1]
[denali:31244] *** End of error message ***
[denali:31246] *** Process received signal ***
[denali:31246] Signal: Segmentation fault (11)
[denali:31246] Signal code: Address not mapped (1)
[denali:31246] Failing at address: 0x18
[denali:31246] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20]
[denali:31246] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3]
[denali:31246] [ 2] ray-trace(main+0x11a) [0x805f8e8]
[denali:31246] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768]
[denali:31246] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1]
[denali:31246] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 31243 on node denali exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Reply via email to