Dear all, I have recently started working on a project using OpenMPI. Basically, I have been given some c++ code, a cluster to play with and a deadline in order to make the c++ code run faster. The cluster was a bit crowded, so I started working on my laptop (g++ 4.3.3 -- Ubuntu repos, OpenMPI 1.3.2 -- compiled with no options ) and after one week I actually had something that was running on my computer, therefore decided to move to the cluster. Since the cluster is very old and it was using g++ 3.2 and an old version of OpenMPI, I decided to install both of them from source in my home folder (g++ 4.4, OpenMPI 1.3.2). The issue is that when I run the program (after being compiled flawless on the machine), I get these error messages:
[denali:30134] *** Process received signal *** [denali:30134] Signal: Segmentation fault (11) [denali:30134] Signal code: Address not mapped (1) [denali:30134] Failing at address: 0x18 (more in the attached file -- mpirun -np 4 ray-trace) All this morning, I have gone through the mailing lists, found people experiencing my problems, but their solution did not work for me. By using simple debugging (cout), I was able to determine where the error comes from: //Initialize step MPI_Init(&argc,&argv); //Here it breaks!!! Memory allocation issue! MPI_Comm_size(MPI_COMM_WORLD, &pool); std::cout<<"I'm here"<<std::endl; //this statement is never reached MPI_Comm_rank(MPI_COMM_WORLD, &myid); When trying to debug via gdb, the problem seems to be: Program received signal SIGSEGV, Segmentation fault. 0xb7524772 in ompi_comm_invalid (comm=Could not find the frame base for "ompi_comm_invalid".) at communicator.h:261 261 communicator.h: No such file or directory. in communicator.h which might indicate a problem with paths. For now, my LD_LIBRARY_PATH is set to "/users/cluster/cdavid/local/lib/" (the local folder in my home folder emulates the directory structure of the / folder). Moreover, I wanted to see if the installation is actually ok and I tried running this program: http://en.wikipedia.org/wiki/Message_Passing_Interface#Example_program with exactly the same results; the code breaks when the memory address of variable pool is referenced. So, if you have any ideas or you think I might have missed something, please let me know. Thanks, Catalin -- ****************************** Catalin David B.Sc. Computer Science 2010 Jacobs University Bremen Phone: +49-(0)1577-49-38-667 College Ring 4, #343 Bremen, 28759 Germany ******************************
rm -f bin/*.o rm -f src/*~ mpicxx -Wall -g -c -o bin/building.o src/building.cpp mpicxx -Wall -g -c -o bin/complex.o src/complex.cpp mpicxx -Wall -g -c -o bin/ComplexVector.o src/ComplexVector.cpp mpicxx -Wall -g -c -o bin/environment.o src/environment.cpp mpicxx -Wall -g -c -o bin/face.o src/face.cpp mpicxx -Wall -g -c -o bin/image.o src/image.cpp mpicxx -Wall -g -c -o bin/point.o src/point.cpp mpicxx -Wall -g -c -o bin/ray.o src/ray.cpp mpicxx -Wall -g -c -o bin/raylist.o src/raylist.cpp mpicxx -Wall -g -c -o bin/raynode.o src/raynode.cpp mpicxx -Wall -g -c -o bin/ray-trace.o src/ray-trace.cpp mpicxx -Wall -g -c -o bin/segment.o src/segment.cpp mpicxx -Wall -g -o bin/ray-trace bin/building.o bin/complex.o bin/ComplexVector.o bin/environment.o bin/face.o bin/image.o bin/point.o bin/ray.o bin/raylist.o bin/raynode.o bin/ray-trace.o bin/segment.o [denali:31245] *** Process received signal *** [denali:31245] Signal: Segmentation fault (11) [denali:31245] Signal code: Address not mapped (1) [denali:31245] Failing at address: 0x18 [denali:31245] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20] [denali:31245] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3] [denali:31245] [ 2] ray-trace(main+0x11a) [0x805f8e8] [denali:31245] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768] [denali:31243] *** Process received signal *** [denali:31243] Signal: Segmentation fault (11) [denali:31243] Signal code: Address not mapped (1) [denali:31243] Failing at address: 0x18 [denali:31244] *** Process received signal *** [denali:31243] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20] [denali:31243] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3] [denali:31243] [ 2] ray-trace(main+0x11a) [0x805f8e8] [denali:31243] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768] [denali:31243] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1] [denali:31243] *** End of error message *** [denali:31244] Signal: Segmentation fault (11) [denali:31244] Signal code: Address not mapped (1) [denali:31244] Failing at address: 0x18 [denali:31245] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1] [denali:31245] *** End of error message *** [denali:31244] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20] [denali:31244] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3] [denali:31244] [ 2] ray-trace(main+0x11a) [0x805f8e8] [denali:31244] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768] [denali:31244] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1] [denali:31244] *** End of error message *** [denali:31246] *** Process received signal *** [denali:31246] Signal: Segmentation fault (11) [denali:31246] Signal code: Address not mapped (1) [denali:31246] Failing at address: 0x18 [denali:31246] [ 0] /lib/tls/libpthread.so.0 [0x403b8d20] [denali:31246] [ 1] /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x103) [0x400a48b3] [denali:31246] [ 2] ray-trace(main+0x11a) [0x805f8e8] [denali:31246] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x403d3768] [denali:31246] [ 4] ray-trace(_ZNSt8ios_base4InitD1Ev+0x35) [0x804f1a1] [denali:31246] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 31243 on node denali exited on signal 11 (Segmentation fault). --------------------------------------------------------------------------