The line
Signal code: Address not mapped (1)
indicates that there is probably a mismatch between the runtime
library and the linked version. Make sure that you link the program
and run it using the same installation base. Are the libraries in /
usr/mpi/fsl_openmpi_gcc_1.2.6 the same you used at link time?
On Sep 19, 2008, at 2:42 PM, Daniel Hansen wrote:
I work for a supercomputing organization and we just installed the
latest version of rocks/centos on our cluster. We compiled openmpi
from source to customize it for our purposes. Since switching we
have receive messages from users about errors, segfaults, etc. that
we didn't see before. Here is one such segfault message that I
don't have enough knowledge to figure out or even have a clue about
what is going on. Here it is:
[m4b-1-8:11830] *** Process received signal ***
[m4b-1-8:11830] Signal: Segmentation fault (11)
[m4b-1-8:11830] Signal code: Address not mapped (1)
[m4b-1-8:11830] Failing at address: 0x2abcdff475b0
[m4b-1-8:11830] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
[m4b-1-8:11830] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
[m4b-1-8:11830] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
[m4b-1-8:11830] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
[m4b-1-8:11830] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
[m4b-1-8:11830] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
[m4b-1-8:11830] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
[m4b-1-8:11830] [ 7] /fslhome/wshuai/compute/for_Shuai2/
mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
[m4b-1-8:11830] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x33e841d8a4]
[m4b-1-8:11830] [ 9] /fslhome/wshuai/compute/for_Shuai2/
mpi_md_bgo_twham_12sept08_debug [0x404109]
[m4b-1-8:11830] *** End of error message ***
[m4b-1-9:11772] *** Process received signal ***
[m4b-1-9:11772] Signal: Segmentation fault (11)
[m4b-1-9:11772] Signal code: Address not mapped (1)
[m4b-1-9:11772] Failing at address: 0x2abcdff475b0
[m4b-1-9:11772] [ 0] /lib64/libpthread.so.0 [0x302380de70]
[m4b-1-9:11772] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
[m4b-1-9:11772] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
[m4b-1-9:11772] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
[m4b-1-9:11772] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
[m4b-1-9:11772] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
[m4b-1-9:11772] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
[m4b-1-9:11772] [ 7] /fslhome/wshuai/compute/for_Shuai2/
mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
[m4b-1-9:11772] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x302301d8a4]
[m4b-1-9:11772] [ 9] /fslhome/wshuai/compute/for_Shuai2/
mpi_md_bgo_twham_12sept08_debug [0x404109]
[m4b-1-9:11772] *** End of error message ***
[m4b-1-7i][0,1,7][btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
errno=111
[m4b-1-7i][0,1,8][btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
errno=111
[m4b-1-7i][0,1,9][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=104
[m4b-1-7i][0,1,9][btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
errno=111
[m4b-1-9:11773] *** Process received signal ***
[m4b-1-9:11773] Signal: Segmentation fault (11)
[m4b-1-9:11773] Signal code: Address not mapped (1)
[m4b-1-9:11773] Failing at address: 0x2abcdff475b0
[m4b-1-9:11773] [ 0] /lib64/libpthread.so.0 [0x302380de70]
[m4b-1-9:11773] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_btl_sm_send+0xf1) [0x2aaaaab541d1]
[m4b-1-9:11773] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e]
[m4b-1-9:11773] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(mca_pml_ob1_isend+0x217) [0x2aaaaab9be37]
[m4b-1-9:11773] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa]
[m4b-1-9:11773] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f]
[m4b-1-9:11773] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0
(PMPI_Barrier+0x6f) [0x2aaaaab1eadf]
[m4b-1-9:11773] [ 7] /fslhome/wshuai/compute/for_Shuai2/
mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179]
[m4b-1-9:11773] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x302301d8a4]
[m4b-1-9:11773] [ 9] /fslhome/wshuai/compute/for_Shuai2/
mpi_md_bgo_twham_12sept08_debug [0x404109]
[m4b-1-9:11773] *** End of error message ***
orterun noticed that job rank 0 with PID 12338 on node m4b-1-10i
exited on signal 15 (Terminated).
Can someone give me some clues as to what is going wrong here or
possibly point me in the right direction? Is there something I or
the user can do to get more informative error messages? The user
mentioned that this particular program ran fine before we upgraded
to the current openmpi version, and that he can't find any bugs in
his code.
Thanks for your help,
Daniel Hansen
Systems Administrator
BYU Fulton Supercomputing Lab
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users