The line

Signal code: Address not mapped (1)

indicates that there is probably a mismatch between the runtime library and the linked version. Make sure that you link the program and run it using the same installation base. Are the libraries in / usr/mpi/fsl_openmpi_gcc_1.2.6 the same you used at link time?

On Sep 19, 2008, at 2:42 PM, Daniel Hansen wrote:

I work for a supercomputing organization and we just installed the latest version of rocks/centos on our cluster. We compiled openmpi from source to customize it for our purposes. Since switching we have receive messages from users about errors, segfaults, etc. that we didn't see before. Here is one such segfault message that I don't have enough knowledge to figure out or even have a clue about what is going on. Here it is:

[m4b-1-8:11830] *** Process received signal ***
[m4b-1-8:11830] Signal: Segmentation fault (11)
[m4b-1-8:11830] Signal code: Address not mapped (1)
[m4b-1-8:11830] Failing at address: 0x2abcdff475b0
[m4b-1-8:11830] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
[m4b-1-8:11830] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_btl_sm_send+0xf1) [0x2aaaaab541d1] [m4b-1-8:11830] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e] [m4b-1-8:11830] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37] [m4b-1-8:11830] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa] [m4b-1-8:11830] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f] [m4b-1-8:11830] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (PMPI_Barrier+0x6f) [0x2aaaaab1eadf] [m4b-1-8:11830] [ 7] /fslhome/wshuai/compute/for_Shuai2/ mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179] [m4b-1-8:11830] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x33e841d8a4] [m4b-1-8:11830] [ 9] /fslhome/wshuai/compute/for_Shuai2/ mpi_md_bgo_twham_12sept08_debug [0x404109]
[m4b-1-8:11830] *** End of error message ***
[m4b-1-9:11772] *** Process received signal ***
[m4b-1-9:11772] Signal: Segmentation fault (11)
[m4b-1-9:11772] Signal code: Address not mapped (1)
[m4b-1-9:11772] Failing at address: 0x2abcdff475b0
[m4b-1-9:11772] [ 0] /lib64/libpthread.so.0 [0x302380de70]
[m4b-1-9:11772] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_btl_sm_send+0xf1) [0x2aaaaab541d1] [m4b-1-9:11772] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e] [m4b-1-9:11772] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37] [m4b-1-9:11772] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa] [m4b-1-9:11772] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f] [m4b-1-9:11772] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (PMPI_Barrier+0x6f) [0x2aaaaab1eadf] [m4b-1-9:11772] [ 7] /fslhome/wshuai/compute/for_Shuai2/ mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179] [m4b-1-9:11772] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x302301d8a4] [m4b-1-9:11772] [ 9] /fslhome/wshuai/compute/for_Shuai2/ mpi_md_bgo_twham_12sept08_debug [0x404109]
[m4b-1-9:11772] *** End of error message ***
[m4b-1-7i][0,1,7][btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111 [m4b-1-7i][0,1,8][btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111 [m4b-1-7i][0,1,9][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104 [m4b-1-7i][0,1,9][btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111
[m4b-1-9:11773] *** Process received signal ***
[m4b-1-9:11773] Signal: Segmentation fault (11)
[m4b-1-9:11773] Signal code: Address not mapped (1)
[m4b-1-9:11773] Failing at address: 0x2abcdff475b0
[m4b-1-9:11773] [ 0] /lib64/libpthread.so.0 [0x302380de70]
[m4b-1-9:11773] [ 1] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_btl_sm_send+0xf1) [0x2aaaaab541d1] [m4b-1-9:11773] [ 2] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_pml_ob1_send_request_start_copy+0x15e) [0x2aaaaaba0e2e] [m4b-1-9:11773] [ 3] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (mca_pml_ob1_isend+0x217) [0x2aaaaab9be37] [m4b-1-9:11773] [ 4] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (ompi_coll_tuned_sendrecv_actual+0xda) [0x2aaaaab5acaa] [m4b-1-9:11773] [ 5] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (ompi_coll_tuned_barrier_intra_bruck+0x9f) [0x2aaaaab5f81f] [m4b-1-9:11773] [ 6] /usr/mpi/fsl_openmpi_gcc_1.2.6/lib/libmpi.so.0 (PMPI_Barrier+0x6f) [0x2aaaaab1eadf] [m4b-1-9:11773] [ 7] /fslhome/wshuai/compute/for_Shuai2/ mpi_md_bgo_twham_12sept08_debug(main+0x5d9) [0x413179] [m4b-1-9:11773] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x302301d8a4] [m4b-1-9:11773] [ 9] /fslhome/wshuai/compute/for_Shuai2/ mpi_md_bgo_twham_12sept08_debug [0x404109]
[m4b-1-9:11773] *** End of error message ***
orterun noticed that job rank 0 with PID 12338 on node m4b-1-10i exited on signal 15 (Terminated).

Can someone give me some clues as to what is going wrong here or possibly point me in the right direction? Is there something I or the user can do to get more informative error messages? The user mentioned that this particular program ran fine before we upgraded to the current openmpi version, and that he can't find any bugs in his code.

Thanks for your help,

Daniel Hansen
Systems Administrator
BYU Fulton Supercomputing Lab
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to