When running either over myrinet or over gigabit one of our codes (Gagdet2)
it fails predictably with the following error message.
>From the back trace it looks as if the SEGV is in
ompi_coll_tuned_reduce_generic.

Have there been similar reportings and/or is there a fix for this?

Lydia Heck


[m2042:08002] *** Process received signal ***
[m2042:08002] Signal: Segmentation Fault (11)
[m2042:08002] Signal code: Address not mapped (1)
[m2042:08002] Failing at address: 92
/opt/OMPI/ompi-1.2b4r13488/lib/libopen-pal.so.0.0.0:opal_backtrace_print+0x26
/opt/OMPI/ompi-1.2b4r13488/lib/libopen-pal.so.0.0.0:0xc3874
/lib/amd64/libc.so.1:0xcb686
/lib/amd64/libc.so.1:0xc0a52
/opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_generic+0x11b
[ Signal 11 (SEGV)]
/opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_intra_binary+0x162
/opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_intra_dec_fixed+0x28d
/opt/OMPI/ompi-1.2b4r13488/lib/libmpi.so.0.0.0:PMPI_Reduce+0x3f6
/data/4/nil/tak_gadget/gadget2/P-Gadget2:gravity_tree+0x146c
/data/4/nil/tak_gadget/gadget2/P-Gadget2:compute_accelerations+0x7e
/data/4/nil/tak_gadget/gadget2/P-Gadget2:run+0xa5
/data/4/nil/tak_gadget/gadget2/P-Gadget2:main+0x22f
/data/4/nil/tak_gadget/gadget2/P-Gadget2:0x7c3c
[m2042:08002] *** End of error message ***
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c
at line 275
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_gridengine_module.c at
line 793
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
mpirun noticed that job rank 2 with PID 0 on node m2043 exited on signal 11
(Segmentation Fault).
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c
at line 188
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_gridengine_module.c at
line 828
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value
Timeout instead of ORTE_SUCCESS.




------------------------------------------
Dr E L  Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

DURHAM, DH1 3LE
United Kingdom

e-mail: lydia.h...@durham.ac.uk

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645
___________________________________________

Reply via email to