Hello Lydia,
how does the call to MPI_Reduce look like in your application? Is the
code available?
Thank you,
Jelena
On Wed, 14 Feb 2007, Lydia Heck wrote:
When running either over myrinet or over gigabit one of our codes (Gagdet2)
it fails predictably with the following error message.
From the back trace it looks as if the SEGV is in
ompi_coll_tuned_reduce_generic.
Have there been similar reportings and/or is there a fix for this?
Lydia Heck
[m2042:08002] *** Process received signal ***
[m2042:08002] Signal: Segmentation Fault (11)
[m2042:08002] Signal code: Address not mapped (1)
[m2042:08002] Failing at address: 92
/opt/OMPI/ompi-1.2b4r13488/lib/libopen-pal.so.0.0.0:opal_backtrace_print+0x26
/opt/OMPI/ompi-1.2b4r13488/lib/libopen-pal.so.0.0.0:0xc3874
/lib/amd64/libc.so.1:0xcb686
/lib/amd64/libc.so.1:0xc0a52
/opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_generic+0x11b
[ Signal 11 (SEGV)]
/opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_intra_binary+0x162
/opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_intra_dec_fixed+0x28d
/opt/OMPI/ompi-1.2b4r13488/lib/libmpi.so.0.0.0:PMPI_Reduce+0x3f6
/data/4/nil/tak_gadget/gadget2/P-Gadget2:gravity_tree+0x146c
/data/4/nil/tak_gadget/gadget2/P-Gadget2:compute_accelerations+0x7e
/data/4/nil/tak_gadget/gadget2/P-Gadget2:run+0xa5
/data/4/nil/tak_gadget/gadget2/P-Gadget2:main+0x22f
/data/4/nil/tak_gadget/gadget2/P-Gadget2:0x7c3c
[m2042:08002] *** End of error message ***
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c
at line 275
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_gridengine_module.c at
line 793
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
mpirun noticed that job rank 2 with PID 0 on node m2043 exited on signal 11
(Segmentation Fault).
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c
at line 188
[m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_gridengine_module.c at
line 828
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value
Timeout instead of ORTE_SUCCESS.
------------------------------------------
Dr E L Heck
University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road
DURHAM, DH1 3LE
United Kingdom
e-mail: lydia.h...@durham.ac.uk
Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645
___________________________________________
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jelena Pjesivac-Grbovic, Pjesa
Graduate Research Assistant
Innovative Computing Laboratory
Computer Science Department, UTK
Claxton Complex 350
(865) 974 - 6722
(865) 974 - 6321
jpjes...@utk.edu
Murphy's Law of Research:
Enough research will tend to support your theory.