Hi Marcin,

what version of Open MPI did you use?
Is it still occurring?
It is also possible that the connection went down during the execution... although, a segfault really should not occur.

Thanks,
Jelena

On Tue, 29 May 2007, Marcin Skoczylas wrote:

hello,

recently my administrator made some changes on our cluster and now I
have a crash during MPI_Barrier:

[our-host:12566] *** Process received signal ***
[our-host:12566] Signal: Segmentation fault (11)
[our-host:12566] Signal code: Address not mapped (1)
[our-host:12566] Failing at address: 0x4
[our-host:12566] [ 0] /lib/tls/libpthread.so.0 [0xa22f80]
[our-host:12566] [ 1]
/usr/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x68f)
[0xcd86d7]
[our-host:12566] [ 2]
/usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x32) [0xcb7e3a]
[our-host:12566] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0xed)
[0xc2b221]
[our-host:12566] [ 4] /usr/lib/libmpi.so.0 [0x3aecc5]
[our-host:12566] [ 5] /usr/lib/libmpi.so.0(ompi_request_wait_all+0xec)
[0x3ae784]
[our-host:12566] [ 6]
/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual+0x77)
[0xd025bb]
[our-host:12566] [ 7]
/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_recursivedoubling+0xde)
[0xd05e3a]
[our-host:12566] [ 8]
/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_dec_fixed+0x44)
[0xd027d8]
[our-host:12566] [ 9] /usr/lib/libmpi.so.0(PMPI_Barrier+0x176) [0x3c0cea]

Actually, I made small investigation and I realised that:

[user@our-host]$ ssh our-host
ssh(12704) ssh: connect to host our-host port 22: No route to host

that could be the thing, I'm going to talk with my admin soon about this
routing change, however if it is really this problem, shouldn't it be
recognised during startup, f.e. in MPI_Init? Actually, I'm not sure...
your comments?

            greetings, Marcin


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jelena Pjesivac-Grbovic, Pjesa
Graduate Research Assistant
Innovative Computing Laboratory
Computer Science Department, UTK
Claxton Complex 350
(865) 974 - 6722 (865) 974 - 6321
jpjes...@utk.edu

Murphy's Law of Research:
        Enough research will tend to support your theory.

Reply via email to