I should add that the problem disappears if I add a line

  MPI::COMM_WORLD.Barrier ()

just before the loop which frees the intercommunicators.

I should not need to do this, right?

On Thu, Mar 12, 2009 at 4:57 PM, Mikael Djurfeldt <mik...@djurfeldt.com> wrote:
> Dear list,
>
> I get "Connection reset by peer" in Finalize (see log below), but
> *only* if I free my intercommunicators:
>
>    ...
>    for (std::vector<Connector*>::iterator connector = connectors.begin ();
>         connector != connectors.end ();
>         ++connector)
>      (*connector)->freeIntercomm ();
>
>    MPI::Finalize ();
>    ...
>
> where freeIntercomm is defined:
>
>  void
>  Connector::freeIntercomm ()
>  {
>    intercomm.Free ();
>  }
>
> What could be the reason for this?  I'm using 1.2.7~rc2-1ubuntu2.
> (The problem does not occur on the other MPI implementations I've
> tested.)
>
> [swish:10019] [ 0] /lib/libpthread.so.0 [0x7f0dc32610f0]
> [swish:10019] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so 
> [0x7f0dbe1ed460]
> [swish:10019] [ 2]
> /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x670)
> [0x7f0dbd79ee60]
> [swish:10019] [ 3]
> /usr/lib/openmpi/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b)
> [0x7f0dbdfe318b]
> [swish:10019] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x4a)
> [0x7f0dc4248f5a]
> [swish:10019] [ 5]
> /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d)
> [0x7f0dc189691d]
> [swish:10019] [ 6]
> /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x437)
> [0x7f0dc189a037]
> [swish:10019] [ 7] /usr/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33)
> [0x7f0dc44cbd43]
> [swish:10019] [ 8]
> /usr/lib/openmpi/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_increment_value+0x1e2)
> [0x7f0dc14826a2]
> [swish:10019] [ 9]
> /usr/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x2ac)
> [0x7f0dc44e28fc]
> [swish:10019] [10] /usr/lib/libmpi.so.0(ompi_mpi_finalize+0x111)
> [0x7f0dc4733521]
> [swish:10019] [11]
> /home/mdj/music/trunk/src/.libs/libmusic.so.1(_ZN5MUSIC7Runtime8finalizeEv+0x7d)
> [0x7f0dc4bed7ed]
> [swish:10019] [12]
> /home/mdj/music/trunk/test/.libs/lt-contdelay(main+0x347) [0x40a297]
> [swish:10019] [13] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f0dc2efe466]
> [swish:10019] [14] /home/mdj/music/trunk/test/.libs/lt-contdelay [0x409539]
> [swish:10019] *** End of error message ***
> [swish:10015] [0,0,0]-[0,1,1] mca_oob_tcp_msg_recv: readv failed:
> Connection reset by peer (104)
> mpirun noticed that job rank 0 with PID 10018 on node swish exited on
> signal 15 (Terminated).
> 3 additional processes aborted (not shown)
>

Reply via email to