I should add that the problem disappears if I add a line MPI::COMM_WORLD.Barrier ()
just before the loop which frees the intercommunicators. I should not need to do this, right? On Thu, Mar 12, 2009 at 4:57 PM, Mikael Djurfeldt <mik...@djurfeldt.com> wrote: > Dear list, > > I get "Connection reset by peer" in Finalize (see log below), but > *only* if I free my intercommunicators: > > ... > for (std::vector<Connector*>::iterator connector = connectors.begin (); > connector != connectors.end (); > ++connector) > (*connector)->freeIntercomm (); > > MPI::Finalize (); > ... > > where freeIntercomm is defined: > > void > Connector::freeIntercomm () > { > intercomm.Free (); > } > > What could be the reason for this? I'm using 1.2.7~rc2-1ubuntu2. > (The problem does not occur on the other MPI implementations I've > tested.) > > [swish:10019] [ 0] /lib/libpthread.so.0 [0x7f0dc32610f0] > [swish:10019] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so > [0x7f0dbe1ed460] > [swish:10019] [ 2] > /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x670) > [0x7f0dbd79ee60] > [swish:10019] [ 3] > /usr/lib/openmpi/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b) > [0x7f0dbdfe318b] > [swish:10019] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x4a) > [0x7f0dc4248f5a] > [swish:10019] [ 5] > /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d) > [0x7f0dc189691d] > [swish:10019] [ 6] > /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x437) > [0x7f0dc189a037] > [swish:10019] [ 7] /usr/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33) > [0x7f0dc44cbd43] > [swish:10019] [ 8] > /usr/lib/openmpi/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_increment_value+0x1e2) > [0x7f0dc14826a2] > [swish:10019] [ 9] > /usr/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x2ac) > [0x7f0dc44e28fc] > [swish:10019] [10] /usr/lib/libmpi.so.0(ompi_mpi_finalize+0x111) > [0x7f0dc4733521] > [swish:10019] [11] > /home/mdj/music/trunk/src/.libs/libmusic.so.1(_ZN5MUSIC7Runtime8finalizeEv+0x7d) > [0x7f0dc4bed7ed] > [swish:10019] [12] > /home/mdj/music/trunk/test/.libs/lt-contdelay(main+0x347) [0x40a297] > [swish:10019] [13] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f0dc2efe466] > [swish:10019] [14] /home/mdj/music/trunk/test/.libs/lt-contdelay [0x409539] > [swish:10019] *** End of error message *** > [swish:10015] [0,0,0]-[0,1,1] mca_oob_tcp_msg_recv: readv failed: > Connection reset by peer (104) > mpirun noticed that job rank 0 with PID 10018 on node swish exited on > signal 15 (Terminated). > 3 additional processes aborted (not shown) >