Did "--mca mpi_preconnect_all 1" work?
I also face this problem "readv failed: connection time out" in the production environment, and our engineer has reproduced this scenario at 20 nodes with gigabye ethernet and limit one ethernet speed to 2MB/s, then a MPI_Isend && MPI_Recv ring that means each node call MPI_Isend send data to the next node and then call MPI_Recv recv data from the prior with large size for many cycles, then we get the following error log: [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection timed out (110) our environment: Open MPI version 1.3.1, using btl tcp component. I thought it might because the network fd was set nonblocking, and the nonblocking call of connect() might be error and the epoll_wait() was wake up by the error but treat it as success and call mca_btl_tcp_endpoint_recv_handler(), the nonblocking readv() call on a failed connected fd, so it return -1, and set the errorno to 110 which means connection timed out. > From: [email protected] > Date: Tue, 20 Apr 2010 09:24:17 -0400 > To: [email protected] > Subject: Re: [OMPI users] 'readv failed: Connection timed out' issue > > On 2010-04-20, at 9:18AM, Terry Dontje wrote: > > > Hi Jonathan, > > > > Do you know what the top level function is or communication pattern? Is it > > some type of collective or a pattern that has a many to one. > > Ah, should have mentioned. The best-characterized code that we're seeing this > with is an absolutely standard (logically) regular grid hydrodynamics code, > only does nearest neighbour communication for exchanging guardcells; the Wait > in this case is, I think, just a matter of overlapping communication with > computation of the inner zones. There are things like allreduces in there, as > well, for setting timesteps, but the communication pattern is overall > extremely regular and well-behaved. > > > What might be happening is that since OMPI uses a lazy connections by > > default if all processes are trying to establish communications to the same > > process you might run into the below. > > > > You might want to see if setting "--mca mpi_preconnect_all 1" helps any. > > But beware this will cause your startup to increase. However, this might > > give us insight as to whether the problem is flooding a single rank with > > connect requests. > > I'm certainly willing to try it. > > - Jonathan > > -- > Jonathan Dursi <[email protected]> > > > > > > _______________________________________________ > users mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/users _________________________________________________________________ 一张照片的自白――Windows Live照片的可爱视频介绍 http://windowslivesky.spaces.live.com/blog/cns!5892B6048E2498BD!889.entry
