On May 10, 2010, at 11:00 AM, Guanyinzhu wrote:
> Did "--mca mpi_preconnect_all 1" work?
>
> I also face this problem "readv failed: connection time out" in the
> production environment, and our engineer has reproduced this scenario at 20
> nodes with gigabye ethernet and limit one ethernet sp
endpoint_recv_handler(), the nonblocking readv() call on a failed
connected fd, so it return -1, and set the errorno to 110 which means
connection timed out.
> From: ljdu...@scinet.utoronto.ca
> Date: Tue, 20 Apr 2010 09:24:17 -0400
> To: us...@open-mpi.org
> Subject: Re: [OMPI us
On Apr 20, 2010, at 8:55 AM, Jonathan Dursi wrote:
> We've got OpenMPI 1.4.1 and Intel MPI running on our 3000 node system. We
> like OpenMPI for large jobs, because the startup time is much faster (and
> startup is more reliable) than the current defaults with IntelMPI; but we're
> having so
On 2010-04-20, at 9:18AM, Terry Dontje wrote:
> Hi Jonathan,
>
> Do you know what the top level function is or communication pattern? Is it
> some type of collective or a pattern that has a many to one.
Ah, should have mentioned. The best-characterized code that we're seeing this
with is an
Hi Jonathan,
Do you know what the top level function is or communication pattern? Is
it some type of collective or a pattern that has a many to one. What
might be happening is that since OMPI uses a lazy connections by default
if all processes are trying to establish communications to the same
Hi:
We've got OpenMPI 1.4.1 and Intel MPI running on our 3000 node system. We
like OpenMPI for large jobs, because the startup time is much faster (and
startup is more reliable) than the current defaults with IntelMPI; but we're
having some pretty serious problems when the jobs are actually r