On Mar 1, 2012, at 10:47 PM, Barnet Wagman wrote:

> I've run into a problem upgrading from 1.4.3 to 1.4.4 or 1.4.5
> 
> With 1.4.4 and 1.4.5, I'm getting error messages like
> 
> [[59597,1],0] routed:binomial: Connection to lifeline [[59597,0],0] lost
> 
> The error does not occur if I restrict the host list to localhost.
> 
> Basic tests like 'mpirun hello_c' work properly.  The problem occurs using 
> the R package Rmpi package.  (I've tried the R mailing lists, but so far to 
> no avail.) This R package does work reliably with openmpi 1.4.3.
> 
> Could some one explain what an error message like this indicates? Is 
> something timing out? Any idea what changed after 1.4.3 that might lead to 
> this kind of problem?

Is the job completing? Usually this message appears because mpirun terminates 
before everything else does. Only concern I have is that the process that 
issued your example message is an application process, but I'm assuming it was 
running local to mpirun - yes?

If the job is completing, then you can just ignore the message. I'm not aware 
of anything that changed in the 1.4 series that would have impacted termination 
procedures, and I haven't been seeing this behavior myself (caveat: I don't run 
1.4 very often).

> 
> FYI I'm running ompi under Debian 6.0.4 (squeeze).  
> 
> thanks
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to