On Mar 1, 2012, at 10:47 PM, Barnet Wagman wrote: > I've run into a problem upgrading from 1.4.3 to 1.4.4 or 1.4.5 > > With 1.4.4 and 1.4.5, I'm getting error messages like > > [[59597,1],0] routed:binomial: Connection to lifeline [[59597,0],0] lost > > The error does not occur if I restrict the host list to localhost. > > Basic tests like 'mpirun hello_c' work properly. The problem occurs using > the R package Rmpi package. (I've tried the R mailing lists, but so far to > no avail.) This R package does work reliably with openmpi 1.4.3. > > Could some one explain what an error message like this indicates? Is > something timing out? Any idea what changed after 1.4.3 that might lead to > this kind of problem?
Is the job completing? Usually this message appears because mpirun terminates before everything else does. Only concern I have is that the process that issued your example message is an application process, but I'm assuming it was running local to mpirun - yes? If the job is completing, then you can just ignore the message. I'm not aware of anything that changed in the 1.4 series that would have impacted termination procedures, and I haven't been seeing this behavior myself (caveat: I don't run 1.4 very often). > > FYI I'm running ompi under Debian 6.0.4 (squeeze). > > thanks > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users