or this could be caused by a firewall ...
v1.10 and earlier uses tcp for  oob,
from v2.x, unix sockets are used

detecting consistent version is a good idea,
printing them (mpirun, orted and a.out) can be a first step.

my idea is
mpirun invokes orted with '--ompi_version=x.y.z'
orted checks it is running version x.y.z, and sets the OMPI_VERSION
environment variable.
a.out checks it is running version x.y.z
/* we might have to check opal, orte and ompi versions, except orted that
should not require MPI */

any thoughts ?

Cheers,

Gilles



On Tuesday, May 17, 2016, Dave Love <d.l...@liverpool.ac.uk> wrote:

> Ralph Castain <r...@open-mpi.org <javascript:;>> writes:
>
> > This usually indicates that the remote process is using a different OMPI
> > version. You might check to ensure that the paths on the remote nodes are
> > correct.
>
> That seems quite a common problem with non-obvious failure modes.
>
> Is it not possible to have a mechanism that checks the consistency of
> the components and aborts in a clear way?  I've never thought it out,
> but it seems that some combination of OOB messages, library versioning
> (at least with ELF) and environment variables might do it.
> _______________________________________________
> users mailing list
> us...@open-mpi.org <javascript:;>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29215.php
>

Reply via email to