or this could be caused by a firewall ... v1.10 and earlier uses tcp for oob, from v2.x, unix sockets are used
detecting consistent version is a good idea, printing them (mpirun, orted and a.out) can be a first step. my idea is mpirun invokes orted with '--ompi_version=x.y.z' orted checks it is running version x.y.z, and sets the OMPI_VERSION environment variable. a.out checks it is running version x.y.z /* we might have to check opal, orte and ompi versions, except orted that should not require MPI */ any thoughts ? Cheers, Gilles On Tuesday, May 17, 2016, Dave Love <d.l...@liverpool.ac.uk> wrote: > Ralph Castain <r...@open-mpi.org <javascript:;>> writes: > > > This usually indicates that the remote process is using a different OMPI > > version. You might check to ensure that the paths on the remote nodes are > > correct. > > That seems quite a common problem with non-obvious failure modes. > > Is it not possible to have a mechanism that checks the consistency of > the components and aborts in a clear way? I've never thought it out, > but it seems that some combination of OOB messages, library versioning > (at least with ELF) and environment variables might do it. > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:;> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29215.php >