Well, I checked and it looks to me like —hetero-apps is a stale option in the master at least - I don’t see where it gets used.
Looking at the code, I would suspect that something didn’t get configured correctly - either the —enable-heterogeneous flag didn’t get set on one side, or we incorrectly failed to identify the BE machine, or both. You might run ompi_info on the two sides and verify they both were built correctly > On Jun 1, 2015, at 7:40 AM, Ralph Castain <r...@open-mpi.org> wrote: > > Just to check the obvious: I assume that the /usr/mpi directory is not > network mounted, and both application and OMPI code are appropriately > compiled on each side? > > There is another mpirun flag —hetero-apps that you may need to provide. It > has been so long since someone tried this that I’d have to look to remember > what it does. > > > >> On Jun 1, 2015, at 7:28 AM, Steve Wise <sw...@opengridcomputing.com> wrote: >> >> Hello, >> >> I'm seeing an error trying to run a simple OMPI job on a 2 node cluster >> where one node is a PPC64 BE byte order and the other is a >> X86_64 LE byte order node. OMPI 1.8.4 is configured with >> --enable-heterogeneous: >> >> ./configure --with-openib=/usr CC=gcc CXX=g++ F77=gfortran FC=gfortran >> --enable-mpirun-prefix-by-default --prefix=/usr/mpi/gcc/openmpi-1.8.4/ >> --with-openib-libdir=/usr/lib64/ --libdir=/usr/mpi/gcc/openmpi-1.8.4/lib64/ >> --with-contrib-vt-flags=--disable-iotrace --enable-mpi-thread-multiple >> --with-threads=posix --enable-heterogeneous && make -j8 && make -j8 install >> >> And the job started this way: >> >> /usr/mpi/gcc/openmpi-1.8.4/bin/mpirun -np 2 -host >> ppc64,atlas3 --allow-run-as-root --mca btl_openib_addr_include 102.1.1.0/24 >> --mca btl openib,sm,self /usr/mpi/gcc/openmpi-1.8.4/tests/IMB-3.2/IMB-MPI1 >> pingpong >> >> But we see the following error. Note atlas3 is using the vendor ID that is >> in the wrong byte order (0x25140000 instead of 0x1425): >> >> The Open MPI receive queue configuration for the OpenFabrics devices >> on two nodes are incompatible, meaning that MPI processes on two >> specific nodes were unable to communicate with each other. This >> generally happens when you are using OpenFabrics devices from >> different vendors on the same network. You should be able to use the >> mca_btl_openib_receive_queues MCA parameter to set a uniform receive >> queue configuration for all the devices in the MPI job, and therefore >> be able to run successfully. >> >> Local host: ppc64-rhel71 >> Local adapter: cxgb4_0 (vendor 0x1425, part ID 21505) >> Local queues: P,65536,64 >> >> Remote host: atlas3 >> Remote adapter: (vendor 0x25140000, part ID 22282240) >> Remote queues: >> P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64 >> >> >> Am I missing some OMPI parameter to allow this job to run? >> >> Thanks, >> >> Steve. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/06/27010.php >