Just to check the obvious: I assume that the /usr/mpi directory is not network 
mounted, and both application and OMPI code are appropriately compiled on each 
side?

There is another mpirun flag —hetero-apps that you may need to provide. It has 
been so long since someone tried this that I’d have to look to remember what it 
does.



> On Jun 1, 2015, at 7:28 AM, Steve Wise <sw...@opengridcomputing.com> wrote:
> 
> Hello,
> 
> I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where 
> one node is a PPC64 BE byte order and the other is a
> X86_64 LE byte order node.  OMPI 1.8.4 is configured with 
> --enable-heterogeneous:
> 
> ./configure --with-openib=/usr  CC=gcc CXX=g++ F77=gfortran FC=gfortran
> --enable-mpirun-prefix-by-default --prefix=/usr/mpi/gcc/openmpi-1.8.4/
> --with-openib-libdir=/usr/lib64/ --libdir=/usr/mpi/gcc/openmpi-1.8.4/lib64/
> --with-contrib-vt-flags=--disable-iotrace --enable-mpi-thread-multiple
> --with-threads=posix --enable-heterogeneous && make -j8 && make -j8 install
> 
> And the job started this way:
> 
> /usr/mpi/gcc/openmpi-1.8.4/bin/mpirun -np 2 -host
> ppc64,atlas3 --allow-run-as-root --mca btl_openib_addr_include 102.1.1.0/24
> --mca btl openib,sm,self /usr/mpi/gcc/openmpi-1.8.4/tests/IMB-3.2/IMB-MPI1
> pingpong
> 
> But we see the following error.  Note atlas3 is using the vendor ID that is 
> in the wrong byte order (0x25140000 instead of 0x1425): 
> 
> The Open MPI receive queue configuration for the OpenFabrics devices
> on two nodes are incompatible, meaning that MPI processes on two
> specific nodes were unable to communicate with each other.  This
> generally happens when you are using OpenFabrics devices from
> different vendors on the same network.  You should be able to use the
> mca_btl_openib_receive_queues MCA parameter to set a uniform receive
> queue configuration for all the devices in the MPI job, and therefore
> be able to run successfully.
> 
>  Local host:       ppc64-rhel71
>  Local adapter:    cxgb4_0 (vendor 0x1425, part ID 21505)
>  Local queues:     P,65536,64
> 
>  Remote host:      atlas3
>  Remote adapter:   (vendor 0x25140000, part ID 22282240)
>  Remote queues:   
> P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64
> 
> 
> Am I missing some OMPI parameter to allow this job to run?
> 
> Thanks,
> 
> Steve.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27010.php

Reply via email to