Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where one node is a PPC64 BE byte order and the other is a X86_64 LE byte order node. OMPI 1.8.4 is configured with --enable-heterogeneous:
./configure --with-openib=/usr CC=gcc CXX=g++ F77=gfortran FC=gfortran --enable-mpirun-prefix-by-default --prefix=/usr/mpi/gcc/openmpi-1.8.4/ --with-openib-libdir=/usr/lib64/ --libdir=/usr/mpi/gcc/openmpi-1.8.4/lib64/ --with-contrib-vt-flags=--disable-iotrace --enable-mpi-thread-multiple --with-threads=posix --enable-heterogeneous && make -j8 && make -j8 install And the job started this way: /usr/mpi/gcc/openmpi-1.8.4/bin/mpirun -np 2 -host ppc64,atlas3 --allow-run-as-root --mca btl_openib_addr_include 102.1.1.0/24 --mca btl openib,sm,self /usr/mpi/gcc/openmpi-1.8.4/tests/IMB-3.2/IMB-MPI1 pingpong But we see the following error. Note atlas3 is using the vendor ID that is in the wrong byte order (0x25140000 instead of 0x1425): The Open MPI receive queue configuration for the OpenFabrics devices on two nodes are incompatible, meaning that MPI processes on two specific nodes were unable to communicate with each other. This generally happens when you are using OpenFabrics devices from different vendors on the same network. You should be able to use the mca_btl_openib_receive_queues MCA parameter to set a uniform receive queue configuration for all the devices in the MPI job, and therefore be able to run successfully. Local host: ppc64-rhel71 Local adapter: cxgb4_0 (vendor 0x1425, part ID 21505) Local queues: P,65536,64 Remote host: atlas3 Remote adapter: (vendor 0x25140000, part ID 22282240) Remote queues: P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64 Am I missing some OMPI parameter to allow this job to run? Thanks, Steve.