Open MPI List, I recently encountered an odd bug with Open MPI 1.8.1 and GCC 4.9.1 on our cluster (reported on this list), and decided to try it with 1.8.2. However, we seem to be having an issue with Open MPI 1.8.2 and SLURM. Even weirder, Open MPI 1.8.2rc4 doesn't show the bug. And the bug is: I get no stdout with Open MPI 1.8.2. That is, HelloWorld doesn't work.
To wit, our sysadmin has two tarballs: (1441) $ sha1sum openmpi-1.8.2rc4.tar.bz2 7e7496913c949451f546f22a1a159df25f8bb683 openmpi-1.8.2rc4.tar.bz2 (1442) $ sha1sum openmpi-1.8.2.tar.gz cf2b1e45575896f63367406c6c50574699d8b2e1 openmpi-1.8.2.tar.gz I then build each with a script in the method our sysadmin usually does: #!/bin/sh > set -x > export PREFIX=/discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2 > export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/nlocal/slurm/2.6.3/lib64 > build() { > echo `pwd` > ./configure --with-slurm --disable-wrapper-rpath --enable-shared > --enable-mca-no-build=btl-usnic \ > CC=gcc CXX=g++ F77=gfortran FC=gfortran \ > CFLAGS="-mtune=generic -fPIC -m64" CXXFLAGS="-mtune=generic -fPIC > -m64" FFLAGS="-mtune=generic -fPIC -m64" \ > F77FLAGS="-mtune=generic -fPIC -m64" FCFLAGS="-mtune=generic -fPIC > -m64" F90FLAGS="-mtune=generic -fPIC -m64" \ > LDFLAGS="-L/usr/nlocal/slurm/2.6.3/lib64" > CPPFLAGS="-I/usr/nlocal/slurm/2.6.3/include" LIBS="-lpciaccess" \ > --prefix=${PREFIX} 2>&1 | tee configure.1.8.2.log > make 2>&1 | tee make.1.8.2.log > make check 2>&1 | tee makecheck.1.8.2.log > make install 2>&1 | tee makeinstall.1.8.2.log > } > echo "calling build" > build > echo "exiting" The only difference between the two is '1.8.2' or '1.8.2rc4' in the PREFIX and log file tees. Now, let us test. First, I grab some nodes with slurm: $ salloc --nodes=6 --ntasks-per-node=16 --constraint=sand --time=09:00:00 > --account=g0620 --mail-type=BEGIN Once I get my nodes, I run with 1.8.2rc4: (1142) $ > /discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2rc4/bin/mpifort -o > helloWorld.182rc4.x helloWorld.F90 > (1143) $ > /discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2rc4/bin/mpirun -np 8 > ./helloWorld.182rc4.x > Process 0 of 8 is on borg01w044 > Process 5 of 8 is on borg01w044 > Process 3 of 8 is on borg01w044 > Process 7 of 8 is on borg01w044 > Process 1 of 8 is on borg01w044 > Process 2 of 8 is on borg01w044 > Process 4 of 8 is on borg01w044 > Process 6 of 8 is on borg01w044 Now 1.8.2: (1144) $ > /discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2/bin/mpifort -o > helloWorld.182.x helloWorld.F90 > (1145) $ > /discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2/bin/mpirun -np 8 > ./helloWorld.182.x > (1146) $ No output at all. But, if I take the helloWorld.x from 1.8.2 and run it with 1.8.2rc4's mpirun: (1146) $ > /discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2rc4/bin/mpirun -np 8 > ./helloWorld.182.x > Process 5 of 8 is on borg01w044 > Process 7 of 8 is on borg01w044 > Process 2 of 8 is on borg01w044 > Process 4 of 8 is on borg01w044 > Process 1 of 8 is on borg01w044 > Process 3 of 8 is on borg01w044 > Process 6 of 8 is on borg01w044 > Process 0 of 8 is on borg01w044 So...any idea what is happening here? There did seem to be a few SLURM related changes between the two tarballs involving /dev/null but it's a bit above me to decipher. You can find the ompi_info, build, make, config, etc logs at these links (they are ~300kB which is over the mailing list limit according to the Open MPI web page): https://dl.dropboxusercontent.com/u/61696/OMPI-1.8.2rc4-Output.tar.bz2 https://dl.dropboxusercontent.com/u/61696/OMPI-1.8.2-Output.tar.bz2 Thank you for any help and please let me know if you need more information, Matt -- "And, isn't sanity really just a one-trick pony anyway? I mean all you get is one trick: rational thinking. But when you're good and crazy, oooh, oooh, oooh, the sky is the limit!" -- The Tick