I'm seeing hangs when MPI_Abort is called. This is with openmpi 1.10.3. e.g:
program output: Testing -- big dataset test (bigdset) Proc 3: *** Parallel ERROR *** VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c aborting MPI processes Testing -- big dataset test (bigdset) Proc 0: *** Parallel ERROR *** VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c aborting MPI processes Testing -- big dataset test (bigdset) Proc 2: *** Parallel ERROR *** VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- Testing -- big dataset test (bigdset) Proc 5: *** Parallel ERROR *** VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c aborting MPI processes aborting MPI processes Testing -- big dataset test (bigdset) Proc 1: *** Parallel ERROR *** VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c aborting MPI processes Testing -- big dataset test (bigdset) Proc 4: *** Parallel ERROR *** VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c aborting MPI processes strace of mpiexec process: poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=14, events=POLLIN}], 4, -1 mpiexec 21511 orion 1w REG 8,3 10547 17696145 /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog mpiexec 21511 orion 2w REG 8,3 10547 17696145 /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog mpiexec 21511 orion 3u unix 0xdaedbc80 0t0 4818918 type=STREAM mpiexec 21511 orion 4u unix 0xdaed8000 0t0 4818919 type=STREAM mpiexec 21511 orion 5u a_inode 0,11 0 8731 [eventfd] mpiexec 21511 orion 6u REG 0,38 0 4818921 /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/dev/shm/open_mpi.0000 (deleted) mpiexec 21511 orion 7r FIFO 0,10 0t0 4818922 pipe mpiexec 21511 orion 8w FIFO 0,10 0t0 4818922 pipe mpiexec 21511 orion 9r DIR 8,3 4096 15471703 /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root mpiexec 21511 orion 10r DIR 0,16 0 82 /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/sys/firmware/devicetree/base/cpus mpiexec 21511 orion 11u IPv4 4818926 0t0 TCP *:39619 (LISTEN) mpiexec 21511 orion 12r FIFO 0,10 0t0 4818927 pipe mpiexec 21511 orion 13w FIFO 0,10 0t0 4818927 pipe mpiexec 21511 orion 14r FIFO 8,3 0t0 17965730 /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/tmp/openmpi-sessions-mockbuild@arm03-packager00_0/46622/0/debugger_attach_fifo Any suggestions on what to look for? FWIW, it was a 6 process run on a 4 core machine. Thanks. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane or...@nwra.com Boulder, CO 80301 http://www.nwra.com