No, just mpiexec is running. single node. Only see it when the test is executed with "make check", not seeing it if I just run mpiexec -n 6 ./testphdf5 by hand.
On 06/30/2016 09:58 AM, Ralph Castain wrote: > Are the procs still alive? Is this on a single node? > >> On Jun 30, 2016, at 8:49 AM, Orion Poplawski <or...@cora.nwra.com> wrote: >> >> I'm seeing hangs when MPI_Abort is called. This is with openmpi 1.10.3. >> e.g: >> >> program output: >> >> Testing -- big dataset test (bigdset) >> Proc 3: *** Parallel ERROR *** >> VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c >> aborting MPI processes >> Testing -- big dataset test (bigdset) >> Proc 0: *** Parallel ERROR *** >> VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c >> aborting MPI processes >> Testing -- big dataset test (bigdset) >> Proc 2: *** Parallel ERROR *** >> VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD >> with errorcode 1. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> Testing -- big dataset test (bigdset) >> Proc 5: *** Parallel ERROR *** >> VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c >> aborting MPI processes >> aborting MPI processes >> Testing -- big dataset test (bigdset) >> Proc 1: *** Parallel ERROR *** >> VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c >> aborting MPI processes >> Testing -- big dataset test (bigdset) >> Proc 4: *** Parallel ERROR *** >> VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c >> aborting MPI processes >> >> >> strace of mpiexec process: >> >> poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}, >> {fd=14, events=POLLIN}], 4, -1 >> >> mpiexec 21511 orion 1w REG 8,3 10547 17696145 >> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog >> mpiexec 21511 orion 2w REG 8,3 10547 17696145 >> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog >> mpiexec 21511 orion 3u unix 0xdaedbc80 0t0 4818918 type=STREAM >> mpiexec 21511 orion 4u unix 0xdaed8000 0t0 4818919 type=STREAM >> mpiexec 21511 orion 5u a_inode 0,11 0 8731 [eventfd] >> mpiexec 21511 orion 6u REG 0,38 0 4818921 >> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/dev/shm/open_mpi.0000 >> (deleted) >> mpiexec 21511 orion 7r FIFO 0,10 0t0 4818922 pipe >> mpiexec 21511 orion 8w FIFO 0,10 0t0 4818922 pipe >> mpiexec 21511 orion 9r DIR 8,3 4096 15471703 >> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root >> mpiexec 21511 orion 10r DIR 0,16 0 82 >> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/sys/firmware/devicetree/base/cpus >> mpiexec 21511 orion 11u IPv4 4818926 0t0 TCP *:39619 >> (LISTEN) >> mpiexec 21511 orion 12r FIFO 0,10 0t0 4818927 pipe >> mpiexec 21511 orion 13w FIFO 0,10 0t0 4818927 pipe >> mpiexec 21511 orion 14r FIFO 8,3 0t0 17965730 >> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/tmp/openmpi-sessions-mockbuild@arm03-packager00_0/46622/0/debugger_attach_fifo >> >> Any suggestions on what to look for? FWIW, it was a 6 process run on a 4 >> core >> machine. >> >> Thanks. >> >> -- >> Orion Poplawski >> Technical Manager 303-415-9701 x222 >> NWRA, Boulder/CoRA Office FAX: 303-415-9702 >> 3380 Mitchell Lane or...@nwra.com >> Boulder, CO 80301 http://www.nwra.com >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/06/29573.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29575.php > -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane or...@nwra.com Boulder, CO 80301 http://www.nwra.com