No, just mpiexec is running.  single node.  Only see it when the test is
executed with "make check", not seeing it if I just run mpiexec -n 6
./testphdf5 by hand.

On 06/30/2016 09:58 AM, Ralph Castain wrote:
> Are the procs still alive? Is this on a single node?
> 
>> On Jun 30, 2016, at 8:49 AM, Orion Poplawski <or...@cora.nwra.com> wrote:
>>
>> I'm seeing hangs when MPI_Abort is called.  This is with openmpi 1.10.3.  
>> e.g:
>>
>> program output:
>>
>> Testing  -- big dataset test (bigdset)
>> Proc 3: *** Parallel ERROR ***
>>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
>> aborting MPI processes
>> Testing  -- big dataset test (bigdset)
>> Proc 0: *** Parallel ERROR ***
>>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
>> aborting MPI processes
>> Testing  -- big dataset test (bigdset)
>> Proc 2: *** Parallel ERROR ***
>>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
>> with errorcode 1.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> --------------------------------------------------------------------------
>> Testing  -- big dataset test (bigdset)
>> Proc 5: *** Parallel ERROR ***
>>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
>> aborting MPI processes
>> aborting MPI processes
>> Testing  -- big dataset test (bigdset)
>> Proc 1: *** Parallel ERROR ***
>>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
>> aborting MPI processes
>> Testing  -- big dataset test (bigdset)
>> Proc 4: *** Parallel ERROR ***
>>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
>> aborting MPI processes
>>
>>
>> strace of mpiexec process:
>>
>> poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN},
>> {fd=14, events=POLLIN}], 4, -1
>>
>> mpiexec 21511 orion    1w      REG        8,3    10547 17696145
>> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog
>> mpiexec 21511 orion    2w      REG        8,3    10547 17696145
>> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog
>> mpiexec 21511 orion    3u     unix 0xdaedbc80      0t0  4818918 type=STREAM
>> mpiexec 21511 orion    4u     unix 0xdaed8000      0t0  4818919 type=STREAM
>> mpiexec 21511 orion    5u  a_inode       0,11        0     8731 [eventfd]
>> mpiexec 21511 orion    6u      REG       0,38        0  4818921
>> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/dev/shm/open_mpi.0000
>> (deleted)
>> mpiexec 21511 orion    7r     FIFO       0,10      0t0  4818922 pipe
>> mpiexec 21511 orion    8w     FIFO       0,10      0t0  4818922 pipe
>> mpiexec 21511 orion    9r      DIR        8,3     4096 15471703
>> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root
>> mpiexec 21511 orion   10r      DIR       0,16        0       82
>> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/sys/firmware/devicetree/base/cpus
>> mpiexec 21511 orion   11u     IPv4    4818926      0t0      TCP *:39619 
>> (LISTEN)
>> mpiexec 21511 orion   12r     FIFO       0,10      0t0  4818927 pipe
>> mpiexec 21511 orion   13w     FIFO       0,10      0t0  4818927 pipe
>> mpiexec 21511 orion   14r     FIFO        8,3      0t0 17965730
>> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/tmp/openmpi-sessions-mockbuild@arm03-packager00_0/46622/0/debugger_attach_fifo
>>
>> Any suggestions on what to look for?  FWIW, it was a 6 process run on a 4 
>> core
>> machine.
>>
>> Thanks.
>>
>> -- 
>> Orion Poplawski
>> Technical Manager                     303-415-9701 x222
>> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
>> 3380 Mitchell Lane                       or...@nwra.com
>> Boulder, CO 80301                   http://www.nwra.com
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/06/29573.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29575.php
> 


-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       or...@nwra.com
Boulder, CO 80301                   http://www.nwra.com

Reply via email to