Related to this or not, I also get a hang on MVAPICH2 2.3 compiled with GCC 
8.2, but on t_filters_parallel, not t_mpi. With that combo, though, I get a 
segfault, or at least a message about one. It’s only “Alarm clock” on the GCC 
4.8 with OpenMPI 3.1.3 combo. It also happens at the ~20 minute mark, FWIW.

Testing  t_filters_parallel
============================
 t_filters_parallel  Test Log
============================
srun: job 84117363 queued and waiting for resources
srun: job 84117363 has been allocated resources
[slepner063.amarel.rutgers.edu:mpi_rank_0][error_sighandler] Caught error: 
Segmentation fault (signal 11)
srun: error: slepner063: task 0: Segmentation fault
srun: error: slepner063: tasks 1-3: Alarm clock
0.01user 0.01system 20:01.44elapsed 0%CPU (0avgtext+0avgdata 5144maxresident)k
0inputs+0outputs (0major+1524minor)pagefaults 0swaps
make[4]: *** [t_filters_parallel.chkexe_] Error 1
make[4]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[3]: *** [build-check-p] Error 1
make[3]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[2]: *** [test] Error 2
make[2]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory 
`/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make: *** [check-recursive] Error 1

> On Feb 21, 2019, at 3:03 PM, Gabriel, Edgar <egabr...@central.uh.edu> wrote:
> 
> Yes, I was talking about the same thing, although for me it was not t_mpi, 
> but t_shapesame that was hanging. It might be an indication of the same issue 
> however.
> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
>> Novosielski
>> Sent: Thursday, February 21, 2019 1:59 PM
>> To: Open MPI Users <users@lists.open-mpi.org>
>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>> 3.1.3
>> 
>> 
>>> On Feb 21, 2019, at 2:52 PM, Gabriel, Edgar <egabr...@central.uh.edu>
>> wrote:
>>> 
>>>> -----Original Message-----
>>>>> Does it always occur at 20+ minutes elapsed ?
>>>> 
>>>> Aha! Yes, you are right: every time it fails, it’s at the 20 minute
>>>> and a couple of seconds mark. For comparison, every time it runs, it
>>>> runs for 2-3 seconds total. So it seems like what might actually be
>>>> happening here is a hang, and not a failure of the test per se.
>>>> 
>>> 
>>> I *think* I can confirm that. I compiled 3.1.3 yesterday with gcc 4.8
>> (although this was OpenSuSE, not Redhat), and it looked to me like one of
>> tests were hanging, but I didn't have time to investigate it further.
>> 
>> Just to be clear, the hanging test I have is t_mpi from HDF5 1.10.4. The
>> OpenMPI 3.1.3 make check passes just fine on all of our builds. But I don’t
>> believe it ever launches any jobs or anything like that.
>> 
>> --
>> ____
>> || \\UTGERS,          
>> |---------------------------*O*---------------------------
>> ||_// the State       |         Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>> ||  \\    of NJ       | Office of Advanced Research Computing - MSB C630,
>> Newark
>>     `'
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to