On Wed, 20 Feb 2019 10:46:10 -0500
Adam LeBlanc wrote:
> Hello,
>
> When I do a run with OpenMPI v4.0.0 on Infiniband with this command:
> mpirun --mca btl_openib_warn_no_device_params_found 0 --map-by node
> --mca orte_base_help_aggregate 0 --mca btl openib,vader,self --mca
> pml ob1 --mca btl_
I have the following, rather unusual, scenario...
I have a program running with OpenMP on a multicore computer. At one point
in the program, I want to use an external package that is written to
exploit MPI, not OpenMP, parallelism. So a (rather awkward) solution could
be to launch the program in M
> On Feb 20, 2019, at 7:14 PM, Gilles Gouaillardet wrote:
>
> Ryan,
>
> as Edgar explained, that could be a compiler issue (fwiw, I am unable to
> reproduce the bug)
Same thing, OpenMPI 3.1.3, GCC 4.8.5, and HDF5 1.10.4 make check? Just making
sure — that makes it seem like there’s something
> -Original Message-
> > Does it always occur at 20+ minutes elapsed ?
>
> Aha! Yes, you are right: every time it fails, it’s at the 20 minute and a
> couple
> of seconds mark. For comparison, every time it runs, it runs for 2-3 seconds
> total. So it seems like what might actually be hap
> On Feb 21, 2019, at 2:52 PM, Gabriel, Edgar wrote:
>
>> -Original Message-
>>> Does it always occur at 20+ minutes elapsed ?
>>
>> Aha! Yes, you are right: every time it fails, it’s at the 20 minute and a
>> couple
>> of seconds mark. For comparison, every time it runs, it runs for 2
Yes, I was talking about the same thing, although for me it was not t_mpi, but
t_shapesame that was hanging. It might be an indication of the same issue
however.
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
> Novosielski
> Sent: Thursday,
Related to this or not, I also get a hang on MVAPICH2 2.3 compiled with GCC
8.2, but on t_filters_parallel, not t_mpi. With that combo, though, I get a
segfault, or at least a message about one. It’s only “Alarm clock” on the GCC
4.8 with OpenMPI 3.1.3 combo. It also happens at the ~20 minute ma
> On Feb 20, 2019, at 7:14 PM, Gilles Gouaillardet wrote:
>
> Ryan,
>
> That being said, the "Alarm clock" message looks a bit suspicious.
>
> Does it always occur at 20+ minutes elapsed ?
>
> Is there some mechanism that automatically kills a job if it does not write
> anything to stdout for