On Thu, Nov 30, 2017 at 6:32 PM, Jeff Squyres (jsquyres)
wrote:
> Ah, I was misled by the subject.
>
> Can you provide more information about "hangs", and your environment?
>
> You previously cited:
>
> - E5-2697A v4 CPUs and Mellanox ConnectX-3 FDR Infiniband
> - SLRUM
> - Open MPI v3.0.0
> - IMB
On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk wrote:
> I have attached my slurm job script, it will simply do an mpirun
> IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
> instance, vader is enabled.
I have tested again, with
mpirun --mca btl "^vader" IMB-MPI1
it made no
> On Dec 1, 2017, at 8:10 AM, Götz Waschk wrote:
>
> On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk wrote:
>> I have attached my slurm job script, it will simply do an mpirun
>> IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
>> instance, vader is enabled.
> I have tested a
FWIW,
pstack
Is a gdb wrapper that displays the stack trace.
PADB http://padb.pittman.org.uk is a great OSS that automatically collect the
stack traces of all the MPI tasks (and can do some grouping similar to dshbak)
Cheers,
Gilles
Noam Bernstein wrote:
>
>
>On Dec 1, 2017, at 8:10 AM, Göt
Thanks,
I've tried padb first to get stack traces. This is from IMB-MPI1
hanging after one hour, the last output was:
# Benchmarking Alltoall
# #processes = 1024
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]