Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Götz Waschk
On Thu, Nov 30, 2017 at 6:32 PM, Jeff Squyres (jsquyres)
 wrote:
> Ah, I was misled by the subject.
>
> Can you provide more information about "hangs", and your environment?
>
> You previously cited:
>
> - E5-2697A v4 CPUs and Mellanox ConnectX-3 FDR Infiniband
> - SLRUM
> - Open MPI v3.0.0
> - IMB-MPI1
>
> Can you send the information listed here:
>
> https://www.open-mpi.org/community/help/
>
> BTW, the fact that you fixed the last error by growing the tmpdir size 
> (admittedly: we should probably have a better error message here, and 
> shouldn't just segv like you were seeing -- I'll open a bug on that), you can 
> probably remove "--mca btl ^vader" or other similar CLI options.  vader and 
> sm were [probably?] failing due to the memory-mapped files on the filesystem 
> running out of space and Open MPI not handling it well.  Meaning: in general, 
> you don't want to turn off shared memory support, because that will likely 
> always be the fastest for on-node communication.
Hi Jeff,

yes, it was wrong to simply close the issue with openmpi 1.10. But now
about the current problem:

I am using the packages provided by OpenHPC, so I didn't build openmpi
myself and don't have config.log. The package version is
openmpi3-gnu7-ohpc-3.0.0-35.1.x86_64.
Attached is the output of ompi_info --all.
The FAQ entry must be outdated, as this happened:
% ompi_info -v ompi full --parsable
ompi_info: Error: unknown option "-v"
Type 'ompi_info --help' for usage.

I have attached my slurm job script, it will simply do an mpirun
IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
instance, vader is enabled.

The bug's effect is that the program will provide standard output for
over 30 minutes, then all processes will keep running with 100% CPU
until they are killed by the slurm job limit (2 hours in the example).

The Infiniband network seems to be working fine. I'm using Red Hat's
OFED from RHEL7.4 (it really is Scientific Linux 7.4). I am running
opensm on one of the nodes.


Regards, Götz


ompi_info.txt.bz2
Description: BZip2 compressed data


slurm-mpitest-openmpi3.job
Description: Binary data


slurm-2715.out.bz2
Description: BZip2 compressed data
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Götz Waschk
On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk  wrote:
> I have attached my slurm job script, it will simply do an mpirun
> IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
> instance, vader is enabled.
I have tested again, with
mpirun --mca btl "^vader" IMB-MPI1
it made no difference.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Noam Bernstein

> On Dec 1, 2017, at 8:10 AM, Götz Waschk  wrote:
> 
> On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk  wrote:
>> I have attached my slurm job script, it will simply do an mpirun
>> IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
>> instance, vader is enabled.
> I have tested again, with
>mpirun --mca btl "^vader" IMB-MPI1
> it made no difference.

I’ve lost track of the earlier parts of this thread, but has anyone suggested 
logging into the nodes it’s running on, doing “gdb -p PID” for each of the mpi 
processes, and doing “where” to see where it’s hanging?

I use this script (trace_all), which depends on a variable process that is a 
grep regexp that matches the mpi executable:
echo "where" > /tmp/gf

pids=`ps aux | grep $process | grep -v grep | grep -v trace_all | awk '{print 
\$2}'`
for pid in $pids; do
   echo $pid
   prog=`ps auxw | grep " $pid " | grep -v grep | awk '{print $11}'`
   gdb -x /tmp/gf -batch $prog $pid
   echo ""
done

smime.p7s
Description: S/MIME cryptographic signature
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Gilles Gouaillardet
FWIW,

pstack 
Is a gdb wrapper that displays the stack trace.

PADB http://padb.pittman.org.uk is a great OSS that automatically collect the 
stack traces of all the MPI tasks (and can do some grouping similar to dshbak)

Cheers,

Gilles

Noam Bernstein  wrote:
>
>
>On Dec 1, 2017, at 8:10 AM, Götz Waschk  wrote:
>
>
>On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk  wrote:
>
>I have attached my slurm job script, it will simply do an mpirun
>IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
>instance, vader is enabled.
>
>I have tested again, with
>   mpirun --mca btl "^vader" IMB-MPI1
>it made no difference.
>
>
>I’ve lost track of the earlier parts of this thread, but has anyone suggested 
>logging into the nodes it’s running on, doing “gdb -p PID” for each of the mpi 
>processes, and doing “where” to see where it’s hanging?
>
>
>I use this script (trace_all), which depends on a variable process that is a 
>grep regexp that matches the mpi executable:
>
>echo "where" > /tmp/gf
>
>
>pids=`ps aux | grep $process | grep -v grep | grep -v trace_all | awk '{print 
>\$2}'`
>
>for pid in $pids; do
>
>   echo $pid
>
>   prog=`ps auxw | grep " $pid " | grep -v grep | awk '{print $11}'`
>
>   gdb -x /tmp/gf -batch $prog $pid
>
>   echo ""
>
>done
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Götz Waschk
Thanks,

I've tried padb first to get stack traces. This is from IMB-MPI1
hanging after one hour, the last output was:
# Benchmarking Alltoall
# #processes = 1024
#
   #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
0 1000 0.04 0.09 0.05
1 1000   253.40   335.35   293.06
2 1000   266.93   346.65   306.23
4 1000   303.52   382.41   342.21
8 1000   383.89   493.56   439.34
   16 1000   501.27   627.84   569.80
   32 1000  1039.65  1259.70  1163.12
   64 1000  1710.12  2071.47  1910.62
  128 1000  3051.68  3653.44  3398.65

On Fri, Dec 1, 2017 at 4:23 PM, Gilles Gouaillardet
 wrote:
> FWIW,
>
> pstack 
> Is a gdb wrapper that displays the stack trace.
>
> PADB http://padb.pittman.org.uk is a great OSS that automatically collect
> the stack traces of all the MPI tasks (and can do some grouping similar to
> dshbak)
>
> Cheers,
>
> Gilles
>
>
> Noam Bernstein  wrote:
>
> On Dec 1, 2017, at 8:10 AM, Götz Waschk  wrote:
>
> On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk  wrote:
>
> I have attached my slurm job script, it will simply do an mpirun
> IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
> instance, vader is enabled.
>
> I have tested again, with
>mpirun --mca btl "^vader" IMB-MPI1
> it made no difference.
>
>
> I’ve lost track of the earlier parts of this thread, but has anyone
> suggested logging into the nodes it’s running on, doing “gdb -p PID” for
> each of the mpi processes, and doing “where” to see where it’s hanging?
>
> I use this script (trace_all), which depends on a variable process that is a
> grep regexp that matches the mpi executable:
>
> echo "where" > /tmp/gf
>
> pids=`ps aux | grep $process | grep -v grep | grep -v trace_all | awk
> '{print \$2}'`
> for pid in $pids; do
>echo $pid
>prog=`ps auxw | grep " $pid " | grep -v grep | awk '{print $11}'`
>gdb -x /tmp/gf -batch $prog $pid
>echo ""
> done
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
AL I:40: Do what thou wilt shall be the whole of the Law.
Stack trace(s) for thread: 1
-
[0-1023] (1024 processes)
-
main() at ?:?
  IMB_init_buffers_iter() at ?:?
IMB_alltoall() at ?:?
  -
  [0-31,35,42,118,163,235] (37 processes)
  -
  PMPI_Barrier() at ?:?
ompi_coll_base_barrier_intra_recursivedoubling() at ?:?
  ompi_request_default_wait() at ?:?
opal_progress() at ?:?
  -
  [32-34,36-41,43-117,119-162,164-234,236-1023] (987 processes)
  -
  PMPI_Alltoall() at ?:?
ompi_coll_base_alltoall_intra_basic_linear() at ?:?
  ompi_request_default_wait_all() at ?:?
-

[32-34,36-41,43-117,119-162,164-234,236-413,415-532,534-651,653-744,746-894,896-1023]
 (982 processes)
-
opal_progress() at ?:?
-
[533] (1 processes)
-
opal_progress@plt() at ?:?
Stack trace(s) for thread: 2
-
[0-1023] (1024 processes)
-
start_thread() at ?:?
  progress_engine() at ?:?
opal_libevent2022_event_base_loop() at event.c:1630
  epoll_dispatch() at epoll.c:407
epoll_wait() at ?:?
Stack trace(s) for thread: 3
-
[0-1023] (1024 processes)
-
start_thread() at ?:?
  progress_engine() at ?:?
opal_libevent2022_event_base_loop() at event.c:1630
  poll_dispatch() at poll.c:165
poll() at ?:?
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users