Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs

2014-05-27 Thread Nathan Hjelm
On Wed, May 28, 2014 at 12:32:35AM +0200, Alain Miniussi wrote: > Unfortunately, the debug library works like a charm (which make the > uninitialized variable issue more likely). > > Still, the stack trace point to mca_btl_openib_add_procs in > ompi/mca/btl/openib/btl_openib.c and there is only on

Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs

2014-05-27 Thread Ralph Castain
On May 27, 2014, at 3:32 PM, Alain Miniussi wrote: > Unfortunately, the debug library works like a charm (which make the > uninitialized variable issue more likely). Indeed - sounds like there is some optimization occurring that triggers the problem. > > Still, the stack trace point to mca_

Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs

2014-05-27 Thread Alain Miniussi
Unfortunately, the debug library works like a charm (which make the uninitialized variable issue more likely). Still, the stack trace point to mca_btl_openib_add_procs in ompi/mca/btl/openib/btl_openib.c and there is only one division in that function (although not floating point) at the end:

Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI

2014-05-27 Thread Rolf vandeVaart
>-Original Message- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime >Boissonneault >Sent: Tuesday, May 27, 2014 4:07 PM >To: Open MPI Users >Subject: Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI > >Answers inline too. >>> 2) Is the absence of btl_ope

Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI

2014-05-27 Thread Maxime Boissonneault
Answers inline too. 2) Is the absence of btl_openib_have_driver_gdr an indicator of something missing ? Yes, that means that somehow the GPU Direct RDMA is not installed correctly. All that check does is make sure that the file /sys/kernel/mm/memory_peers/nv_mem/version exists. Does that exis

Re: [OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Brock Palen
mpiifort and mpiicc are intel MPI library commands, in openmpi and others the analogous would be mpifort and mpicc Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On May 27, 2014, at 2:11 PM, Lorenzo Donà wrote: > Dear all > I in

Re: [OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Lori 91
Really Thanks i finked that if i compile with ifort and so on i got mpiifort and the others Thanks really:) > Il giorno 27/mag/2014, alle ore 20:23, "Fabricio Cannini" > ha scritto: > > Em 27-05-2014 15:10, Lorenzo Donà escreveu: >> Dear all >> I installed openmpi with intel compiler in this w

Re: [OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Fabricio Cannini
Em 27-05-2014 15:10, Lorenzo Donà escreveu: Dear all I installed openmpi with intel compiler in this way: ./configure FC=ifort CC=icc CXX=icpc F77=ifort --prefix=/Users/lorenzodona/Documents/openmpi-1.8.1/ but in bin dir i did not find : mpiifort mpiicc please can you help me to install openm

[OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Lorenzo Donà
Dear all I installed openmpi with intel compiler in this way: ./configure FC=ifort CC=icc CXX=icpc F77=ifort --prefix=/Users/lorenzodona/Documents/openmpi-1.8.1/ but in bin dir i did not find : mpiifort mpiicc please can you help me to install openmpi with intel compiler.?? Thanks to help me and

[OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Lorenzo Donà
Dear all I installed openmpi with intel compiler in this way: ./configure FC=ifort CC=icc CXX=icpc F77=ifort --prefix=/Users/lorenzodona/Documents/openmpi-1.8.1/ but in bin dir i did not find : mpiifort mpiicc please can you help me to install openmpi with intel compiler.?? Thanks to help me and

Re: [OMPI users] Deadly warning "Epoll ADD(4) on fd 2 failed." ?

2014-05-27 Thread Ralph Castain
I'm unaware of any OMPI error message like that - might be caused by something in libevent as that could be using epoll, so it could be caused by us. However, I'm a little concerned about the use of the prerelease version of Slurm as we know that PMI is having some problems over there. So out o

Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs

2014-05-27 Thread Ralph Castain
Ah, good. On the setup that fails, could you use gdb to find the line number where it is dividing by zero? It could be an uninitialized variable that gcc inits one way and icc inits another. On May 27, 2014, at 4:49 AM, Alain Miniussi wrote: > So it's working with a gcc compiled openmpi: > >

Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI

2014-05-27 Thread Rolf vandeVaart
Answers inline... >-Original Message- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime >Boissonneault >Sent: Friday, May 23, 2014 4:31 PM >To: Open MPI Users >Subject: [OMPI users] Advices for parameter tuning for CUDA-aware MPI > >Hi, >I am currently configuring a GPU c

Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs

2014-05-27 Thread Alain Miniussi
So it's working with a gcc compiled openmpi: [alainm@gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc --showme gcc -I/softs/openmpi-1.8.1-gnu447/include -pthread -Wl,-rpath -Wl,/softs/openmpi-1.8.1-gnu447/lib -Wl,--enable-new-dtags -L/softs/openmpi-1.8.1-gnu447/lib -lmpi (reverse-i-search)`m

Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs

2014-05-27 Thread Alain Miniussi
Hi Gus, Yes I did, with the same result on each process. Actually the problem was spotted on a real code although I just posted the minimal version. Alain On 26/05/2014 17:14, Gustavo Correa wrote: Hi Alain Have you tried this? mpiexec -np 2 ./a.out Note: mpicc to compile, mpiexec to exec

[OMPI users] Deadly warning "Epoll ADD(4) on fd 2 failed." ?

2014-05-27 Thread Filippo Spiga
Dear all, I am using Open MPI v1.8.2 night snapshot compiled with SLURM support (version 14.03pre5). These two messages below appeared during a job of 2048 MPI that died after 24 hours! [warn] Epoll ADD(1) on fd 0 failed. Old events were 0; read change was 1 (add); write change was 0 (none):