Re: [OMPI users] Seg fault in MPI_FINALIZE

2015-10-16 Thread Jeff Squyres (jsquyres)
If you are using Intel 16, yes, 1.10.1 would be a good choice. If you're not using Fortran, you can disable the MPI Fortran bindings, and you should be ok, too. > On Oct 16, 2015, at 3:54 PM, Nick Papior wrote: > > @Jeff, Kevin > > Shouldn't Kevin wait for 1.10.1 with the intel 16 compiler?

Re: [OMPI users] Seg fault in MPI_FINALIZE

2015-10-16 Thread Nick Papior
@Jeff, Kevin Shouldn't Kevin wait for 1.10.1 with the intel 16 compiler? A bugfix for intel 16 has been committed with fb49a2d71ed9115be892e8a22643d9a1c069a8f9. (At least I am anxiously awaiting the 1.10.1 because I cannot get my builds to complete successfully) 2015-10-16 19:33 GMT+00:00 Jeff

Re: [OMPI users] Seg fault in MPI_FINALIZE

2015-10-16 Thread Jeff Squyres (jsquyres)
> On Oct 16, 2015, at 3:25 PM, McGrattan, Kevin B. Dr. > wrote: > > I cannot nail this down any better because this happens like every other > night, with about 1 out of a hundred jobs. Can anyone think of a reason why > the job would seg fault in MPI_FINALIZE, but only under conditions where

[OMPI users] Seg fault in MPI_FINALIZE

2015-10-16 Thread McGrattan, Kevin B. Dr.
My group is running a fairly large CFD code compiled with Intel Fortran 16.0.0 and OpenMPI 1.8.4. Each night we run hundreds of simple test cases, using a range of MPI processes from 1 to 16. I have noticed that if we submit these jobs on our linux cluster and assign each job exclusive rights to

Re: [OMPI users] openib issue with 1.6.5 but not later releases

2015-10-16 Thread John Marshall
On 10/16/2015 02:27 PM, Shamis, Pavel wrote: Well, OMPI will see this as a 14 separate devices and will create ~28 openib btl instances (one per each port). Can you try to limit OpenMPI to run with a single device/port and see what happens ? We are running inside an LXC container and only 1 i

Re: [OMPI users] openib issue with 1.6.5 but not later releases

2015-10-16 Thread Shamis, Pavel
Well, OMPI will see this as a 14 separate devices and will create ~28 openib btl instances (one per each port). Can you try to limit OpenMPI to run with a single device/port and see what happens ? Best, Pasha From: users mailto:users-boun...@open-mpi.org>> on behalf of John Marshall mailto:jo

Re: [OMPI users] openib issue with 1.6.5 but not later releases

2015-10-16 Thread John Marshall
On 10/16/2015 01:35 PM, Shamis, Pavel wrote: Did you try to run ibdiagnet to check the network ? Also, how many devices you have on the same node ? It say "mlx4_14" - do you have 14 HCA on the same machine ?! Yes. ibdiagnet seems to check out fine except for a few warning which do not seem to b

Re: [OMPI users] openib issue with 1.6.5 but not later releases

2015-10-16 Thread Shamis, Pavel
Did you try to run ibdiagnet to check the network ? Also, how many devices you have on the same node ? It say "mlx4_14" - do you have 14 HCA on the same machine ?! Best, Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory

[OMPI users] openib issue with 1.6.5 but not later releases

2015-10-16 Thread John Marshall
Hi, I have encountered a problem when running with 1.6.5 over IB (openib, ConnectX-3): [[51298,1],2][btl_openib_component.c:3496:handle_wc] from ib7-bc2qq42-be01p02 to: 3 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 217ce00 opcode 0 vendor error 129 qp_i

Re: [OMPI users] openMPI programs not using IB network

2015-10-16 Thread Gilles Gouaillardet
David, ib0 means IP over IB this is *not* what you want to use since it is way slower than native infiniband. if you mpirun --mca self,sm,openib ... on more than one node, the only btl usable for inter node communication is openib, so if communication happen, that means opening is used. in order

[OMPI users] openMPI programs not using IB network

2015-10-16 Thread David Arnold
Hi, We appear to have a correctly setup Mellanox IB network (ibdiagnet, ibstat, iblinkinfo, ibqueryerrors(*)). It's operating at Rate 40 FDR10. But openMPI programs (test and user) that are specifying the 'openib,self,sm' paramenters do not seem to be using the IB network according to network- m

Re: [OMPI users] mpirun/mpiexec requires su

2015-10-16 Thread Jeff Squyres (jsquyres)
On Oct 15, 2015, at 2:58 PM, Brant Abbott wrote: > > If I use mpirun.openmpi everything works as normal. I suppose mpirun is > executing the MPICH version. I'm not entirely sure why when logged in a root > it behaves differently, but good enough for me to just use the alternative > command. T

Re: [OMPI users] MPI_GATHERV error

2015-10-16 Thread Diego Avesani
Dear George, thanks a lot for your explanations. Now all works and it is more clear to me. Best Regards, Diego Diego On 14 October 2015 at 17:16, Georg Geiser wrote: > Hi Diego, > > displacements start at 0, so 0 means no displacement, i.e., the > corresponding data starts at the first entry