[OMPI users] configuring open mpi 10.1.2 with cuda on NVIDIA TK1

2016-01-21 Thread Kuhl, Spencer J
Openmpi 1.10.2 cuda.h and cuda_runtime_api.h exist in /usr/local/cuda-6.5/include using the configure trigger ./configure --with-cuda does not find cuda.h or cuda_runtime_api.h using the configure trigger ./configure --with-cuda=/usr/local/cuda-6.5 does not find cuda.h or cuda_runtime_api.h e

Re: [OMPI users] MPI, Fortran, and GET_ENVIRONMENT_VARIABLE

2016-01-21 Thread Thomas Jahns
Hi Matt, On 01/15/2016 03:53 PM, Matt Thompson wrote: There is a chance in the future I might want/need to query an environment variable in a Fortran program, namely to figure out what switch a currently running process is on (via SLURM_TOPOLOGY_ADDR in my case) and perhaps make a "per-switch" c

Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

2016-01-21 Thread Jeff Hammond
On Thu, Jan 21, 2016 at 4:07 AM, Dave Love wrote: > > Jeff Hammond writes: > > > Just using Intel compilers, OpenMP and MPI. Problem solved :-) > > > > (I work for Intel and the previous statement should be interpreted as a > > joke, > > Good! > > > although Intel OpenMP and MPI interoperate as

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Jeff Squyres (jsquyres)
On Jan 21, 2016, at 7:40 AM, Eva wrote: > > Thanks Jeff. > > >>1. Can you create a small example to reproduce the problem? > > >>2. The TCP and verbs-based transports use different thresholds and > >>protocols, and can sometimes bring to light errors in the application > >>(e.g., the applica

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
Thanks Jeff. >>1. Can you create a small example to reproduce the problem? >>2. The TCP and verbs-based transports use different thresholds and protocols, and can sometimes bring to light errors in the application (e.g., the application is making assumptions that just happen to be true for TCP, b

Re: [OMPI users] MPI, Fortran, and GET_ENVIRONMENT_VARIABLE

2016-01-21 Thread Dave Love
Matt Thompson writes: > All, > > I'm not too sure if this is an MPI issue, a Fortran issue, or something > else but I thought I'd ask the MPI gurus here first since my web search > failed me. > > There is a chance in the future I might want/need to query an environment > variable in a Fortran pro

Re: [OMPI users] Openmpi 1.8.8 and affinty

2016-01-21 Thread Dave Love
twu...@goodyear.com writes: > In the past (v 1.6.4-) we used mpirun args of > > --mca mpi_paffinity_alone 1 --mca btl openib,tcp,sm,self > > with lsf 7.0.6, and this was enough to make cores not be oversubscribed when > submitting 2 or more jobs to the same node. [I'm puzzled by that. It should

Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

2016-01-21 Thread Dave Love
Jeff Hammond writes: > Just using Intel compilers, OpenMP and MPI. Problem solved :-) > > (I work for Intel and the previous statement should be interpreted as a > joke, Good! > although Intel OpenMP and MPI interoperate as well as any > implementations of which I am aware.) Better than MPC (

Re: [OMPI users] cleaning up old ROMIO (MPI-IO) drivers

2016-01-21 Thread Dave Love
[Catching up...] Rob Latham writes: > Do you use any of the other ROMIO file system drivers? If you don't > know if you do, or don't know what a ROMIO file system driver is, then > it's unlikely you are using one. > > What if you use a driver and it's not on the list? First off, let me > know

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Jeff Squyres (jsquyres)
Can you create a small example to reproduce the problem? The TCP and verbs-based transports use different thresholds and protocols, and can sometimes bring to light errors in the application (e.g., the application is making assumptions that just happen to be true for TCP, but not necessarily fo

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Gilles Gouaillardet
That could be a bug in openib, openmpi and/or your application. for example, a memory corruption could be unnoticed with tcp, but might cause openib hang. you can start by running your program under a memory debugger (valgrind, ddt or other) and confirm your application works fine. you can also up

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
Gilles, Actually, there are some more strange things. With the same environment and MPI version, I write a simple program by using the same communication logic with my hang program. The simple program can work without hang. So is there any possible reason? I can try them one by one. Or can I debug

Re: [OMPI users] OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Gilles Gouaillardet
You can try a more recent version of openmpi 1.10.2 was released recently, or try with a nightly snapshot of master. If all of these still fail, can you post a trimmed version of your program so we can investigate ? Cheers, Gilles Eva wrote: >Gilles, > >>>Can you try to  >>>mpirun --mca btl t

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
Gilles, >>Can you try to >>mpirun --mca btl tcp,self --mca btl_tcp_eager_limit 56 ... >>and confirm it works fine with TCP *and* without eager ? I have tried this and it works. So what should I do next? 2016-01-21 16:25 GMT+08:00 Eva : > Thanks Gilles. > it works fine on tcp > So I use this to

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Gilles Gouaillardet
Can you try to mpirun --mca btl tcp,self --mca btl_tcp_eager_limit 56 ... and confirm it works fine with TCP *and* without eager ? Cheers, Gilles On 1/21/2016 5:25 PM, Eva wrote: Thanks Gilles. it works fine on tcp So I use this to disable eager: -mca btl_openib_use_eager_rdma 0 -mca btl_open

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
Thanks Gilles. it works fine on tcp So I use this to disable eager: -mca btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 2016-01-21 13:10 GMT+08:00 Eva : > I run with two machines, 2 process per node: process0, process1, process2, > process3. > After some random rounds of communicat

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Gilles Gouaillardet
and by the way, you did mpirun --mca btl_tcp_eager_limit 56 in order to disable eager mode, right ? --mca btl_tcp_rndv_eager_limit 0 does something different Cheers, Gilles On 1/21/2016 2:10 PM, Eva wrote: I run with two machines, 2 process per node: process0, process1, process2, process3. Af

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Gilles Gouaillardet
Hi, can you post a trimmed version of your program so we can reproduce and analyze the hang ? Cheers, Gilles On 1/21/2016 2:10 PM, Eva wrote: I run with two machines, 2 process per node: process0, process1, process2, process3. After some random rounds of communications, the communication ha

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
I run with two machines, 2 process per node: process0, process1, process2, process3. After some random rounds of communications, the communication hangs. When I debug into the program, I found: process1 sent a message to process2; process2 received the message from process1 and then start to receiv