date:20150630

Re: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

2015-06-30 Thread Steven Eliuk

Ramps up over time, we had a bunch of locked up nodes over the weekend and have traced it back to this. Let me see if I can share more details, I will review with everyone tomorrow and get back to you, Rolf vandeVaart wrote: Hi Steven, Thanks for the report. Very little has changed betwee

Re: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

2015-06-30 Thread Rolf vandeVaart

Hi Steven, Thanks for the report. Very little has changed between 1.8.5 and 1.8.6 within the CUDA-aware specific code so I am perplexed. Also interesting that you do not see the issue with 1.8.5 and CUDA 7.0. You mentioned that it is hard to share the code on this but maybe you could share how

Re: [OMPI users] Running with native ugni on a Cray XC

2015-06-30 Thread Howard Pritchard

Hi Nick No. Have to use mpirun in this case. You need. to ask for a larger batch allocation than the initial mpirun requires. You do need to ask for batch alloc though. Also note that mpirun doesnt currently work with nativized slurm. Its on my todo list to fix. Howard -- sent from

Re: [OMPI users] Allgather Implementation Details

2015-06-30 Thread George Bosilca

Saliya, On Tue, Jun 30, 2015 at 10:50 AM, Saliya Ekanayake wrote: > Hi, > > I am experiencing some bottleneck with allgatherv routine in one of our > programs and wonder how it works internally. Could you please share some > details on this? > Open MPI has a tunable approach to all the collecti

[OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

2015-06-30 Thread Steven Eliuk

Hi All, Looks like we have found a large memory leak, Very difficult to share code on this but here are some details, 1.8.5 w/ Cuda 7.0 — no memory leak 1.8.5 w/ cuda 6.5 — no memory leak 1.8.6 w/ cuda 7.0 — large memory leak 1.8.5 w/ cuda 6.5 — no memory leak mvapich2 2.1 GDR — no issue on eith

Re: [OMPI users] Running with native ugni on a Cray XC

2015-06-30 Thread Nick Radcliffe

Howard, I have one more question. Is it possible to use MPI_Comm_spawn when launching an OpenMPI job with aprun? I'm getting this error when I try: nradclif@kay:/lus/scratch/nradclif> aprun -n 1 -N 1 ./manager [nid00036:21772] [[14952,0],0] ORTE_ERROR_LOG: Not available in file dpm_orte.c at lin

[OMPI users] Allgather Implementation Details

2015-06-30 Thread Saliya Ekanayake

Hi, I am experiencing some bottleneck with allgatherv routine in one of our programs and wonder how it works internally. Could you please share some details on this? I found this [1] paper from Gropp discussing an efficient implementation. Is this similar to what we get in OpenMPI? [1] http://

Re: [OMPI users] Progress on target of MPI_Win_lock on Infiniband

2015-06-30 Thread Marc-Andre Hermanns

Hi Thomas, as far as I know MPI does _not_ guarantee asynchronous progress (unlike OpenSHMEM) because it would require some implementations to start a progress thread. Jeff has a nice blog post regarding this: http://blogs.cisco.com/performance/mpi-progress I was surprised to see this behavior i

Re: [OMPI users] Progress on target of MPI_Win_lock on Infiniband

2015-06-30 Thread Thomas Jahns

On 06/29/15 17:25, Nathan Hjelm wrote: This is not a configuration issue. On 1.8.x and master we use two-sided communication to emulation one-sided. Since we do not currently have async progress this requires the target to call into MPI to progress RMA communication. This will change in 2.x. I w

Re: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

Re: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

Re: [OMPI users] Running with native ugni on a Cray XC

Re: [OMPI users] Allgather Implementation Details

[OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

Re: [OMPI users] Running with native ugni on a Cray XC

[OMPI users] Allgather Implementation Details

Re: [OMPI users] Progress on target of MPI_Win_lock on Infiniband

Re: [OMPI users] Progress on target of MPI_Win_lock on Infiniband

9 matches

Site Navigation

Mail list logo

Footer information