Re: [OMPI users] Intermittent, somewhat architecture-dependent hang with Open MPI 1.8.1

2014-08-16 Thread Matt Thompson
Jeff, I've tried moving the backing file and it doesn't matter. I can say that PGI 14.7 + Open MPI 1.8.1 does not show this issue. I can run that on 96 cores just fine. Heck, I've run it on a few hundred. As for the 96, they are either on 8 Westmere nodes (8 nodes with 2 6-core sockets) or 6 Sand

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-16 Thread Maxime Boissonneault
There is indeed also a problem with MPI + Cuda. This problem however is deeper, since it happens with Mvapich2 1.9, OpenMPI 1.6.5/1.8.1/1.8.2rc4, Cuda 5.5.22/6.0.37. From my tests, everything works fine with MPI + Cuda on a single node, but as soon as I got to MPI + Cuda accross nodes, I get s

Re: [OMPI users] Intermittent, somewhat architecture-dependent hang with Open MPI 1.8.1

2014-08-16 Thread Jeff Squyres (jsquyres)
Have you tried moving your shared memory backing file directory, like the warning message suggests? I haven't seen a shared memory file on a network share cause correctness issues before (just performance issues), but I could see how that could be in the realm of possibility... Also, are you r

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

2014-08-16 Thread Jeff Squyres (jsquyres)
Just out of curiosity, I saw that one of the segv stack traces involved the cuda stack. Can you try a build without CUDA and see if that resolves the problem? On Aug 15, 2014, at 6:47 PM, Maxime Boissonneault wrote: > Hi Jeff, > > Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit : >> O