Jeff,
I've tried moving the backing file and it doesn't matter. I can say that
PGI 14.7 + Open MPI 1.8.1 does not show this issue. I can run that on 96
cores just fine. Heck, I've run it on a few hundred.
As for the 96, they are either on 8 Westmere nodes (8 nodes with 2 6-core
sockets) or 6 Sand
There is indeed also a problem with MPI + Cuda.
This problem however is deeper, since it happens with Mvapich2 1.9,
OpenMPI 1.6.5/1.8.1/1.8.2rc4, Cuda 5.5.22/6.0.37. From my tests,
everything works fine with MPI + Cuda on a single node, but as soon as I
got to MPI + Cuda accross nodes, I get s
Have you tried moving your shared memory backing file directory, like the
warning message suggests?
I haven't seen a shared memory file on a network share cause correctness issues
before (just performance issues), but I could see how that could be in the
realm of possibility...
Also, are you r
Just out of curiosity, I saw that one of the segv stack traces involved the
cuda stack.
Can you try a build without CUDA and see if that resolves the problem?
On Aug 15, 2014, at 6:47 PM, Maxime Boissonneault
wrote:
> Hi Jeff,
>
> Le 2014-08-15 17:50, Jeff Squyres (jsquyres) a écrit :
>> O