Yes there is a second HPC Sun Grid Engine cluster on which I've run
this openMPI test code dozens of times on upwards of 400 slots
through SGE using qsub and qrsh, but this was using a much
older version of openMPI (1.3.3 I believe). On that particular cluster the
open files hard and soft limits we
Not for this test case size. You should be just fine with the default values.
If I understand you correctly, you've run this app at scale before on another
cluster without problem?
On Jul 19, 2014, at 1:34 PM, Lane, William wrote:
> Ralph,
>
> It's hard to imagine it's the openMPI code becaus
Ralph,
It's hard to imagine it's the openMPI code because I've tested this code
extensively on another cluster with 400 nodes and never had any problems.
But I'll try using the hello_c example in any case. Is it still recommended to
raise the open files soft and hard limits to 4096? Or should even
That's a pretty old OMPI version, and we don't really support it any longer.
However, I can provide some advice:
* have you tried running the simple "hello_c" example we provide? This would at
least tell you if the problem is in your app, which is what I'd expect given
your description
* try u
I'm getting consistent errors of the form:
"mpirun noticed that process rank 3 with PID 802 on node csclprd3-0-8 exited on
signal 11 (Segmentation fault)."
whenever I request more than 28 slots. These
errors even occur when I run mpirun locally
on a compute node that has 32 slots (8 cores, 16 wi