I saved two cores, which might be of interest. However they
are so large, that I cannot attach them to any email. But
I am very willing to submit them, if requested.
Lydia
--
Dr E L Heck
University of Durham
Institute for Computational Cosmology
Ogden Ce
I believe this is "too many open files".
ulimit -n some_number
Regards,
Mostyn
On Wed, 22 Nov 2006, Lydia Heck wrote:
I have - again - successfully built and installed
mx and openmpi and I can run 64 and 128 cpus jobs on a 256 CPU cluster
version of openmpi is 1.2b1
compiler used: studio11
The same run on 32 CPUs almost completes, starting to write 32 re-start
files and fails with the same problem:
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:33
/opt/ompi/lib/libopal.so.0.0.0:opal_backtrace_print+0x10
/opt/ompi/lib/libopal.so.0.0.0:0x99df5
/lib/amd64/li
One of our users/friends has also sent us some example code to do this
internally - I hope to find the time to include that capability in the code
base shortly. I'll advise when we do.
On 11/22/06 2:16 PM, "Rolf Vandevaart" wrote:
>
> Hi Lydia:
>
> errno 24 means "Too many open files". When
Hi Lydia:
errno 24 means "Too many open files". When we have seen this, I believe
we increased the number of file descriptors available to the mpirun process
to get past this.
In my case, my shell (tcsh) defaults to 256. I increase it with a call
to "limit descriptors"
as shown below. I th