Re: [OMPI users] openmpi - mx - solaris and Gadget2 - add on

2006-11-24 Thread Lydia Heck
I saved two cores, which might be of interest. However they are so large, that I cannot attach them to any email. But I am very willing to submit them, if requested. Lydia -- Dr E L Heck University of Durham Institute for Computational Cosmology Ogden Ce

Re: [OMPI users] openmpi, mx

2006-11-23 Thread Mostyn Lewis
I believe this is "too many open files". ulimit -n some_number Regards, Mostyn On Wed, 22 Nov 2006, Lydia Heck wrote: I have - again - successfully built and installed mx and openmpi and I can run 64 and 128 cpus jobs on a 256 CPU cluster version of openmpi is 1.2b1 compiler used: studio11

Re: [OMPI users] openmpi - mx - solaris and Gadget2

2006-11-23 Thread Lydia Heck
The same run on 32 CPUs almost completes, starting to write 32 re-start files and fails with the same problem: Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:33 /opt/ompi/lib/libopal.so.0.0.0:opal_backtrace_print+0x10 /opt/ompi/lib/libopal.so.0.0.0:0x99df5 /lib/amd64/li

Re: [OMPI users] openmpi, mx

2006-11-22 Thread Ralph Castain
One of our users/friends has also sent us some example code to do this internally - I hope to find the time to include that capability in the code base shortly. I'll advise when we do. On 11/22/06 2:16 PM, "Rolf Vandevaart" wrote: > > Hi Lydia: > > errno 24 means "Too many open files". When

Re: [OMPI users] openmpi, mx

2006-11-22 Thread Rolf Vandevaart
Hi Lydia: errno 24 means "Too many open files". When we have seen this, I believe we increased the number of file descriptors available to the mpirun process to get past this. In my case, my shell (tcsh) defaults to 256. I increase it with a call to "limit descriptors" as shown below. I th