Hmmm...tell you what. I'll add the ability for OMPI to set the limit to a user-specified level upon launch of each process. This will give you some protection and flexibility.
I forget, so please forgive the old man's fading memory - what version of OMPI are you using? I'll backport a patch for you. On Apr 2, 2013, at 8:40 AM, Duke Nguyen <duke.li...@gmx.com> wrote: > On 3/30/13 8:46 PM, Patrick Bégou wrote: >> Ok, so your problem is identified as a stack size problem. I went into these >> limitations using Intel fortran compilers on large data problems. >> >> First, it seems you can increase your stack size as "ulimit -s unlimited" >> works (you didn't enforce the system hard limit). The best way is to set >> this setting in your .bashrc file so it will works on every node. >> But setting it to unlimited may not be really safe. IE, if you got in a >> badly coded recursive function calling itself without a stop condition you >> can request all the system memory and crash the node. So set a large but >> limited value, it's safer. >> > > Now I feel the pain you mentioned :). With -s unlimited now some of our nodes > are easily down (completely) and needed to be hard reset!!! (whereas we never > had any node down like that before even with the killed or badly coded jobs). > > Looking for a safer number of ulimit -s other than "unlimited" now... :( > >> I'm managing a cluster and I always set a maximum value to stack size. I >> also limit the memory available for each core for system stability. If a >> user request only one of the 12 cores of a node he can only access 1/12 of >> the node memory amount. If he needs more memory he has to request 2 cores, >> even if he uses a sequential code. This avoid crashing jobs of other users >> on the same node with memory requirements. But this is not configured on >> your node. >> >> Duke Nguyen a écrit : >>> On 3/30/13 3:13 PM, Patrick Bégou wrote: >>>> I do not know about your code but: >>>> >>>> 1) did you check stack limitations ? Typically intel fortran codes needs >>>> large amount of stack when the problem size increase. >>>> Check ulimit -a >>> >>> First time I heard of stack limitations. Anyway, ulimit -a gives >>> >>> $ ulimit -a >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 127368 >>> max locked memory (kbytes, -l) unlimited >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 1024 >>> pipe size (512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) 10240 >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 1024 >>> virtual memory (kbytes, -v) unlimited >>> file locks (-x) unlimited >>> >>> So stack size is 10MB??? Does this one create problem? How do I change this? >>> >>>> >>>> 2) did your node uses cpuset and memory limitation like fake numa to set >>>> the maximum amount of memory available for a job ? >>> >>> Not really understand (also first time heard of fake numa), but I am pretty >>> sure we do not have such things. The server I tried was a dedicated server >>> with 2 x5420 and 16GB physical memory. >>> >>>> >>>> Patrick >>>> >>>> Duke Nguyen a écrit : >>>>> Hi folks, >>>>> >>>>> I am sorry if this question had been asked before, but after ten days of >>>>> searching/working on the system, I surrender :(. We try to use mpirun to >>>>> run abinit (abinit.org) which in turns will call an input file to run >>>>> some simulation. The command to run is pretty simple >>>>> >>>>> $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log >>>>> >>>>> We ran this command on a server with two quad core x5420 and 16GB of >>>>> memory. I called only 4 core, and I guess in theory each of the core >>>>> should take up to 2GB each. >>>>> >>>>> In the output of the log, there is something about memory: >>>>> >>>>> P This job should need less than 717.175 Mbytes of >>>>> memory. >>>>> Rough estimation (10% accuracy) of disk space for files : >>>>> WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. >>>>> >>>>> So basically it reported that the above job should not take more than >>>>> 718MB each core. >>>>> >>>>> But I still have the Segmentation Fault error: >>>>> >>>>> mpirun noticed that process rank 0 with PID 16099 on node biobos exited >>>>> on signal 11 (Segmentation fault). >>>>> >>>>> The system already has limits up to unlimited: >>>>> >>>>> $ cat /etc/security/limits.conf | grep -v '#' >>>>> * soft memlock unlimited >>>>> * hard memlock unlimited >>>>> >>>>> I also tried to run >>>>> >>>>> $ ulimit -l unlimited >>>>> >>>>> before the mpirun command above, but it did not help at all. >>>>> >>>>> If we adjust the parameters of the input.files to give the reported mem >>>>> per core is less than 512MB, then the job runs fine. >>>>> >>>>> Please help, >>>>> >>>>> Thanks, >>>>> >>>>> D. >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users