Hi, Am 30.03.2013 um 15:35 schrieb Gustavo Correa:
> On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: > >> On 3/30/13 8:20 PM, Reuti wrote: >>> Am 30.03.2013 um 13:26 schrieb Tim Prince: >>> >>>> On 03/30/2013 06:36 AM, Duke Nguyen wrote: >>>>> On 3/30/13 5:22 PM, Duke Nguyen wrote: >>>>>> On 3/30/13 3:13 PM, Patrick Bégou wrote: >>>>>>> I do not know about your code but: >>>>>>> >>>>>>> 1) did you check stack limitations ? Typically intel fortran codes >>>>>>> needs large amount of stack when the problem size increase. >>>>>>> Check ulimit -a >>>>>> First time I heard of stack limitations. Anyway, ulimit -a gives >>>>>> >>>>>> $ ulimit -a >>>>>> core file size (blocks, -c) 0 >>>>>> data seg size (kbytes, -d) unlimited >>>>>> scheduling priority (-e) 0 >>>>>> file size (blocks, -f) unlimited >>>>>> pending signals (-i) 127368 >>>>>> max locked memory (kbytes, -l) unlimited >>>>>> max memory size (kbytes, -m) unlimited >>>>>> open files (-n) 1024 >>>>>> pipe size (512 bytes, -p) 8 >>>>>> POSIX message queues (bytes, -q) 819200 >>>>>> real-time priority (-r) 0 >>>>>> stack size (kbytes, -s) 10240 >>>>>> cpu time (seconds, -t) unlimited >>>>>> max user processes (-u) 1024 >>>>>> virtual memory (kbytes, -v) unlimited >>>>>> file locks (-x) unlimited >>>>>> >>>>>> So stack size is 10MB??? Does this one create problem? How do I change >>>>>> this? >>>>> I did $ ulimit -s unlimited to have stack size to be unlimited, and the >>>>> job ran fine!!! So it looks like stack limit is the problem. Questions >>>>> are: >>>>> >>>>> * how do I set this automatically (and permanently)? >>>>> * should I set all other ulimits to be unlimited? >>>>> >>>> In our environment, the only solution we found is to have mpirun run a >>>> script on each node which sets ulimit (as well as environment variables >>>> which are more convenient to set there than in the mpirun), before >>>> starting the executable. We had expert recommendations against this but >>>> no other working solution. It seems unlikely that you would want to >>>> remove any limits which work at default. >>>> Stack size unlimited in reality is not unlimited; it may be limited by a >>>> system limit or implementation. As we run up to 120 threads per rank and >>>> many applications have threadprivate data regions, ability to run without >>>> considering stack limit is the exception rather than the rule. >>> Even if I would be the only user on a cluster of machines, I would define >>> this in any queuingsystem to set the limits for the job. >> >> Sorry if I dont get this correctly, but do you mean I should set this using >> Torque/Maui (our queuing manager) instead of the system itself >> (/etc/security/limits.conf and /etc/profile.d/)? Yes, or per queue/job. > Hi Duke > > We do both. > Set memlock and stacksize to unlimited, and increase the maximum number of > open files in the pbs_mom script in /etc/init.d, and do the same in > /etc/security/limits.conf. > This maybe an overzealous "belt and suspenders" policy, but it works. > As everybody else said, a small stacksize is a common cause of segmentation > fault in > large codes. This way it would be fixed in the overall cluster and not per job - or? I saw situations, where with a limited virtual memory for a job, the stack size has to be set to a low value in range of a few ten megabytes only. Whether such a request is possible depends on the queuingsystem though. In GridEngine it's possible, I'm not sure about Torque/PBS. -- Reuti > Basically all codes that we run here have this problem, with too many > automatic arrays, structures, etc in functions and subroutines. > But also a small memlock is trouble for OFED/Infinband, and the small > (default) > max number of open file handles may hit the limit easily if many programs > (or poorly written programs) are running in the same node. > The default Linux distribution limits don't seem to be tailored for HPC, I > guess. > > I hope this helps, > Gus Correa > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >