Re: [OMPI users] memory per core/process

Reuti Tue, 2 Apr 2013 09:08:38 -0400

Hi,

Am 30.03.2013 um 15:35 schrieb Gustavo Correa:


> On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:
> 
>> On 3/30/13 8:20 PM, Reuti wrote:
>>> Am 30.03.2013 um 13:26 schrieb Tim Prince:
>>> 
>>>> On 03/30/2013 06:36 AM, Duke Nguyen wrote:
>>>>> On 3/30/13 5:22 PM, Duke Nguyen wrote:
>>>>>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>>>>>>> I do not know about your code but:
>>>>>>> 
>>>>>>> 1) did you check stack limitations ? Typically intel fortran codes 
>>>>>>> needs large amount of stack when the problem size increase.
>>>>>>> Check ulimit -a
>>>>>> First time I heard of stack limitations. Anyway, ulimit -a gives
>>>>>> 
>>>>>> $ ulimit -a
>>>>>> core file size          (blocks, -c) 0
>>>>>> data seg size           (kbytes, -d) unlimited
>>>>>> scheduling priority             (-e) 0
>>>>>> file size               (blocks, -f) unlimited
>>>>>> pending signals                 (-i) 127368
>>>>>> max locked memory       (kbytes, -l) unlimited
>>>>>> max memory size         (kbytes, -m) unlimited
>>>>>> open files                      (-n) 1024
>>>>>> pipe size            (512 bytes, -p) 8
>>>>>> POSIX message queues     (bytes, -q) 819200
>>>>>> real-time priority              (-r) 0
>>>>>> stack size              (kbytes, -s) 10240
>>>>>> cpu time               (seconds, -t) unlimited
>>>>>> max user processes              (-u) 1024
>>>>>> virtual memory          (kbytes, -v) unlimited
>>>>>> file locks                      (-x) unlimited
>>>>>> 
>>>>>> So stack size is 10MB??? Does this one create problem? How do I change 
>>>>>> this?
>>>>> I did $ ulimit -s unlimited to have stack size to be unlimited, and the 
>>>>> job ran fine!!! So it looks like stack limit is the problem. Questions 
>>>>> are:
>>>>> 
>>>>> * how do I set this automatically (and permanently)?
>>>>> * should I set all other ulimits to be unlimited?
>>>>> 
>>>> In our environment, the only solution we found is to have mpirun run a 
>>>> script on each node which sets ulimit (as well as environment variables 
>>>> which are more convenient to set there than in the mpirun), before 
>>>> starting the executable.  We had expert recommendations against this but 
>>>> no other working solution.  It seems unlikely that you would want to 
>>>> remove any limits which work at default.
>>>> Stack size unlimited in reality is not unlimited; it may be limited by a 
>>>> system limit or implementation.  As we run up to 120 threads per rank and 
>>>> many applications have threadprivate data regions, ability to run without 
>>>> considering stack limit is the exception rather than the rule.
>>> Even if I would be the only user on a cluster of machines, I would define 
>>> this in any queuingsystem to set the limits for the job.
>> 
>> Sorry if I dont get this correctly, but do you mean I should set this using 
>> Torque/Maui (our queuing manager) instead of the system itself 
>> (/etc/security/limits.conf and /etc/profile.d/)?

Yes, or per queue/job.


> Hi Duke
> 
> We do both.
> Set memlock and stacksize to unlimited, and increase the maximum number of
> open files  in the pbs_mom script in /etc/init.d, and do the same in 
> /etc/security/limits.conf.
> This maybe an overzealous  "belt and suspenders" policy, but it works.
> As everybody else said, a small stacksize is a common cause of segmentation 
> fault in
> large codes.

This way it would be fixed in the overall cluster and not per job - or? I saw 
situations, where with a limited virtual memory for a job, the stack size has 
to be set to a low value in range of a few ten megabytes only.

Whether such a request is possible depends on the queuingsystem though. In 
GridEngine it's possible, I'm not sure about Torque/PBS.

-- Reuti


> Basically all codes that we run here have this problem, with too many
> automatic arrays, structures, etc in functions and subroutines. 
> But also a small memlock is trouble for OFED/Infinband, and the small 
> (default) 
> max number of open file handles may hit the limit easily if many programs 
> (or poorly written  programs) are running in the same node.
> The default Linux distribution limits don't seem to be tailored for HPC, I 
> guess.
> 
> I hope this helps,
> Gus Correa 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] memory per core/process

Reply via email to