[slurm-dev] Re: Exceeded job memory limit problem

Chris Harwell Thu, 07 Sep 2017 04:32:50 -0700

Note that you were successful in changing the value on the right side of
that error message. So, you may just need to continue increasing it to a
number expected to fit the calculation, while, of course, checking that the
total memory available on a node is enough. Sometimes I have done a
representative test run and used sbatch --exclusive --mem=0 job.sh and
closely followed the memory usage of that job - logging in and using ps or
top and/or sacct to find the RSS value for the completed job to round up
and use next time. I believe the --exclusive option would typically
allocate an entire node to just that one job and --mem=0 would effectively
disable slurm memory limits. That depends on the slurm setup though...


On Wed, Sep 6, 2017, 03:38 Sema Atasever <[email protected]> wrote:

> Dear Batsirai,
>
> I tried the line of code what you recommended but the code still generates
> an error unfortunately.
>
> On Thu, Aug 24, 2017 at 5:19 PM, Batsirai Mabvakure <[email protected]>
> wrote:
>
>>
>> Try :
>>
>>
>>
>> sbatch -J jobname --mem=18000 -D $(pwd) submit_job.sh
>>
>>
>>
>>
>>
>>
>>
>> From: Sema Atasever <[email protected]<mailto:[email protected]>>
>>
>> Reply-To: slurm-dev <[email protected]<mailto:[email protected]>>
>>
>> Date: Thursday 24 August 2017 at 15:58
>>
>> To: slurm-dev <[email protected]<mailto:[email protected]>>
>>
>> Subject: [slurm-dev] Re: Exceeded job memory limit problem
>>
>>
>>
>> Dear Lev,
>>
>>
>>
>> I already have tried  --mem parameter with different values.
>>
>>
>>
>> For example:
>>
>>
>>
>> sbatch  --mem=5GB   submit_job.
>>
>> sbatch  --mem=18000  submit_job.
>>
>>
>>
>> but every time it gave the same error again unfortunately.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 24, 2017 at 2:32 AM, Lev Lafayette <
>> [email protected]<mailto:[email protected]>> wrote:
>>
>> On Wed, 2017-08-23 at 01:26 -0600, Sema Atasever wrote:
>>
>>
>>
>> >
>>
>> >
>>
>> > Computing predictions by SVM...
>>
>> > slurmstepd: Job 3469 exceeded memory limit (4235584 > 2048000), being
>>
>> > killed
>>
>> > slurmstepd: Exceeded job memory limit
>>
>> >
>>
>> >
>>
>> > How can i fix this problem.
>>
>> >
>>
>>
>>
>> Error messages often give useful information. In this case you haven't
>>
>> requested enough memory in your Slurm script.
>>
>>
>>
>> Memory can be set with `#SBATCH --mem=[mem][M|G|T]` directive (entire
>>
>> job) or `#SBATCH --mem-per-cpu=[mem][M|G|T]` (per core).
>>
>>
>>
>> As a rule of thumb, the maximum request per node should be based around
>>
>> total cores -1 (for system processes).
>>
>>
>>
>> All the best,
>>
>>
>>
>>
>>
>> --
>>
>> Lev Lafayette, BA (Hons), GradCertTerAdEd (Murdoch), GradCertPM, MBA
>>
>> (Tech Mngmnt) (Chifley)
>>
>> HPC Support and Training Officer +61383444193<tel:%2B61383444193>
>> +61432255208<tel:%2B61432255208>
>>
>> Department of Infrastructure Services, University of Melbourne
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> The views expressed in this email are, unless otherwise stated, those of
>> the author and not those of the National Health Laboratory Service or its
>> management. The information in this e-mail is confidential and is intended
>> solely for the addressee.
>>
>> Access to this e-mail by anyone else is unauthorized. If you are not the
>> intended recipient, any disclosure, copying, distribution or any action
>> taken or omitted in reliance on this, is prohibited and may be unlawful.
>>
>> Whilst all reasonable steps are taken to ensure the accuracy and
>> integrity of information and data transmitted electronically and to
>> preserve the confidentiality thereof, no liability or responsibility
>> whatsoever is accepted if information or data is, for whatever reason,
>> corrupted or does not reach its intended destination.
>>
>
> --
Chris Harwell

[slurm-dev] Re: Exceeded job memory limit problem

Reply via email to