Marcus, maybe you can try playing with --mem instead? We recommend our users to 
use --mem instead of --mem-per-cpu/task as it It makes it easier for users to 
request the right amount of memory for the job. --mem is the amount of memory 
for the whole job. This way, there is no multiplying of memory * cpu involved. 

Strange that the cgroup has more memory than possible though.
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 

On 8/20/19, 11:27 PM, "slurm-users on behalf of Marcus Wagner" 
<slurm-users-boun...@lists.schedmd.com on behalf of wag...@itc.rwth-aachen.de> 
wrote:

    One thing, I forgot.
    
    On 8/20/19 4:58 PM, Christopher Benjamin Coffey wrote:
    > Hi Marcus,
    >
    > What is the reason to add "--mem-per-cpu" when the job already has 
exclusive access to the node?
    The user (normally) does not set --exclusive directly. We have several 
    accounts, whose jobs by default should run exclusively, so we set that 
    in the job_submit plugin.
    >   Your job has access to all of the memory, and all of the cores on the 
system already. Also note, for non-mpi code like single core job, or shared 
memory threaded job, you want to ask for number of cpus with --cpus-per-task, 
or -c. Unless you are running mpi code, where you will want to use -n, and 
--ntasks instead to launch n copies of the code on n cores. In this case, 
because you asked for -n2, and also specified a mem-per-cpu request, the 
scheduler is doling out the memory as requested (2 x tasks), likely due to 
having SelectTypeParameters=CR_Core_Memory in slurm.conf.
    I must say, we would be much happier with a --mem-per-task option 
    instead. I still do not know, why one should ask for mem-per-cpu 
    (logically), since in a shared memory job, you start one process, the 
    threads share the memory.
    With an hybrid MPI-code (mpi code with openmp parallelization on the 
    tasks), it makes even less sense. If I know, how much memory my tasks 
    needs, e.g. 10 GB, I still have to divide that through the number of 
    threads (-c) to get the right memory request. For me as an 
    administrator, an openmp job is a special hybrid job with only one 
    requested task. So it is the same for a shared memory job. I always have 
    to divide the really needed memory through the number of threads (or 
    cpus-per-task).
    
    Is there anyone, who can enlighten me?
    Why does one have to ask for memory per smallest scheduleable (is that 
    word right?) unit? Isn't it better to ask for memory per task/process?
    
    
    Best
    Marcus
    >
    > Best,
    > Chris
    >
    > —
    > Christopher Coffey
    > High-Performance Computing
    > Northern Arizona University
    > 928-523-1167
    >   
    >
    > On 8/20/19, 1:37 AM, "slurm-users on behalf of Marcus Wagner" 
<slurm-users-boun...@lists.schedmd.com on behalf of wag...@itc.rwth-aachen.de> 
wrote:
    >
    >      Just made another test.
    >      
    >      
    >      Thanks god, the exclusivity is not "destroyed" completely, only on 
job
    >      can run on the node, when the job is exclusive. Nonetheless, this is
    >      somewhat unintuitive.
    >      I wonder, if that also has an influence on the cgroups and the 
process
    >      affinity/binding.
    >      
    >      I will do some more tests.
    >      
    >      
    >      Best
    >      Marcus
    >      
    >      On 8/20/19 9:47 AM, Marcus Wagner wrote:
    >      > Hi Folks,
    >      >
    >      >
    >      > I think, I've stumbled over a BUG in Slurm regarding the
    >      > exclusiveness. Might also, I've misinterpreted something. I would 
be
    >      > happy, if someone could explain that to me in the latter case.
    >      >
    >      > To the background. I have set PriorityFlags=MAX_TRES
    >      > The TRESBillingWeights are "CPU=1.0,Mem=0.1875G" for a partition 
with
    >      > 48 core nodes and RealMemory 187200.
    >      >
    >      > ---
    >      >
    >      > I have two jobs:
    >      >
    >      > job 1:
    >      > #SBATCH --exclusive
    >      > #SBATCH --ntasks=2
    >      > #SBATCH --nodes=1
    >      >
    >      > scontrol show <jobid> =>
    >      >    NumNodes=1 NumCPUs=48 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    >      >    TRES=cpu=48,mem=187200M,node=1,billing=48
    >      >
    >      > exactly, what I expected, I got 48 CPUs and therefore the billing 
is 48.
    >      >
    >      > ---
    >      >
    >      > job 2 (just added mem-per-cpu):
    >      > #SBATCH --exclusive
    >      > #SBATCH --ntasks=2
    >      > #SBATCH --nodes=1
    >      > #SBATCH --mem-per-cpu=5000
    >      >
    >      > scontrol show <jobid> =>
    >      >    NumNodes=1-1 NumCPUs=2 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    >      >    TRES=cpu=2,mem=10000M,node=1,billing=2
    >      >
    >      > Why "destroys" '--mem-per-cpu' exclusivity?
    >      >
    >      >
    >      >
    >      > Best
    >      > Marcus
    >      >
    >      
    >      --
    >      Marcus Wagner, Dipl.-Inf.
    >      
    >      IT Center
    >      Abteilung: Systeme und Betrieb
    >      RWTH Aachen University
    >      Seffenter Weg 23
    >      52074 Aachen
    >      Tel: +49 241 80-24383
    >      Fax: +49 241 80-624383
    >      wag...@itc.rwth-aachen.de
    >      
https://nam05.safelinks.protection.outlook.com/?url=www.itc.rwth-aachen.de&amp;data=02%7C01%7Cchris.coffey%40nau.edu%7C7c17306818aa4c9c640708d72600a33a%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C637019656498257836&amp;sdata=7yFRaJHUtfoKISb75FLCesa7zz2e9ZyN6Or0iu89ggM%3D&amp;reserved=0
    >      
    >      
    >      
    >
    
    -- 
    Marcus Wagner, Dipl.-Inf.
    
    IT Center
    Abteilung: Systeme und Betrieb
    RWTH Aachen University
    Seffenter Weg 23
    52074 Aachen
    Tel: +49 241 80-24383
    Fax: +49 241 80-624383
    wag...@itc.rwth-aachen.de
    
https://nam05.safelinks.protection.outlook.com/?url=www.itc.rwth-aachen.de&amp;data=02%7C01%7Cchris.coffey%40nau.edu%7C7c17306818aa4c9c640708d72600a33a%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C637019656498257836&amp;sdata=7yFRaJHUtfoKISb75FLCesa7zz2e9ZyN6Or0iu89ggM%3D&amp;reserved=0
    
    
    

Reply via email to