Awesome thanks, I didn't know about that "scontrol -o show assoc_mgr" command ! 
Thanks guys!

Best,
Chris
 
-- 
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 
 

On 6/23/22, 10:22 AM, "slurm-users on behalf of Miguel Oliveira" 
<slurm-users-boun...@lists.schedmd.com on behalf of miguel.olive...@uc.pt> 
wrote:

    Hi Chris,
    We use a python wrapper to do this but the basic command to retrieved 
account minutes is:

    'scontrol -o show assoc_mgr | grep "^QOS='+account+’"'

    You then have to parse the output for "GrpTRESMins=“. The output will be 
two numbers. The first is the limit, or N for no limit, while the next one in 
parenthesis is the consumed.

    You can also report by user with:

    'sreport -t minutes -T cpu,gres/gpu -nP cluster AccountUtilizationByUser 
start='+date_start+' end='+date_end+' account='+account+' format=login,used’

    If you are willing to accept some rounding errors!

    With slight variations, and some oddities, this can also be used to limit 
GPU utilisation, as is in our case as you can deduce from the previous command.

    Best,

    Miguel Afonso Oliveira





    On 23 Jun 2022, at 17:58, Christopher Benjamin Coffey 
<chris.cof...@nau.edu> wrote:

    Hi Miguel, 

    This is intriguing as I didn't know about this possibility, in dealing with 
fairshare, and limited priority minutes qos at the same time. How can you 
verify how many minutes have been used of this qos that has been setup with 
grptresmins ? Is that possible? Thanks.

    Best,
    Chris

    -- 
    Christopher Coffey
    High-Performance Computing
    Northern Arizona University
    928-523-1167



    On 6/23/22, 9:44 AM, "slurm-users on behalf of Miguel Oliveira" 
<slurm-users-boun...@lists.schedmd.com on behalf of miguel.olive...@uc.pt> 
wrote:

    Hi Gérard,
    It is not exactly true that you have no solution to limit projects. If you 
implement each project as an account then you can create an account qos with 
the NoDecay flags.
    This will not affect associations so priority and fair share are not 
impacted.

    The way we do it is to create a qos:

    sacctmgr -i --quiet create qos "{{ item.account }}" set 
flags=DenyOnLimit,NoDecay GrpTRESMin=cpu=600


    And then use this qos when the account (project) is created:

    sacctmgr -i --quiet add account "{{ item.account }}" Parent="{{ item.parent 
}}" QOS="{{ item.account }}" Fairshare=1 Description="{{ item.description }}”

    We even have a slurm bank implementation to play along with this technique 
and it has not failed us yet too much! :)

    Hope that helps,

    Miguel Afonso Oliveira



    On 23 Jun 2022, at 14:57, gerard....@cines.fr wrote:

    Hi Ole and B/H,

    Thanks for your answers.



    You're right B/H, and as I tuned TRESBillingWeights option to only counts 
cpu, in my case : nb of reserved core = "TRES billing cost" 

    You're right again I forgot the PriorityDecayHalfLife parameter which is 
also used by fairshare Multifactor Priority. 
    We use multifactor priority to manage the priority of jobs in the queue, 
and we set the values of PriorityDecayHalfLife and PriorityUsageResetPeriod 
according to these needs.
    So PriorityDecayHalfLife will decay GrpTRESRaw and GrpTRESMins can't be 
used as we want.

    Setting the NoDecay flag to a QOS could be an option but I suppose it also 
impact fairshare Multifactor Priority of all jobs using this QOS.

    This means I have no solution to limit a project as we want, unless schedMD 
changes its behavior or adds a new feature. 

    Thanks a lot.

    Regards, 
    Gérard
    <http://www.cines.fr/>


    ________________________________________

    De: "Bjørn-Helge Mevik" <b.h.me...@usit.uio.no>
    À: slurm-us...@schedmd.com
    Envoyé: Jeudi 23 Juin 2022 12:39:27
    Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage




    Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:

    Hi Bjørn-Helge,



    Hello, Ole! :)

    On 6/23/22 09:18, Bjørn-Helge Mevik wrote:

    Slurm the same internal variables are used for fairshare calculations as
    for GrpTRESMins (and similar), so when fair share priorities are in use,
    slurm will reduce accumulated GrpTRESMins over time. This means that it
    is impossible(*) to use GrpTRESMins limits and fairshare
    priorities at the same time.



    This is a surprising observation!



    I discovered it quite a few years ago, when we wanted to use Slurm to
    enforce cpu hour quota limits (instead of using Maui+Gold). Can't
    remember anymore if I was surprised or just sad. :D

    We use a 14 days HalfLife in slurm.conf:
    PriorityDecayHalfLife=14-0

    Since our longest running jobs can run only 7 days, maybe our limits
    never get reduced as you describe?



    The accumulated usage is reduced every 5 minutes (by default; see
    PriorityCalcPeriod). The reduction is done by multiplying the
    accumulated usage by a number slightly less than 1. The number is
    chosen so that the accumulated usage is reduced to 50 % after
    PriorityDecayHalfLife (given that you don't run anything more in
    between, of course). With a halflife of 14 days and the default calc
    period, that number is very close to 1 (0.9998281 if my calculations are
    correct :).

    Note: I read all about these details on the schedmd web pages some years
    ago. I cannot find them again (the parts about the multiplication with
    a number smaller than 1 to get the half life), so I might be wrong on
    some of the details.

    BTW, I've written a handy script for displaying user limits in a
    readable format:
    https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits



    Nice!

    --
    B/H





Reply via email to