[slurm-dev] Re: [ sshare ] RAW Usage

Ryan Cox Wed, 26 Nov 2014 10:34:05 -0800

Roshan,

It depends on several things including if the start time is set based onwhen slurmctld says "launch" and the job actually launches (don't know),whether the end time is set based on an event or on polling (don'tknow), whether the end time is set at the moment the batch script exitsor slurmctld gets the message that it has exited (don't know), etc. Ihaven't looked at that part of the code before. Basically, I don't knowthe answer to your question but my guess is that it is not significant.


Ryan

On 11/26/2014 11:13 AM, Roshan Mathew wrote:

The sshare reports the same even now (after few hours) - I have notchanged anything after the previous test.
[root@slurm-login slurm-scripts]# sshare
Account User Raw Shares Norm Shares Raw Usage Effectv UsageFairShare-------------------- ---------- ---------- ----------- ------------------------ ----------
root 1.000000          79      1.000000   0.500000
 premium1 50    0.500000          36      0.455696   0.531672
 premium2 50    0.500000          43      0.544304   0.470215

Ryan,

    /sshare doesn't actually have access to the cputime that is
    accounted for.  All sacct does is (elapsed time * CPU count),
    something that you can check in the manpage or sshare code.  This
    can be slightly different than what was actually accounted for in
    _apply_new_usage().  Slurm doesn't do very well at granularity in
    the seconds (not necessarily bad thing)./
On SLURM not being good at granularity in seconds - the delta between36 and 43 from 60 (actual runtime) is significant isnt it?
Thanks!
Roshan
________________________________________
From: Lech Nieroda <lech.nier...@uni-koeln.de>
Sent: 26 November 2014 15:36
To: slurm-dev
Subject: [slurm-dev] Re: [ sshare ] RAW Usage

Hello Mathew,
just to check the basics - did you wait a few minutes before executingsshare?As far as I remember the RawUsage value is updated every 5 minutes(per default), so these erroneous values might be caused by ameasurement that was taken while the jobs were still running.
Regards,
Lech


Am 26.11.2014 um 12:14 schrieb Roshan Mathew <r.t.mat...@bath.ac.uk>:
> My fairshare test scenario - As it stand the farishare is notdistributed correctly
>
> *Accounts*
>
> [root@slurm-login slurm-scripts]# sacctmgr list accounts
>    Account                Descr                  Org
> ---------- -------------------- --------------------
>   premium1      primary account                  root
>   premium2      primary account                  root
>       root default root account                  root
>
> *Users*
>
> [root@slurm-login slurm-scripts]# sacctmgr list users
>       User   Def Acct     Admin
> ---------- ---------- ---------
>      mm339   premium1      None
>      sy223   premium2      None
>
> *Initial Shares*
>
> [root@slurm-login slurm-scripts]# sshare
> Account User Raw Shares Norm Shares Raw UsageEffectv Usage FairShare> -------------------- ---------- ---------- ----------- ------------------------ ----------
> root 1.000000           0      1.000000   0.500000
> premium1 50 0.500000 00.000000 1.000000> premium2 50 0.500000 00.000000 1.000000
>
>
> *Job script*
>
> [root@slurm-login slurm-scripts]# cat stress.slurm
> #!/bin/bash
>
> #SBATCH --nodes=1
> #SBATCH --ntasks=1
> #SBATCH --job-name=stress
> #SBATCH --time=10
> #SBATCH --output=stress.%j-out
> #SBATCH --error=stress.%j-out
>
> time /opt/shared/apps/stress/app/bin/stress --cpu 1 --timeout 1m
>
>
> *Job Submission*
>
> [root@slurm-login slurm-scripts]# runuser sy223 -c 'sbatch stress.slurm'
> Submitted batch job 2
> [root@slurm-login slurm-scripts]# runuser mm339 -c 'sbatch stress.slurm'
> Submitted batch job 3
>
>
> *SACCT Information*
>
> [root@slurm-login slurm-scripts]# sacct --formatJobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw> JobID JobName Partition Account AllocCPUS StateExitCode CPUTimeRAW> ------------ ---------- ---------- ---------- ---------- ------------------ ----------> 2 stress batch premium2 1COMPLETED 0:0 60> 2.batch batch premium2 1COMPLETED 0:0 60> 3 stress batch premium1 1COMPLETED 0:0 60> 3.batch batch premium1 1COMPLETED 0:0 60
>
>
> *SSHARE Information*
>
> [root@slurm-login slurm-scripts]# sshare
> Account User Raw Shares Norm Shares Raw UsageEffectv Usage FairShare> -------------------- ---------- ---------- ----------- ------------------------ ----------
> root 1.000000          79      1.000000   0.500000
> premium1 50 0.500000 360.455696 0.531672> premium2 50 0.500000 430.544304 0.470215
>
>
> *Slurm.conf - priority/multifactor*
>
> # Activate the Multi-factor Job Priority Plugin with decay
> PriorityType=priority/multifactor
>
> # apply no decay
> PriorityDecayHalfLife=0
> PriorityCalcPeriod=1
> # reset usage after 1 month
> PriorityUsageResetPeriod=MONTHLY
>
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
>
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
>
> # This next group determines the weighting of each of the
> # components of the Multi-factor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=0
> PriorityWeightFairshare=100
> PriorityWeightJobSize=0
> PriorityWeightPartition=0
> PriorityWeightQOS=0 # don't use the qos factor
>
>
> *Questions*
>
> 1. Given that I have set the PriorityDecayHalfLife=0, i.e no decayapplied at any stage, shouldnt both the jobs have the same RAW Usagereported by SSHARE?
>
> 2. Also shouldnt CPUTimeRAW in sacct be same as RAW Usage in sshare?
>
>
> From: Skouson, Gary B <gary.skou...@pnnl.gov>
> Sent: 25 November 2014 21:09
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>
> I believe that the info share data is kept by slurmctld in memory.As far as I could tell from the code, it should be checkpointing theinfo to the assoc_usage file wherever slurm is saving stateinformation. I couldn’t find any docs on that, you’d have to checkthe code for more information.
>
> However, if you just want to see what was used, you can get the rawusage using sacct. For example, for a given job, you can do somethinglike:
>
> sacct -X -a -j 1182128 --formatJobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw
>
> -----
> Gary Skouson
>
>
> From: Roshan Mathew [mailto:r.t.mat...@bath.ac.uk]
> Sent: Tuesday, November 25, 2014 9:51 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>
> Thanks Ryan,
>
> Is this value stored anywhere in the SLURM accounting DB? I couldnot find any value for the JOB that corresponds to this RAW usage.
>
> Roshan
> From: Ryan Cox <ryan_...@byu.edu>
> Sent: 25 November 2014 17:43
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>
> Raw usage is a long double and the time added by jobs can be off bya few seconds. You can take a look at _apply_new_usage() insrc/plugins/priority/multifactor/priority_multifactor.c to see exactlywhat happens.
>
> Ryan
>
> On 11/25/2014 10:34 AM, Roshan Mathew wrote:
> Hello SLURM users,
>
> http://slurm.schedmd.com/sshare.html
> Raw Usage
> The number of cpu-seconds of all the jobs that charged the accountby the user. This number will decay over time whenPriorityDecayHalfLife is defined.> I am getting different RAW Usage values for the same job every timeit is executed. The Job am using is a CPU stress test for 1 minute.
>
> It would be very useful to understand the formula for how this RAWUsage is calculated when we are using the pluginPriorityType=priority/multifactor.
>
> Snip of my slurm.conf file:-
>
> # Activate the Multi-factor Job Priority Plugin with decay
> PriorityType=priority/multifactor
>
> # apply no decay
> PriorityDecayHalfLife=0
>
> PriorityCalcPeriod=1
> PriorityUsageResetPeriod=MONTHLY
>
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
>
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
>
> # This next group determines the weighting of each of the
> # components of the Multi-factor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=0
> PriorityWeightFairshare=100
> PriorityWeightJobSize=0
> PriorityWeightPartition=0
> PriorityWeightQOS=0 # don't use the qos factor
>
> Thanks!
>
> <image001.jpg>
> <image001.jpg>
>
>


--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University

[slurm-dev] Re: [ sshare ] RAW Usage

Reply via email to