The sshare reports the same even now (after few hours) - I have not
changed anything after the previous test.
[root@slurm-login slurm-scripts]# sshare
Account User Raw Shares Norm Shares Raw Usage Effectv Usage
FairShare
-------------------- ---------- ---------- ----------- -----------
------------- ----------
root 1.000000 79 1.000000 0.500000
premium1 50 0.500000 36 0.455696 0.531672
premium2 50 0.500000 43 0.544304 0.470215
Ryan,
/sshare doesn't actually have access to the cputime that is
accounted for. All sacct does is (elapsed time * CPU count),
something that you can check in the manpage or sshare code. This
can be slightly different than what was actually accounted for in
_apply_new_usage(). Slurm doesn't do very well at granularity in
the seconds (not necessarily bad thing)./
On SLURM not being good at granularity in seconds - the delta between
36 and 43 from 60 (actual runtime) is significant isnt it?
Thanks!
Roshan
________________________________________
From: Lech Nieroda <lech.nier...@uni-koeln.de>
Sent: 26 November 2014 15:36
To: slurm-dev
Subject: [slurm-dev] Re: [ sshare ] RAW Usage
Hello Mathew,
just to check the basics - did you wait a few minutes before executing
sshare?
As far as I remember the RawUsage value is updated every 5 minutes
(per default), so these erroneous values might be caused by a
measurement that was taken while the jobs were still running.
Regards,
Lech
Am 26.11.2014 um 12:14 schrieb Roshan Mathew <r.t.mat...@bath.ac.uk>:
> My fairshare test scenario - As it stand the farishare is not
distributed correctly
>
> *Accounts*
>
> [root@slurm-login slurm-scripts]# sacctmgr list accounts
> Account Descr Org
> ---------- -------------------- --------------------
> premium1 primary account root
> premium2 primary account root
> root default root account root
>
> *Users*
>
> [root@slurm-login slurm-scripts]# sacctmgr list users
> User Def Acct Admin
> ---------- ---------- ---------
> mm339 premium1 None
> sy223 premium2 None
>
> *Initial Shares*
>
> [root@slurm-login slurm-scripts]# sshare
> Account User Raw Shares Norm Shares Raw Usage
Effectv Usage FairShare
> -------------------- ---------- ---------- ----------- -----------
------------- ----------
> root 1.000000 0 1.000000 0.500000
> premium1 50 0.500000 0
0.000000 1.000000
> premium2 50 0.500000 0
0.000000 1.000000
>
>
> *Job script*
>
> [root@slurm-login slurm-scripts]# cat stress.slurm
> #!/bin/bash
>
> #SBATCH --nodes=1
> #SBATCH --ntasks=1
> #SBATCH --job-name=stress
> #SBATCH --time=10
> #SBATCH --output=stress.%j-out
> #SBATCH --error=stress.%j-out
>
> time /opt/shared/apps/stress/app/bin/stress --cpu 1 --timeout 1m
>
>
> *Job Submission*
>
> [root@slurm-login slurm-scripts]# runuser sy223 -c 'sbatch stress.slurm'
> Submitted batch job 2
> [root@slurm-login slurm-scripts]# runuser mm339 -c 'sbatch stress.slurm'
> Submitted batch job 3
>
>
> *SACCT Information*
>
> [root@slurm-login slurm-scripts]# sacct --format
Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw
> JobID JobName Partition Account AllocCPUS State
ExitCode CPUTimeRAW
> ------------ ---------- ---------- ---------- ---------- ----------
-------- ----------
> 2 stress batch premium2 1
COMPLETED 0:0 60
> 2.batch batch premium2 1
COMPLETED 0:0 60
> 3 stress batch premium1 1
COMPLETED 0:0 60
> 3.batch batch premium1 1
COMPLETED 0:0 60
>
>
> *SSHARE Information*
>
> [root@slurm-login slurm-scripts]# sshare
> Account User Raw Shares Norm Shares Raw Usage
Effectv Usage FairShare
> -------------------- ---------- ---------- ----------- -----------
------------- ----------
> root 1.000000 79 1.000000 0.500000
> premium1 50 0.500000 36
0.455696 0.531672
> premium2 50 0.500000 43
0.544304 0.470215
>
>
> *Slurm.conf - priority/multifactor*
>
> # Activate the Multi-factor Job Priority Plugin with decay
> PriorityType=priority/multifactor
>
> # apply no decay
> PriorityDecayHalfLife=0
> PriorityCalcPeriod=1
> # reset usage after 1 month
> PriorityUsageResetPeriod=MONTHLY
>
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
>
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
>
> # This next group determines the weighting of each of the
> # components of the Multi-factor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=0
> PriorityWeightFairshare=100
> PriorityWeightJobSize=0
> PriorityWeightPartition=0
> PriorityWeightQOS=0 # don't use the qos factor
>
>
> *Questions*
>
> 1. Given that I have set the PriorityDecayHalfLife=0, i.e no decay
applied at any stage, shouldnt both the jobs have the same RAW Usage
reported by SSHARE?
>
> 2. Also shouldnt CPUTimeRAW in sacct be same as RAW Usage in sshare?
>
>
> From: Skouson, Gary B <gary.skou...@pnnl.gov>
> Sent: 25 November 2014 21:09
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>
> I believe that the info share data is kept by slurmctld in memory.
As far as I could tell from the code, it should be checkpointing the
info to the assoc_usage file wherever slurm is saving state
information. I couldn’t find any docs on that, you’d have to check
the code for more information.
>
> However, if you just want to see what was used, you can get the raw
usage using sacct. For example, for a given job, you can do something
like:
>
> sacct -X -a -j 1182128 --format
Jobid,jobname,partition,account,alloccpus,state,exitcode,cputimeraw
>
> -----
> Gary Skouson
>
>
> From: Roshan Mathew [mailto:r.t.mat...@bath.ac.uk]
> Sent: Tuesday, November 25, 2014 9:51 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>
> Thanks Ryan,
>
> Is this value stored anywhere in the SLURM accounting DB? I could
not find any value for the JOB that corresponds to this RAW usage.
>
> Roshan
> From: Ryan Cox <ryan_...@byu.edu>
> Sent: 25 November 2014 17:43
> To: slurm-dev
> Subject: [slurm-dev] Re: [ sshare ] RAW Usage
>
> Raw usage is a long double and the time added by jobs can be off by
a few seconds. You can take a look at _apply_new_usage() in
src/plugins/priority/multifactor/priority_multifactor.c to see exactly
what happens.
>
> Ryan
>
> On 11/25/2014 10:34 AM, Roshan Mathew wrote:
> Hello SLURM users,
>
> http://slurm.schedmd.com/sshare.html
> Raw Usage
> The number of cpu-seconds of all the jobs that charged the account
by the user. This number will decay over time when
PriorityDecayHalfLife is defined.
> I am getting different RAW Usage values for the same job every time
it is executed. The Job am using is a CPU stress test for 1 minute.
>
> It would be very useful to understand the formula for how this RAW
Usage is calculated when we are using the plugin
PriorityType=priority/multifactor.
>
> Snip of my slurm.conf file:-
>
> # Activate the Multi-factor Job Priority Plugin with decay
> PriorityType=priority/multifactor
>
> # apply no decay
> PriorityDecayHalfLife=0
>
> PriorityCalcPeriod=1
> PriorityUsageResetPeriod=MONTHLY
>
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
>
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
>
> # This next group determines the weighting of each of the
> # components of the Multi-factor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=0
> PriorityWeightFairshare=100
> PriorityWeightJobSize=0
> PriorityWeightPartition=0
> PriorityWeightQOS=0 # don't use the qos factor
>
> Thanks!
>
> <image001.jpg>
> <image001.jpg>
>
>