Hi,

I'm testing the newest version of Slurm and I'm seeing an issue when using the 
newer billing TRES to charge for cpu time on a partition. I've seen that 
billing should be used now instead of cpu in order to properly use the 
"TRESBillingWeights" option on a partition. 

In my test case, I gave an account 2 hours of billing time. I used 1 hour of 
this while setting the partition to TRESBillingWeights="CPU=1.0". It seemed to 
have billed properly.
Next, I set on the same partition TRESBillingWeights="CPU=0.5". I ran several 
jobs, but the billing never seemed to increase. RawUsage, however, did 
increment correctly.

Here's an examples of sshare reporting no billing run minutes, when CPU=0.5 and 
I start a job with a walltime of 1 hour. Even though the RawUsage is well past 
2 hours, a job can still run when it shouldn't.

# sshare -A test -l -o RawUsage,GrpTRESMins,TRESRunMins%60
   RawUsage                    GrpTRESMins                                      
            TRESRunMins 
----------- ------------------------------        
----------------------------------------------------- 
      11068                    billing=120                      
cpu=60,mem=0,energy=0,node=60,billing=0

If I set CPU=1.0 and start say a job for 2 hours, I get this in the logs:
debug2: Job 32 being held, the job is at or exceeds assoc 
239(test/(null)/(null)) group max tres(billing) minutes of 120 of which 60 are 
still available but request is for 120 (plus 0 already in use) tres minutes 
(request tres count 1)

This makes sense because I previously ran a job at the weight of 1.0 for an 
hour so it "billed" for 1 hour at that time. How can I query the "available" 
billing hours if it's not RawUsage?

Going back to setting billing CPU weight to 0.5, the logs seem to be 
inconsistent too. In this first line, it shows the right thing:
debug:  TRES Weight: cpu = 1.000000 * 0.500000 = 0.500000

but not a few lines down:
debug2: acct_policy_job_begin: after adding job 45, assoc 
239(test/(null)/(null)) grp_used_tres_run_secs(billing) is 0

Again, RawUsage increases correctly, but Slurm is using some other field for 
billing to determine if a job can run.

My questions are: How can I set CPU billing to be less than 1 and how can I 
make sure jobs don't run if they are out of time in this case? What is Slurm 
using for billing, because it's clearly not RawUsage? Am I simply 
misunderstanding the billing and/or weights fields?

Thanks for any help...

Reply via email to