***AHA**888 I FOUND IT!

FairShare=parent<https://slurm.schedmd.com/classic_fair_share.html#parent>

It is possible to disable the fairshare at certain levels of the fair share 
hierarchy by using the FairShare=parent option of sacctmgr. For users and 
accounts with FairShare=parent the normalized shares and effective usage values 
from the parent in the hierarchy will be used when calculating fairshare 
priories.

If all users in an account are configured with FairShare=parent the result is 
that all the jobs drawing from that account will get the same fairshare 
priority, based on the accounts total usage. No additional fairness is added 
based on a user's individual usage.



On Aug 10, 2024, at 6:21 PM, Daniel M. Drucker <ddruc...@mclean.harvard.edu> 
wrote:

Yes, there is 'root' and 'mic', and everyone is under 'mic.

No, I don't know any Steve.

So what you're saying is I *must* at account-creation time explicitly assign a 
fairshare value?
Would it be sufficient to just say, in my account creation script,

sacctmgr modify user $NEWUSERNAME set fairshare=1

?

I'm still struggling to understand why that is different from fairshare=parent, 
if everyone has the same value.

Daniel




On Aug 10, 2024, at 2:34 PM, Fulcomer, Samuel <samuel_fulco...@brown.edu> wrote:


        External Email - Use Caution

...and there's not actually one account in your setup, is there? There should 
at least be a "root" and a "mic" account, I think.

I don't recall whether you'd sent the output of "sshare | head -15"...

On Sat, Aug 10, 2024 at 2:30 PM Fulcomer, Samuel 
<samuel_fulco...@brown.edu<mailto:samuel_fulco...@brown.edu>> wrote:
We use the following relevant settings...

PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityCalcPeriod=00:02:00
PriorityMaxAge=3-0
PriorityWeightAge=0
PriorityWeightFairshare=2000000
PriorityWeightJobSize=1
PriorityWeightPartition=200
PriorityWeightQOS=1000000
PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000

...however, that doesn't provide any information about the account 
organization. and RawShares assignments to the accounts (and fairshare in the 
gpu partitions is yet another rathole....).

We do use the Tree feature, which is required in our environment. It's what 
enables (I think) the proper division of share among accounts in subaccounts. 
It's been years since I've looked at this, so YMMV...

We have "condo" accounts, and a non-condo account called "default". When 
investigator or group buys equipment, we create a SLURM account and QoS for it. 
We actually set the Tres limits on the QoS to 1.25X the number of cores and GB 
memory of the purchase, but assign a RawShares value based on the actually 
number of cores purchase divided by the total cores in the cluster, ,multiplied 
by 1000 (to give a meaningful integer for RawShares - maybe we should bump that 
to 10000). The condo QoS Priorities are set to 10000.

The "default" account is assigned a RawShares value base on the number of cores 
purchased by the university, and provides access to exploratory (no charge) and 
premium (more cores, higher Priority, but still a lot less than 10000). The 
default account is oversubscribed when comparing QoS Tres limits to RawShares, 
but that's OK, or... that's just the way it is. We want the condo accounts to 
have the most benefit from the FairShare mechanism.

So....

The root account has children. The root account does not have a RawShares 
assignment.

The default account is one child with the root account as parent.

The primary condo accounts are children of the root account. They have the 
RawShares set based on purchased cores.

Some of the primary condo accounts where the equipment was purchased by 
multiple investigator groups have child condo accounts and QoS', but without 
their own RawShares assignments.

With the FairTree mechanism, this gives us...

FairShare between condos (and the default account)...

FairShare within sub-account condos, as part of the parent condo...

FairShare within the leaf condo among users.

One of us obviously needs to diagram this...

regards,
s




On Sat, Aug 10, 2024 at 10:05 AM Drucker, Daniel 
<ddruc...@mclean.harvard.edu<mailto:ddruc...@mclean.harvard.edu>> wrote:
And now, a few hours later - with no changes made - everyone has the same 
fairshare?

$ sshare -l -a
Account                    User  RawShares  NormShares    RawUsage   NormUsage  
EffectvUsage  FairShare                    GrpTRESMins                    
TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- 
------------- ---------- ------------------------------ 
------------------------------
root                                          0.000000    63235972              
    0.000000   1.000000                                
cpu=188835,mem=1546941371,ene+
 root                      root          1    0.008264           0    0.000000  
    0.000000   1.000000                                
cpu=0,mem=0,energy=0,node=0,b+
 mic                                   120    0.991736    63235972    1.000000  
    1.000000   0.497120                                
cpu=188835,mem=1546941371,ene+
  mic                  aamedina     parent    0.991736     2351906    0.037193  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                 aaruldass     parent    0.991736           0    0.000000  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                  acataldo     parent    0.991736    14637614    0.231476  
    1.000000   0.497120                                
cpu=188031,mem=1540350361,ene+
  mic                achowdhury     parent    0.991736           0    0.000000  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                    ajajoo     parent    0.991736     2053441    0.032473  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                    ajanes     parent    0.991736           0    0.000000  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                 amandacao     parent    0.991736         200    0.000003  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                    aromer     parent    0.991736           0    0.000000  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                aweerasek+     parent    0.991736        1048    0.000017  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                   batwood     parent    0.991736           0    0.000000  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                     bleng     parent    0.991736           3    0.000000  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                 cdemirlek     parent    0.991736        6110    0.000097  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+
  mic                      chun     parent    0.991736           0    0.000000  
    1.000000   0.497120                                
cpu=0,mem=0,energy=0,node=0,b+


I am so confused.



On Aug 10, 2024, at 8:11 AM, Drucker, Daniel 
<ddruc...@mclean.harvard.edu<mailto:ddruc...@mclean.harvard.edu>> wrote:

Hmm, no. That solved the problem of everyone having the same FairShare, but 
even after restarting slurmd and doing reconfigure, if I submit a job as 
someone with a huge usage and someone with zero usage, they both end up with 
the same Priority.



On Aug 10, 2024, at 8:05 AM, Daniel M. Drucker 
<ddruc...@mclean.harvard.edu<mailto:ddruc...@mclean.harvard.edu>> wrote:

I just set
PriorityFlags=NO_FAIR_TREE
and this seems to have solved the problem!




On Aug 10, 2024, at 7:45 AM, Drucker, Daniel 
<ddruc...@mclean.harvard.edu<mailto:ddruc...@mclean.harvard.edu>> wrote:

According to https://docs.rc.fas.harvard.edu/kb/fairshare/  and 
https://slurm.schedmd.com/SUG14/fair_tree.pdf :


"The Fairshare score is calculated using the following formula.f = 
2^(-EffectvUsage/NormShares)"

This is clearly not happening on my system:

Account                    User  RawShares  NormShares    RawUsage   NormUsage  
EffectvUsage  FairShare    LevelFS                    GrpTRESMins               
     TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- 
------------- ---------- ---------- ------------------------------ 
------------------------------
...
  mic                  acataldo     parent    0.991736    13066208    0.210193  
    0.210193   0.983871                                           
cpu=169648,mem=1389757781,ene+
  mic                achowdhury     parent    0.991736           0    0.000000  
    0.000000   0.983871                                           
cpu=0,mem=0,energy=0,node=0,b+
...


Every user has 0.991736 NormShares.
Acataldo has EffectvUsage = 0.210193
Achowdhury has EffectvUsage = 0

But both users have the same FairShare. The correct values according to the 
above formula would be 0.863 and 1.0 respectively.

So what's going on?



On Aug 10, 2024, at 7:36 AM, Daniel M. Drucker 
<ddruc...@mclean.harvard.edu<mailto:ddruc...@mclean.harvard.edu>> wrote:

Here is what is confusing me I guess. Look at the below. You can see that some 
people have no usage and some people have a lot of usage. But their FairShare 
value is all identical.

https://lists.schedmd.com/mailman3/hyperkitty/list/slurm-users@lists.schedmd.com/thread/I53OEJSNBT2BMXYVFEFHQQKKAHIUYA53/
  seems to say that fairshare=parent should work just fine, but what I am 
seeing is that it is NOT altering people's FairShare?







The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline .

Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.


The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
<https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail. 
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to