...and there's not actually one account in your setup, is there? There
should at least be a "root" and a "mic" account, I think.

I don't recall whether you'd sent the output of "sshare | head -15"...

On Sat, Aug 10, 2024 at 2:30 PM Fulcomer, Samuel <samuel_fulco...@brown.edu>
wrote:

> We use the following relevant settings...
>
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityCalcPeriod=00:02:00
> PriorityMaxAge=3-0
> PriorityWeightAge=0
> PriorityWeightFairshare=2000000
> PriorityWeightJobSize=1
> PriorityWeightPartition=200
> PriorityWeightQOS=1000000
> PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
>
> ...however, that doesn't provide any information about the account
> organization. and RawShares assignments to the accounts (and fairshare in
> the gpu partitions is yet another rathole....).
>
> We do use the Tree feature, which is required in our environment. It's
> what enables (I think) the proper division of share among accounts in
> subaccounts. It's been years since I've looked at this, so YMMV...
>
> We have "condo" accounts, and a non-condo account called "default". When
> investigator or group buys equipment, we create a SLURM account and QoS for
> it. We actually set the Tres limits on the QoS to 1.25X the number of cores
> and GB memory of the purchase, but assign a RawShares value based on the
> actually number of cores purchase divided by the total cores in the
> cluster, ,multiplied by 1000 (to give a meaningful integer for RawShares -
> maybe we should bump that to 10000). The condo QoS Priorities are set to
> 10000.
>
> The "default" account is assigned a RawShares value base on the number of
> cores purchased by the university, and provides access to exploratory (no
> charge) and premium (more cores, higher Priority, but still a lot less than
> 10000). The default account is oversubscribed when comparing QoS Tres
> limits to RawShares, but that's OK, or... that's just the way it is. We
> want the condo accounts to have the most benefit from the FairShare
> mechanism.
>
> So....
>
> The root account has children. The root account does not have a RawShares
> assignment.
>
> The default account is one child with the root account as parent.
>
> The primary condo accounts are children of the root account. They have the
> RawShares set based on purchased cores.
>
> Some of the primary condo accounts where the equipment was purchased by
> multiple investigator groups have child condo accounts and QoS', but
> without their own RawShares assignments.
>
> With the FairTree mechanism, this gives us...
>
> FairShare between condos (and the default account)...
>
> FairShare within sub-account condos, as part of the parent condo...
>
> FairShare within the leaf condo among users.
>
> One of us obviously needs to diagram this...
>
> regards,
> s
>
>
>
>
> On Sat, Aug 10, 2024 at 10:05 AM Drucker, Daniel <
> ddruc...@mclean.harvard.edu> wrote:
>
>> And now, a few hours later - with no changes made - everyone has the same
>> fairshare?
>>
>> $ sshare -l -a
>> Account                    User  RawShares  NormShares    RawUsage
>> NormUsage  EffectvUsage  FairShare                    GrpTRESMins
>>          TRESRunMins
>> -------------------- ---------- ---------- ----------- -----------
>> ----------- ------------- ---------- ------------------------------
>> ------------------------------
>> root                                          0.000000    63235972
>>            0.000000   1.000000
>>  cpu=188835,mem=1546941371,ene+
>>  root                      root          1    0.008264           0
>>  0.000000      0.000000   1.000000
>>  cpu=0,mem=0,energy=0,node=0,b+
>>  mic                                   120    0.991736    63235972
>>  1.000000      1.000000   0.497120
>>  cpu=188835,mem=1546941371,ene+
>>   mic                  aamedina     parent    0.991736     2351906
>>  0.037193      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                 aaruldass     parent    0.991736           0
>>  0.000000      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                  acataldo     parent    0.991736    14637614
>>  0.231476      1.000000   0.497120
>>  cpu=188031,mem=1540350361,ene+
>>   mic                achowdhury     parent    0.991736           0
>>  0.000000      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                    ajajoo     parent    0.991736     2053441
>>  0.032473      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                    ajanes     parent    0.991736           0
>>  0.000000      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                 amandacao     parent    0.991736         200
>>  0.000003      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                    aromer     parent    0.991736           0
>>  0.000000      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                aweerasek+     parent    0.991736        1048
>>  0.000017      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                   batwood     parent    0.991736           0
>>  0.000000      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                     bleng     parent    0.991736           3
>>  0.000000      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                 cdemirlek     parent    0.991736        6110
>>  0.000097      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>   mic                      chun     parent    0.991736           0
>>  0.000000      1.000000   0.497120
>>  cpu=0,mem=0,energy=0,node=0,b+
>>
>>
>> I am so confused.
>>
>>
>>
>> On Aug 10, 2024, at 8:11 AM, Drucker, Daniel <ddruc...@mclean.harvard.edu>
>> wrote:
>>
>> Hmm, no. That solved the problem of everyone having the same FairShare,
>> but even after restarting slurmd and doing reconfigure, if I submit a job
>> as someone with a huge usage and someone with zero usage, they both end up
>> with the same Priority.
>>
>>
>>
>> On Aug 10, 2024, at 8:05 AM, Daniel M. Drucker <
>> ddruc...@mclean.harvard.edu> wrote:
>>
>> I just set
>> PriorityFlags=NO_FAIR_TREE
>> and this seems to have solved the problem!
>>
>>
>>
>>
>> On Aug 10, 2024, at 7:45 AM, Drucker, Daniel <ddruc...@mclean.harvard.edu>
>> wrote:
>>
>> According to https://docs.rc.fas.harvard.edu/kb/fairshare/  and
>> https://slurm.schedmd.com/SUG14/fair_tree.pdf :
>>
>>
>> "The Fairshare score is calculated using the following formula.f =
>> 2^(-EffectvUsage/NormShares)"
>>
>> This is clearly not happening on my system:
>>
>> Account                    User  RawShares  NormShares    RawUsage
>> NormUsage  EffectvUsage  FairShare    LevelFS
>>  GrpTRESMins                    TRESRunMins
>> -------------------- ---------- ---------- ----------- -----------
>> ----------- ------------- ---------- ----------
>> ------------------------------ ------------------------------
>> ...
>>   mic                  acataldo     parent    0.991736    13066208
>>  0.210193      0.210193   0.983871
>>   cpu=169648,mem=1389757781,ene+
>>   mic                achowdhury     parent    0.991736           0
>>  0.000000      0.000000   0.983871
>>   cpu=0,mem=0,energy=0,node=0,b+
>> ...
>>
>>
>> Every user has 0.991736 NormShares.
>> Acataldo has EffectvUsage = 0.210193
>> Achowdhury has EffectvUsage = 0
>>
>> But both users have the same FairShare. The correct values according to
>> the above formula would be 0.863 and 1.0 respectively.
>>
>> So what's going on?
>>
>>
>>
>> On Aug 10, 2024, at 7:36 AM, Daniel M. Drucker <
>> ddruc...@mclean.harvard.edu> wrote:
>>
>> Here is what is confusing me I guess. Look at the below. You can see that
>> some people have no usage and some people have a lot of usage. But their
>> FairShare value is all identical.
>>
>>
>> https://lists.schedmd.com/mailman3/hyperkitty/list/slurm-users@lists.schedmd.com/thread/I53OEJSNBT2BMXYVFEFHQQKKAHIUYA53/
>> seems to say that fairshare=parent should work just fine, but what I am
>> seeing is that it is NOT altering people's FairShare?
>>
>>
>>
>>
>>
>>
>>
>>
>> The information in this e-mail is intended only for the person to whom it
>> is addressed.  If you believe this e-mail was sent to you in error and the
>> e-mail contains patient information, please contact the Mass General
>> Brigham Compliance HelpLine at
>> https://www.massgeneralbrigham.org/complianceline .
>>
>> Please note that this e-mail is not secure (encrypted).  If you do not
>> wish to continue communication over unencrypted e-mail, please notify the
>> sender of this message immediately.  Continuing to send or respond to
>> e-mail after receiving this message means you understand and accept this
>> risk and wish to continue to communicate over unencrypted e-mail.
>>
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to