We use the following relevant settings... PriorityType=priority/multifactor PriorityDecayHalfLife=7-0 PriorityCalcPeriod=00:02:00 PriorityMaxAge=3-0 PriorityWeightAge=0 PriorityWeightFairshare=2000000 PriorityWeightJobSize=1 PriorityWeightPartition=200 PriorityWeightQOS=1000000 PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
...however, that doesn't provide any information about the account organization. and RawShares assignments to the accounts (and fairshare in the gpu partitions is yet another rathole....). We do use the Tree feature, which is required in our environment. It's what enables (I think) the proper division of share among accounts in subaccounts. It's been years since I've looked at this, so YMMV... We have "condo" accounts, and a non-condo account called "default". When investigator or group buys equipment, we create a SLURM account and QoS for it. We actually set the Tres limits on the QoS to 1.25X the number of cores and GB memory of the purchase, but assign a RawShares value based on the actually number of cores purchase divided by the total cores in the cluster, ,multiplied by 1000 (to give a meaningful integer for RawShares - maybe we should bump that to 10000). The condo QoS Priorities are set to 10000. The "default" account is assigned a RawShares value base on the number of cores purchased by the university, and provides access to exploratory (no charge) and premium (more cores, higher Priority, but still a lot less than 10000). The default account is oversubscribed when comparing QoS Tres limits to RawShares, but that's OK, or... that's just the way it is. We want the condo accounts to have the most benefit from the FairShare mechanism. So.... The root account has children. The root account does not have a RawShares assignment. The default account is one child with the root account as parent. The primary condo accounts are children of the root account. They have the RawShares set based on purchased cores. Some of the primary condo accounts where the equipment was purchased by multiple investigator groups have child condo accounts and QoS', but without their own RawShares assignments. With the FairTree mechanism, this gives us... FairShare between condos (and the default account)... FairShare within sub-account condos, as part of the parent condo... FairShare within the leaf condo among users. One of us obviously needs to diagram this... regards, s On Sat, Aug 10, 2024 at 10:05 AM Drucker, Daniel < ddruc...@mclean.harvard.edu> wrote: > And now, a few hours later - with no changes made - everyone has the same > fairshare? > > $ sshare -l -a > Account User RawShares NormShares RawUsage > NormUsage EffectvUsage FairShare GrpTRESMins > TRESRunMins > -------------------- ---------- ---------- ----------- ----------- > ----------- ------------- ---------- ------------------------------ > ------------------------------ > root 0.000000 63235972 > 0.000000 1.000000 > cpu=188835,mem=1546941371,ene+ > root root 1 0.008264 0 > 0.000000 0.000000 1.000000 > cpu=0,mem=0,energy=0,node=0,b+ > mic 120 0.991736 63235972 > 1.000000 1.000000 0.497120 > cpu=188835,mem=1546941371,ene+ > mic aamedina parent 0.991736 2351906 > 0.037193 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic aaruldass parent 0.991736 0 > 0.000000 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic acataldo parent 0.991736 14637614 > 0.231476 1.000000 0.497120 > cpu=188031,mem=1540350361,ene+ > mic achowdhury parent 0.991736 0 > 0.000000 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic ajajoo parent 0.991736 2053441 > 0.032473 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic ajanes parent 0.991736 0 > 0.000000 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic amandacao parent 0.991736 200 > 0.000003 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic aromer parent 0.991736 0 > 0.000000 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic aweerasek+ parent 0.991736 1048 > 0.000017 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic batwood parent 0.991736 0 > 0.000000 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic bleng parent 0.991736 3 > 0.000000 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic cdemirlek parent 0.991736 6110 > 0.000097 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > mic chun parent 0.991736 0 > 0.000000 1.000000 0.497120 > cpu=0,mem=0,energy=0,node=0,b+ > > > I am so confused. > > > > On Aug 10, 2024, at 8:11 AM, Drucker, Daniel <ddruc...@mclean.harvard.edu> > wrote: > > Hmm, no. That solved the problem of everyone having the same FairShare, > but even after restarting slurmd and doing reconfigure, if I submit a job > as someone with a huge usage and someone with zero usage, they both end up > with the same Priority. > > > > On Aug 10, 2024, at 8:05 AM, Daniel M. Drucker < > ddruc...@mclean.harvard.edu> wrote: > > I just set > PriorityFlags=NO_FAIR_TREE > and this seems to have solved the problem! > > > > > On Aug 10, 2024, at 7:45 AM, Drucker, Daniel <ddruc...@mclean.harvard.edu> > wrote: > > According to https://docs.rc.fas.harvard.edu/kb/fairshare/ and > https://slurm.schedmd.com/SUG14/fair_tree.pdf : > > > "The Fairshare score is calculated using the following formula.f = > 2^(-EffectvUsage/NormShares)" > > This is clearly not happening on my system: > > Account User RawShares NormShares RawUsage > NormUsage EffectvUsage FairShare LevelFS > GrpTRESMins TRESRunMins > -------------------- ---------- ---------- ----------- ----------- > ----------- ------------- ---------- ---------- > ------------------------------ ------------------------------ > ... > mic acataldo parent 0.991736 13066208 > 0.210193 0.210193 0.983871 > cpu=169648,mem=1389757781,ene+ > mic achowdhury parent 0.991736 0 > 0.000000 0.000000 0.983871 > cpu=0,mem=0,energy=0,node=0,b+ > ... > > > Every user has 0.991736 NormShares. > Acataldo has EffectvUsage = 0.210193 > Achowdhury has EffectvUsage = 0 > > But both users have the same FairShare. The correct values according to > the above formula would be 0.863 and 1.0 respectively. > > So what's going on? > > > > On Aug 10, 2024, at 7:36 AM, Daniel M. Drucker < > ddruc...@mclean.harvard.edu> wrote: > > Here is what is confusing me I guess. Look at the below. You can see that > some people have no usage and some people have a lot of usage. But their > FairShare value is all identical. > > > https://lists.schedmd.com/mailman3/hyperkitty/list/slurm-users@lists.schedmd.com/thread/I53OEJSNBT2BMXYVFEFHQQKKAHIUYA53/ > seems to say that fairshare=parent should work just fine, but what I am > seeing is that it is NOT altering people's FairShare? > > > > > > > > > The information in this e-mail is intended only for the person to whom it > is addressed. If you believe this e-mail was sent to you in error and the > e-mail contains patient information, please contact the Mass General > Brigham Compliance HelpLine at > https://www.massgeneralbrigham.org/complianceline . > > Please note that this e-mail is not secure (encrypted). If you do not > wish to continue communication over unencrypted e-mail, please notify the > sender of this message immediately. Continuing to send or respond to > e-mail after receiving this message means you understand and accept this > risk and wish to continue to communicate over unencrypted e-mail. >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com