...and there's not actually one account in your setup, is there? There should at least be a "root" and a "mic" account, I think.
I don't recall whether you'd sent the output of "sshare | head -15"... On Sat, Aug 10, 2024 at 2:30 PM Fulcomer, Samuel <samuel_fulco...@brown.edu> wrote: > We use the following relevant settings... > > PriorityType=priority/multifactor > PriorityDecayHalfLife=7-0 > PriorityCalcPeriod=00:02:00 > PriorityMaxAge=3-0 > PriorityWeightAge=0 > PriorityWeightFairshare=2000000 > PriorityWeightJobSize=1 > PriorityWeightPartition=200 > PriorityWeightQOS=1000000 > PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000 > > ...however, that doesn't provide any information about the account > organization. and RawShares assignments to the accounts (and fairshare in > the gpu partitions is yet another rathole....). > > We do use the Tree feature, which is required in our environment. It's > what enables (I think) the proper division of share among accounts in > subaccounts. It's been years since I've looked at this, so YMMV... > > We have "condo" accounts, and a non-condo account called "default". When > investigator or group buys equipment, we create a SLURM account and QoS for > it. We actually set the Tres limits on the QoS to 1.25X the number of cores > and GB memory of the purchase, but assign a RawShares value based on the > actually number of cores purchase divided by the total cores in the > cluster, ,multiplied by 1000 (to give a meaningful integer for RawShares - > maybe we should bump that to 10000). The condo QoS Priorities are set to > 10000. > > The "default" account is assigned a RawShares value base on the number of > cores purchased by the university, and provides access to exploratory (no > charge) and premium (more cores, higher Priority, but still a lot less than > 10000). The default account is oversubscribed when comparing QoS Tres > limits to RawShares, but that's OK, or... that's just the way it is. We > want the condo accounts to have the most benefit from the FairShare > mechanism. > > So.... > > The root account has children. The root account does not have a RawShares > assignment. > > The default account is one child with the root account as parent. > > The primary condo accounts are children of the root account. They have the > RawShares set based on purchased cores. > > Some of the primary condo accounts where the equipment was purchased by > multiple investigator groups have child condo accounts and QoS', but > without their own RawShares assignments. > > With the FairTree mechanism, this gives us... > > FairShare between condos (and the default account)... > > FairShare within sub-account condos, as part of the parent condo... > > FairShare within the leaf condo among users. > > One of us obviously needs to diagram this... > > regards, > s > > > > > On Sat, Aug 10, 2024 at 10:05 AM Drucker, Daniel < > ddruc...@mclean.harvard.edu> wrote: > >> And now, a few hours later - with no changes made - everyone has the same >> fairshare? >> >> $ sshare -l -a >> Account User RawShares NormShares RawUsage >> NormUsage EffectvUsage FairShare GrpTRESMins >> TRESRunMins >> -------------------- ---------- ---------- ----------- ----------- >> ----------- ------------- ---------- ------------------------------ >> ------------------------------ >> root 0.000000 63235972 >> 0.000000 1.000000 >> cpu=188835,mem=1546941371,ene+ >> root root 1 0.008264 0 >> 0.000000 0.000000 1.000000 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic 120 0.991736 63235972 >> 1.000000 1.000000 0.497120 >> cpu=188835,mem=1546941371,ene+ >> mic aamedina parent 0.991736 2351906 >> 0.037193 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic aaruldass parent 0.991736 0 >> 0.000000 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic acataldo parent 0.991736 14637614 >> 0.231476 1.000000 0.497120 >> cpu=188031,mem=1540350361,ene+ >> mic achowdhury parent 0.991736 0 >> 0.000000 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic ajajoo parent 0.991736 2053441 >> 0.032473 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic ajanes parent 0.991736 0 >> 0.000000 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic amandacao parent 0.991736 200 >> 0.000003 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic aromer parent 0.991736 0 >> 0.000000 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic aweerasek+ parent 0.991736 1048 >> 0.000017 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic batwood parent 0.991736 0 >> 0.000000 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic bleng parent 0.991736 3 >> 0.000000 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic cdemirlek parent 0.991736 6110 >> 0.000097 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> mic chun parent 0.991736 0 >> 0.000000 1.000000 0.497120 >> cpu=0,mem=0,energy=0,node=0,b+ >> >> >> I am so confused. >> >> >> >> On Aug 10, 2024, at 8:11 AM, Drucker, Daniel <ddruc...@mclean.harvard.edu> >> wrote: >> >> Hmm, no. That solved the problem of everyone having the same FairShare, >> but even after restarting slurmd and doing reconfigure, if I submit a job >> as someone with a huge usage and someone with zero usage, they both end up >> with the same Priority. >> >> >> >> On Aug 10, 2024, at 8:05 AM, Daniel M. Drucker < >> ddruc...@mclean.harvard.edu> wrote: >> >> I just set >> PriorityFlags=NO_FAIR_TREE >> and this seems to have solved the problem! >> >> >> >> >> On Aug 10, 2024, at 7:45 AM, Drucker, Daniel <ddruc...@mclean.harvard.edu> >> wrote: >> >> According to https://docs.rc.fas.harvard.edu/kb/fairshare/ and >> https://slurm.schedmd.com/SUG14/fair_tree.pdf : >> >> >> "The Fairshare score is calculated using the following formula.f = >> 2^(-EffectvUsage/NormShares)" >> >> This is clearly not happening on my system: >> >> Account User RawShares NormShares RawUsage >> NormUsage EffectvUsage FairShare LevelFS >> GrpTRESMins TRESRunMins >> -------------------- ---------- ---------- ----------- ----------- >> ----------- ------------- ---------- ---------- >> ------------------------------ ------------------------------ >> ... >> mic acataldo parent 0.991736 13066208 >> 0.210193 0.210193 0.983871 >> cpu=169648,mem=1389757781,ene+ >> mic achowdhury parent 0.991736 0 >> 0.000000 0.000000 0.983871 >> cpu=0,mem=0,energy=0,node=0,b+ >> ... >> >> >> Every user has 0.991736 NormShares. >> Acataldo has EffectvUsage = 0.210193 >> Achowdhury has EffectvUsage = 0 >> >> But both users have the same FairShare. The correct values according to >> the above formula would be 0.863 and 1.0 respectively. >> >> So what's going on? >> >> >> >> On Aug 10, 2024, at 7:36 AM, Daniel M. Drucker < >> ddruc...@mclean.harvard.edu> wrote: >> >> Here is what is confusing me I guess. Look at the below. You can see that >> some people have no usage and some people have a lot of usage. But their >> FairShare value is all identical. >> >> >> https://lists.schedmd.com/mailman3/hyperkitty/list/slurm-users@lists.schedmd.com/thread/I53OEJSNBT2BMXYVFEFHQQKKAHIUYA53/ >> seems to say that fairshare=parent should work just fine, but what I am >> seeing is that it is NOT altering people's FairShare? >> >> >> >> >> >> >> >> >> The information in this e-mail is intended only for the person to whom it >> is addressed. If you believe this e-mail was sent to you in error and the >> e-mail contains patient information, please contact the Mass General >> Brigham Compliance HelpLine at >> https://www.massgeneralbrigham.org/complianceline . >> >> Please note that this e-mail is not secure (encrypted). If you do not >> wish to continue communication over unencrypted e-mail, please notify the >> sender of this message immediately. Continuing to send or respond to >> e-mail after receiving this message means you understand and accept this >> risk and wish to continue to communicate over unencrypted e-mail. >> >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com