Paul, you refer to banking resources. Which leads me to ask are schemes such as Gold used these days in Slurm? Gold was a utility where groups could top up with a virtual amount of money which would be spent as they consume resources. Altair also wrote a similar system for PBS, which they offered to us when I was in Formula 1 - it was quite a good system, and at the time we had a requirement for allocating resources to groups of users.
I guess the sophisticated fairshare mechanisms discussed in this thread make schemes like Gold obsolete. On Thu, 20 Jun 2019 at 15:24, Paul Edmon <ped...@cfa.harvard.edu> wrote: > People will specify which partition they need or if they want multiple > they use this: > > #SBATCH -p general,shared,serial_requeue > > As then the scheduler will just select which partition they will run in > first. Naturally there is a risk that you will end up running in a more > expensive partition. > > Our time limit is only applied to our public partitions, our owned > partitions (of which we have roughly 80) have no time limit. So if they > run on their dedicated resources they have no penalty. We've been working > on getting rid of owned partitions and moving to a school/department based > partition, where all the purchased resources for different PI's go into the > same bucket where they compete against themselves and not the wider > community. We've found that this ends up working pretty well as most PI's > only used their purchased resources sporadically. Thus there are usually > idle cores lying around that we backfill with our serial queues. Since > those are requeueable we can get immediate response to access that idle > space. We are also toying with a high priority partition that is open to > people with high fairshare so that they can get immediate response as those > with high fairshare tend to be bursty users. > > Our current halflife is set to a month and we keep 6 months of data in our > database. I'd actually like to get rid of the halflife and just go to a 3 > month moving window to allow people to bank their fairshare, but we haven't > done that yet as people have been having a hard enough time understanding > our current system. It's not due to its complexity but more that most > people just flat out aren't cognizant of their usage and think the resource > is functionally infinite. > > -Paul Edmon- > On 6/19/19 5:16 PM, Fulcomer, Samuel wrote: > > Hi Paul, > > Thanks..Your setup is interesting. I see that you have your processor > types segregated in their own partitions (with the exception of of the > requeue partition), and that's how you get at the weighting mechanism. Do > you have your users explicitly specify multiple partitions in the batch > commands/scripts in order to take advantage of this, or do you use a plugin > for it? > > It sounds like you don't impose any hard limit on simultaneous resource > use, and allow everything to fairshare out with the help of the 7 day > TimeLimit. We haven't been imposing any TimeLimit on our condo users, which > would be an issue for us with your config. For our exploratory and priority > users, we impose an effective time limit with GrpTRESRunMins=cpu (and > gres/gpu= for the GPU usage). In addition, since we have so many priority > users, we don't explicitly set a rawshare value for them (they all execute > under the "default" account). We set rawshare for the condo accounts as > cores-purchased/total-cores*1000. > > What's your fairshare decay setting (don't remember the proper name at the > moment)? > > Regards, > Sam > > > > On Wed, Jun 19, 2019 at 3:44 PM Paul Edmon <ped...@cfa.harvard.edu> wrote: > >> We do a similar thing here at Harvard: >> >> https://www.rc.fas.harvard.edu/fairshare/ >> >> We simply weight all the partitions based on their core type and then we >> allocate Shares for each account based on what they have purchased. We >> don't use QoS at all, so we just rely purely on fairshare weighting for >> resource usage. It has worked pretty well for our purposes. >> >> -Paul Edmon- >> On 6/19/19 3:30 PM, Fulcomer, Samuel wrote: >> >> >> (...and yes, the name is inspired by a certain OEM's software licensing >> schemes...) >> >> At Brown we run a ~400 node cluster containing nodes of multiple >> architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased in >> some cases by University funds and in others by investigator funding >> (~50:50). They all appear in the default SLURM partition. We have 3 >> classes of SLURM users: >> >> >> 1. Exploratory - no-charge access to up to 16 cores >> 2. Priority - $750/quarter for access to up to 192 cores (and with a >> GrpTRESRunMins=cpu limit). Each user has their own QoS >> 3. Condo - an investigator group who paid for nodes added to the >> cluster. The group has its own QoS and SLURM Account. The QoS allows use >> of >> the number of cores purchased and has a much higher priority than the QoS' >> of the "priority" users. >> >> The first problem with this scheme is that condo users who have purchased >> the older hardware now have access to the newest without penalty. In >> addition, we're encountering resistance to the idea of turning off their >> hardware and terminating their condos (despite MOUs stating a 5yr life). >> The pushback is the stated belief that the hardware should run until it >> dies. >> >> What I propose is a new TRES called a Processor Performance Unit (PPU) >> that would be specified on the Node line in slurm.conf, and used such that >> GrpTRES=ppu=N was calculated as the number of allocated cores multiplied by >> their associated PPU numbers. >> >> We could then assign a base PPU to the oldest hardware, say, "1" for >> Sandy/Ivy and increase for later architectures based on performance >> improvement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., where N is >> the number of cores of the oldest architecture multiplied by the configured >> PPU/core, X, and repeat for any newer nodes/cores the investigator has >> purchased since. >> >> The result is that the investigator group gets to run on an approximation >> of the performance that they've purchased, rather on the raw purchased core >> count. >> >> Thoughts? >> >> >>