Hey Samuel, Can't you just adjust the existing "cpu" limit numbers using those same multipliers? Someone bought 100 CPUs 5 years ago, now that's ~70 CPUs.
Or vice versa, someone buys 100 CPUs today, they get a setting of 130 CPUs because the CPUs are normalized to the old performance. Since it would probably look bad politically to reduce someone's number, but giving a new customer a larger number should be fine. Regards, Alex On Wed, Jun 19, 2019 at 12:32 PM Fulcomer, Samuel <samuel_fulco...@brown.edu> wrote: > > (...and yes, the name is inspired by a certain OEM's software licensing > schemes...) > > At Brown we run a ~400 node cluster containing nodes of multiple > architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased in > some cases by University funds and in others by investigator funding > (~50:50). They all appear in the default SLURM partition. We have 3 > classes of SLURM users: > > > 1. Exploratory - no-charge access to up to 16 cores > 2. Priority - $750/quarter for access to up to 192 cores (and with a > GrpTRESRunMins=cpu limit). Each user has their own QoS > 3. Condo - an investigator group who paid for nodes added to the > cluster. The group has its own QoS and SLURM Account. The QoS allows use of > the number of cores purchased and has a much higher priority than the QoS' > of the "priority" users. > > The first problem with this scheme is that condo users who have purchased > the older hardware now have access to the newest without penalty. In > addition, we're encountering resistance to the idea of turning off their > hardware and terminating their condos (despite MOUs stating a 5yr life). > The pushback is the stated belief that the hardware should run until it > dies. > > What I propose is a new TRES called a Processor Performance Unit (PPU) > that would be specified on the Node line in slurm.conf, and used such that > GrpTRES=ppu=N was calculated as the number of allocated cores multiplied by > their associated PPU numbers. > > We could then assign a base PPU to the oldest hardware, say, "1" for > Sandy/Ivy and increase for later architectures based on performance > improvement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., where N is > the number of cores of the oldest architecture multiplied by the configured > PPU/core, X, and repeat for any newer nodes/cores the investigator has > purchased since. > > The result is that the investigator group gets to run on an approximation > of the performance that they've purchased, rather on the raw purchased core > count. > > Thoughts? > > >