Dear slurm-user list, I am currently investigating ways of evaluation regarding slurms cloud scheduling performance. As we are all aware there are many adjustment screws when it comes to cloud scheduling.
We can change the regular scheduling (prioritizing, ...), powerup and powerdown times. There's probably a lot more. However, my question today is not about improving cloud scheduling performance, but how we collect data like: When were nodes powered up [down]. To what degree were the powered up machines used? Were the "right" instances started for the given jobs or were larger instances started than needed? ... I know that this question is currently very open, but I am still trying to narrow down where I have to look. The final goal is of course to use this evaluation to pick better timeout values and improve cloud scheduling. Best regards, Xaver Stiensmeier