Thanks Ole for giving so much thought into my question. I'll pass a long these suggestions. Unfortunately as a user there's not a whole lot I can do about the choice of hardware.
Thanks for the link to the guide, I'll have a look at it. Even as a user it's helpful to be well informed on the admin side :) Regards, Guillaume. On Tue, Aug 27, 2019 at 4:26 AM Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > Hi Guillaume, > > The performance of the slurmctld server depends strongly on the server > hardware on which it is running! This should be taken into account when > considering your question. > > SchedMD recommends that the slurmctld server should have only a few, but > very fast CPU cores, in order to ensure the best responsiveness. > > The file system for /var/spool/slurmctld/ should be mounted on the > fastest possible disks (SSD or NVMe if possible). > > You should also read the Large Cluster Administration Guide at > https://slurm.schedmd.com/big_sys.html > > Furthermore, it may perhaps be a good idea to have the MySQL database > server installed on a separate server so that it doesn't slow down the > slurmctld. > > Best regards, > Ole > > On 8/27/19 9:45 AM, Guillaume Perrault Archambault wrote: > > Hi Paul, > > > > Thanks a lot for your suggestion. > > > > The cluster I'm using has thousands of users, so I'm doubtful the admins > > will change this setting just for me. But I'll mention it to the support > > team I'm working with. > > > > I was hoping more for something that can be done on the user end. > > > > Is there some way for the user to measure whether the scheduler is in > > RPC saturation? And then if it is, I could make sure my script doesn't > > launch too many jobs in parallel. > > > > Sorry if my question is too vague, I don't understand the backend of the > > SLURM scheduler too well, so my questions are using the limited > > terminology of a user. > > > > My concern is just to make sure that my scripts don't send out more > > commands (simultaneously) than the scheduler can handle. > > > > For example, as an extreme scenario, suppose a user forks off 1000 > > sbatch commands in parallel, is that more than the scheduler can handle? > > As a user, how can I know whether it is? > >