Hi Guillaume,

The performance of the slurmctld server depends strongly on the server hardware on which it is running! This should be taken into account when considering your question.

SchedMD recommends that the slurmctld server should have only a few, but very fast CPU cores, in order to ensure the best responsiveness.

The file system for /var/spool/slurmctld/ should be mounted on the fastest possible disks (SSD or NVMe if possible).

You should also read the Large Cluster Administration Guide at https://slurm.schedmd.com/big_sys.html

Furthermore, it may perhaps be a good idea to have the MySQL database server installed on a separate server so that it doesn't slow down the slurmctld.

Best regards,
Ole

On 8/27/19 9:45 AM, Guillaume Perrault Archambault wrote:
Hi Paul,

Thanks a lot for your suggestion.

The cluster I'm using has thousands of users, so I'm doubtful the admins will change this setting just for me. But I'll mention it to the support team I'm working with.

I was hoping more for something that can be done on the user end.

Is there some way for the user to measure whether the scheduler is in RPC saturation? And then if it is, I could make sure my script doesn't launch too many jobs in parallel.

Sorry if my question is too vague, I don't understand the backend of the SLURM scheduler too well, so my questions are using the limited terminology of a user.

My concern is just to make sure that my scripts don't send out more commands (simultaneously) than the scheduler can handle.

For example, as an extreme scenario, suppose a user forks off 1000 sbatch commands in parallel, is that more than the scheduler can handle? As a user, how can I know whether it is?

Reply via email to