On 10/24/22 09:57, Diego Zuccato wrote:
Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto:

 > It is definitely a BAD idea to store Slurm StateSaveLocation on a slow
 > NFS directory!  SchedMD recommends to use local NVME or SSD disks
 > because there will be many IOPS to this file system!

IIUC it does have to be shared between controllers, right?

Possibly use NVME-backed (or even better NVDIMM-backed) NFS share. Or replica-3 Gluster volume with NVDIMMs for the bricks, for the paranoid  :)

IOPS is the key parameter! Local NVME or SSD should beat any networked storage. The original question refers to having StateSaveLocation on a standard (slow) NFS drive, AFAICT.

I don't know how many people prefer using 2 slurmctld hosts (primary and backup)? I certainly don't do that. Slurm does have a configurable SlurmctldTimeout parameter so that you can reboot the server quickly when needed.

It would be nice if people with experience in HA storage for slurmctld could comment.

/Ole

Reply via email to