Hello, everyone.

I'm also fairly new to slurm, still in a conceptual rather than a test or productive phase. Currently I am still trying to find out where to create which files and directories, on the host or in a network directory.
I'm a little confused about the description in the manpage of slurm. conf.
For example, the JobCheckpointDir should be accessible from both the primary and backup controller. Now it is clear (at least I believe) that this has to be done in the NCCR, for example. If the primary controller goes down, the backup controller must be able to access it. On the other hand, SlurmctldPidFile should also be available on both the primary and backup controller. Since there is usually in /var/run, I assume that this should be a local path. It should also be unique on every controller. The manpage is not quite clear in its description. What about the SlurmctldLogFile, for example? Theoretically, both could write to the same file.

If anyone has an advice or would like to tell me how it was solved on your site, I would be very happy.


best
Marcus

--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Reply via email to