On 10-12-2023 17:29, Ryan Novosielski wrote:
This is basically always somebody filling up /tmp and /tmp residing on the same
filesystem as the actual SlurmdSpoolDirectory.
/tmp, without modifications, it’s almost certainly the wrong place for
temporary HPC files. Too large.
Agreed! That's w
We maintain /tmp as a separate partition to mitigate this exact scenario on all
nodes though it doesn’t necessarily need to be part of the primary system RAID.
No need for tmp resiliency.
Regards,
Peter
Peter Goode
Research Computing Systems Administrator
Lafayette College
> On Dec 10, 2023,
This is basically always somebody filling up /tmp and /tmp residing on the same
filesystem as the actual SlurmdSpoolDirectory.
/tmp, without modifications, it’s almost certainly the wrong place for
temporary HPC files. Too large.
Sent from my iPhone
> On Dec 8, 2023, at 10:02, Xaver Stiensmeie
Hello Brian Andrus,
we ran 'df -h' to determine the amount of free space I mentioned below.
I also should add that at the time we inspected the node, there was
still around 38 GB of space left - however, we were unable to watch the
remaining space while the error occurred so maybe the large file(
Xaver,
It is likely your /var or /var/spool mount.
That may be a separate partition or part of your root partition. It is
the partition that is full, not the directory itself. So the cause could
very well be log files in /var/log. I would check to see what (if any)
partitions are getting fille
Hi Xaver,
On 12/8/23 16:00, Xaver Stiensmeier wrote:
during a larger cluster run (the same I mentioned earlier 242 nodes), I
got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
directory on the workers that is used for job state information
(https://slurm.schedmd.com/slurm.co
Dear slurm-user list,
during a larger cluster run (the same I mentioned earlier 242 nodes), I
got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
directory on the workers that is used for job state information
(https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir). How
Maybe this was a noob question, I've just solved my problem.
I'll share my thoughts. I returned to my original settings
and rerun Ansible's playbook, reconfiguring the SlurmdSpoolDir.
* https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir_1
Maybe it is writable by root, because root can
Dear Slurm Users,
recently, I have started a new instance of my cluster with Slurm 22.05.2
(built from source). Evertyhing seems to be configured properly and
working fine except "sbatch". The error is quite self-explanatory and
I thought it would be quite easy to fix directory permissions.
slur