<< wrote:
> Hi Robert,
>
> On 2/23/24 17:38, Robert Kudyba via slurm-users wrote:
>
> > We switched over from using systemctl for tmp.mount and change to zram,
> > e.g.,
> > modprobe zram
> > echo 20GB > /sys/block/zram0/disksize
> > mkfs.xfs /dev/zram0
> > mount -o discard /dev/zram0 /tmp
> [...]
On 24/2/24 06:14, Robert Kudyba via slurm-users wrote:
For now I just set it to chmod 777 on /tmp and that fixed the errors. Is
there a better option?
Traditionally /tmp and /var/tmp have been 1777 (that "1" being the
sticky bit, originally invented to indicate that the OS should attempt
to
<< wrote:
> On 24/2/24 06:14, Robert Kudyba via slurm-users wrote:
>
> > For now I just set it to chmod 777 on /tmp and that fixed the errors. Is
> > there a better option?
>
> Traditionally /tmp and /var/tmp have been 1777 (that "1" being the
> sticky bit, originally invented to indicate that the
There are scontrol subcommands uhold/hold/release/requeuehold that are ignored
when describing how to place a job on hold in FAQ 21; and it is never explained
why the method described therein is the best method, it just states it is. Does
anyone know why the FAQ method is better than using the s
Now what would be causing this? The srun just hangs and these are the only
logs from slurmctld:
[2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node
node007
[2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node
node006
[2024-02-24T23:23:26.003] error: Orph