We're in the process of installing some racks with Lenovo SD665 V3 [1]
water-cooled servers. A Lenovo DW612S chassis contains 6 1U trays with 2
SD665 V3 servers mounted side-by-side in each tray.
Lenovo delivers SD665 V3 servers including water-cooled NVIDIA InfiniBand
"SharedIO" adapters [2]
Dear slurm-user list,
I have a cloud node that is powered up and down on demand. Rarely it can
happen that slurm's resumeTimeout is reached and the node is therefore
powered down. We have set ReturnToService=2 in order to avoid the node
being marked down, because the instance behind that node is
Hello!
In our current cluster the workflows are quite diverse (bunch of large,
long (24-72h) jobs; medium size <4h job; and many small 1 node jobs). The
current priority is fair share only (averaged on a ~few months timescale).
For the new setup we would like to
(1) discourage the 1 node jobs [espe
Thats a Very interesting design and looking at the SD665 V3 documentation
am I correct each node has dual 25GBs SFP28 interfaces?
If so, the despite dual nodes in a 1u configuration, you actually have 2
separate servers?
Sid
On Fri, 23 Feb 2024, 22:40 Ole Holm Nielsen via slurm-users, <
slurm-u
We switched over from using systemctl for tmp.mount and change to zram,
e.g.,
modprobe zram
echo 20GB > /sys/block/zram0/disksize
mkfs.xfs /dev/zram0
mount -o discard /dev/zram0 /tmp
srun with --x11 was working before changing this. We're on RHEL 9.
slurmctld logs show this whenever --x11 is used
Hi Robert,
On 2/23/24 17:38, Robert Kudyba via slurm-users wrote:
We switched over from using systemctl for tmp.mount and change to zram,
e.g.,
modprobe zram
echo 20GB > /sys/block/zram0/disksize
mkfs.xfs /dev/zram0
mount -o discard /dev/zram0 /tmp
[...]
> [2024-02-23T20:26:15.881] [530.exter