Re: [slurm-users] "Low RealMem" after upgrade

2021-10-05 Thread Diego Zuccato
Il 05/10/2021 09:22, Ole Holm Nielsen ha scritto: What is a "frontend"?  Do you mean the slurmctld server? Yes, sorry. "Frontend" is how we call the node(s) used by users to submit jobs, where slurmctld and slurmdbd run. We'll probably move slurmdbd and slurmctld to a dedicated VM in a future

Re: [slurm-users] "Low RealMem" after upgrade

2021-10-05 Thread Ole Holm Nielsen
On 10/5/21 8:05 AM, Diego Zuccato wrote: I already tried multiple times, both RESUME and IDLE, and it didn't work: it just returned to "IDLE+DRAIN" with 'Reason="low realmem"'. :( I just tried again (after an unplanned shutdown of the frontend) and it What is a "frontend"? Do you mean the slu

Re: [slurm-users] "Low RealMem" after upgrade

2021-10-04 Thread Diego Zuccato
Hi. I already tried multiple times, both RESUME and IDLE, and it didn't work: it just returned to "IDLE+DRAIN" with 'Reason="low realmem"'. :( I just tried again (after an unplanned shutdown of the frontend) and it worked with IDLE (RESUME gives "Invalid node state specified"). SLURM 20.11.4.

Re: [slurm-users] "Low RealMem" after upgrade

2021-10-01 Thread Paul Brunk
ting Resource Center Enterprise IT Svcs, the University of Georgia -Original Message- From: slurm-users On Behalf Of Diego Zuccato Sent: Friday, October 1, 2021 04:23 To: Slurm User Community List Subject: [slurm-users] "Low RealMem" after upgrade [EXTERNAL SENDER - PROCEED CA

Re: [slurm-users] "Low RealMem" after upgrade

2021-10-01 Thread Brian Andrus
Not unusual. You should set your amount of memory a bit below what slurmd reports. Different kernel modules that get upgraded may use a little more memory, causing just this situation. There are other causes as well, but by providing the kernel/system some wiggle room, you prevent any issues.

[slurm-users] "Low RealMem" after upgrade

2021-10-01 Thread Diego Zuccato
Hello all. I just upgraded to Debian 11 that brings Slurm 21.08 and the newer nodes upgraded w/o too many issues (just minor config changes, one being RealMemory value in slurm.conf, since for some reason it seems the new slurmd detects about 12MB less memory than before). But the older node