On 19.02.2025 14:06, Luke Sudbery via slurm-users wrote:
How much RAM does your laptop have? How much have you told slurm it has?
How much is needed by the system? Does your task actually need 2GB?
Also your CPU/cores/threads counts don't appear to make sense.
It's probably one of those newer
On 27/01/2025 12:00, karl--- via slurm-users wrote:
Hello,
Last weekend, I updated my server to ubuntu 24.04 and instead of using
my own compiled version of slurm, I switched to
the version (23.11.4) that comes with Ubuntu 24.04. Since then, I have
been having problems with slurmdbd.
If th
On 25.06.2024 17:54, stth via slurm-users wrote:
Hi Timo,
Thanks, The old data wasn’t important so I did that. I changed the line
as follows in the
/usr/lib/systemd/system/slurmctld.service :
ExecStart=/usr/sbin/slurmctld --systemd -i $SLURMCTLD_OPTIONS
You should be able to immediately remo
On 25/06/2024 12:20, stth via slurm-users wrote:
Jun 25 10:06:39 server slurmctld[63738]: slurmctld: fatal: Can not
recover last_conf_lite, incompatible version, (9472 not between 9728 and
10240), start with '-i' to ignore this. Warning: using -i will lose the
data that can't be recovered.
Se
On 20/06/2024 10:57, hermes via slurm-users wrote:
Hello,
I’d like to ask if there is any safe method to rename an existing slurm
user to new username with the same uid?
As for linux itself, it’s quite common to have 2 user share the same uid.
So if we already have 2 system users, for exampl
On 14.06.2024 17:51, Rafał Lalik via slurm-users wrote:
Hello,
I have encountered issues with running slurmctld.
From logs, I see these errors:
[2024-06-14T17:37:57.587] slurmctld version 24.05.0 started on cluster
laura
[2024-06-14T17:37:57.587] error: plugin_load_from_file:
dlopen(/usr/li
On 02.04.2024 22:15, Russell Jones via slurm-users wrote:
Hi all,
I am working on upgrading a Slurm cluster from 20 -> 23. I was
successfully able to upgrade to 22, however now that I am trying to go
from 22 to 23, starting slurmdbd results in the following error being
logged:
error: mysql_
On 19/07/2023 15:04, Jan Andersen wrote:
Hmm, OK - but that is the only nvml.h I can find, as shown by the find
command. I downloaded the official NVIDIA-Linux-x86_64-535.54.03.run and
ran it successfully; do I need to install something else beside? A
google search for 'CUDA SDK' leads directly
On 19/07/2023 11:47, Jan Andersen wrote:
I'm trying to build slurm with nvml support, but configure doesn't find it:
root@zorn:~/slurm-23.02.3# ./configure --with-nvml
...
checking for hwloc installation... /usr
checking for nvml.h... no
checking for nvmlInit in -lnvidia-ml... yes
configure: err
ultiple jobs. Since all
those will be unique per job?
On Fri, 17 Mar 2023 at 11:17, Timo Rothenpieler
wrote:
Hello!
I'm currently facing a bit of an issue regarding cleanup after a job
completed.
I've added the following bit of Shellscript to our clusters Epilog scrip
Hello!
I'm currently facing a bit of an issue regarding cleanup after a job
completed.
I've added the following bit of Shellscript to our clusters Epilog script:
for d in "${SLURM_JOB_ID}" "${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID}"
"${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}"; do
WO
Ideally, the systemd service would specify the User/Group already, and
then also specify RuntimeDirectory=slurmrestd.
It then pre-creates a slurmrestd directory in /run for the service to
put its runtime files (like sockets) into, avoiding any permission issues.
Having service files in top leve
On 17.05.2022 15:58, Brian Andrus wrote:
You are starting to understand a major issue with most containers.
I suggest you check out Singularity, which was built from the ground up
to address most issues. And it can run other container types (eg: docker).
Brian Andrus
Side-Note to this, sing
Make sure you properly configured nsswitch.conf.
Most commonly this kind of issue indicates that you forgot to define
initgroups correctly.
It should look something like this:
...
group: files [SUCCESS=merge] systemd [SUCCESS=merge] ldap
...
initgroups: files [SUCCESS=continue] ldap
...
Are you using LDAP for your users?
This sounds exactly like what I was seeing on our cluster when
nsswitch.conf was not properly set up.
In my case, I was missing a line like
> initgroups: files [SUCCESS=continue] ldap
Just adding ldap to group: was not enough, and only got the primary
group
I'm immediately running into an issue when updating our Gentoo packages:
> checking for netloc installation...
> configure: error: unable to locate netloc installation
That happens even though --without-netloc was specified when configuring.
Looking at the following patch:
https://github.com/S
You shouldn't need this script and pam_exec.
You can set those limits directly in the systemd config to match every user.
On 20.05.2021 16:28, Bas van der Vlies wrote:
same here we use the systemd user slice in out pam stack:
```
# Setup for local and ldap logins
session required pam_systemd.
On 24.04.2021 04:37, Cristóbal Navarro wrote:
Hi Community,
I have a set of users still not so familiar with slurm, and yesterday
they bypassed srun/sbatch and just ran their CPU program directly on the
head/login node thinking it would still run on the compute node. I am
aware that I will nee
This has started happening after upgrading slurm from 20.02 to latest 20.11.
It seems like something exits too early, before slurm, or whatever else
is writing that file, has a chance to flush the final output buffer to disk.
For example, take this very simple batch script, which gets submitted
My slurm logrotate file looks like this:
/var/log/slurm/*.log {
weekly
compress
missingok
nocopytruncate
nocreate
nodelaycompress
nomail
notifempty
noolddir
rotate 5
sharedscripts
size=5M
create 640 slurm slurm
postrotate
systemctl
On 02.07.2020 20:28, Luis Huang wrote:
You can look into the CR_LLN feature. It works fairly well in our
environment and jobs are distributed evenly.
SelectTypeParameters=CR_Core_Memory,CR_LLN
From how I understand it, CR_LLN will schedule jobs to the least used
node. But if there's nearly n
Hello,
Our cluster is very rarely fully utilized, often only a handful of jobs
are running.
This has the effect that the first couple nodes get used a whole lot
more frequently than the ones further near the end of the list.
This is primarily a problem because of the SSDs in the nodes. They
22 matches
Mail list logo