from:"Timo Rothenpieler"

[slurm-users] Re: Running SLURM in a laptop

2025-02-19 Thread Timo Rothenpieler via slurm-users

On 19.02.2025 14:06, Luke Sudbery via slurm-users wrote: How much RAM does your laptop have? How much have you told slurm it has? How much is needed by the system? Does your task actually need 2GB? Also your CPU/cores/threads counts don't appear to make sense. It's probably one of those newer

[slurm-users] Re: Problems with slurmdbd on Ubuntu 24.04

2025-01-27 Thread Timo Rothenpieler via slurm-users

On 27/01/2025 12:00, karl--- via slurm-users wrote: Hello, Last weekend, I updated my server to ubuntu 24.04 and instead of using my own compiled version of slurm, I switched to the version (23.11.4) that comes with Ubuntu 24.04. Since then, I have been having problems with slurmdbd. If th

[slurm-users] Re: Slurmctld Problems

2024-06-25 Thread Timo Rothenpieler via slurm-users

On 25.06.2024 17:54, stth via slurm-users wrote: Hi Timo, Thanks, The old data wasn’t important so I did that. I changed the line as follows in the /usr/lib/systemd/system/slurmctld.service : ExecStart=/usr/sbin/slurmctld --systemd -i $SLURMCTLD_OPTIONS You should be able to immediately remo

[slurm-users] Re: Slurmctld Problems

2024-06-25 Thread Timo Rothenpieler via slurm-users

On 25/06/2024 12:20, stth via slurm-users wrote: Jun 25 10:06:39 server slurmctld[63738]: slurmctld: fatal: Can not recover last_conf_lite, incompatible version, (9472 not between 9728 and 10240), start with '-i' to ignore this. Warning: using -i will lose the data that can't be recovered. Se

[slurm-users] Re: how to safely rename a slurm user's name

2024-06-20 Thread Timo Rothenpieler via slurm-users

On 20/06/2024 10:57, hermes via slurm-users wrote: Hello, I’d like to ask if there is any safe method to rename an existing slurm user to new username with the same uid? As for linux itself, it’s quite common to have 2 user share the same uid. So if we already have 2 system users, for exampl

[slurm-users] Re: Issue with starting slurmctld

2024-06-14 Thread Timo Rothenpieler via slurm-users

On 14.06.2024 17:51, Rafał Lalik via slurm-users wrote: Hello, I have encountered issues with running slurmctld. From logs, I see these errors: [2024-06-14T17:37:57.587] slurmctld version 24.05.0 started on cluster laura [2024-06-14T17:37:57.587] error: plugin_load_from_file: dlopen(/usr/li

[slurm-users] Re: Slurm 23.11 - Unknown system variable 'wsrep_on'

2024-04-03 Thread Timo Rothenpieler via slurm-users

On 02.04.2024 22:15, Russell Jones via slurm-users wrote: Hi all, I am working on upgrading a Slurm cluster from 20 -> 23. I was successfully able to upgrade to 22, however now that I am trying to go from 22 to 23, starting slurmdbd results in the following error being logged: error: mysql_

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Timo Rothenpieler

On 19/07/2023 15:04, Jan Andersen wrote: Hmm, OK - but that is the only nvml.h I can find, as shown by the find command. I downloaded the official NVIDIA-Linux-x86_64-535.54.03.run and ran it successfully; do I need to install something else beside? A google search for 'CUDA SDK' leads directly

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

2023-07-19 Thread Timo Rothenpieler

On 19/07/2023 11:47, Jan Andersen wrote: I'm trying to build slurm with nvml support, but configure doesn't find it: root@zorn:~/slurm-23.02.3# ./configure --with-nvml ... checking for hwloc installation... /usr checking for nvml.h... no checking for nvmlInit in -lnvidia-ml... yes configure: err

Re: [slurm-users] Get Job Array information in Epilog script

2023-03-17 Thread Timo Rothenpieler

ultiple jobs. Since all those will be unique per job? On Fri, 17 Mar 2023 at 11:17, Timo Rothenpieler wrote: Hello! I'm currently facing a bit of an issue regarding cleanup after a job completed. I've added the following bit of Shellscript to our clusters Epilog scrip

[slurm-users] Get Job Array information in Epilog script

2023-03-17 Thread Timo Rothenpieler

Hello! I'm currently facing a bit of an issue regarding cleanup after a job completed. I've added the following bit of Shellscript to our clusters Epilog script: for d in "${SLURM_JOB_ID}" "${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID}" "${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}"; do WO

Re: [slurm-users] slurmrestd service broken by 22.05.07 update

2022-12-29 Thread Timo Rothenpieler

Ideally, the systemd service would specify the User/Group already, and then also specify RuntimeDirectory=slurmrestd. It then pre-creates a slurmrestd directory in /run for the service to put its runtime files (like sockets) into, avoiding any permission issues. Having service files in top leve

Re: [slurm-users] container on slurm cluster

2022-05-17 Thread Timo Rothenpieler

On 17.05.2022 15:58, Brian Andrus wrote: You are starting to understand a major issue with most containers. I suggest you check out Singularity, which was built from the ground up to address most issues. And it can run other container types (eg: docker). Brian Andrus Side-Note to this, sing

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

2022-01-31 Thread Timo Rothenpieler

Make sure you properly configured nsswitch.conf. Most commonly this kind of issue indicates that you forgot to define initgroups correctly. It should look something like this: ... group: files [SUCCESS=merge] systemd [SUCCESS=merge] ldap ... initgroups: files [SUCCESS=continue] ldap ...

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

2021-09-21 Thread Timo Rothenpieler

Are you using LDAP for your users? This sounds exactly like what I was seeing on our cluster when nsswitch.conf was not properly set up. In my case, I was missing a line like > initgroups: files [SUCCESS=continue] ldap Just adding ldap to group: was not enough, and only got the primary group

Re: [slurm-users] Slurm version 21.08 is now available

2021-08-27 Thread Timo Rothenpieler

I'm immediately running into an issue when updating our Gentoo packages: > checking for netloc installation... > configure: error: unable to locate netloc installation That happens even though --without-netloc was specified when configuring. Looking at the following patch: https://github.com/S

Re: [slurm-users] What is an easy way to prevent users run programs on the master/login node.

2021-05-20 Thread Timo Rothenpieler

You shouldn't need this script and pam_exec. You can set those limits directly in the systemd config to match every user. On 20.05.2021 16:28, Bas van der Vlies wrote: same here we use the systemd user slice in out pam stack: ``` # Setup for local and ldap logins session required pam_systemd.

Re: [slurm-users] What is an easy way to prevent users run programs on the master/login node.

2021-05-20 Thread Timo Rothenpieler

On 24.04.2021 04:37, Cristóbal Navarro wrote: Hi Community, I have a set of users still not so familiar with slurm, and yesterday they bypassed srun/sbatch and just ran their CPU program directly on the head/login node thinking it would still run on the compute node. I am aware that I will nee

[slurm-users] sbatch output logs get truncated

2021-01-28 Thread Timo Rothenpieler

This has started happening after upgrading slurm from 20.02 to latest 20.11. It seems like something exits too early, before slurm, or whatever else is writing that file, has a chance to flush the final output buffer to disk. For example, take this very simple batch script, which gets submitted

Re: [slurm-users] Slurmctld and log file

2020-09-08 Thread Timo Rothenpieler

My slurm logrotate file looks like this: /var/log/slurm/*.log { weekly compress missingok nocopytruncate nocreate nodelaycompress nomail notifempty noolddir rotate 5 sharedscripts size=5M create 640 slurm slurm postrotate systemctl

Re: [slurm-users] Evenly use all nodes

2020-07-02 Thread Timo Rothenpieler

On 02.07.2020 20:28, Luis Huang wrote: You can look into the CR_LLN feature. It works fairly well in our environment and jobs are distributed evenly. SelectTypeParameters=CR_Core_Memory,CR_LLN From how I understand it, CR_LLN will schedule jobs to the least used node. But if there's nearly n

[slurm-users] Evenly use all nodes

2020-07-02 Thread Timo Rothenpieler

Hello, Our cluster is very rarely fully utilized, often only a handful of jobs are running. This has the effect that the first couple nodes get used a whole lot more frequently than the ones further near the end of the list. This is primarily a problem because of the SSDs in the nodes. They

[slurm-users] Re: Running SLURM in a laptop

[slurm-users] Re: Problems with slurmdbd on Ubuntu 24.04

[slurm-users] Re: Slurmctld Problems

[slurm-users] Re: Slurmctld Problems

[slurm-users] Re: how to safely rename a slurm user's name

[slurm-users] Re: Issue with starting slurmctld

[slurm-users] Re: Slurm 23.11 - Unknown system variable 'wsrep_on'

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

Re: [slurm-users] configure script can't find nvml.h or libnvidia-ml.so

Re: [slurm-users] Get Job Array information in Epilog script

[slurm-users] Get Job Array information in Epilog script

Re: [slurm-users] slurmrestd service broken by 22.05.07 update

Re: [slurm-users] container on slurm cluster

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

Re: [slurm-users] Slurm version 21.08 is now available

Re: [slurm-users] What is an easy way to prevent users run programs on the master/login node.

Re: [slurm-users] What is an easy way to prevent users run programs on the master/login node.

[slurm-users] sbatch output logs get truncated

Re: [slurm-users] Slurmctld and log file

Re: [slurm-users] Evenly use all nodes

[slurm-users] Evenly use all nodes

22 matches

Site Navigation

Mail list logo

Footer information