[slurm-users] IMEX plugin in Slurm 24.05

2024-06-19 Thread Taras Shapovalov via slurm-users
Hello, Does anyone know if there is any documentation about the NVIDIA IMEX plugin for Slurm 24.05? It is not even in man page for slurm.conf, though it is in the release notes. Best regards, Taras -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to s

[slurm-users] Correct way to do logrotation

2023-10-16 Thread Taras Shapovalov
Hello, In the past it was recommended to reconfigure slurm daemons in logrotate script, sending a signal I believe was also the way to go. But recently I retested manual logrotation and I see that a removal of log file (for slurmctld, slurmdbd or slurmd) does not affect the logging of the daemo

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-13 Thread Taras Shapovalov
2.6 and 22.05.10 are now available (CVE-2023-41914) Taras Shapovalov writes: > Are the older versions affected as well? Yes, all older versjons are affected. -- B/H

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-11 Thread Taras Shapovalov
Are the older versions affected as well? Best regards, Taras From: slurm-users on behalf of Tim Wickberg Sent: Thursday, October 12, 2023 00:01 To: slurm-annou...@schedmd.com ; slurm-us...@schedmd.com Subject: [slurm-users] Slurm versions 23.02.6 and 22.05.1

[slurm-users] Why job memory request may be automatically set by Slurm to RealMemory of some node?

2022-11-04 Thread Taras Shapovalov
Hey, I noticed a weird behavior of Slurm 21 and 22. When the following conditions are satisfied, then Slurm implicitly sets job memory request equal to RealMemory of some node (perhaps first node that satisfies other job's requests, but this is not documented, or I could not find in the documen

Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-23 Thread Taras Shapovalov
Hi Marcus, This may depend on ConstrainDevices in cgroups.conf. I guess it is set to "no" in your case. Best regards, Taras On Tue, Jun 23, 2020 at 4:02 PM Marcus Wagner wrote: > Hi Kota, > > thanks for the hint. > > Yet, I'm still a little bit astonished, as if I remember right, > CUDA_VISIBL

Re: [slurm-users] Node appears to have a different slurm.conf than the slurmctld; update_node: node reason set to: Kill task failed

2020-02-12 Thread Taras Shapovalov
Hey Robert, Ask Bright support, they will help you to figure out what is going on there. Best regards, Taras On Tue, Feb 11, 2020 at 8:26 PM Robert Kudyba wrote: > This is still happening. Nodes are being drained after a kill task failed. > Could this be related to https://bugs.schedmd.com/sho

[slurm-users] Build datawarp plugin on non-Cray machine

2020-01-22 Thread Taras Shapovalov
Hey guys, Do you know if there is a way to build Slurm with datawarp plugin on a regular RHEL7 machine without Cray environment (without DataWarp installed)? Best regards, Taras

Re: [slurm-users] Replacement for FastSchedule since 19.05.3

2019-11-06 Thread Taras Shapovalov
from your configuration now. The error message suggests to "consider" this somehow. But I don't get how we should consider this. Best regards, Taras On Wed, Nov 6, 2019 at 5:30 AM Chris Samuel wrote: > On 5/11/19 6:36 am, Taras Shapovalov wrote: > > > Since Slurm 19.0

[slurm-users] Replacement for FastSchedule since 19.05.3

2019-11-05 Thread Taras Shapovalov
Hey guys, Since Slurm 19.05.3 we get an error message that FastSchedule is deprecated. But I cannot find in the documentation what is an alternative option for FastSchedule=0. Do you know how we can do that without using the option since 19.05.3? Best regards, Taras

[slurm-users] RHEL8 support

2019-10-26 Thread Taras Shapovalov
Hey guys, Do I understand correctly that Slurm19 is not compatible with rhel8? It is not in the list https://slurm.schedmd.com/platforms.html Has anyone successfully built Surm19 on rhel8 (or centos8)? Best regards, Taras

Re: [slurm-users] How to turn off core specialization?

2019-08-28 Thread Taras Shapovalov
Hi Dave, I can confirm that CoreSpecCount can not be reset to 0 once it is set >0 (at least for FastSchedule>0). As a workaround for this bug you can try to stop slurmctld, remove node_state file and start slurmctld again. Best regards, Taras On Fri, Aug 9, 2019 at 11:54 PM Guertin, David S. w

[slurm-users] Slurm cannot kill a job which time limit exhausted

2019-03-19 Thread Taras Shapovalov
Hey guys, When a job max time is exceeded, then Slurm tries to kill the job and fails: [2019-03-15T09:44:03.589] sched: _slurm_rpc_allocate_resources JobId=1325 NodeList=rn003 usec=355 [2019-03-15T09:44:03.928] prolog_running_decr: Configuration for JobID=1325 is complete [2019-03-15T09:45:12.739

Re: [slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-11 Thread Taras Shapovalov
Thank you, guys, Lets wait for 17.11.8. Any estimation for the release date? Best regards, Taras On Wed, Jul 11, 2018 at 12:11 AM Kilian Cavalotti < kilian.cavalotti.w...@gmail.com> wrote: > On Tue, Jul 10, 2018 at 10:34 AM, Taras Shapovalov > wrote: > > I noticed the

[slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-10 Thread Taras Shapovalov
Hey guys, When we upgraded to 17.11.7, then on some clusters all jobs are killed with these messages: slurmstepd: error: Job 374 exceeded memory limit (1308 > 1024), being killed slurmstepd: error: Exceeded job memory limit slurmstepd: error: *** JOB 374 ON node002 CANCELLED AT 2018-06-28T0

[slurm-users] Why SlurmUser is set to slurm by default?

2018-05-24 Thread Taras Shapovalov
Hey guys, We always use the default value for SlurmUser, but now we have realized that we don't really get why it is user slurm, but not root. Sometimes it is useful to run SlurmctlProlog as root, but then slurmctld will also run as root. Other workload managers are ok to run their control daemons