[slurm-users] Re: TRES cpu vs tasks

2024-12-04 Thread Henkel, Andreas via slurm-users
Hi Miriam, The Definition of cpu is “fluid” . It depends on hardware and configuration. If threads are defined then cpu may relate to one thread whereas on hardware configurations without threads it will refer to a physical core. https://slurm.schedmd.com/mc_support.html#defs Didn’t you set min

[slurm-users] Re: slurmd on a warwwulf node - not running

2024-12-03 Thread Henkel, Andreas via slurm-users
Hi, No it doesn’t need to be below 1000. Best Andreas Am 03.12.2024 um 22:08 schrieb Steven Jones via slurm-users :  HI, Does the slurm user need to be <1000UID?Using IPA with a UID of [root@vuwunicoslurmd1 slurm]# id slurm uid=126209577(slurm) gid=126209576(slurm) groups=126209576(slurm)

Re: [slurm-users] Database cluster

2024-01-24 Thread Henkel, Andreas
Hi Daniel, We run a simple Galera-MySQL Cluster and have a HAproxy running on all clients to steer the requests (round-Robin) to one of the DB-nodes that answer the health check properly. Best, Andreas Am 23.01.2024 um 15:35 schrieb Daniel L'Hommedieu :  Xand, Thanks - that’s great to hear.

Re: [slurm-users] slurm reporting

2019-11-27 Thread Henkel, Andreas
Hi Mark, Thanks for your insight. We also work with elasticsearch and I appreciate the easy analysis (once one understands Kibana logic). Do you use job completion plugin as is? Or did you modify it to account for ssl or additional metrics? Best Andreas Am 26.11.2019 um 18:27 schrieb Mark Ha

Re: [slurm-users] Limiting the number of CPU

2019-11-14 Thread Henkel, Andreas
Hi again, I’m pretty sure that’s not valid since your scontrol Show job shows minmemorypernode mich bigger than 1G. Best Andreas Am 14.11.2019 um 14:37 schrieb Nguyen Dai Quy mailto:q...@vnoss.org>>: On Thu, Nov 14, 2019 at 1:59 PM Sukman mailto:suk...@pusat.itb.ac.id>> wrote: Hi Brian, th

Re: [slurm-users] Limiting the number of CPU

2019-11-14 Thread Henkel, Andreas
Hi, Is lowercase #sbatch really valid? > Am 14.11.2019 um 14:09 schrieb Sukman : > > Hi Brian, > > thank you for the suggestion. > > It appears that my node is in drain state. > I rebooted the node and everything became fine. > > However, the QOS still cannot be applied properly. > Do you hav

Re: [slurm-users] After reboot nodes are in state = down

2019-09-26 Thread Henkel, Andreas
Hi Rafal, How do you restart the nodes? If you don’t use scontrol reboot Slurm doesn’t expect nodes to reboot therefore you see that reason in those cases. Best Andreas Am 27.09.2019 um 07:53 schrieb Rafał Kędziorski mailto:rafal.kedzior...@gmail.com>>: Hi, I'm working with slurm-wlm 18.08.

Re: [slurm-users] One time override to force run job

2019-09-07 Thread Henkel, Andreas
Hi Tina, We have an additional partition with partitionqos that increase the limits and allows for running short jobs over the limits if nodes are idle. And on Submission in the Standard-Partitions we automatically add the additional partition via a job_submit-plugin. Best, Andreas > Am 04.09

[slurm-users] Fwd: Getting information about AssocGrpCPUMinutesLimit for a job

2019-08-08 Thread Henkel, Andreas
Sorry,didn’t send to the list Anfang der weitergeleiteten Nachricht: Von: Henkel mailto:hen...@uni-mainz.de>> Datum: 8. August 2019 um 09:21:55 MESZ An: "Sarlo, Jeffrey S" mailto:jsa...@central.uh.edu>> Betreff: Aw:⁨ [slurm-users] Getting information about AssocGrpCPUMinutesLimit for a job⁩ Hi

Re: [slurm-users] Slurm configuration

2019-08-03 Thread Henkel, Andreas
Hi, Have you checked documentation of MinMemory in Slurm.conf for node definition? Best, Andreas > Am 02.08.2019 um 23:53 schrieb Sistemas NLHPC : > > Hi all, > > Currently we have two types of nodes, one with 192GB and another with 768GB > of RAM, it is required that in nodes of 768 GB it is

Re: [slurm-users] Rename account or move user from one account to another

2019-06-17 Thread Henkel, Andreas
Hi Christoph, I think the only way is to modify the database directly. I don’t know if Slurm likes it and personally would try it in a copy of the DB with a separate slurmdbd to see if the values reported are still correct. Best regards, Andreas Henkel > Am 14.06.2019 um 16:16 schrieb Sam Ga

Re: [slurm-users] Pending with resource problems

2019-04-17 Thread Henkel, Andreas
I think there isn’t enough memory. AllocTres Shows mem=55G And your job wants another 40G although the node only has 63G in total. Best, Andreas Am 17.04.2019 um 16:45 schrieb Mahmood Naderan mailto:mahmood...@gmail.com>>: Hi, Although it was fine for previous job runs, the following script now

Re: [slurm-users] Priority access for a group of users

2019-02-17 Thread Henkel, Andreas
Hi David, I think there is another option if you don’t want to use preemption. If the max runlimit is small (several hours for example) working without preemption may be acceptable. Assign a qos with a priority boost to the owners of the node. Then whenever they submit jobs to the partition the

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Henkel, Andreas
gt; Am 14.02.2019 um 09:32 schrieb Marcus Wagner : > > Hi Andreas, > > > >> On 2/14/19 8:56 AM, Henkel, Andreas wrote: >> Hi Marcus, >> >> More ideas: >> CPUs doesn’t always count as core but may take the meaning of one thread, >> hence makes

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Henkel, Andreas
Hi Marcus, More ideas: CPUs doesn’t always count as core but may take the meaning of one thread, hence makes different Maybe the behavior of CR_ONE_TASK is still not solid nor properly documente and ntasks and ntasks-per-node are honored different internally. If so solely using ntasks can mea

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Henkel, Andreas
Hi Marcus, What just came to my mind: if you don’t set —ntasks isn’t the default just 1? All examples I know using ntasks-per-node also set ntasks with ntasks >= ntasks-per-node. Best, Andreas > Am 14.02.2019 um 06:33 schrieb Marcus Wagner : > > Hi all, > > I have narrowed this down a litt

Re: [slurm-users] How to request ONLY one CPU instead of one socket or one node?

2019-02-12 Thread Henkel, Andreas
Hi Leon, If the partition is defined to run jobs exclusive you always get a full node. You’ll have to try to either split up your analysis in independent subtasks to be run in parallel by dividing the data or make use of some Perl parallelization package like parallel::Forkmanager to run steps of

Re: [slurm-users] SLURM_JOB_GPU not set in salloc

2019-01-21 Thread Henkel, Andreas
Thank you Chris. This is what I assumed since setting those Variables for complicated Allocations may be just useless. Yet I wasn’t sure if it was possible at all. Best, Andreas > Am 19.01.2019 um 08:39 schrieb Chris Samuel : > >> On 18/1/19 3:18 am, Henkel wrote: >> >> we just found that

Re: [slurm-users] [Slurm 18.08.4] sacct/seff Inaccurate usercpu values

2019-01-17 Thread Henkel, Andreas
n't show up until the next release, but at least > there is a fix available. > > Mike Robbert > >> On 1/15/19 11:43 PM, Henkel, Andreas wrote: >> Bad news Dir the cgroup-Users, seems like the bug is „resolved“ by the site >> switching to task/Linux instea

Re: [slurm-users] [Slurm 18.08.4] sacct/seff Inaccurate usercpu values

2019-01-15 Thread Henkel, Andreas
Bad news Dir the cgroup-Users, seems like the bug is „resolved“ by the site switching to task/Linux instead :-( > Am 09.01.2019 um 22:06 schrieb Christopher Benjamin Coffey > : > > Thanks... looks like the bug should get some attention now that a paying site > is complaining: > > https://bugs

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Henkel, Andreas
Hi, As far as I understand salloc is used to make allocations but initiate a shell (whatever the sallocdefaultcommand specifies) on the node you called salloc. If you’re looking for an interactive session you‘ll probably have to use srun --pty xterm . This will allocate the resources AND initia

Re: [slurm-users] Disabling --nodelist

2018-11-27 Thread Henkel, Andreas
Hi, A spank plugin would probably work. A job-submit which replaces the nodelist with an empty string could work either. What about just changing the .profile and set the env variable for nodelist to empty string? „Note that environment variables will override any options set in a batch scri

Re: [slurm-users] UsageFactor in combination with GrpTRESRunMins

2018-03-21 Thread Henkel, Andreas
PS: we're using Slurm 17.11.5 Am 21.03.2018 um 16:18 schrieb Henkel, Andreas mailto:hen...@uni-mainz.de>>: Hi, recently, while trying a new configuration I came cross a Problem. In principal, we have one big Partition containing all nodes with PriorityTier=2. Each account got Gr

[slurm-users] UsageFactor in combination with GrpTRESRunMins

2018-03-21 Thread Henkel, Andreas
Hi, recently, while trying a new configuration I came cross a Problem. In principal, we have one big Partition containing all nodes with PriorityTier=2. Each account got GrpTRESRunMin=cpu=<#somelimit> set. Every now and then we have the Situation that part of the nodes are idling. For this we