[slurm-users] Re: Assistance with Node Restrictions and Priority for Users in Floating Partition

2025-02-03 Thread Bjørn-Helge Mevik via slurm-users
Manisha Yadav writes: > Could you please confirm if my setup is correct, or if any modifications are > required on my end? I don't see anything wrong with the part of the setup that you've shown. Have you checked with `sprio -l -j ` whether the jobs get the extra qos priority? If not, perhaps

[slurm-users] Re: Assistance with Node Restrictions and Priority for Users in Floating Partition

2025-01-27 Thread Bjørn-Helge Mevik via slurm-users
rhaps simply GrpTRES=node=3) should work. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to sl

[slurm-users] Re: The hostname resolution case sensitive

2024-11-07 Thread Bjørn-Helge Mevik via slurm-users
Ole Holm Nielsen via slurm-users writes: > Is Slurm's NodeName case sensitivity a bug or a feature? Preventing people from using UPPERCASE hostnames, usernames, group names etc. is IMNSHO a feature. :D -- B/H signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-use

[slurm-users] Re: The hostname resolution case sensitive

2024-11-07 Thread Bjørn-Helge Mevik via slurm-users
m-per-cpu=100 --wrap='sleep 60' --nodelist=C3-1 sbatch: error: Batch job submission failed: Invalid node name specified Looks like answer is Yes. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP sig

[slurm-users] Re: I need to limit the number of jobs per user per partition

2024-10-28 Thread Bjørn-Helge Mevik via slurm-users
Have you set the AccountingStorageEnforce parameter in slurm.conf? I believe that it should be set to at least "limits", but do check "man slurm.conf" to be sure. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.

[slurm-users] Re: GPU Accounting

2024-10-03 Thread Bjørn-Helge Mevik via slurm-users
clusters, we have AccountingStorageTRES=gres/gpu,gres/gpu:a100,gres/gpu:rtx30,gres/gpu:1g.20gb,gres/gpu:a40 Then AllocTRES from sacct will show things like billing=19,cpu=6,gres/gpu:a100=1,gres/gpu=1,mem=12G,node=1 depending on what the job specifies. -- Regards, Bjørn-Helge Mevik, dr. sci

[slurm-users] Re: A note on updating Slurm from 23.02 to 24.05 & multi-cluster

2024-09-26 Thread Bjørn-Helge Mevik via slurm-users
do you mean all the slurmctlds, or also all slurmds? -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an emai

[slurm-users] Re: Detailed locations for SLUG'24

2024-09-10 Thread Bjørn-Helge Mevik via slurm-users
Bjørn-Helge Mevik via slurm-users writes: > Dear all SLUG attendees! > > The information about which buildings/addresses the SLUG reception and > presentations are to be held is not very visible on > the https://slug24.splashthat.com. There is a map there with all loc

[slurm-users] Detailed locations for SLUG'24

2024-09-09 Thread Bjørn-Helge Mevik via slurm-users
ere! -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

2024-08-19 Thread Bjørn-Helge Mevik via slurm-users
Brian Andrus via slurm-users writes: > IIRC, slurm parses the batch file as options until it hits the first > non-comment line, which includes blank lines. Blank lines do not stop sbatch from parsing the file. (But commands do.) -- B/H signature.asc Description: PGP signature -- slurm-use

[slurm-users] Re: Slurm sacct ResvCPURAW invalid field in version 24.12.5

2024-07-29 Thread Bjørn-Helge Mevik via slurm-users
Perhaps PlannedCPURAW? -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Unsupported RPC version by slurmctld 19.05.3 from client slurmd 22.05.11

2024-06-17 Thread Bjørn-Helge Mevik via slurm-users
Paul Edmon via slurm-users writes: > https://slurm.schedmd.com/upgrades.html#compatibility_window > > Looks like no. You have to be with in 2 major releases. Also, server must be newer than client. -- B/H signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@l

[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

2024-05-26 Thread Bjørn-Helge Mevik via slurm-users
Ole Holm Nielsen via slurm-users writes: > Whether or not to enable Hyper-Threading (HT) on your compute nodes > depends entirely on the properties of applications that you wish to > run on the nodes. Some applications are faster without HT, others are > faster with HT. When HT is enabled, the

[slurm-users] Re: scrontab question

2024-05-07 Thread Bjørn-Helge Mevik via slurm-users
t look like ordinary ascii, for instance "unbreakable space". I tend to just pipe the text throuth "od -a". -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -

[slurm-users] Re: Convergence of Kube and Slurm?

2024-05-06 Thread Bjørn-Helge Mevik via slurm-users
Tim Wickberg via slurm-users writes: > [1] Slinky is not an acronym (neither is Slurm [2]), but loosely > stands for "Slurm in Kubernetes". And not at all inspired by Slinky Dog in Toy Story, I guess. :D -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Compu

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-17 Thread Bjørn-Helge Mevik via slurm-users
:) (Except for number of procs and number of pending signals, according to "man setrlimit".) Then 1024 might not be so low for ulimit -n after all. -- Regard, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signatu

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Bjørn-Helge Mevik via slurm-users
Ole Holm Nielsen writes: > Hi Bjørn-Helge, > > That sounds interesting, but which limit might affect the kernel's > fs.file-max? For example, a user already has a narrow limit: > > ulimit -n > 1024 AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n" is per user. Now that I t

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Bjørn-Helge Mevik via slurm-users
d/. Then users would be blocked from opening unreasonably many files. One could use this to find which applications are responsible, and try to get them fixed. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP

[slurm-users] Re: Increasing SlurmdTimeout beyond 300 Seconds

2024-02-12 Thread Bjørn-Helge Mevik via slurm-users
We've been running one cluster with SlurmdTimeout = 1200 sec for a couple of years now, and I haven't seen any problems due to that. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- s

[slurm-users] Re: Starting a job after a file is created in previous job (dependency looking for soluton)

2024-02-06 Thread Bjørn-Helge Mevik via slurm-users
. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Why is Slurm 20 the latest RPM in RHEL 8/Fedora repo?

2024-01-31 Thread Bjørn-Helge Mevik via slurm-users
can tailor the rpms/build to your needs (IB? SlingShot? Nvidia? etc.). -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an

Re: [slurm-users] propose environment variables SLURM_STDOUT, SLURM_STDERR, SLURM_STDIN

2024-01-21 Thread Bjørn-Helge Mevik
I would find that useful, yes. Especially if the variables were made available for the Prolog and Epilog scripts. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] slurm.conf

2024-01-18 Thread Bjørn-Helge Mevik
the Slurm configuration file. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] SLURM Reservation for GPU

2023-12-04 Thread Bjørn-Helge Mevik
Bjørn-Helge Mevik writes: > (Unfortunately, the page is so "wisely" created that it is impossible > to cut'n'paste from it.) That turned out to be a PEBKAC. :) cut'n'paste *is* possible. :) -- B/H signature.asc Description: PGP signature

Re: [slurm-users] SLURM Reservation for GPU

2023-12-04 Thread Bjørn-Helge Mevik
Minulakshmi S writes: > I am not able to find any supporting statements in Release Notes ... could > you please point. https://www.schedmd.com/news.php, the "Slurm version 23.11.0 is now available" section, the seventh bullet point. (Unfortunately, the page is so "wisely" created that it is imp

Re: [slurm-users] SLURM Reservation for GPU

2023-11-29 Thread Bjørn-Helge Mevik
I believe support for this was implemented in 23.11.0. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Releasing stale allocated TRES

2023-11-23 Thread Bjørn-Helge Mevik
ite an old version. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] --partition requests ignored in scripts

2023-11-08 Thread Bjørn-Helge Mevik
a batch script, and command line options will override any environment variables. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] RES: multiple srun commands in the same SLURM script

2023-11-01 Thread Bjørn-Helge Mevik
s changed quite a bit in the recent versions, and the example above is for the latest version, so check the srun man page for your version. (And unfortunately, the documentation in the srun man page has not always been correct, so you might need to experiment. For instance, I believe Example 7 above i

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-15 Thread Bjørn-Helge Mevik
on a couple of our clusters due to this.) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-12 Thread Bjørn-Helge Mevik
Taras Shapovalov writes: > Are the older versions affected as well? Yes, all older versjons are affected. -- B/H signature.asc Description: PGP signature

Re: [slurm-users] New member , introduction

2023-09-30 Thread Bjørn-Helge Mevik
Welcome! :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] question about configuration in slurm.conf

2023-09-26 Thread Bjørn-Helge Mevik
d work. I'd personally use the second one. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

[slurm-users] Transport from SLC to Provo?

2023-08-14 Thread Bjørn-Helge Mevik
t any alternative way to get to Provo on a Sunday night? -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

[slurm-users] No coffee allowed on BYU campus(!) Suggestions for alternatives?

2023-07-04 Thread Bjørn-Helge Mevik
quot;no smoking or drinking of alcohol, coffee, or tea is permitted on the BYU campus, though other caffeinated beverages are allowed." So, any suggestions for "other caffeinated beverages" I'd be able to buy and bring with me to the sessions? -- Cheers, Bjørn-Helge Mevik, dr. sci

Re: [slurm-users] Job step do not take the hole allocation

2023-06-30 Thread Bjørn-Helge Mevik
Hei, Ole! :) Ole Holm Nielsen writes: > Can anyone she light on the relationship between Tommi's > slurm_cli_pre_submit function and the ones defined in the > cli_filter_plugins page? I think the *_p_* functions are functions you need to implement if you write a cli plugin in C. When you write

Re: [slurm-users] Limit run time of interactive jobs

2023-05-08 Thread Bjørn-Helge Mevik
Ole Holm Nielsen writes: > On 5/8/23 08:39, Bjørn-Helge Mevik wrote: >> Angel de Vicente writes: >> >>> But one possible way to something similar is to have a partition only >>> for interactive jobs and a different partition for batch jobs, and then >>

Re: [slurm-users] Limit run time of interactive jobs

2023-05-07 Thread Bjørn-Helge Mevik
e (check the job_submit.lua > example). Wouldn't it be simpler to just refuse too long interactive jobs in job_submit.lua? -- Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] [EXT] Submit sbatch to multiple partitions

2023-04-17 Thread Bjørn-Helge Mevik
quot; containing all nodes will work. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Preventing --exclusive on a per-partition basis

2023-03-22 Thread Bjørn-Helge Mevik
I'd simply add a test like and job_desc.partition == "the_partition" to the test for exclusiveness. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Bjørn-Helge Mevik
and the other 1 GiB without the step being killed. You can inspect the memory limits that are in effect in cgroups (v1) in /sys/fs/cgroup/memory/slurm/uid_/job_ (usual location, at least). -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Bjørn-Helge Mevik
Slurm does job cleanup. Step epilogs and/or SPANK plugins can further delay the release of step resources. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] job_container/tmpfs and autofs

2023-01-12 Thread Bjørn-Helge Mevik
ger needed. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] How to read job accounting data long output? `sacct -l`

2022-12-15 Thread Bjørn-Helge Mevik
Marcus Wagner writes: > That depends on what is meant with formatting argument. Yes, they could surely have defined that. > etc. And I would assume, that -S, -E and -T are filtering options, not > formatting options. I'd describe -T as a formatting option: -T, --truncate

Re: [slurm-users] How to read job accounting data long output? `sacct -l`

2022-12-14 Thread Bjørn-Helge Mevik
Marcus Wagner writes: > it it important to know, that the json output seems to be broken. > > First of all, it does not (compared to the normal output) obey to the > truncate option -T. > But more important, I saw a job, where in a "day output" (-S -E > ) no steps were recorded. > Using sacct

Re: [slurm-users] How to read job accounting data long output? `sacct -l`

2022-12-13 Thread Bjørn-Helge Mevik
ttribute and am to lazy to read the man page. Then I use -o to specify what I want returned.) Also, in newer versions at least, there is --json and --yaml to give you output which you can parse with other tools (or read, if you really want :). -- Cheers, Bjørn-Helge Mevik signature.as

Re: [slurm-users] Test Suite problems related to requesting tasks

2022-10-26 Thread Bjørn-Helge Mevik
"Groner, Rob" writes: > For your "special testing config", do you just mean the > slurm.conf/gres.conf/*.conf files? Yes. > So when you want to test a new > version of slurm, you replace the conf files and then restart all of > the daemons? Exactly. (We usually don't do this on our production

Re: [slurm-users] Test Suite problems related to requesting tasks

2022-10-25 Thread Bjørn-Helge Mevik
"Groner, Rob" writes: > I'm wondering OVERALL if the test suite is supposed to work on ANY > working slurm system. I could not find any documentation on how the > slurm configuration and nodes were required to be setup in order for > the test to workno indication that the test suite requires

Re: [slurm-users] Accounting core-hours usages

2022-10-11 Thread Bjørn-Helge Mevik
g MariaDB and then slurmdb as described in the manual but > looks like I am missing something. I wonder if someone can help us with > this off the list? Perhaps the eminent guide of Ole Nielsen can help you: https://wiki.fysik.dtu.dk/niflheim/SLURM -- Regards, Bjørn-Helge Mevik signat

Re: [slurm-users] Use cases for "include" in slurm.conf?

2022-09-21 Thread Bjørn-Helge Mevik
a git repo without spreading the password around. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] How to debug a prolog script?

2022-09-18 Thread Bjørn-Helge Mevik
ith a > non-executable with with "sh filename", so I made the incorrect > assumption that slurm would have invoked the prolog that way. Slurm prologs can be written in any language - we used to have perl prolog scripts. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] How to debug a prolog script?

2022-09-16 Thread Bjørn-Helge Mevik
but some things can be tested even though the setups are not exactly the same (for instance, in my experience, CentOS and Rocky are close enough to RHEL for most slurm-related things). One takes what one have. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] How to debug a prolog script?

2022-09-16 Thread Bjørn-Helge Mevik
Davide DelVento writes: > 2. How to debug the issue? I'd try capturing all stdout and stderr from the script into a file on the compute node, for instance like this: exec &> /root/prolog_slurmd.$$ set -x # To print out all commands before any other commands in the script. The "prolog_slurmd.

Re: [slurm-users] Cgroup task plugin fails if ConstrainRAMSpace and ConstrainKmemSpace are enabled

2022-08-22 Thread Bjørn-Helge Mevik
This doesn't answer your question, but still: I'd be wary about using ConstrainKmemSpace at all. At least in the kernels on RedHat/CentOS <= 7.9, there is a bug in that eventually prevents Slurm from starting new job steps on a node, and the node has to be rebooted to be usable again. See for ins

Re: [slurm-users] "slurmd -C" reduce by xx GB or yy %

2022-08-10 Thread Bjørn-Helge Mevik
small C program that mallocs and fills a large array, and see how big I can make the array before the node starts to swap. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

2022-06-24 Thread Bjørn-Helge Mevik
Miguel Oliveira writes: > Hi Bjørn-Helge, > > Long time! Hi Miguel! Yes, definitely a long time! :D > Why not? You can have multiple QoSs and you have other techniques to change > priorities according to your policies. A job can only run in a single QoS, so if you submit a job with "sbatch

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

2022-06-24 Thread Bjørn-Helge Mevik
. Yes, that will work. But it has the drawback that you cannot use QoS'es for *anything else*, like a QoS for development jobs or similar. So either way it is a trade-off. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

2022-06-23 Thread Bjørn-Helge Mevik
Ole Holm Nielsen writes: > Hi Bjørn-Helge, Hello, Ole! :) > On 6/23/22 09:18, Bjørn-Helge Mevik wrote: > >> Slurm the same internal variables are used for fairshare calculations as >> for GrpTRESMins (and similar), so when fair share priorities are in use, >> sl

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

2022-06-23 Thread Bjørn-Helge Mevik
writes: > TRESRaw cpu is lower than before as I'm alone on the system an no other job > was submitted. > Any explanation of this ? I'd guess you have turned on FairShare priorities. Unfortunately, in Slurm the same internal variables are used for fairshare calculations as for GrpTRESMins (an

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

2022-06-23 Thread Bjørn-Helge Mevik
riorityFlags=MAX_TRES or not. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Need to restart slurmctld for gres jobs to start

2022-06-02 Thread Bjørn-Helge Mevik
tluchko writes: > Jobs only sit in the queue with RESOURCES as the REASON when we > include the flag --gres=bandwidth:ib. If we remove the flag, the jobs > run fine. But we need the flag to ensure that we don't get a mix of IB > and ethernet nodes because they fail in this case. This doesn't ans

Re: [slurm-users] High log rate on messages like "Node nodeXX has low real_memory size"

2022-05-12 Thread Bjørn-Helge Mevik
Per Lönnborg writes: > I "forgot" to tell our version because it´s a bit embarrising - 19.05.8... Haha! :D -- B/H signature.asc Description: PGP signature

Re: [slurm-users] High log rate on messages like "Node nodeXX has low real_memory size"

2022-05-12 Thread Bjørn-Helge Mevik
ter once if it has too little memory, thus only giving one such message. (The node will then hva state "inval" in sinfo.) -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Strange memory limit behavior with --mem-per-gpu

2022-04-08 Thread Bjørn-Helge Mevik
Paul Raines writes: > Basically, it appears using --mem-per-gpu instead of just --mem gives > you unlimited memory for your job. > > $ srun --account=sysadm -p rtx8000 -N 1 --time=1-10:00:00 > --ntasks-per-node=1 --cpus-per-task=1 --gpus=1 --mem-per-gpu=8G > --mail-type=FAIL --pty /bin/bash > rtx

Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Bjørn-Helge Mevik
Hermann Schwärzler writes: > Do you happen to know if there is a difference between setting CPUs > explicitely like you do it and not setting it but using > "ThreadsPerCore=1"? > > My guess is that there is no difference and in both cases only the > physical cores are "handed out to jobs". But ma

Re: [slurm-users] Disable exclusive flag for users

2022-03-25 Thread Bjørn-Helge Mevik
any times more. There are better ways to specify using whole nodes, for instance using all cpus on the node or all memory on the node.") end (both of these just warn, though, but should be easy to change into rejecting the job.) -- Regards, Bjørn-Helge Mevik, dr. scient, Departm

Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Bjørn-Helge Mevik
eters=CR_CPU_Memory and node definitions like NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=182784 Gres=localscratch:330G Weight=1000 (so we set CPUs to the number of *physical cores*, not *hyperthreads*). -- Regards, Bjørn-Helge Mevik, dr. scient, Departmen

Re: [slurm-users] monitoring and update regime for Power Saving nodes

2022-02-23 Thread Bjørn-Helge Mevik
thing up, to > work on during the maintenance)? For the slurm.conf part, I'd suggest using the "configless" mode - that way at least the slurm config will always be up-to-date. See, e.g., https://slurm.schedmd.com/configless_slurm.html -- Regards, Bjørn-Helge Mevik, dr. scient, Dep

Re: [slurm-users] Problems with sun and TaskProlog

2022-02-11 Thread Bjørn-Helge Mevik
"Putnam, Harry" writes: > /opt/slurm/task_epilog > > #!/bin/bash > mytmpdir=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID > rm -Rf $mytmpdir > exit; This might not be the reason for what you observe, but I believe deleting the scratch dir in the task epilog is not a good idea. The task epilog is run a

Re: [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information

2022-02-04 Thread Bjørn-Helge Mevik
ing the upgrade, shouldn't it? As I understand it, without any running jobs, you can do pretty much what you want on the compute nodes. Or am I missing something here? -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Compute nodes cycling from idle to down on a regular basis ?

2022-02-01 Thread Bjørn-Helge Mevik
This might not apply to your setup, but historically when we've seen similar behaviour, it was often due to the affected compute nodes missing from /etc/hosts on some *other* compute nodes. -- B/H signature.asc Description: PGP signature

Re: [slurm-users] Questions about default_queue_depth

2022-01-12 Thread Bjørn-Helge Mevik
David Henkemeyer writes: > 3) Is there a way to see the order of the jobs in the queue? Perhaps > squeue lists the jobs in order? squeue -S -p Sort jobs in descending priority order. -- B/H signature.asc Description: PGP signature

Re: [slurm-users] Is this a known error?

2021-12-08 Thread Bjørn-Helge Mevik
N:9472 UID:51568 IP:10.2.3.185 CONN:8 in slurmdbd.log. But perhaps that will not happen if slurmdbd fails to unpack the header? -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] slurmstepd: error: Too many levels of symbolic links

2021-12-03 Thread Bjørn-Helge Mevik
Adrian Sevcenco writes: > On 01.12.2021 10:25, Bjørn-Helge Mevik wrote: > >> In the end we had to give up >> using automount, and implement a manual procedure that mounts/umounts >> the needed nfs areas. > > Thanks a lot for info! manual as in "script" o

Re: [slurm-users] slurmstepd: error: Too many levels of symbolic links

2021-12-01 Thread Bjørn-Helge Mevik
Adrian Sevcenco writes: > Hi! Does anyone know what could the the cause of such error? > I have a shared home, slurm 20.11.8 and i try a simple script in the submit > directory > which is in the home that is nfs shared... We had the "Too many levels of symbolic links" error some years ago, whil

Re: [slurm-users] Per-job TMPDIR: how to lookup gres allocation in prolog?

2021-11-17 Thread Bjørn-Helge Mevik
og, please? We are using basically the same setup, and have not found any other way than running "scontrol show job ..." in the prolog (even though it is not recommended). I have yet to see any problems arising from it, but YMMW. If you find a different way, please share it with the list!

Re: [slurm-users] Warning: can't honor --ntasks-per-node

2021-11-17 Thread Bjørn-Helge Mevik
ing in some situations with IntelMPI. In all our cases, "srun hostname" or "mpirun hostname" shows that it *does* honor --ntasks-per-node. (So we generally just ask our users to check with "srun hostname", and ignore the warning if it works as expected.) -- Regar

Re: [slurm-users] Bug when I run "sinfo --states=idle"

2021-10-29 Thread Bjørn-Helge Mevik
David Henkemeyer writes: > I just noticed today that when I run "sinfo --states=idle", I get all the > idle nodes, plus an additional node that is in the "DRAIN" state (notice > how xavier6 is showing up below, even though its not in the idle state): I *think* this could be because if you drain

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

2021-09-21 Thread Bjørn-Helge Mevik
p to the job step processes. See the enable_nss_slurm LaunchParameters in man slurm.conf, and the URL in that description. -- Regards, Bjørn-Helge Mevik signature.asc Description: PGP signature

Re: [slurm-users] Is this a known error?

2021-09-17 Thread Bjørn-Helge Mevik
Andreas Davour writes: > [2021-09-17T08:53:49.166] error: unpack_header: protocol_version 8448 > not supported > [2021-09-17T08:53:49.166] error: unpacking header > [2021-09-17T08:53:49.166] error: destroy_forward: no init > [2021-09-17T08:53:49.166] error: slurm_receive_msg_and_forward: > Messag

Re: [slurm-users] FreeMem is not equal to (RealMem - AllocMem)

2021-09-14 Thread Bjørn-Helge Mevik
Pavel Vashchenkov writes: > There is a line "RealMemory=257433 AllocMem=155648 FreeMem=37773 > Sockets=2 Boards=1" > > > My question is: Why there is so few FreeMem (37 GB instead of expected > 100 GB (RealMem - AllocMem))? If I recall correctly, RealMem is what you have configured in slurm.con

Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-08 Thread Bjørn-Helge Mevik
no program is run. See section UNKILLABLE STEP PROGRAM SCRIPT for more informa‐ tion. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Building SLURM with X11 support

2021-05-28 Thread Bjørn-Helge Mevik
Thekla Loizou writes: > Also, when compiling SLURM in the config.log I get: > > configure:22291: checking whether Slurm internal X11 support is enabled > configure:22306: result: > > The result is empty. I read that X11 is build by default so I don't > expect a special flag to be given during com

Re: [slurm-users] schedule mixed nodes first

2021-05-17 Thread Bjørn-Helge Mevik
Durai Arasan writes: > Is there a way of improving this situation? E.g. by not blocking IDLE nodes > with jobs that only use a fraction of the 8 GPUs? Why are single GPU jobs > not scheduled to fill already MIXED nodes before using IDLE ones? > > What parameters/configuration need to be adjusted

Re: [slurm-users] How can I get complete field values with without specify the length

2021-03-08 Thread Bjørn-Helge Mevik
an option --parsable2(*) specifically designed for parsing output, and which does not truncate long field values. (*) There is also --parsable, but that puts an extra "|" at the end of the line, so I prefer --parsable2. -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, Univ

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-01-24 Thread Bjørn-Helge Mevik
Thanks for the heads-up, Ole! -- B/H signature.asc Description: PGP signature

Re: [slurm-users] Set a ramdom offset when starting node health check in SLURM

2020-11-27 Thread Bjørn-Helge Mevik
tions." -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Slurm User Group Meeting (SLUG'20) Agenda Posted

2020-08-31 Thread Bjørn-Helge Mevik
Just wondering, will we get our t-shirts by email? :D -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] GrpMEMRunMins equivalent?

2020-06-06 Thread Bjørn-Helge Mevik
s in the list. Interesting to know! -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] GrpMEMRunMins equivalent?

2020-06-04 Thread Bjørn-Helge Mevik
missing something? GrpTRESRunMins For instance: GrpTRESRunMins=Memory=1000,Cpu=2000 See man sacctmgr for details. -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Problem with permisions. CentOS 7.8

2020-05-28 Thread Bjørn-Helge Mevik
Ferran Planas Padros writes: > I run the command as slurm user, and the /var/log/munge folder does belong to > slurm. For security reasons, I strongly advise that you run munged as a separate user, which is unprivileged and not used for anything else. -- Regards, Bjørn-Helge Mev

Re: [slurm-users] How to get command from a finished job

2020-04-30 Thread Bjørn-Helge Mevik
ir is available with sacct, IIRC. For other types of information, I believe you can add code to your job_submit.lua that stores it in the job's AdminComment field, which sacct can display. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Bjørn-Helge Mevik
Jean-mathieu CHANTREIN writes: > But that is not enough, it is also necessary to use srun in > test.slurm, because the signals are sent to the child processes only > if they are also children in the JOB sense. Good to know! -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for

Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Bjørn-Helge Mevik
t builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed. So try using sleep 200 & wait instead. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] log rotation for slurmctld.

2020-03-16 Thread Bjørn-Helge Mevik
configure, just reopend the log file. Thanks for the reminder! -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] log rotation for slurmctld.

2020-03-13 Thread Bjørn-Helge Mevik
endscript (That is for both slurmctld.log and slurmdbd.log.) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Question about SacctMgr....

2020-02-28 Thread Bjørn-Helge Mevik
aracters left justified. (in addition to using it in a couple of examples). :) -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] memory in job_submit.lua

2020-02-27 Thread Bjørn-Helge Mevik
ot;bigmem" then slurm.log_info( "non-bigmem job from uid %d with memory specification: Denying.", job_desc.user_id) slurm.user_msg("Memory specification only allowed for bigmem jobs") return 2044 -- Signal ESLURM_INVALID_TASK

Re: [slurm-users] Question on how to make slurm aware of a CVMFS revision

2020-02-27 Thread Bjørn-Helge Mevik
)? Perhaps the daemon process could simply run "scontrol update node= ..." when it detects a change? -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

2019-12-12 Thread Bjørn-Helge Mevik
tend to ban --exclusive on our clusters (or at least warn about it). I haven't looked at the code for a long time, so I don't know whether this is still the current behaviour, but every time I've tested, I've seen the same problem. I believe I've tested on 19.05 (but I

  1   2   >