[slurm-users] sending mails wit smail on rocky9

2024-12-19 Thread Marcus Wagner via slurm-users
92] error: MailProg returned error, it's output was '' 27450:[2024-12-19T15:56:58.849] slurmscriptd: error: run_command: killing MailProg operation on shutdown 27451:[2024-12-19T15:56:58.859] slurmscriptd: _run_script: JobId=0 MailProg killed by signal 0 any hints? Best Marcus

Re: [slurm-users] How to read job accounting data long output? `sacct -l`

2022-12-14 Thread Marcus Wagner
Am 15.12.2022 um 08:23 schrieb Bjørn-Helge Mevik: Marcus Wagner writes: it it important to know, that the json output seems to be broken. First of all, it does not (compared to the normal output) obey to the truncate option -T. But more important, I saw a job, where in a "day output

Re: [slurm-users] How to read job accounting data long output? `sacct -l`

2022-12-14 Thread Marcus Wagner
other tools (or read, if you really want :). -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Server, Storage, HPC Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de S

Re: [slurm-users] CPUSpecList confusion

2022-12-14 Thread Marcus Wagner
967214 | grep CPU_ID     Nodes=r17 CPU_IDs=8-11,20-23 Mem=51200 GRES= # cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1967214/cpuset.cpus 16-23 I am totally lost now. Seems totally random. SLURM devs?  Any insight? -- Paul Raines (http://help.nmr.mgh.harvard.edu) On Wed, 14 Dec 2022 1:33am, Marc

Re: [slurm-users] CPUSpecList confusion

2022-12-13 Thread Marcus Wagner
patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> . Please note that this e-mail is not secure (encrypted).  If you do not wish to continue communication over unencr

Re: [slurm-users] can a job run across partition in slurm

2022-09-12 Thread Marcus Wagner
are using slurm 21 Thanks & Regards, Purvesh -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Server, Storage, HPC Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aache

Re: [slurm-users] container on slurm cluster

2022-05-16 Thread Marcus Wagner
. In container I switch user to jack, now, if I submit a job to slurm cluster, the job owner is jack. So I use the tom account submit a jack's job. Any help will be appreciated. --GHui -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Server, Storage, HPC Abteilung: Systeme und Betrieb

Re: [slurm-users] Add partition to existing user association

2022-01-24 Thread Marcus Wagner
nown option: partition=gpu  Use keyword 'where' to modify condition This is not possible? The only solution I found to that is to delete the association and create it again with the partition: sacctmgr del user thekla account=ops sacctmgr add user thekla account=ops partition=gpu

Re: [slurm-users] How to get an estimate of job completion for planned maintenance?

2021-11-09 Thread Marcus Wagner
rds. -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Server, Storage, HPC Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de Social Media Kanäle des IT Centers: https://blog.r

Re: [slurm-users] slurm.conf syntax checker?

2021-10-26 Thread Marcus Wagner
Hi Diego, sorry for the delay. On 10/18/21 14:20, Diego Zuccato wrote: Il 15/10/2021 06:02, Marcus Wagner ha scritto: mostly, our problem was, that we forgot to add/remove a node to/from the partitions/topology file, which caused slurmctld to deny startup. So I wrote a simple checker for

Re: [slurm-users] slurm.conf syntax checker?

2021-10-14 Thread Marcus Wagner
ugh before committing it?  (And sometimes crashing slurmctld in the process...) Thanks! -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Server, Storage, HPC Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@it

Re: [slurm-users] is there a way to temporarily freeze an account?

2021-10-06 Thread Marcus Wagner
associated with the accounts or zeroing their hours? We are using slurm version 19.05.7 Thanks -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Server, Storage, HPC Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag

Re: [slurm-users] Slurm does not set memory.limit_in_bytes for tasks (but does for steps)

2021-06-23 Thread Marcus Wagner
is anyway and see if anyone had any thoughts. Thanks, __ *Jacob D. Chappell, CSM* Research Computing | Research Computing Infrastructure Information Technology Services | University of Kentucky jacob.chapp...@uky.edu <mailto:jacob.chapp...@uky.edu

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-21 Thread Marcus Wagner
Hi Loris, pam slurm adopt just allows or disallows a user to login to a node, depending if a job runs or not. Yet you have to do something, that the user can login passwordless, e.g. through host-based authentication. Best Marcus Am 21.05.2021 um 14:53 schrieb Loris Bennett: Hi, We have se

Re: [slurm-users] [External] Re: What is an easy way to prevent users run programs on the master/login node.

2021-05-19 Thread Marcus Wagner
ry core is at 100% utilization? Or, what if the application is MPI + OpenMP? In that case, that one process on the login node could spawn multiple threads that use the remaining cores on the login node. Prentice On 4/26/21 2:01 AM, Marcus Wagner wrote: Hi, we also have a wrapper script, together

Re: [slurm-users] What is an easy way to prevent users run programs on the master/login node.

2021-04-25 Thread Marcus Wagner
  hard    rss 5000 *   hard    data    5000 *   soft    stack   4000 *   hard    stack   5000 *   hard    nproc   250 /Ole -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Sys

Re: [slurm-users] Slurm - sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused

2021-02-03 Thread Marcus Wagner
(-2) [2021-02-02T17:16:17.141] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds. [2021-02-02T17:16:22.804] error: mysql_real_connect failed: 2005 Unknown MySQL server host 'smater' (-2) [2021-02-02T17:16:22.804] error: The database must be up whe

Re: [slurm-users] Hidden partition visibility issue

2021-01-21 Thread Marcus Wagner
wAccounts=, and AllowGroups= We will be having a number of condo partitions, and would be nice if only the slurm admins, and condo owner could see the partition only. Anyone have a work around?  Best, Chris -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: System

Re: [slurm-users] failed to send msg type 6002: No route to host

2020-11-12 Thread Marcus Wagner
host [2020-11-10T11:21:40.372] [11.0] debug:  Message thread exited [2020-11-10T11:21:40.372] [11.0] debug:  mpi/pmi2: agent thread exit [2020-11-10T11:21:40.372] [11.0] *done with job* But I do not understand what this "No route to host" means. Thanks for your help. Patri

Re: [slurm-users] Simple free for all cluster

2020-10-07 Thread Marcus Wagner
tte College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632 -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 F

Re: [slurm-users] error: user not found

2020-09-30 Thread Marcus Wagner
I've been places where that can take 24 hours. It's been more than a week since the first failure :( And our forest usually propagates changes in just a few minutes (more often in seconds). -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: Systeme und Betrieb RW

Re: [slurm-users] Tool-wrapper "sinteractive"

2020-06-25 Thread Marcus Wagner
source files and tree and it doesn’t appear. How can I install (and use) this tool-wrapper? Thanks. -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383

Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-23 Thread Marcus Wagner
t. Best Marcus Am 23.06.2020 um 15:41 schrieb Taras Shapovalov: Hi Marcus, This may depend on ConstrainDevices in cgroups.conf. I guess it is set to "no" in your case. Best regards, Taras On Tue, Jun 23, 2020 at 4:02 PM Marcus Wagner <mailto:wag...@itc.rwth-aachen.de>> wrote:

Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-23 Thread Marcus Wagner
-Original Message- From: slurm-users On Behalf Of Marcus Wagner Sent: Tuesday, June 16, 2020 9:17 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] How to view GPU indices of the completed jobs? Hi David, if I remember right, if you use cgroups, CUDA_VISIBLE_DEVICES always s

Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-16 Thread Marcus Wagner
uzaki) kota.tsuyuzaki...@hco.ntt.co.jp <mailto:kota.tsuyuzaki...@hco.ntt.co.jp> NTTソフトウェアイノベーションセンタ 分散処理基盤技術プロジェクト 0422-59-2837 - -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: Systeme und Betrieb RWTH Aachen University Se

Re: [slurm-users] Gres GPU Resource Issue

2020-05-18 Thread Marcus Wagner
=N/A MCS_label=N/A    Partitions=pharmacy    BootTime=2020-05-15T09:26:45 SlurmdStartTime=2020-05-15T16:35:13    CfgTRES=cpu=40,mem=48000M,billing=40    AllocTRES=    CapWatts=n/a    CurrentWatts=0 AveWatts=0    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s -- Marcus Wagner, Dipl.-Inf. IT Cent

Re: [slurm-users] Reset TMPDIR for All Jobs

2020-05-13 Thread Marcus Wagner
be give an example? https://slurm.schedmd.com/prolog_epilog.html Just a thought, Erik -- Erik Ellestad Wynton Cluster SysAdmin UCSF -Original Message- From: slurm-users On Behalf Of Marcus Wagner Sent: Tuesday, May 12, 2020 10:08 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-us

Re: [slurm-users] Slurm queue seems to be completely blocked

2020-05-12 Thread Marcus Wagner
Hi Joakim, one more thing to mention: Am 11.05.2020 um 19:23 schrieb Joakim Hove: ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ scontrol show node NodeName=ip-172-31-80-232 Arch=x86_64 CoresPerSocket=1    Reason=Low RealMemory [root@2020-05-11T16:20:02] The "State=IDLE+DRAIN" looks a bit susp

Re: [slurm-users] Reset TMPDIR for All Jobs

2020-05-12 Thread Marcus Wagner
Hi Erik, the output of task-prolog is sourced/evaluated (not really sure, how) in the job environment. Thus you don't have to export a variable in task-prolog, but echo the export, e.g. echo export TMPDIR=/scratch/$SLURM_JOB_ID The variable will then be set in job environment. Best Marcu

Re: [slurm-users] Do not upgrade mysql to 5.7.30!

2020-05-07 Thread Marcus Wagner
Definitively not up to now, just checked the sources of 20.02.2, the same problem there. Seems, someone with a contract needs to open a ticket. Best Marcus Am 07.05.2020 um 10:50 schrieb Bill Broadley: On 5/6/20 11:30 AM, Dustin Lang wrote: Hi, Ubuntu has made mysql 5.7.30 the default vers

Re: [slurm-users] Do not upgrade mysql to 5.7.30!

2020-05-06 Thread Marcus Wagner
Yeah, and I found the reason. Seems that (at least for the mysql procedure get_parent_limits) mySQL 5.7.30 returns NULL where mySQL 5.7.29 returned an empty string. Running mySQL < 5.7.30 is a bad idea, as there exist two remotely exploitable bugs with a CVSS score of 9.8! (see also https:

Re: [slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-06 Thread Marcus Wagner
Sorry, forgot, we use by the way, slurm 18.08.7 I just saw, in an earlier coredump, that there is another (earlier) line involved: 2136: if (row2[ASSOC2_REQ_MTPJ][0]) the corresponding mysql response was: +-+--+--+--+--+---+---+---++---

Re: [slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-06 Thread Marcus Wagner
Hi, same here :/ the segfault happens after the procedure call in mysql: call get_parent_limits('assoc_table', 'rwth0515', 'rcc', 0); select @par_id, @mj, @mja, @mpt, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos; The mysql answer is: +-+--+--+-

Re: [slurm-users] Show detailed information from a finished job

2020-04-23 Thread Marcus Wagner
How about sacct -o ALL Am 23.04.2020 um 09:33 schrieb Gestió Servidors: Hello, When a job is “pending” or “running”, with “scontrol show jobid=#jobjumber” I can get some usefull information, but when the job has finished, that command doesn’t return anything. For example, if I run a “sacct”

Re: [slurm-users] Executing slurm command from Lua job_submit script?

2020-04-14 Thread Marcus Wagner
t, this sinfo command returned no result. Regards, Chansup On Fri, Apr 3, 2020 at 1:28 AM Marcus Wagner mailto:wag...@itc.rwth-aachen.de>> wrote: Hi Chansup, could you provde a code snippet? Best Marcus Am 02.04.2020 um 19:43 schrieb CB: > Hi, > > I&#

Re: [slurm-users] Executing slurm command from Lua job_submit script?

2020-04-02 Thread Marcus Wagner
Hi Chansup, could you provde a code snippet? Best Marcus Am 02.04.2020 um 19:43 schrieb CB: Hi, I'm running Slurm 19.05. I'm trying to execute some Slurm commands from the Lua job_submit script for a certain condition. But, I found that it's not executed and return nothing. For example, I

Re: [slurm-users] Job are pending when plenty of resources available

2020-03-30 Thread Marcus Wagner
happening. -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Slurm Perl API use and examples

2020-03-25 Thread Marcus Wagner
a job_desc_msg_t structure? Thanks, John -Original Message- From: slurm-users on behalf of Marcus Wagner Reply-To: Slurm User Community List Date: Tuesday, March 24, 2020 at 9:49 AM To: "slurm-users@lists.schedmd.com" Subject: Re: [slurm-users] Slurm Perl API use an

Re: [slurm-users] Slurm Perl API use and examples

2020-03-24 Thread Marcus Wagner
In fact, we ARE using the perl API, but there are some flaws. E.g. the array_task_str of the jobinfo structure. Slurm abbreviates long list of array indices, like scontrol does: e.g. 1-3,5-8,45-... yes, you can really find there three dots. In my opinion, this is ok for a general tool like s

Re: [slurm-users] Accounting Information from slurmdbd does not reach slurmctld

2020-03-23 Thread Marcus Wagner
Hi Pascal, are the slurmdbd and slurmctld running on he same host? Best Marcus Am 20.03.2020 um 18:12 schrieb Pascal Klink: Hi Chris, Thanks for the quick answer! I tried the 'sacctmgr show clusters‘ command, which gave Cluster ControlHost ControlPort RPC Share ... QOS

Re: [slurm-users] Normal user cancelling a job

2020-03-18 Thread Marcus Wagner
Hi Gestio, yes, that is something, we have done several times. The coordinators are able to cancel other users jobs in the account. We have instructed the corrdinators, to not change anything regarding the accounting database (the things describben in the manual), it is primarily used to cancel

Re: [slurm-users] *****SPAM*****Re: Normal user cancelling a job

2020-03-16 Thread Marcus Wagner
sacctmgr add coordinator account= names= Best Marcus P.S.: sorry, the right term was coordinator, not account administrator. Sorry for the confusion Am 16.03.2020 um 13:17 schrieb Sysadmin CAOS: and how can I add "Account Administrators"? in the accounting database? or in a configuration fil

Re: [slurm-users] *****SPAM***** Normal user cancelling a job

2020-03-16 Thread Marcus Wagner
Hi, you can add Account Administrators, but they are also allowed to create subaccounts. Best Marcus Am 16.03.2020 um 10:46 schrieb Sysadmin CAOS: Hi, is there any configuration way for allowing a normal user to cancel jobs for users that belong to the same system group or account group?

Re: [slurm-users] log rotation for slurmctld.

2020-03-13 Thread Marcus Wagner
o pid) kill -hup $(ps -C slurmdbd h -o pid) endscript (That is for both slurmctld.log and slurmdbd.log.) -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aach

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-06 Thread Marcus Wagner
:  Leaving _msg_thr_internal salloc: debug2: spank: spank_cloud.so: exit = 0 salloc: debug2: spank: spank_nv_gpufreq.so: exit = 0 So good idea, seems someone defined "SLURM_HINT=nomultithread" in all users env. Removing that makes the allocation succeed. -- Marcus Wagner, Dipl.-Inf.

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-05 Thread Marcus Wagner
" So in summary: "CPU" for the srun/sbatch/salloc means "(physical) core". "CPU" as for scontrol (and pyslurm which seems to wrap this) means "Thread". This is confusing but at least the question seems to be answered now. -- Marcus Wagner, Dipl

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Marcus Wagner
pus-per-task` can be as high as 176 on this node and `--mem-per-cpu` can be up to the reported "RealMemory"/176? Yes. Cheers, Loris -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Slurm 19.05 X11-forwarding

2020-02-28 Thread Marcus Wagner
t; >>> This message is from an external sender. Learn more about why this >>> matters. >>> <https://ut.service-now.com/sp?id=kb_article&number=KB0011401> >>> >>> >> - --    || \\UTGERS,  |-

[slurm-users] memory in job_submit.lua

2020-02-27 Thread Marcus Wagner
Hi folks, does anyone know how to detect in the lua submission script, if the user used --mem or --mem-per-cpu? And also, if it is possible to "unset" this setting? The reason is, we want to remove all memory thingies set by the user for exclusive jobs. Best Marcus -- Marcus Wa

Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Marcus Wagner
Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038. -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074

Re: [slurm-users] Increasing the OpenFile under SLURM ....

2020-02-13 Thread Marcus Wagner
Hi Matthias, the jobs are always children of slurmd, so they inherit slurmds settings. So you have to modify the systemd-unit, e.g. like the following: [Service] LimitNOFILE=51200 LimitMEMLOCK=infinity LimitSTACK=infinity LimitCORE=8388608:infinity Best Marcus Am 12.02.2020 um 13:17 schrie

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-05 Thread Marcus Wagner
On Behalf Of Marcus Wagner Sent: Tuesday, February 4, 2020 2:31 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu Hi Dean, could you please try to restart the slurmctld? This usually helps on our site. Never saw

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-04 Thread Marcus Wagner
es all exist in /dev. What's the controller complaining about? -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Node node00x has low real_memory size & slurm_rpc_node_registration node=node003: Invalid argument

2020-01-20 Thread Marcus Wagner
20 13:34:00 2020 [root@node001 ~]# free -h               total        used        free      shared  buff/cache   available Mem:           187G         69G         96G        4.0G   21G        112G Swap:           11G         11G         55M What setting is incorrect here? -- Marcus Wagner, Dipl.

Re: [slurm-users] How to print a user's creation timestamp from the Slurm database?

2020-01-20 Thread Marcus Wagner
uld like inquire at a later date about the timestamp for the user creation. As far as I can tell, the sacctmgr command cannot show such timestamps. Hi Ole, for me (currently running Slurm version 19.05.2) the command sacctmgr list transactions Action="Add Users" also shows timestamps. Isn't this what you are looking for? Best regards Jürgen -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

2020-01-12 Thread Marcus Wagner
in v18. Sincerely, Béatrice Le 12 déc. 2019 à 12:10, Marcus Wagner a écrit : Hi Beatrice and Bjørn-Helge, I can sign, that it works with 18.08.7. We additionally use TRESBillingWeights together with PriorityFlags=MAX_TRES. For example: TRESBillingWeights="CPU=1.0,Mem=0.1875G,gr

Re: [slurm-users] Slurm 19-05-4-1 and Centos8

2020-01-10 Thread Marcus Wagner
early about CentOS8, which does. On Fri, 10 Jan 2020 at 12:56, Marcus Wagner <mailto:wag...@itc.rwth-aachen.de>> wrote: Hi William, a RuntimeDirectory=slurm should suffice. "If set, one or more directories by the specified names will be created bel

Re: [slurm-users] Slurm 19-05-4-1 and Centos8

2020-01-10 Thread Marcus Wagner
C Garscube Campus University of Glasgow shane.ke...@glasgow.ac.uk ext: 3031 -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Serial jobs on multi-core nodes using whole compute node

2020-01-01 Thread Marcus Wagner
FINITE State=UP - Cheers -- Nicholas Yue Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5 Custom Dev - C++ porting, OSX, Linux, Windows http://au.linkedin.com/in/nicholasyue https://vimeo.com/channels/naiadtools -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aa

Re: [slurm-users] Question about memory allocation

2019-12-17 Thread Marcus Wagner
abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log    StdIn=/dev/null   StdOut=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log    Power= I can not figure out what is the root of the problem. Regards, Mahmood On Tue, Dec 17,

Re: [slurm-users] Question about memory allocation

2019-12-16 Thread Marcus Wagner
d not 200 GB per node. For all nodes this counts in total to 40 GB as you request 4 nodes. The number of tasks per node does not matter for this limit. Best ;-) Sebastian -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter

Re: [slurm-users] slurmd.service fails to register

2019-12-16 Thread Marcus Wagner
ownership and permissions and then changing them back). Apparently the node is communicating with the controller, but munge thinks I have a bad credential. Any idea how to troubleshoot this? -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

2019-12-12 Thread Marcus Wagner
warn about it). I haven't looked at the code for a long time, so I don't know whether this is still the current behaviour, but every time I've tested, I've seen the same problem. I believe I've tested on 19.05 (but I might remember wrong). -- Marcus Wagner, Dipl.-Inf.

Re: [slurm-users] oom-kill events for no good reason

2019-11-08 Thread Marcus Wagner
:05.882] [164977.extern] *_oom_event_monitor: oom-kill event count: 1* [2019-11-07T16:16:05.886] [164977.extern] done with job -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth

Re: [slurm-users] Running job using our serial queue

2019-11-06 Thread Marcus Wagner
y=yes"? I assume the interaction between jobs (sometimes jobs can get stalled) is due to context switching at the kernel level, however (apart from educating users) how can we minimise that switching on the serial nodes? Best regards, David -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: S

Re: [slurm-users] Running job using our serial queue

2019-11-04 Thread Marcus Wagner
educating users) how can we minimise that switching on the serial nodes? Best regards, David -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] jobacct_gather/linux vs jobacct_gather/cgroup

2019-11-04 Thread Marcus Wagner
the extern step which is a bonus I guess. It would be nice to have some more clarification from other sites, or devs on this. Best, Chris -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 24

Re: [slurm-users] jobacct_gather/linux vs jobacct_gather/cgroup

2019-10-25 Thread Marcus Wagner
m the jobacct_gather/linux plugin vs the cgroup version. In fact, the extern step now has data where as it is empty when using the cgroup version. Anyone know the differences? Best, Chris -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter W

Re: [slurm-users] Removing user from slurm configuration

2019-10-10 Thread Marcus Wagner
    root       root       local       local    mahmood       local     teshyt       local       test       local     test10       local     test11       local     test12 -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen T

Re: [slurm-users] srun: Error generating job credential

2019-10-08 Thread Marcus Wagner
2019 at 2:41 PM Marcus Wagner mailto:wag...@itc.rwth-aachen.de>> wrote: Hi Eddy, what is the result of "id 1000" on the submithost and on piglet-18? Best Marcus On 10/7/19 8:07 AM, Eddy

Re: [slurm-users] srun: Error generating job credential

2019-10-07 Thread Marcus Wagner
getpwuid(0x9e7e, 0x7f0850f18a00, 9, 9) = 0x7f0850f19260 Did you restart munge and afterwards restart the slurm daemons? Though, the error with wrong munge keys are more like "zero bytes transmitted". I'm a little bit confused. -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung:

Re: [slurm-users] srun: Error generating job credential

2019-10-06 Thread Marcus Wagner
:39:47.148] [20.0] debug:  Message thread exited [2019-10-07T13:39:47.149] [20.0] done with job I am not sure what i am missing. Hope someone can point out what i am doing wrong here. Thank you. Best regards, Eddy Swan -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betr

Re: [slurm-users] "--batch" option of the sbatch command

2019-10-01 Thread Marcus Wagner
x27;m not sure how a constrain with 'or' is resolved if multiple solutions are available. What happens if you write the constraint as --constraint="broadwell|haswell" ? Cheers, Loris -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Marcus Wagner
e, is more like: "What difference does it make whether I use 'srun' or 'mpirun' within a batch file started with 'sbatch'." That's exactly the question I wanted to ask. Thanks again. Best regards Jürgen -- Marcus Wagner, Dipl.-Inf. IT Cente

Re: [slurm-users] exclusive or not exclusive, that is the question

2019-08-20 Thread Marcus Wagner
that word right?) unit? Isn't it better to ask for memory per task/process? Best Marcus Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 8/20/19, 1:37 AM, "slurm-users on behalf of Marcus Wagner" wrote:

Re: [slurm-users] exclusive or not exclusive, that is the question

2019-08-20 Thread Marcus Wagner
lurm.conf. Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 8/20/19, 1:37 AM, "slurm-users on behalf of Marcus Wagner" wrote: Just made another test. Thanks god, the exclusivity is not "destroyed&

Re: [slurm-users] exclusive or not exclusive, that is the question

2019-08-20 Thread Marcus Wagner
ll do some more tests. Best Marcus On 8/20/19 9:47 AM, Marcus Wagner wrote: Hi Folks, I think, I've stumbled over a BUG in Slurm regarding the exclusiveness. Might also, I've misinterpreted something. I would be happy, if someone could explain that to me in the latter case. To th

[slurm-users] exclusive or not exclusive, that is the question

2019-08-20 Thread Marcus Wagner
*    TRES=cpu=2,mem=1M,node=1,billing=2 Why "destroys" '--mem-per-cpu' exclusivity? Best Marcus -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Job not running of the specified node

2019-07-09 Thread Marcus Wagner
run "salloc ./run.sh", it puts the job on the frontend. Is that normal? If there is any problem with the node I have specified, then I should receive an error or waiting message. Isn't that? Regards, Mahmood -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Bet

Re: [slurm-users] Problem with sbatch

2019-07-09 Thread Marcus Wagner
l. @Jeffrey It is expected to be multi-user. As for your third option, I think you refer to something similar to what I wrote for Patrick. -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax

Re: [slurm-users] Host not being a valid controller

2019-06-28 Thread Marcus Wagner
mmented out, and I have no intention in using one. I have searched documentation and previous posted question of this, but have not found a solution. Any help is much appreciated, thank you! Best regards, Palle -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aac

Re: [slurm-users] What means this error ?

2019-06-25 Thread Marcus Wagner
a valid controller Thanks -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Random "sbatch" failure: "Socket timed out on send/recv operation"

2019-06-12 Thread Marcus Wagner
problem a bit. We're also experimenting with GroupUpdateForce and GroupUpdateTime to reduce the number of times slurmctld needs to ask about groups, but I'm unsure how much that helps. -- Bjørn-Helge Mevik, dr. scient, Department for Research Computing, Unive

Re: [slurm-users] slurmctl listening on IPv4 only

2019-06-04 Thread Marcus Wagner
  0 127.0.0.11:45412 0.0.0.0:* LISTEN  - tcp6   0  0 :::22 :::*    LISTEN 20/sshd udp    0  0 127.0.0.11:57504 0.0.0.0:*   - [root@slurmctld_container ~]# cheers josef -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme

Re: [slurm-users] Issue with x11

2019-05-16 Thread Marcus Wagner
lan.o...@gmail.com> https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ―Friedrich Nietzsche -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel:

Re: [slurm-users] Issue with x11

2019-05-15 Thread Marcus Wagner
ind of means that you are physically logged into the machine I am connecting through a vnc session. Right now, I have access to the desktop of the frontend and I open a terminal and run things. I even xclock is working on the frontend and compute-0-0 (through ssh -Y). Regards, Mahmood

[slurm-users] scontrol reboot issue

2019-04-29 Thread Marcus Wagner
and ncg10. This one, as these nodes were already drained, slurmctld issued the reboot and the nodes are now up again. Does anyone has similar issues, or a clue, where this behaviour might come from? Best Marcus -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aach

Re: [slurm-users] How to apply for multiple GPU cards from different worker nodes?

2019-04-16 Thread Marcus Wagner
le GPU cards allocation on different worker nodes are not available, the post is in 2017, is it still true at present?       Thanks a lot for your help. Best regards, Ran -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betr

Re: [slurm-users] How does cgroups limit user access to GPUs?

2019-04-12 Thread Marcus Wagner
github.com/SchedMD/slurm/commit/cecb39ff087731d29252bbc36b00abf814a3c5ac So recent versions should already have this. All the best, Chris -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax:

Re: [slurm-users] How does cgroups limit user access to GPUs?

2019-04-11 Thread Marcus Wagner
ry-gpu=index,name --format=csv index, name 0, Tesla T4 [computelab-136:~]$ sudo systemctl daemon-reload; sudo systemctl restart slurmd [computelab-136:~]$ nvidia-smi --query-gpu=index,name --format=csv index, name 0, Tesla T4 On Thu, Apr 11, 2019 at 7:53 AM Marcus Wagner mailto:wag...@itc.rw

Re: [slurm-users] How does cgroups limit user access to GPUs?

2019-04-11 Thread Marcus Wagner
this is probably systemd messing up with cgroups and deciding it's the king of cgroups on the host. You'll find more context and details in https://bugs.schedmd.com/show_bug.cgi?id=5292 Cheers, -- Kilian -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Sy

Re: [slurm-users] X11 forwarding and VNC?

2019-04-09 Thread Marcus Wagner
forwarding through SLURM. Best Marcus On 3/29/19 7:45 PM, Marcus Wagner wrote: Hi Loris, Am 29.03.2019 um 14:01 schrieb Loris Bennett: Hi Marcus, Marcus Wagner writes: Hi Loris, On 3/25/19 1:42 PM, Loris Bennett wrote: 3. salloc works fine too without --x11, subsequent srun with a x11 app

Re: [slurm-users] Backfill isn’t working for a node with two GPUs that have different GRES types.

2019-04-03 Thread Marcus Wagner
1, 2019 at 11:24 PM Marcus Wagner mailto:wag...@itc.rwth-aachen.de>> wrote: Dear Randall, could you please also provide scontrol -d show node computelab-134 scontrol -d show job 100091 scontrol -d show job 100094 Best Marcus On 4/1/19 4:31 PM, Rand

Re: [slurm-users] Backfill isn’t working for a node with two GPUs that have different GRES types.

2019-04-01 Thread Marcus Wagner
93 Prio=1 Partition=test-backfill [2019-04-01T08:16:53.281] backfill test for JobID=100094 Prio=1 Partition=test-backfill [2019-04-01T08:16:53.281] backfill: reached end of job queue [2019-04-01T08:16:53.281] backfill: completed testing 2(2) jobs, usec=707 -- Marcus Wagner, Dipl.-Inf.

Re: [slurm-users] X11 forwarding and VNC?

2019-03-29 Thread Marcus Wagner
Hi Loris, Am 29.03.2019 um 14:01 schrieb Loris Bennett: Hi Marcus, Marcus Wagner writes: Hi Loris, On 3/25/19 1:42 PM, Loris Bennett wrote: 3. salloc works fine too without --x11, subsequent srun with a x11 app works great Doing 'salloc' followed by 'ssh -X' wor

Re: [slurm-users] number of nodes varies for no reason?

2019-03-29 Thread Marcus Wagner
ys gets 2 nodes. Noam -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] X11 forwarding and VNC?

2019-03-29 Thread Marcus Wagner
   412K   00:01:43  00:00.158 1053837.ext+ extern    543880K   00:01:42  02:00.001 Best Marcus Cheers, Loris -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Marcus Wagner
w.x ghatee   16396  0.0  0.0 112664   952 pts/16   S+ 11:00   0:00 grep pw.x process ids 16219, 16220, 16221 and 16222 Or did I miss something? Best Marcus Regards, Mahmood -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg

Re: [slurm-users] Source of SIGTERM

2019-03-08 Thread Marcus Wagner
node logs or slurmctl logs suggesting the source of the SIGTERM. Thank you, Doug Meyer -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de

Re: [slurm-users] Fairshare - root user

2019-02-27 Thread Marcus Wagner
luster, leaving the other 50% to all the other users? If it's not an issue, OK, but if it is, any way to reduce the 'root' share? Thanks, Will -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241

  1   2   >