92] error: MailProg returned error, it's
output was ''
27450:[2024-12-19T15:56:58.849] slurmscriptd: error: run_command:
killing MailProg operation on shutdown
27451:[2024-12-19T15:56:58.859] slurmscriptd: _run_script: JobId=0
MailProg killed by signal 0
any hints?
Best
Marcus
Am 15.12.2022 um 08:23 schrieb Bjørn-Helge Mevik:
Marcus Wagner writes:
it it important to know, that the json output seems to be broken.
First of all, it does not (compared to the normal output) obey to the truncate
option -T.
But more important, I saw a job, where in a "day output
other tools (or read, if you really want :).
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Server, Storage, HPC
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
S
967214 | grep CPU_ID
Nodes=r17 CPU_IDs=8-11,20-23 Mem=51200 GRES=
# cat /sys/fs/cgroup/cpuset/slurm/uid_5164679/job_1967214/cpuset.cpus
16-23
I am totally lost now. Seems totally random. SLURM devs? Any insight?
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Wed, 14 Dec 2022 1:33am, Marc
patient information, please contact the Mass General Brigham Compliance
HelpLine at https://www.massgeneralbrigham.org/complianceline
<https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to
continue communication over unencr
are using slurm 21
Thanks & Regards,
Purvesh
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Server, Storage, HPC
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aache
. In
container I switch user to jack, now, if I submit a job to slurm cluster, the
job owner is jack.
So I use the tom account submit a jack's job.
Any help will be appreciated.
--GHui
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Server, Storage, HPC
Abteilung: Systeme und Betrieb
nown option: partition=gpu
Use keyword 'where' to modify condition
This is not possible?
The only solution I found to that is to delete the association and create it
again with the partition:
sacctmgr del user thekla account=ops
sacctmgr add user thekla account=ops partition=gpu
rds.
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Server, Storage, HPC
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
Social Media Kanäle des IT Centers:
https://blog.r
Hi Diego,
sorry for the delay.
On 10/18/21 14:20, Diego Zuccato wrote:
Il 15/10/2021 06:02, Marcus Wagner ha scritto:
mostly, our problem was, that we forgot to add/remove a node to/from
the partitions/topology file, which caused slurmctld to deny startup.
So I wrote a simple checker for
ugh before committing it? (And sometimes crashing slurmctld in the
process...)
Thanks!
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Server, Storage, HPC
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@it
associated with the accounts or zeroing their hours?
We are using slurm version 19.05.7
Thanks
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Server, Storage, HPC
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag
is
anyway and see if anyone had any thoughts.
Thanks,
__
*Jacob D. Chappell, CSM*
Research Computing | Research Computing Infrastructure
Information Technology Services | University of Kentucky
jacob.chapp...@uky.edu <mailto:jacob.chapp...@uky.edu
Hi Loris,
pam slurm adopt just allows or disallows a user to login to a node,
depending if a job runs or not.
Yet you have to do something, that the user can login passwordless, e.g.
through host-based authentication.
Best
Marcus
Am 21.05.2021 um 14:53 schrieb Loris Bennett:
Hi,
We have se
ry core is at 100% utilization? Or, what if the
application is MPI + OpenMP? In that case, that one process on the login node
could spawn multiple threads that use the remaining cores on the login node.
Prentice
On 4/26/21 2:01 AM, Marcus Wagner wrote:
Hi,
we also have a wrapper script, together
hard rss 5000
* hard data 5000
* soft stack 4000
* hard stack 5000
* hard nproc 250
/Ole
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Sys
(-2)
[2021-02-02T17:16:17.141] error: The database must be up when starting the
MYSQL plugin. Trying again in 5 seconds.
[2021-02-02T17:16:22.804] error: mysql_real_connect failed: 2005 Unknown MySQL
server host 'smater' (-2)
[2021-02-02T17:16:22.804] error: The database must be up whe
wAccounts=,
and AllowGroups=
We will be having a number of condo partitions, and would be nice if only the
slurm admins, and condo owner could see the partition only.
Anyone have a work around?
Best,
Chris
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Systemgruppe Linux
Abteilung: System
host
[2020-11-10T11:21:40.372] [11.0] debug: Message thread exited
[2020-11-10T11:21:40.372] [11.0] debug: mpi/pmi2: agent thread exit
[2020-11-10T11:21:40.372] [11.0] *done with job*
But I do not understand what this "No route to host" means.
Thanks for your help.
Patri
tte College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
F
I've been places where that can take 24 hours.
It's been more than a week since the first failure :( And our forest
usually propagates changes in just a few minutes (more often in seconds).
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RW
source files and tree
and it doesn’t appear. How can I install (and use) this tool-wrapper?
Thanks.
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
t.
Best
Marcus
Am 23.06.2020 um 15:41 schrieb Taras Shapovalov:
Hi Marcus,
This may depend on ConstrainDevices in cgroups.conf. I guess it is set
to "no" in your case.
Best regards,
Taras
On Tue, Jun 23, 2020 at 4:02 PM Marcus Wagner <mailto:wag...@itc.rwth-aachen.de>> wrote:
-Original Message-
From: slurm-users On Behalf Of Marcus
Wagner
Sent: Tuesday, June 16, 2020 9:17 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
Hi David,
if I remember right, if you use cgroups, CUDA_VISIBLE_DEVICES always
s
uzaki)
kota.tsuyuzaki...@hco.ntt.co.jp <mailto:kota.tsuyuzaki...@hco.ntt.co.jp>
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
-
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Se
=N/A MCS_label=N/A
Partitions=pharmacy
BootTime=2020-05-15T09:26:45 SlurmdStartTime=2020-05-15T16:35:13
CfgTRES=cpu=40,mem=48000M,billing=40
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
--
Marcus Wagner, Dipl.-Inf.
IT Cent
be give an example?
https://slurm.schedmd.com/prolog_epilog.html
Just a thought,
Erik
--
Erik Ellestad
Wynton Cluster SysAdmin
UCSF
-Original Message-
From: slurm-users On Behalf Of Marcus
Wagner
Sent: Tuesday, May 12, 2020 10:08 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-us
Hi Joakim,
one more thing to mention:
Am 11.05.2020 um 19:23 schrieb Joakim Hove:
ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ scontrol show node
NodeName=ip-172-31-80-232 Arch=x86_64 CoresPerSocket=1
Reason=Low RealMemory [root@2020-05-11T16:20:02]
The "State=IDLE+DRAIN" looks a bit susp
Hi Erik,
the output of task-prolog is sourced/evaluated (not really sure, how) in
the job environment.
Thus you don't have to export a variable in task-prolog, but echo the
export, e.g.
echo export TMPDIR=/scratch/$SLURM_JOB_ID
The variable will then be set in job environment.
Best
Marcu
Definitively not up to now, just checked the sources of 20.02.2, the
same problem there.
Seems, someone with a contract needs to open a ticket.
Best
Marcus
Am 07.05.2020 um 10:50 schrieb Bill Broadley:
On 5/6/20 11:30 AM, Dustin Lang wrote:
Hi,
Ubuntu has made mysql 5.7.30 the default vers
Yeah,
and I found the reason. Seems that (at least for the mysql procedure
get_parent_limits) mySQL 5.7.30 returns NULL where mySQL 5.7.29 returned
an empty string.
Running mySQL < 5.7.30 is a bad idea, as there exist two remotely
exploitable bugs with a CVSS score of 9.8!
(see also https:
Sorry, forgot, we use by the way, slurm 18.08.7
I just saw, in an earlier coredump, that there is another (earlier) line
involved:
2136: if (row2[ASSOC2_REQ_MTPJ][0])
the corresponding mysql response was:
+-+--+--+--+--+---+---+---++---
Hi, same here :/
the segfault happens after the procedure call in mysql:
call get_parent_limits('assoc_table', 'rwth0515', 'rcc', 0); select
@par_id, @mj, @mja, @mpt, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm,
@def_qos_id, @qos, @delta_qos;
The mysql answer is:
+-+--+--+-
How about
sacct -o ALL
Am 23.04.2020 um 09:33 schrieb Gestió Servidors:
Hello,
When a job is “pending” or “running”, with “scontrol show
jobid=#jobjumber” I can get some usefull information, but when the job
has finished, that command doesn’t return anything. For example, if I
run a “sacct”
t, this sinfo command returned no result.
Regards,
Chansup
On Fri, Apr 3, 2020 at 1:28 AM Marcus Wagner
mailto:wag...@itc.rwth-aachen.de>> wrote:
Hi Chansup,
could you provde a code snippet?
Best
Marcus
Am 02.04.2020 um 19:43 schrieb CB:
> Hi,
>
> I
Hi Chansup,
could you provde a code snippet?
Best
Marcus
Am 02.04.2020 um 19:43 schrieb CB:
Hi,
I'm running Slurm 19.05.
I'm trying to execute some Slurm commands from the Lua job_submit script
for a certain condition.
But, I found that it's not executed and return nothing.
For example, I
happening.
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
a job_desc_msg_t structure? Thanks,
John
-Original Message-
From: slurm-users on behalf of Marcus Wagner
Reply-To: Slurm User Community List
Date: Tuesday, March 24, 2020 at 9:49 AM
To: "slurm-users@lists.schedmd.com"
Subject: Re: [slurm-users] Slurm Perl API use an
In fact, we ARE using the perl API, but there are some flaws.
E.g. the array_task_str of the jobinfo structure. Slurm abbreviates long
list of array indices, like scontrol does:
e.g.
1-3,5-8,45-...
yes, you can really find there three dots. In my opinion, this is ok for
a general tool like s
Hi Pascal,
are the slurmdbd and slurmctld running on he same host?
Best
Marcus
Am 20.03.2020 um 18:12 schrieb Pascal Klink:
Hi Chris,
Thanks for the quick answer! I tried the 'sacctmgr show clusters‘ command,
which gave
Cluster ControlHost ControlPort RPC Share ... QOS
Hi Gestio,
yes, that is something, we have done several times.
The coordinators are able to cancel other users jobs in the account.
We have instructed the corrdinators, to not change anything regarding
the accounting database (the things describben in the manual), it is
primarily used to cancel
sacctmgr add coordinator account= names=
Best
Marcus
P.S.:
sorry, the right term was coordinator, not account administrator. Sorry
for the confusion
Am 16.03.2020 um 13:17 schrieb Sysadmin CAOS:
and how can I add "Account Administrators"? in the accounting database?
or in a configuration fil
Hi,
you can add Account Administrators, but they are also allowed to create
subaccounts.
Best
Marcus
Am 16.03.2020 um 10:46 schrieb Sysadmin CAOS:
Hi,
is there any configuration way for allowing a normal user to cancel jobs
for users that belong to the same system group or account group?
o pid)
kill -hup $(ps -C slurmdbd h -o pid)
endscript
(That is for both slurmctld.log and slurmdbd.log.)
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aach
: Leaving _msg_thr_internal
salloc: debug2: spank: spank_cloud.so: exit = 0
salloc: debug2: spank: spank_nv_gpufreq.so: exit = 0
So good idea, seems someone defined "SLURM_HINT=nomultithread" in all
users env. Removing that makes the allocation succeed.
--
Marcus Wagner, Dipl.-Inf.
"
So in summary: "CPU" for the srun/sbatch/salloc means "(physical)
core". "CPU" as for scontrol (and pyslurm which seems to wrap this)
means "Thread". This is confusing but at least the question seems to
be answered now.
--
Marcus Wagner, Dipl
pus-per-task` can be as high as
176 on this node and `--mem-per-cpu` can be up to the reported
"RealMemory"/176?
Yes.
Cheers,
Loris
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
t;
>>> This message is from an external sender. Learn more about why
this
>>> matters.
>>> <https://ut.service-now.com/sp?id=kb_article&number=KB0011401>
>>>
>>>
>>
- --
|| \\UTGERS, |-
Hi folks,
does anyone know how to detect in the lua submission script, if the user
used --mem or --mem-per-cpu?
And also, if it is possible to "unset" this setting?
The reason is, we want to remove all memory thingies set by the user for
exclusive jobs.
Best
Marcus
--
Marcus Wa
Research is a company limited by guarantee, registered in
England at Harpenden, Hertfordshire, AL5 2JQ under the registration
number 2393175 and a not for profit charity number 802038.
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074
Hi Matthias,
the jobs are always children of slurmd, so they inherit slurmds
settings. So you have to modify the systemd-unit, e.g. like the following:
[Service]
LimitNOFILE=51200
LimitMEMLOCK=infinity
LimitSTACK=infinity
LimitCORE=8388608:infinity
Best
Marcus
Am 12.02.2020 um 13:17 schrie
On Behalf Of
Marcus Wagner
Sent: Tuesday, February 4, 2020 2:31 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] sbatch script won't accept --gres that
requires more than 1 gpu
Hi Dean,
could you please try to restart the slurmctld?
This usually helps on our site.
Never saw
es all exist in /dev.
What's the controller complaining about?
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
20 13:34:00 2020
[root@node001 ~]# free -h
total used free shared buff/cache
available
Mem: 187G 69G 96G 4.0G 21G 112G
Swap: 11G 11G 55M
What setting is incorrect here?
--
Marcus Wagner, Dipl.
uld like inquire at a later date about the timestamp for the user
creation. As far as I can tell, the sacctmgr command cannot show such
timestamps.
Hi Ole,
for me (currently running Slurm version 19.05.2) the command
sacctmgr list transactions Action="Add Users"
also shows timestamps. Isn't this what you are looking for?
Best regards
Jürgen
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
in v18.
Sincerely,
Béatrice
Le 12 déc. 2019 à 12:10, Marcus Wagner a écrit :
Hi Beatrice and Bjørn-Helge,
I can sign, that it works with 18.08.7. We additionally use TRESBillingWeights
together with PriorityFlags=MAX_TRES. For example:
TRESBillingWeights="CPU=1.0,Mem=0.1875G,gr
early about CentOS8, which does.
On Fri, 10 Jan 2020 at 12:56, Marcus Wagner <mailto:wag...@itc.rwth-aachen.de>> wrote:
Hi William,
a
RuntimeDirectory=slurm
should suffice.
"If set, one or more directories by the specified names will be
created
bel
C
Garscube Campus
University of Glasgow
shane.ke...@glasgow.ac.uk
ext: 3031
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
FINITE State=UP
-
Cheers
--
Nicholas Yue
Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue
https://vimeo.com/channels/naiadtools
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aa
abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
StdIn=/dev/null
StdOut=/home/abspou/OpenFOAM/abbaspour-6/run/laminarSMOKEPhi1U1/alpha3.45U1phi1lamSmoke.log
Power=
I can not figure out what is the root of the problem.
Regards,
Mahmood
On Tue, Dec 17,
d not 200 GB per node. For all nodes
this counts in total to 40 GB as you request 4 nodes. The number
of tasks per node does not matter for this limit.
Best ;-)
Sebastian
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter
ownership and permissions and then
changing them back).
Apparently the node is communicating with the controller, but munge
thinks I have a bad credential.
Any idea how to troubleshoot this?
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
warn about it).
I haven't looked at the code for a long time, so I don't know whether
this is still the current behaviour, but every time I've tested, I've
seen the same problem. I believe I've tested on 19.05 (but I might
remember wrong).
--
Marcus Wagner, Dipl.-Inf.
:05.882] [164977.extern] *_oom_event_monitor:
oom-kill event count: 1*
[2019-11-07T16:16:05.886] [164977.extern] done with job
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth
y=yes"? I assume the interaction between jobs (sometimes
jobs can get stalled) is due to context switching at the kernel
level, however (apart from educating users) how can we minimise that
switching on the serial nodes?
Best regards,
David
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: S
educating users) how can we minimise that
switching on the serial nodes?
Best regards,
David
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
the extern step which is a bonus I guess.
It would be nice to have some more clarification from other sites, or devs on
this.
Best,
Chris
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 24
m the
jobacct_gather/linux plugin vs the cgroup version. In fact, the
extern step now has data where as it is empty when using the cgroup
version.
Anyone know the differences?
Best,
Chris
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter W
root root
local
local mahmood
local teshyt
local test
local test10
local test11
local test12
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
T
2019 at 2:41 PM Marcus Wagner
mailto:wag...@itc.rwth-aachen.de>>
wrote:
Hi Eddy,
what is the result of "id 1000" on the submithost and on
piglet-18?
Best
Marcus
On 10/7/19 8:07 AM, Eddy
getpwuid(0x9e7e, 0x7f0850f18a00, 9, 9) = 0x7f0850f19260
Did you restart munge and afterwards restart the slurm daemons?
Though, the error with wrong munge keys are more like "zero bytes
transmitted".
I'm a little bit confused.
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung:
:39:47.148] [20.0] debug: Message thread exited
[2019-10-07T13:39:47.149] [20.0] done with job
I am not sure what i am missing. Hope someone can point out what i am
doing wrong here.
Thank you.
Best regards,
Eddy Swan
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betr
x27;m not sure
how a constrain with 'or' is resolved if multiple solutions are
available.
What happens if you write the constraint as
--constraint="broadwell|haswell"
?
Cheers,
Loris
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
e, is more like:
"What difference does it make whether I use 'srun' or 'mpirun' within
a batch file started with 'sbatch'."
That's exactly the question I wanted to ask.
Thanks again.
Best regards
Jürgen
--
Marcus Wagner, Dipl.-Inf.
IT Cente
that
word right?) unit? Isn't it better to ask for memory per task/process?
Best
Marcus
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 8/20/19, 1:37 AM, "slurm-users on behalf of Marcus Wagner"
wrote:
lurm.conf.
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 8/20/19, 1:37 AM, "slurm-users on behalf of Marcus Wagner"
wrote:
Just made another test.
Thanks god, the exclusivity is not "destroyed&
ll do some more tests.
Best
Marcus
On 8/20/19 9:47 AM, Marcus Wagner wrote:
Hi Folks,
I think, I've stumbled over a BUG in Slurm regarding the
exclusiveness. Might also, I've misinterpreted something. I would be
happy, if someone could explain that to me in the latter case.
To th
*
TRES=cpu=2,mem=1M,node=1,billing=2
Why "destroys" '--mem-per-cpu' exclusivity?
Best
Marcus
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
run "salloc ./run.sh",
it puts the job on the frontend.
Is that normal? If there is any problem with the node I have
specified, then I should receive an error or waiting message. Isn't that?
Regards,
Mahmood
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Bet
l.
@Jeffrey It is expected to be multi-user. As for your third option, I
think you refer to something similar to what I wrote for Patrick.
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax
mmented out, and I have no intention
in using one.
I have searched documentation and previous posted question of this,
but have not found a solution.
Any help is much appreciated, thank you!
Best regards,
Palle
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aac
a valid controller
Thanks
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
problem a bit. We're also experimenting with GroupUpdateForce and
GroupUpdateTime to reduce the number of times slurmctld needs to ask
about groups, but I'm unsure how much that helps.
--
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, Unive
0 127.0.0.11:45412 0.0.0.0:* LISTEN -
tcp6 0 0 :::22 :::* LISTEN 20/sshd
udp 0 0 127.0.0.11:57504
0.0.0.0:* -
[root@slurmctld_container ~]#
cheers
josef
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme
lan.o...@gmail.com>
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel:
ind of means that you are physically logged into the
machine
I am connecting through a vnc session. Right now, I have access to the
desktop of the frontend and I open a terminal and run things.
I even
xclock is working on the frontend and compute-0-0 (through ssh -Y).
Regards,
Mahmood
and ncg10.
This one, as these nodes were already drained, slurmctld issued the
reboot and the nodes are now up again.
Does anyone has similar issues, or a clue, where this behaviour might
come from?
Best
Marcus
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aach
le GPU cards allocation on
different worker nodes are not available, the post is in
2017, is it still true at present?
Thanks a lot for your help.
Best regards,
Ran
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betr
github.com/SchedMD/slurm/commit/cecb39ff087731d29252bbc36b00abf814a3c5ac
So recent versions should already have this.
All the best,
Chris
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax:
ry-gpu=index,name --format=csv
index, name
0, Tesla T4
[computelab-136:~]$ sudo systemctl daemon-reload; sudo systemctl
restart slurmd
[computelab-136:~]$ nvidia-smi --query-gpu=index,name --format=csv
index, name
0, Tesla T4
On Thu, Apr 11, 2019 at 7:53 AM Marcus Wagner
mailto:wag...@itc.rw
this is probably systemd messing
up with cgroups and deciding it's the king of cgroups on the host.
You'll find more context and details in
https://bugs.schedmd.com/show_bug.cgi?id=5292
Cheers,
--
Kilian
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Sy
forwarding through SLURM.
Best
Marcus
On 3/29/19 7:45 PM, Marcus Wagner wrote:
Hi Loris,
Am 29.03.2019 um 14:01 schrieb Loris Bennett:
Hi Marcus,
Marcus Wagner writes:
Hi Loris,
On 3/25/19 1:42 PM, Loris Bennett wrote:
3. salloc works fine too without --x11, subsequent srun with a x11
app
1, 2019 at 11:24 PM Marcus Wagner
mailto:wag...@itc.rwth-aachen.de>> wrote:
Dear Randall,
could you please also provide
scontrol -d show node computelab-134
scontrol -d show job 100091
scontrol -d show job 100094
Best
Marcus
On 4/1/19 4:31 PM, Rand
93 Prio=1
Partition=test-backfill
[2019-04-01T08:16:53.281] backfill test for JobID=100094 Prio=1
Partition=test-backfill
[2019-04-01T08:16:53.281] backfill: reached end of job queue
[2019-04-01T08:16:53.281] backfill: completed testing 2(2) jobs, usec=707
--
Marcus Wagner, Dipl.-Inf.
Hi Loris,
Am 29.03.2019 um 14:01 schrieb Loris Bennett:
Hi Marcus,
Marcus Wagner writes:
Hi Loris,
On 3/25/19 1:42 PM, Loris Bennett wrote:
3. salloc works fine too without --x11, subsequent srun with a x11 app works
great
Doing 'salloc' followed by 'ssh -X' wor
ys gets 2 nodes.
Noam
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
412K 00:01:43 00:00.158
1053837.ext+ extern 543880K 00:01:42 02:00.001
Best
Marcus
Cheers,
Loris
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
w.x
ghatee 16396 0.0 0.0 112664 952 pts/16 S+ 11:00 0:00 grep pw.x
process ids 16219, 16220, 16221 and 16222
Or did I miss something?
Best
Marcus
Regards,
Mahmood
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg
node logs or slurmctl logs
suggesting the source of the SIGTERM.
Thank you,
Doug Meyer
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de
luster,
leaving the other 50% to all the other users? If it's not an issue, OK, but if
it is, any way to reduce the 'root' share?
Thanks,
Will
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241
1 - 100 of 148 matches
Mail list logo