d
libopamgt-devel packages.
So, since you seem to use the same version as me, I'm not sure why you
have these linking problems :/
Best
Marcus
On 06/14/2018 09:17 AM, Ole Holm Nielsen wrote:
Hi Jeffrey,
On 06/13/2018 10:35 PM, Jeffrey Frey wrote:
Intel's OPA doesn't include the
On 07/18/2018 10:56 AM, Roshan Thomas Mathew wrote:
We ran into this issue trying to move from 16.05.3 -> 17.11.7 with 1.5M
records in job table.
In our first attempt, MySQL reported "ERROR 1206 The total number of
locks exceeds the lock table size" after about 7 hours.
Increased InnoDB Buff
Hi Slurm users,
We have found the need to execute a parallel command on all nodes
running jobs belonging to a particular user.
I have made a configuration to the excellent ClusterShell tool as
documented in https://wiki.fysik.dtu.dk/niflheim/SLURM#clustershell
If you add a "slurmuser" secti
On 06-08-2018 12:53, Bjørn-Helge Mevik wrote:
There is also a Slurm plugin for pdsh (unfortunately not enabled in the
default redhat/centos RPMs) that lets you run a command on each node
belonging to a specific job with "pdsh -j ". Not
exactly the same, though. :)
Bjørn, that is a different t
Hi Tina,
Is it the same OS version for 17.02 and 17.11, or are you upgrading the
OS (and possibly the MySQL/MariaDB) at the same time? I assume you're
testing the Slurm upgrade on a test server and not the production cluster?
Did you check the steps mentioned in the thread "slurmdbd:
mysql/
Regarding the Slurm User Group Meeting 2018 coming up in Madrid, Spain
in two weeks from now: Has anyone heard information about hotels and
the schedule? The official page
https://slurm.schedmd.com/slurm_ug_agenda.html was last updated on May 30...
/Ole
hotel?
Thanks,
Ole
On 10-09-2018 17:33, Jacob Jenson wrote:
Ole,
You can find hotels close to CIEMAT here
https://drive.google.com/open?id=1eEKgnlBXeYNO426QS7nPuDS4nm8aUpnH&usp=sharing
Jacob
On Mon, Sep 10, 2018 at 1:23 AM, Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 12-09-2018 18:21, Andre Torres wrote:
I’m new to slurm and I’m confused regarding user creation. I have an
installation with 1 login node and 5 compute nodes. If I create a user
across all the nodes with the same uid and gid I can execute jobs but I
can’t understand the difference between us
On 09/27/2018 10:33 AM, Bjørn-Helge Mevik wrote:
Baker D.J. writes:
I guess that the question that comes to mind is.. Is it a really big deal if
the slurmctld process is down whilst the slurmdbd is being upgraded?
I tend to always stop slurmctld before upgrading slurmdbd, and have
never noti
On 09/27/2018 05:12 PM, Christopher Benjamin Coffey wrote:> 2.
Purge/archive unneeded jobs/steps before the upgrade, to make the
upgrade as quick as possible:
slurmdbd.conf:
ArchiveDir=/common/adm/slurmdb_archive
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveSteps=no
ArchiveResvs=no
ArchiveSuspend=
On 06-10-2018 04:15, 崔灏 (CUI Hao) wrote:
$ scontrol reconfigure
slurm_reconfigure error: SelectType change requires restart of the
slurmctld daemon to take effect
I'm afraid that restarting slurmctld will interrupt current tasks, so
I'm still waiting for them to finish.
There should be no prob
On 14-10-2018 06:30, Steven Dick wrote:
I've found that when creating a new cluster, slurmdbd does not
function correctly right away. It may be necessary to restart
slurmdbd at several points during the slurm installation process to
get everything working correctly.
Also, slurmctld will buffer
, and
what difference it makes when slurmdbd is restarted repeatedly. Are you
up for this task?
/Ole
On Sun, Oct 14, 2018 at 4:12 AM Ole Holm Nielsen
wrote:
Correct, and this is documented in the Slurm accounting setup page:
https://slurm.schedmd.com/accounting.html#database-configuration
Hi Roland,
That website is improperly configured. My Firefox browser says:
qlustar.com uses an invalid security certificate. The certificate is
only valid for the following names: docs.qlustar.com, www.qlustar.com
Error code: SSL_ERROR_BAD_CERT_DOMAIN
/Ole
On 10/16/2018 02:27 PM, Roland F
On 17-10-2018 20:13, Aravindh Sampathkumar wrote:
I built a SLURM cluster and am able to successfully run jobs as root.
However, when I try to submit jobs as a regular user, I hit permission
problems.
username@console:[~] > srun -N1 /bin/hostname
slurmstepd: error: couldn't chdir to `/usr/home
On 10/25/2018 07:00 AM, Christopher Samuel wrote:
On 25/10/18 2:29 pm, Christopher Samuel wrote:
Could explain why this isn't something we see consistently, and why
we're both seeing it currently.
This seems to be a handy way to find any processes that are not properly
constrained by Slurm c
nted after each jobid/user
-C: Color output is forced ON
-c: Color output is forced OFF
-h: Print this help information
-V: Version information
My monitoring of jobs is usually done simply with "pestat -F", and also
with "pestat -s mix".
/Ole
-
Hi Yalei,
On 21-11-2018 18:51, 宋亚磊 wrote:
How to check the percent cpu of a job in slurm? I tried sacct, sstat, squeue,
but I can't find that how to check.
Can someone help me?
I would recommend my "pestat" tool, which was also announced on the list
today. The CPUload is one of the many sta
On 21-11-2018 19:41, Ryan Novosielski wrote:
Olm’s “pestat” script does allow you to get similar information, but I’m
interested to see if indeed there’s a better answer. I’ve used his script for
more or less the same reason, to see if the jobs are using the resources
they’re allocated. They s
On 11/22/2018 12:10 AM, Christopher Samuel wrote:
I've just had a quick play with pestat and it reveals that Slurm
18.08.3 seems to have some odd ideas about load on nodes, for instance
one of our KNL nodes that is offline is reported with a CPUload of
2.70, but I can see nothing running on it an
On 29-11-2018 19:27, Christopher Benjamin Coffey wrote:
We've been noticing an issue with nodes from time to time that become "wedged",
or unusable. This is a state where ps, and w hang. We've been looking into this for a
while when we get time and finally put some more effort into it yesterday
FWIW, I have made some scripts to automate the creation of Slurm
accounts from the passwd database (not LDAP), see
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmaccounts
I hope this helps you getting started with Slurm.
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department o
d001
d002
d003
***
*** 595,600
--- 600,606
i048
i049
i050
+ i051
x001
x002
x003
Comments and suggestions are most welcome!
FYI: My Slurm Wiki contains available information about adding/removing
nodes: https://wiki.fysik.dtu.dk/niflheim/SLURM#add-and-remove-nodes
--
Ole Holm N
Hi Bjørn-Helge,
Thanks:
On 1/21/19 12:37 PM, Bjørn-Helge Mevik wrote:
Two more details/enhancements:
1) Sites which use node names like c[1-20]-[1-36], would benefit from
"sort -V" instead of just sort -- otherwise c10-12 will be listed before
c2-12, for instance. (For sites that use names li
On 1/21/19 12:18 PM, Bjørn-Helge Mevik wrote:
Ole Holm Nielsen writes:
Comments and suggestions are most welcome!
Splendid tool! I immediately found that I'd forgotten to take a few
nodes out of the topology definition. :)
Me too :-)
One thing: The script doesn't work i
ailable in this
page: https://c4science.ch/source/slurm-accounts/
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
Hi Nathalie,
Which Slurm version and which OS version are you using?
FYI: My Slurm Wiki contains all the details of setting up Slurm on
CentOS 7: https://wiki.fysik.dtu.dk/niflheim/SLURM
Best regards,
Ole
On 2/13/19 2:58 PM, Nathalie Gocht wrote:
Hey,
I am building up a one node cluster. M
On 2/26/19 9:07 AM, Marcus Wagner wrote:
Does anyone know, why per default the number of array elements is
limited to 1000?
We have one user, who would like to have 100k array elements!
What is more difficult for the scheduler, one array job with 100k
elements or 100k non-array jobs?
Where
cluster.
Upgrading slurmctld and slurmd is another topic, and this is discussed
in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm.
I'd appreciate comments and suggestions about my procedure.
/Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Phys
ctld
but with the reorg I'm taking a downtime for the dbd upgrade. That's not too
bad though as we pause all our jobs out of paranoia for upgrades.
My strategy is to avoid any downtime at all = lost productivity.
/Ole
On 3/1/19 8:10 AM, Ole Holm Nielsen wrote:
We're one of t
On 3/4/19 2:26 PM, Loris Bennett wrote:
Ole Holm Nielsen writes:
We're one of the many Slurm sites which run the slurmdbd database daemon on the
same server as the slurmctld daemon. This works without problems at our site
given our modest load, however, SchedMD recommends to run the da
On 04-03-2019 16:30, Loris Bennett wrote:
On 3/4/19 2:26 PM, Loris Bennett wrote:
Ole Holm Nielsen writes:
We're one of the many Slurm sites which run the slurmdbd database daemon on the
same server as the slurmctld daemon. This works without problems at our site
given our modest
On 3/8/19 1:59 PM, Frava wrote:
I'm replying to the "[slurm-users] Available gpus ?" post. Some time ago
I did a BASHv4 script in for listing the available CPU/RAM/GPU on the
nodes. It parses the output of the "scontrol -o -d show node" command
and displays what I think is needed to launch GPU
On 3/21/19 6:56 PM, Ryan Novosielski wrote:
On Mar 21, 2019, at 12:21 PM, Loris Bennett wrote:
Our last cluster only hit around 2.5 million jobs after
around 6 years, so database conversion was never an issue. For sites
with a higher-throughput things may be different, but I would hope that
On 3/22/19 4:15 PM, José A. wrote:
Dear all,
I would like to create two partitions, A and B, in which node1 had a
certain weight in partition A and a different one in partition B. Does
anyone know how to implement it?
Some pointers to documentation of this and a practical example is in my
W
filling
node2.
Can I accomplish this behavior through weighting the nodes? With your example
I’m afraid to say it’s not still clear to me how.
Thanks a lot for your help.
José
On 22. Mar 2019, at 16:29, Ole Holm Nielsen wrote:
On 3/22/19 4:15 PM, José A. wrote:
Dear all,
I would like to
Hi José,
On 23-03-2019 19:59, Jose A wrote:
You got my point. I want a way in which a partition influences the priority
with a node takes new jobs.
Any tip will be really appreciated. Thanks a lot.
Would PriorityWeightPartition as defined with the Multifactor Priority
Plugin (https://slurm.
Hi Ahmet,
On 3/27/19 10:51 AM, mercan wrote:
Except sjstat script, Slurm does not contains a command to show
user-oriented partition info. I wrote a command. I hope you will find it
useful.
https://github.com/mercanca/spart
Thanks for a very useful new Slurm command!
/Ole
Hi Lech,
IMHO, the Slurm user community would benefit the most from your
interesting work on MySQL/MariaDB performance, if your patch could be
made against the current 18.08 and the coming 19.05 releases. This
would ensure that your work is carried forward.
Would you be able to make patches
t offset.
Kind regards,
Lech
Am 02.04.2019 um 15:18 schrieb Ole Holm Nielsen :
Hi Lech,
IMHO, the Slurm user community would benefit the most from your interesting
work on MySQL/MariaDB performance, if
https://bugs.schedmd.com/show_bug.cgi?id=6796your patch could be made against
the curren
, Lech Nieroda wrote:
Hi Ole,
Am 03.04.2019 um 12:53 schrieb Ole Holm Nielsen :
SchedMD already decided that they won't fix the problem:
Yes, I guess it’s a bit late in the release lifecycles. Nevertheless it’s a
pity, as there are certainly a lot of users around who’d rather not upgrade
Hi Lech,
I've tried to summarize your work on the Slurm database upgrade patch in
my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older
Could you kindly check if my notes are correct and complete? Hopefully
this Wiki will also h
s the default for RHEL6 but it’s the default
for RHEL7, isn’t it? Assuming that you use RHEL7/CentOS7 with mysql 5.5, have
you checked how long your upgrade would take with the patch?
Kind regards,
Lech
>>
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical Un
Hi Julien,
Did you optimize the MySQL database, in particular InnoDB?
I have collected some documentation in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#mysql-configuration
and I also discuss database purging.
Please note that we run Slurm 17.11 (and recently 18.08) on Cent
On 4/5/19 4:28 PM, Julien Rey wrote:
The failure occurs after a few minutes (~10).
And we are running out of space on the slurm controller. The mysql
daemon is at 100% CPU usage all the time. This issue is becoming critical.
...
Our slurm accounting database is growing bigger and bigger (more
On 09-04-2019 07:37, sudhagar s wrote:
Hi, Iam newbee in slurm. trying to setup a cluster for ML training
purpose. i created controle node and compute node. both are up and running.
when i enter "srun -N 1 hostname" it says
" srun error memory specification can not be satisfied"
"unable to allo
On 09-04-2019 08:25, sudhagar s wrote:
Thanks For the response.
here is my node and partition information:
Well, 1 MB of real memory in the node is not a lot :-) This reminds me
of the very old days where PCs had 640 kB RAM...
On Tue, Apr 9, 2019 at 11:53 AM Ole Holm Nielsen
onfigured slurm.conf incorrectly.
On Tue, Apr 9, 2019 at 11:53 AM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 09-04-2019 07:37, sudhagar s wrote:
> Hi, Iam newbee in slurm. trying to setup a cluster for ML training
> purpose. i created controle node and c
uot;). The default value is 1.
On 4/9/19 8:47 AM, sudhagar s wrote:
Attaching my slurm.conf file. can you please help me to find the issue.
On Tue, Apr 9, 2019 at 12:08 PM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 09-04-2019 08:33, sudhagar s wrote:
>
uot;). The default value is 1.
On 4/9/19 8:47 AM, sudhagar s wrote:
Attaching my slurm.conf file. can you please help me to find the issue.
On Tue, Apr 9, 2019 at 12:08 PM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 09-04-2019 08:33, sudhagar s wrote:
>
Hi Jean-Mathieu,
On 4/30/19 2:47 PM, Jean-mathieu CHANTREIN wrote:
Do you know a command to get a summary of the use of compute nodes
and/or partition of a cluster in real time ? Something with a output
like this:
$ sutilization
Partition/Node_Name CPU_Use CPU_Total %Use
standard 236
On 30-04-2019 17:47, Jean-mathieu CHANTREIN wrote:
Hello.
That's exactly what I need. Thank you very much for your work.
It surprises me that slurm does not provide an official solution for that ...
Is there a page listing the tools (such as this one) that are being developed
by the community?
On 15-05-2019 09:34, Barbara Krašovec wrote:
It could be a problem with ARP cache.
If the number of devices approaches 512, there is a kernel limitation in
dynamic ARP-cache size and it can result in the loss of connectivity
between nodes.
This is something every cluster owner should be awar
Hi Alexander,
The error "can't find address for host cn7" would indicate a DNS
problem. What is the output of "host cn7" from the srun host li1?
How many network devices are in your subnet? It may be that the Linux
kernel is doing "ARP cache trashing" if the number of devices approaches
51
On 6/26/19 12:23 PM, John Marshall wrote:
I have had $SQUEUE_FORMAT set in my environment for a long time, but have only
today learnt that sacct will also listen to an environment variable to set a
default output format. Previously I had only looked for it in the Environment
Variables section
On 6/26/19 1:14 PM, John Marshall wrote:
On 26 Jun 2019, at 11:51, Ole Holm Nielsen wrote:
You should open a case with SchedMD containing your patch:
https://bugs.schedmd.com/
Yes, I considered creating a Bugzilla account at SchedMD so that I could send
them a three-line patch. To be
On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
The nodes are now communicating however when I run the command
srun -w compute02 /bin/ls
it remains stuck and there is no out
On 6/28/19 9:57 AM, Valerio Bellizzomi wrote:
On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote:
On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
The nodes are now
On 01-07-2019 21:47, HELLMERS Joe wrote:
I’m having trouble installing Slurm 18.08.7 on Red Hat 7.3.
I installed munge from source.
It may be easier for you to install Slurm with RPMs. A complete guide
is in my Slurm Wiki pages:
https://wiki.fysik.dtu.dk/niflheim/SLURM
https://wiki.fysik.d
On 7/2/19 10:48 PM, Tina Fora wrote:
We run mysql on a dedicated machine with slurmctld and slurmdbd running on
another machine. Now I want to add another machine running slurmctld and
slurmdbd and this machine with be on CentOS 7. Existing one is CentOS 6.
Is this possible? Can I run two seperat
Hi Edward,
Besides my Slurm Wiki page https://wiki.fysik.dtu.dk/niflheim/SLURM, I
have written a number of tools which we use for monitoring our cluster,
see https://github.com/OleHolmNielsen/Slurm_tools. I recommend in
particular these tools:
* pestat Prints a Slurm cluster nodes status wi
Hi Edward,
The squeue command tells you about job status. You can get extra
information using format options (see the squeue man-page). I like to
set this environment variable for squeue:
export SQUEUE_FORMAT="%.18i %.9P %.6q %.8j %.8u %.8a %.10T %.9Q %.10M
%.10V %.9l %.6D %.6C %m %R"
Wh
On 7/9/19 9:04 AM, Priya Mishra wrote:
Hi,
I am using the slurmibtopology tool to generate the topology.conf file
from the cluster at my institute which gives me a file with around 400
nodes. I need a topology file with a larger no of nodes for further use.
Is there anyway of generating a synt
On 7/9/19 10:14 AM, Priya Mishra wrote:
Hi Ole,
I am using slurm emulator and would soon start working with the slurm
simulator. I need these larger topology files for the purpose of a
project and not actual job scheduling. If there are any suitable
resources for me to use, please let me know.
s been really supportive
in testing showpartitions during development and comparing the output to
spart.
Thorsten Deilmann from University of Wuppertal has offered a number of
useful suggestions, including the colored output.
Best regards,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
Andreas made a good suggestion of looking at the user's TRESRunMin from
sshare in order to answer Jeff's question about AssocGrpCPUMinutesLimit
for a job. However, getting at this information is in practice really
complicated, and I don't think any ordinary user will bother to look it up.
Due
il.
Best regards,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
Hi Guillaume,
The performance of the slurmctld server depends strongly on the server
hardware on which it is running! This should be taken into account when
considering your question.
SchedMD recommends that the slurmctld server should have only a few, but
very fast CPU cores, in order to e
And how can users specify the minimum
*Available* disk space required by their jobs submitted by "sbatch"?
If this is not feasible, are there other techniques that achieve the
same goal? We're currently still at Slurm 18.08.
Thanks,
Ole
--
Ole Holm Nielsen
PhD, Senior
the idea is to
make the prolog set up a "project" disk quota for the job on the
localtmp file system, and the epilog to remove it again.
I'm not 100% sure we will make it work, but I'm hopeful. Fingers
crossed! :)
On 9/2/19 8:02 PM, Ole Holm Nielsen wrote:> We have some u
You should be able to assign node weights to accommodate your
prioritization wishes. I've summarized this setting in my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight
I hope this helps.
/Ole
On 9/5/19 5:48 PM, Douglas Duckworth wrote:
Hello
We added some
have an
NHC check "check_fs_used /scratch 90%").
Best regards,
Ole
On 10-09-2019 20:41, Michael Jennings wrote:
On Monday, 02 September 2019, at 20:02:57 (+0200),
Ole Holm Nielsen wrote:
We have some users requesting that a certain minimum size of the
*Available* (i.e., free) T
sacctmgr delete user XXX
I would also like to mention my Slurm account and user updating tools:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmaccounts
/Ole
On 10/10/19 1:41 PM, Mahmood Naderan wrote:
Hi
I had created multiple test users, and then removed them.
However, I see t
On 18-10-2019 19:56, Tom Wurgler wrote:
I need to know how many cores a given job is using per node.
Say my nodes have 24 cores each and I run a 36 way job.
It take a node and a half.
scontrol show job id
shows me 36 cores, and the 2 nodes it is running on.
But I want to know how it split the job
FWIW, you may be interested in my Wiki on upgrading Slurm:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
You should also read the pages on Upgrading in the presentation
Technical: Field Notes From A MadMan, Tim Wickberg, SchedMD from last
month's Slurm User Group meeting
--
*From:* slurm-users on behalf of
Ole Holm Nielsen
*Sent:* Friday, October 18, 2019 2:15 PM
*To:* slurm-users@lists.schedmd.com
*Subject:* [EXT] Re: [slurm-users] How to find core count per job per node
WARNING: This is an EXTERNAL email. Please think before RESPONDING or
CLICKING
Hi,
Maybe my Slurm Wiki can help you build SLurm on CentOS/RHEL 7? See
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms
Note in particular:
Important: Install the MariaDB (a replacement for MySQL) packages before you
build Slurm RPMs (otherwise some libraries will be mis
iaDB-shared on every server that will run slurmd, i.e. all compute
nodes. I expect that if I looked harder at the build options there may be a
way to do this, perhaps with linker flags.
For now, I can progress.
Thanks
William
-Original Message-
From: slurm-users On Behalf Of Ole Ho
#x27;s required by any of the mariadb packages, it'll get pulled
automatically. If not, you don't need it on the build system.
On 11/11/19 10:56 PM, Ole Holm Nielsen wrote:
Hi William,
Interesting experiences with MariaDB 10.4! I tried to collect the
instructions from the MariaDB p
On 11/12/19 8:10 AM, Nguyen Dai Quy wrote:
I have the same issue by compiling RPM. Just add "--with mysql" at
rpmbuild option and the error gone :-)
HTH,
That's an interesting observation! Do you know what the "--with mysql"
actually does?
IMHO, the Slurm .spec file should include all requ
Hi Daniel,
Thanks for sharing your insights! I have updated my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-mariadb-database
now.
/Ole
On 11/12/19 8:52 AM, Daniel Letai wrote:
On 11/12/19 9:34 AM, Ole Holm Nielsen wrote:
On 11/11/19 10:14 PM, Daniel Letai
On 13-11-2019 18:04, Bas van der Vlies wrote:
We have currently version 18.08.7 installed on our cluster and want to
upgrade to 19.03.3.. So I wanted to start small and installed it one of
our compute node. Buy if I start the 'slurmd' then our slurmctld will
complain that:
{{{
2019-11-13T17:49
On 11/28/19 10:35 AM, Nguyen Dai Quy wrote:
Hi list,
I can not submit my job:
> sbatch submit.sh
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified
After checking slurmdbd.log, I see:
[2019-11-28T10:21:07.578] Accounting storage MYSQL plugi
On 11/28/19 11:47 AM, Nguyen Dai Quy wrote:
On Thu, Nov 28, 2019 at 11:20 AM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 11/28/19 10:35 AM, Nguyen Dai Quy wrote:
> Hi list,
> I can not submit my job:
> > sbatch submit.sh
> sba
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think
most of your requirements and questions are described in these pages.
My Wiki gives detailed deployment information for a CentOS 7 cluster,
but mu
Forgot the link to the Wiki: https://wiki.fysik.dtu.dk/niflheim/SLURM
On 12/8/19 9:18 PM, Ole Holm Nielsen wrote:
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think most
of your requirements and
Hi Mike,
My showuserlimits tool prints nicely user limits from the Slurm database:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
Maybe this can give you further insights into the source of problems.
/Ole
On 16-12-2019 17:27, Renfro, Michael wrote:
Hey, folks. I’ve j
Some examples are here:
https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting#quality-of-service-qos
/Ole
On 19-12-2019 19:30, Prentice Bisbal wrote:
On 12/19/19 10:44 AM, Ransom, Geoffrey M. wrote:
The simplest is probably to just have a separate partition that will
only allow job times of 1
When we have created a new Slurm user with "sacctmgr create user
name=xxx", I would like inquire at a later date about the timestamp for
the user creation. As far as I can tell, the sacctmgr command cannot
show such timestamps.
I assume that the Slurm database contains the desired timestamp(?
Hi Jürgen,
On 1/19/20 2:38 PM, Juergen Salk wrote:
* Ole Holm Nielsen [200118 12:06]:
When we have created a new Slurm user with "sacctmgr create user name=xxx",
I would like inquire at a later date about the timestamp for the user
creation. As far as I can tell, the sacctmgr comm
and suggestions for improvement are welcome!
/Ole
On 1/18/20 12:06 PM, Ole Holm Nielsen wrote:
When we have created a new Slurm user with "sacctmgr create user
name=xxx", I would like inquire at a later date about the timestamp for
the user creation. As far as I can tell, the sacctm
te -d "-45 days"
+%m/%d/%y`
I think this pretty nicely gives us the flexibility for listing
transactions during some period into the past.
/Ole
On 1/20/20 11:29 AM, Ole Holm Nielsen wrote:
Hi Jürgen,
On 1/19/20 2:38 PM, Juergen Salk wrote:
* Ole Holm Nielsen [200118 12:06]:
When
On 24-01-2020 20:22, Dean Schulze wrote:
Since there isn't a list for slurm development I'll ask here. Does the
slurm code include a library for making REST calls? I'm writing a
plugin that will make REST calls and if slurm already has one I'll use
that, otherwise I'll find one with an approp
On 27-01-2020 20:35, Mahmood Naderan wrote:
Hi
Is there any command to print current cgroup parameters or
configurations that are used by Slurm?
This works for me:
# scontrol show config | tail -22
Cgroup Support Configuration:
AllowedDevicesFile = /etc/slurm/cgroup_allowed_devices_file
On 06-02-2020 22:40, Dean Schulze wrote:
I've moved two nodes to a different controller. The nodes are wired and
the controller is networked via wifi. I had to open up ports 6817 and
6818 between the wired and wireless sides of our network to get any
connectivity.
Now when I do
srun -N2 ho
they work again.
-Original Message-
From: slurm-users On Behalf Of Ole Holm
Nielsen
Sent: Friday, February 7, 2020 2:34 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Which ports does slurm use?
On 06-02-2020 22:40, Dean Schulze wrote:
I've moved two nodes to a differ
On 2/17/20 11:16 AM, navin srivastava wrote:
i have an issue with the slurm job limit. i applied the Maxjobs limit on
user using
sacctmgr modify user navin1 set maxjobs=3
but still i see this is not getting applied. i am still bale to submit
more jobs.
Slurm version is 17.11.x
Let me know
tava wrote:
Hi,
Thanks for your script.
with this i am able to show the limit what i set. but this limt is
not working.
MaxJobs = 3, current value = 0
Regards
Navin.
On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
On 2/17/20
limit is set it should allow only 3 jobs at any point of time.
Regards
Navin.
On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
Hi Navin,
Why do you think the limit is not working? The MaxJobs limits the number
of running jobs to
On 2/17/20 1:19 PM, Parag Khuraswar wrote:
Hi Team,
Does Slurm provide cluster usage reports like mentioned below ?
Detailed reports about cluster usage statistics.
Reports of every user and jobs including their
monthly usage, node usage, percentage of
utilization, History tracking, number of
1 - 100 of 563 matches
Mail list logo