On Thursday, 30 November 2017 2:21:36 AM AEDT Christian Anthon wrote:
> The nodes are fully allocated in terms of memory, but not all cpu
> resources are consumed
I suspect that's your problem, the job wants 16 cores on a single node and
32GB of RAM free. If you've got no RAM free it's not goi
On Thursday, 30 November 2017 5:28:26 PM AEDT Chris Samuel wrote:
> Are you starting it with systemctl? If so it might be taking too long for
> systemd's liking to upgrade the tables and it might kill it.
Ignore that - I skimmed your logs too quickly!
[2017-11-29T16:15:22.086] slurmdbd version
On Thursday, 30 November 2017 3:26:25 AM AEDT Bruno Santos wrote:
> Managed to do some more progress on this. The problem seems to be related to
> somehow the service still linking to an older version of slurmdbd I had
> installed with apt. I have now hopefully fully cleaned the old version but
>
We've just installed 17.11.0 on our 100+ node x86_64 cluster running
CentOS 7.4 this afternoon, and periodically see a single node (perhaps
the first node in an allocation?) get drained with the message "batch
job complete failure".
On one node in question, slurmd.log reports
pam_unix(slur
Dear friends and colleagues
On behalf of Mellanox HPC R&D I would like to emphasize a feature that we
introduced in Slurm 17.11 that has been show [1] to significantly improve
the speed and scalability of Slurm jobstart.
Starting from this release PMIx plugin supports:
(a) Direct point-to-point c
On 30/11/17 8:57 am, Jacob Chappell wrote:
Using "scontrol show jobid X" I can see info about running jobs,
including the command used to launch the job, the user's working
directory, values of stdout, stdin, stderr, etc.
Note that the announcement for 17.11.0 mentions that the job script
wil
All,
Using "scontrol show jobid X" I can see info about running jobs, including
the command used to launch the job, the user's working directory, values of
stdout, stdin, stderr, etc. With Slurm accounting configured, sacct seems
to show *some* of this information about jobs that have completed. H
Hi Kevin,
Based on my understanding and a discussion with the SLURM dev team on that
subject, here are some information about the new support of X11 in
slurm-17.11 :
- slurm's native support of X11 forwarding is based on libssh2
- slurm's native support of X11 can be disabled at configure/compila
Thanks,
I believe the user must have resubmitted the job, hence the updated id.
Cheers, Christian
JobId=6986 JobName=Morgens
UserId=ferro(2166) GroupId=ferro(22166) MCS_label=N/A
Priority=1031 Nice=0 Account=rth QOS=normal
JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:
Depen
Here is some more data:
Changed slurm.conf to have
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
Then restarted
sudo systemctl restart slurmctld.service
The log on the host said:
[2017-11-29T12:23:56.384] error: we don't have select plugin type 101
[2017-11-29T12:23:56.384] erro
Hello SLURM users?,
I was reviewing the X11 documentation
https://slurm.schedmd.com/faq.html#terminal
https://slurm.schedmd.com/faq.html#x11
15. Can tasks be launched with a remote terminal?
In Slurm version 1.3 or higher, use srun's --pty option. Until then, you can
accomplish this by starting
We do have hyperthreading enabled. Here are some log extracts fomr various
attempts to get it working.
[2017-11-28T15:52:30.466] error: we don't have select plugin type 101
[2017-11-28T15:52:30.466] error: select_g_select_jobinfo_unpack: unpack error
[2017-11-28T15:52:30.466] error: Malformed
Hello David,
So linuxcluster is the Head node and also a Compute node ?
Is slurmd running ?
What does /var/log/slurm/slurmd.log say ?
Regards,
Pierre-Marie Le Biot
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
david vilanova
Sent: Wednesday, November 29, 2017
On 11/29/17 4:32 PM, david vilanova wrote:
Hi,
I have updated the slurm.conf as follows:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
NodeName=linuxcluster CPUs=2
PartitionName=testq Nodes=linuxcluster Default=YES MaxTime=INFINITE State=UP
Still get testq node in down status ??? Any
Managed to do some more progress on this. The problem seems to be related
to somehow the service still linking to an older version of slurmdbd I had
installed with apt. I have now hopefully fully cleaned the old version but
when I try to start the service it is getting killed somehow. Any
suggestio
damn autocorrect - I meant:
# scontrol show job 6982
--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom
> On 29 Nov 2017, at 16:08, Merlin Hartley
> wrote:
>
> Can you give us the output of
> # control show job 6982
>
> Could be an issue wi
Can you give us the output of
# control show job 6982
Could be an issue with requesting too many CPUs or something…
Merlin
--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom
> On 29 Nov 2017, at 15:21, Christian Anthon wrote:
>
> Hi,
>
> I ha
Step back from slurm and confirm that MariaDb is up and responsive.
# mysql -uroot -pEnter password: Welcome to the MariaDB monitor. Commands end
with ; or \g.Your MariaDB connection id is 8Server version: 10.2.9-MariaDB
MariaDB Server
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and
Hi,
I have updated the slurm.conf as follows:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
NodeName=linuxcluster CPUs=2
PartitionName=testq Nodes=linuxcluster Default=YES MaxTime=INFINITE State=UP
Still get testq node in down status ??? Any idea ?
Below log from db and controller:
==>
Hi,
I have a problem with a newly setup slurm-17.02.7-1.el6.x86_64 that jobs
seems to be stuck in ReqNodeNotAvail:
6982 panic Morgens ferro PD 0:00 1
(ReqNodeNotAvail, UnavailableNodes:)
6981 panic SPEC ferro PD 0:00 1
(ReqNodeNotAva
Hi Barbara,
This is a fresh install. I have installed slurm from source on Debian
stretch and now trying to set it up correctly.
MariaDB is running for but I am confused about the database configuration.
I followed a tutorial (I can no longer find it) that showed me how to
create the database and
Hi David,
On Wed, 2017-11-29 at 14:45:06 +, david vilanova wrote:
> Hello,
> I have installed latest 7.11 release and my node is shown as down.
> I hava a single physical server with 12 cores so not sure the conf below is
> correct ?? can you help ??
>
> In slurm.conf the node is configure as
Did you upgrade SLURM or is it a fresh install?
Are there any associations set? For instance, did you create the cluster with
sacctmgr?
sacctmgr add cluster
Is mariadb/mysql server running, is slurmdbd running? Is it working? Try a
simple test, such as:
sacctmgr show user -s
If it was an upgra
Hello,
I have installed latest 7.11 release and my node is shown as down.
I hava a single physical server with 12 cores so not sure the conf below is
correct ?? can you help ??
In slurm.conf the node is configure as follows:
NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1
Thank you Barbara,
Unfortunately, it does not seem to be a munge problem. Munge can
successfully authenticate with the nodes.
I have increased the verbosity level and restarted the slurmctld and now I
am getting more information about this:
> Nov 29 14:08:16 plantae slurmctld[30340]: Registering
Hello,
does munge work?
Try if decode works locally:
munge -n | unmunge
Try if decode works remotely:
munge -n | ssh unmunge
It seems as munge keys do not match...
See comments inline..
> On 29 Nov 2017, at 14:40, Bruno Santos wrote:
>
> I actually just managed to figure that one out.
>
> T
I was struggling like crazy with this one a while ago.
Then I saw this in the slurm.conf man page:
AccountingStoragePass
The password used to gain access to the database to store the accounting
data. Only used for database type storage plugins, ignored otherwise. In the
case of
I actually just managed to figure that one out.
The problem was that I had setup AccountingStoragePass=magic in the
slurm.conf file while after re-reading the documentation it seems this is
only needed if I have a different munge instance controlling the logins to
the database, which I don't.
So c
It looks like you don't have the munged daemon running.
On 11/29/2017 08:01 AM, Bruno Santos wrote:
Hi everyone,
I have set-up slurm to use slurm_db and all was working fine. However
I had to change the slurm.conf to play with user priority and upon
restarting the slurmctl is fails with the f
Hi everyone,
I have set-up slurm to use slurm_db and all was working fine. However I had
to change the slurm.conf to play with user priority and upon restarting the
slurmctl is fails with the following messages below. It seems that somehow
is trying to use the mysql password as a munge socket?
Any
30 matches
Mail list logo