Is there a way to diagnose if the I/O to the
/cm/shared/apps/slurm/var/cm/statesave directory (Used for job status) on the
NFS storage is the cause of the socket errors?
What values/threshold from the nfsiostat command would signal the NFS storage
as the bottleneck?
From: Buckley, Ronan
Sent
mail/slurm-users/2019-June/003534.html
My take is that there is no answer to the question, each site is different.
Best Regards
mg.
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Buckley, Ronan
Sent: Dienstag, 25. Juni 2019 11:17
To: 'slurm-users@lists.schedmd.com
Hi,
Since configuring a backup slurm controller (including moving the
StateSaveLocation from a local disk to a NFS share), we are seeing these errors
in the slurmctld logs on a regular basis:
Socket timed out on send/recv operation
It sometimes occurs when a job array is started and squeue wil
Hi,
Since configuring a backup slurm controller (including moving the
StateSaveLocation from a local disk to a NFS share), we are seeing these errors
in the slurmctld logs on a regular basis:
Socket timed out on send/recv operation
It sometimes occurs when a job array is started and squeue wil
Hi,
Does restarting the slurmctld daemon on a slurm head node affect running slurm
jobs on the compute nodes in any way?
Rgds
Hi,
I want to increase the MaxJobCount in the slurm.conf file from its default
value of 10,000. I want to increase it to 250,000.
The online documentation says:
MaxJobCount
The maximum number of jobs Slurm can have in its active database at one time.
Set the values of MaxJobCount and MinJobAge
Hi,
I want to increase the MaxArraySize in the slurm.conf file from its default
value of 1001. I want to increase it to 1.
Is it a case of just adding "MaxArraySize=1" to the slurm.conf file and
then running "scontrol reconfigure" to update slurm.conf ?
Will this update affect running j
Disabling the firewall service on the centos client allows the ‘srun hostname’
command to run.
From: Buckley, Ronan
Sent: Tuesday, July 17, 2018 12:00 PM
To: 'Slurm User Community List'
Subject: RE: [slurm-users] 'srun hostname' hangs on the command line
Hi Carlos, Is ther
n see this means that you cannot launch a job.
What state are the compute nodes in when you run sinfo?
On 17 July 2018 at 10:08, Buckley, Ronan
mailto:ronan.buck...@dell.com>> wrote:
Yes, srun just hangs. Commands like sinfo and squeue run fine.
I also have no slurm logs in /var/log
h you have already run an ssh into a node and run the
hostname command manually.
On 17 July 2018 at 09:50, Buckley, Ronan
mailto:ronan.buck...@dell.com>> wrote:
Yes I do.
From: slurm-users
[mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>]
oblem as a non-root user?
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Buckley, Ronan
Sent: Tuesday, 17 July 2018 12:53 AM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] 'srun hostname' hangs on the
Hi All,
Verbose mode doesn't show much.
I hashed out the hostnames.
Any ideas/suggestions?
# srun hostname
^Csrun: interrupt (one more within 1 sec to abort)
srun: task 0: unknown
^Z
[1]+ Stopped srun hostname
#
# srun -v hostname
srun: defined options for program `srun'
srun: -
ilto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Buckley, Ronan
Sent: Friday, June 15, 2018 9:31 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] sreport reports blank information
Hi all,
Slurm accounting commands like sstat and sacct report information but sreport
always
Hi all,
Slurm accounting commands like sstat and sacct report information but sreport
always reports no information, even though by default it works on my VM.
What am I missing?
Rgds
Ronan
Hi All,
Commands like sacct and sreport provide blank information:
# sreport cluster utilization
Cluster Utilization 2018-06-04T00:00:00 - 2018-06-04T23:59:59
Use reported in TRES Minutes
Hi All,
I need to restart the slurmctld service.
1. Is the command to do this "service slurmctld restart" ?
2. Can this be done at any time? i.e. does it affect active running SLURM
jobs?
Ronan
nf file, as well. You will have to create the accounts and
users using sacctmgr, and possibly QOSs, depending on what you'd like to do.
It's not difficult, but there are a number of small steps.
There's a document online that walks you through the process.
Paul.
> On Ma
Hi All,
I need to enable SLURM accounting so that I can use commands like sacct,
sstat,sreport etc. It looks like SLURM accounting was not enabled by default.
From reading the online documentation, all I have to do is to un-commented the
following lines in /etc/slurm/slurm.conf:
#JobAcctGather
Hi All,
I am unable to get confirmation from the SLURM documentation that there is no
impact to active SLURM jobs when "scontrol reconfigure" is ran to enforce new
SLURM configuration from /etc/slurm/slurm.conf.
Can anybody confirm that there is no impact?
Rgds
Ronan
Has anyone any experience of setting up users that can cancel jobs?
From: Buckley, Ronan
Sent: Wednesday, April 18, 2018 9:06 AM
To: 'slurm-users@lists.schedmd.com'
Subject: RE: SLURM Operator Role (to cancel SLURM Jobs)
According to the online documentation:
"When using the Slur
admin root ethor,mpern,svcappu+
root default root account root
#
Must I configure something else to allow the above 4 operator users to kill
jobs?
From: Buckley, Ronan
Sent: Tuesday, April 17, 2018 4:21 PM
To: 'slurm-users@lists.schedmd.com'
Subject:
Hi,
I have given 4 users the operator role and they are all part of the coordinator
accounts. However, when I su to the users in question, they get a permission
denied error when trying to cancel a job.
What am I missing?
Ronan
22 matches
Mail list logo