Re: [slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-25 Thread Buckley, Ronan
Is there a way to diagnose if the I/O to the /cm/shared/apps/slurm/var/cm/statesave directory (Used for job status) on the NFS storage is the cause of the socket errors? What values/threshold from the nfsiostat command would signal the NFS storage as the bottleneck? From: Buckley, Ronan Sent

Re: [slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-25 Thread Buckley, Ronan
mail/slurm-users/2019-June/003534.html My take is that there is no answer to the question, each site is different. Best Regards mg. From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Buckley, Ronan Sent: Dienstag, 25. Juni 2019 11:17 To: 'slurm-users@lists.schedmd.com

[slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-25 Thread Buckley, Ronan
Hi, Since configuring a backup slurm controller (including moving the StateSaveLocation from a local disk to a NFS share), we are seeing these errors in the slurmctld logs on a regular basis: Socket timed out on send/recv operation It sometimes occurs when a job array is started and squeue wil

[slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-24 Thread Buckley, Ronan
Hi, Since configuring a backup slurm controller (including moving the StateSaveLocation from a local disk to a NFS share), we are seeing these errors in the slurmctld logs on a regular basis: Socket timed out on send/recv operation It sometimes occurs when a job array is started and squeue wil

[slurm-users] service slurmctld restart

2019-01-31 Thread Buckley, Ronan
Hi, Does restarting the slurmctld daemon on a slurm head node affect running slurm jobs on the compute nodes in any way? Rgds

[slurm-users] Increase MaxJobCount in slurm.conf

2019-01-31 Thread Buckley, Ronan
Hi, I want to increase the MaxJobCount in the slurm.conf file from its default value of 10,000. I want to increase it to 250,000. The online documentation says: MaxJobCount The maximum number of jobs Slurm can have in its active database at one time. Set the values of MaxJobCount and MinJobAge

[slurm-users] Increase MaxArraySize in slurm.conf

2019-01-29 Thread Buckley, Ronan
Hi, I want to increase the MaxArraySize in the slurm.conf file from its default value of 1001. I want to increase it to 1. Is it a case of just adding "MaxArraySize=1" to the slurm.conf file and then running "scontrol reconfigure" to update slurm.conf ? Will this update affect running j

Re: [slurm-users] 'srun hostname' hangs on the command line

2018-07-17 Thread Buckley, Ronan
Disabling the firewall service on the centos client allows the ‘srun hostname’ command to run. From: Buckley, Ronan Sent: Tuesday, July 17, 2018 12:00 PM To: 'Slurm User Community List' Subject: RE: [slurm-users] 'srun hostname' hangs on the command line Hi Carlos, Is ther

Re: [slurm-users] 'srun hostname' hangs on the command line

2018-07-17 Thread Buckley, Ronan
n see this means that you cannot launch a job. What state are the compute nodes in when you run sinfo? On 17 July 2018 at 10:08, Buckley, Ronan mailto:ronan.buck...@dell.com>> wrote: Yes, srun just hangs. Commands like sinfo and squeue run fine. I also have no slurm logs in /var/log

Re: [slurm-users] 'srun hostname' hangs on the command line

2018-07-17 Thread Buckley, Ronan
h you have already run an ssh into a node and run the hostname command manually. On 17 July 2018 at 09:50, Buckley, Ronan mailto:ronan.buck...@dell.com>> wrote: Yes I do. From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>]

Re: [slurm-users] 'srun hostname' hangs on the command line

2018-07-17 Thread Buckley, Ronan
oblem as a non-root user? From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Buckley, Ronan Sent: Tuesday, 17 July 2018 12:53 AM To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] 'srun hostname' hangs on the

[slurm-users] 'srun hostname' hangs on the command line

2018-07-16 Thread Buckley, Ronan
Hi All, Verbose mode doesn't show much. I hashed out the hostnames. Any ideas/suggestions? # srun hostname ^Csrun: interrupt (one more within 1 sec to abort) srun: task 0: unknown ^Z [1]+ Stopped srun hostname # # srun -v hostname srun: defined options for program `srun' srun: -

Re: [slurm-users] sreport reports blank information

2018-06-15 Thread Buckley, Ronan
ilto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Buckley, Ronan Sent: Friday, June 15, 2018 9:31 AM To: slurm-users@lists.schedmd.com Subject: [slurm-users] sreport reports blank information Hi all, Slurm accounting commands like sstat and sacct report information but sreport always

[slurm-users] sreport reports blank information

2018-06-15 Thread Buckley, Ronan
Hi all, Slurm accounting commands like sstat and sacct report information but sreport always reports no information, even though by default it works on my VM. What am I missing? Rgds Ronan

[slurm-users] cluster not registered

2018-06-05 Thread Buckley, Ronan
Hi All, Commands like sacct and sreport provide blank information: # sreport cluster utilization Cluster Utilization 2018-06-04T00:00:00 - 2018-06-04T23:59:59 Use reported in TRES Minutes

[slurm-users] Restart slurmctld

2018-06-05 Thread Buckley, Ronan
Hi All, I need to restart the slurmctld service. 1. Is the command to do this "service slurmctld restart" ? 2. Can this be done at any time? i.e. does it affect active running SLURM jobs? Ronan

Re: [slurm-users] Enable SLURM Accounting

2018-05-28 Thread Buckley, Ronan
nf file, as well. You will have to create the accounts and users using sacctmgr, and possibly QOSs, depending on what you'd like to do. It's not difficult, but there are a number of small steps. There's a document online that walks you through the process. Paul. > On Ma

[slurm-users] Enable SLURM Accounting

2018-05-28 Thread Buckley, Ronan
Hi All, I need to enable SLURM accounting so that I can use commands like sacct, sstat,sreport etc. It looks like SLURM accounting was not enabled by default. From reading the online documentation, all I have to do is to un-commented the following lines in /etc/slurm/slurm.conf: #JobAcctGather

[slurm-users] Running 'scontrol reconfigure" while jobs are running

2018-05-28 Thread Buckley, Ronan
Hi All, I am unable to get confirmation from the SLURM documentation that there is no impact to active SLURM jobs when "scontrol reconfigure" is ran to enforce new SLURM configuration from /etc/slurm/slurm.conf. Can anybody confirm that there is no impact? Rgds Ronan

Re: [slurm-users] SLURM Operator Role (to cancel SLURM Jobs)

2018-04-20 Thread Buckley, Ronan
Has anyone any experience of setting up users that can cancel jobs? From: Buckley, Ronan Sent: Wednesday, April 18, 2018 9:06 AM To: 'slurm-users@lists.schedmd.com' Subject: RE: SLURM Operator Role (to cancel SLURM Jobs) According to the online documentation: "When using the Slur

Re: [slurm-users] SLURM Operator Role (to cancel SLURM Jobs)

2018-04-18 Thread Buckley, Ronan
admin root ethor,mpern,svcappu+ root default root account root # Must I configure something else to allow the above 4 operator users to kill jobs? From: Buckley, Ronan Sent: Tuesday, April 17, 2018 4:21 PM To: 'slurm-users@lists.schedmd.com' Subject:

[slurm-users] SLURM Operator Role (to cancel SLURM Jobs)

2018-04-17 Thread Buckley, Ronan
Hi, I have given 4 users the operator role and they are all part of the coordinator accounts. However, when I su to the users in question, they get a permission denied error when trying to cancel a job. What am I missing? Ronan