[slurm-users] Fwd: confirm e3c10e8d4f2f35ab689c7a4a88e5e2b57931da79

2019-01-29 Thread Lachlan Musicman
I got this email to my gmail account? I don't understand why my gmail account would register any bounces at all? Am I still unsubscribed? cheers L. -- Forwarded message - From: Date: Wed, 30 Jan 2019 at 02:44 Subject: confirm e3c10e8d4f2f35ab689c7a4a88e5e2b57931da79 To: Your m

Re: [slurm-users] Can't find an address

2018-10-24 Thread Lachlan Musicman
On Wed, 24 Oct 2018 at 22:56, Zohar Roe MLM wrote: > Hello, > > I have a node that from some reason change state to "Down" evert few > minutes. > > When I change it with scontrol to "resume" its ok until Down again. > > In the slurm server log I can see error: > > "agent/is_node_resp: node:myName

Re: [slurm-users] Documentation for creating a login node for a SLURM cluster

2018-10-15 Thread Lachlan Musicman
On Mon, 15 Oct 2018 at 17:59, Bjørn-Helge Mevik wrote: > Lachlan Musicman writes: > > > There's one thing that no one seems to have mentioned - I think you will > > need to list it as an AllocNode in the Partition that you want it to be > > able to allocate jobs to.

Re: [slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-14 Thread Lachlan Musicman
On Fri, 12 Oct 2018 at 17:02, Aravindh Sampathkumar wrote: > @Chris and @Lachlan, > Thanks for your responses. > > I resolved the issue based on hint from Jeffrey in earlier email. I > tweaked the location of PID files in slurm config files, but missed to > change them in the systemd service defi

Re: [slurm-users] Documentation for creating a login node for a SLURM cluster

2018-10-14 Thread Lachlan Musicman
There's one thing that no one seems to have mentioned - I think you will need to list it as an AllocNode in the Partition that you want it to be able to allocate jobs to. https://slurm.schedmd.com/slurm.conf.html#OPT_AllocNodes Eg in my conf we have one partition that looks like PartitionName=re

Re: [slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-11 Thread Lachlan Musicman
1. After systemctl restart slurmdbd , what does journalctl -xe say? 2. Your email is very hard to read. This is bc posted in html, with terminal colours and etc. Could you send the next email in plain text pls? Cheers L. On Fri, 12 Oct 2018 at 08:02, Aravindh Sampathkumar wrote: > Hello. > > I'

Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Lachlan Musicman
On 29 July 2018 at 04:32, Felix Wolfheimer wrote: > I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm > facing the following situation: Let's say, SLURM requests that a compute > instance is started. The ResumeProgram tries to create the instance, but > doesn't succeed because

Re: [slurm-users] srun: error: Unable to allocate resources: Invalid partition name specified

2018-07-26 Thread Lachlan Musicman
On 27 July 2018 at 03:13, Michael Robbert wrote: > The line that you list from your slurm.conf shows the "course" partition > being set as the default partition, but on our system the sinfo command > shows our default partition with a * at the end and your output doesn't > show that so I'm wonder

Re: [slurm-users] Upgrade woes

2018-05-31 Thread Lachlan Musicman
On 31 May 2018 at 17:00, Ole Holm Nielsen wrote: > Hi Lachlan, > > Slurm upgrades on CentOS 7.5 should run without problems. It seems to me > that your problems are unrelated to the Slurm RPMs. FWIW, I documented the > Munge and Slurm installation as well as upgrade process in my Wiki page > ht

[slurm-users] Upgrade woes

2018-05-30 Thread Lachlan Musicman
After last night's announcement, I decided to start the upgrade process. Build went fine - once I worked out where munge went - and installation also seemed fine. slurmctld won't restart though. In the logs I'm seeing: [2018-05-31T15:20:50.810] debug: Munge encode failed: Failed to access "xxx

Re: [slurm-users] slurm munge

2018-05-30 Thread Lachlan Musicman
On 31 May 2018 at 10:23, Lachlan Musicman wrote: > Hola > > According to the documentation, the slurm munge rpm will be built if the > munge libraries are installed. > > In CentOS 7.4 I have munge, munge-devel and munge-libs install, via yum. > The libs are in /usr/lib64,

[slurm-users] slurm munge

2018-05-30 Thread Lachlan Musicman
Hola According to the documentation, the slurm munge rpm will be built if the munge libraries are installed. In CentOS 7.4 I have munge, munge-devel and munge-libs install, via yum. The libs are in /usr/lib64, the bins are in /usr/bin, the daemon is in /usr/sbin. Neither machine on which I run `

Re: [slurm-users] Running 'scontrol reconfigure" while jobs are running

2018-05-28 Thread Lachlan Musicman
On 28 May 2018 at 20:24, Loris Bennett wrote: > Hi Ronan, > > "Buckley, Ronan" writes: > > > Hi All, > > > > I am unable to get confirmation from the SLURM documentation that > > there is no impact to active SLURM jobs when “scontrol reconfigure” is > > ran to enforce new SLURM configuration fro

Re: [slurm-users] Handling idle sessions

2018-05-27 Thread Lachlan Musicman
ch for those terms here https://slurm.schedmd.com/slurm.conf.html Accounting system is using FairShare/Fair Tree https://slurm.schedmd.com/fair_tree.html PDF of presentation -> https://slurm.schedmd.com/SC14/BYU_Fair_Tree.pdf Cheers L. > Thanks, Nadav > > > On 27/05/2018 11:34,

Re: [slurm-users] Handling idle sessions

2018-05-27 Thread Lachlan Musicman
On 27 May 2018 at 18:23, Nadav Toledo wrote: > Hello forum, > > I am trying to deal with idle session for some time, and haven't found a > solution i am happy with. > The scenario is as follow: users using srun for jupyter-lab(which is fine > and even encouraged by me) on image processing cluster

Re: [slurm-users] How to get the path to original sbatch script

2018-05-26 Thread Lachlan Musicman
On 26 May 2018 at 12:19, 程迪 wrote: > Hi, everyone > > I just found the sbatch will copy the original sbatch script to a new > place and I cannot get the path to original sbatch script. Is there any > method to solve it? > > I am using the path to copy related files. I need to populate a scratch >

Re: [slurm-users] How to find user limit in SLURM

2018-05-20 Thread Lachlan Musicman
On 21 May 2018 at 11:36, Lachlan Musicman wrote: > On 21 May 2018 at 11:29, 程迪 wrote: > >> Hi everyone, >> >> I am using SLURM as a normal user. I want to find the usage limit of my >> user. I can access the slurm's config via `scontrol show config`. But it

Re: [slurm-users] How to find user limit in SLURM

2018-05-20 Thread Lachlan Musicman
On 21 May 2018 at 11:29, 程迪 wrote: > Hi everyone, > > I am using SLURM as a normal user. I want to find the usage limit of my > user. I can access the slurm's config via `scontrol show config`. But it is > my user's limit. > > I can find the account of my user by `sacctmgr show user di`. But I ca

Re: [slurm-users] Python and R installation in a SLURM cluster

2018-05-10 Thread Lachlan Musicman
On 11 May 2018 at 01:35, Eric F. Alemany wrote: > Hi All, > > I know this might sounds as a very basic question: where in the cluster > should I install Python and R? > Headnode? > Execute nodes ? > > And is there a particular directory (path) I need to install Python and R. > > Background: > SLU

Re: [slurm-users] Slurm setup question

2018-04-11 Thread Lachlan Musicman
On 12 April 2018 at 01:22, Matt Hohmeister wrote: > > Thanks; I just set StateSaveLocation=/var/spool/slurm.state, and that > went away. Of course, another error popped up: > > > > Apr 11 11:19:24 psy-slurm slurmctld[1772]: fatal: Invalid node names in > partition slurm > > > > Here’s the relevan

Re: [slurm-users] How to set default partition?

2018-03-13 Thread Lachlan Musicman
On 14 March 2018 at 14:53, Christopher Samuel wrote: > On 14/03/18 14:50, Lachlan Musicman wrote: > > As per subject, recently I've been shuffling nodes around into new >> partitions. In that time somehow the default partition switched from prod >> to dev. Not the end o

[slurm-users] How to set default partition?

2018-03-13 Thread Lachlan Musicman
As per subject, recently I've been shuffling nodes around into new partitions. In that time somehow the default partition switched from prod to dev. Not the end of the world - desirable in fact. But I'd like to know what happened to cause it? cheers L. -- "The antidote to apocalypticism is *

Re: [slurm-users] Nagios or Other Monitoring Plugins

2018-01-18 Thread Lachlan Musicman
On 19 January 2018 at 07:29, Ryan Novosielski wrote: > Hi all, > > Looked back at the mailing list to see if there was a question about this > already. There was some mention of /using/ Nagios, but no real mention of > specifics. What do people monitor with Nagios? We monitor, so far, > slurmctld

[slurm-users] ntpd or chrony?

2018-01-14 Thread Lachlan Musicman
Hi all, As part of both Munge and SLURM, time synchronised servers are necessary. I keep finding chrony installed and running and ntpd stopped. I turn chrony off and restart/enable ntpd but every CentOS point update it seems to flip. >From what I've read ntpd is better for always on devices, and

Re: [slurm-users] Mixed x86 and ARM cluster

2018-01-07 Thread Lachlan Musicman
I'd imagine so. As long as the slurmd is running on the arm nodes, I can't see why not. Should be transparent to the underlying hardware. L. -- "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic ab

[slurm-users] Slurm From email address configuration

2018-01-04 Thread Lachlan Musicman
Hola, Apparently (I was on holiday - of course) we experienced a mini email server melt because of a confluence of two events. Triggered by a user making a spelling mistake in their own email address, this was compounded by the fact that slurm creates a from address for the outgoing email in the

Re: [slurm-users] Slurm fair share priority not being applied

2017-12-07 Thread Lachlan Musicman
On 8 December 2017 at 18:07, Loris Bennett wrote: > Lachlan Musicman writes: > >> > >> Running sshare -l only shows the root user: > >> Account User RawShares NormShares RawUsage NormUsage EffectvUsage > FairShar

Re: [slurm-users] Slurm fair share priority not being applied

2017-12-07 Thread Lachlan Musicman
On 1 December 2017 at 20:48, Bruno Santos wrote: > Loris, I think you hit the nail on the head. > > Running sshare -l only shows the root user: > Account User RawShares NormSharesRawUsage > NormUsage EffectvUsage FairShareLevelFS > GrpTRESMinsTR

Re: [slurm-users] Can't start slurmdbd

2017-11-20 Thread Lachlan Musicman
b64/slurm/accounting_storage_* > /usr/lib64/slurm/accounting_storage_filetxt.so > /usr/lib64/slurm/accounting_storage_none.so /usr/lib64/slurm/accounting_ > storage_slurmdbd.so > > However, I did install the slurm-sql rpm package. > Any idea about what's failing? > > Thanks > On 20/1

Re: [slurm-users] Can't start slurmdbd

2017-11-20 Thread Lachlan Musicman
On 20 November 2017 at 20:50, Juan A. Cordero Varelaq < bioinformatica-i...@us.es> wrote: > $ systemctl start slurmdbd > Job for slurmdbd.service failed because the control process exited > with error code. See "systemctl status slurmdbd.service" and "journalctl > -xe" for details. > $

Re: [slurm-users] Simple script to identify inefficient

2017-11-19 Thread Lachlan Musicman
Works fine on CentOS 7.4 Some of my users are getting > 100% efficiency? That seems weird, tbh, but I've not done a thorough analysis of their work/sbatch files. L. -- "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the trut

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 10:54, Elisabetta Falivene wrote: > I am the admin and I have no documentation :D I'll try The third option. > Thank you very much > Ah. Yes. Well, you will need some sort of drive shared between all the nodes so that they can read and write from a common space. Also, I re

[slurm-users] Quick hold on all partitions, all jobs

2017-11-08 Thread Lachlan Musicman
The IT team sent an email saying "complete network wide network outage tomorrow night from 10pm across the whole institute". Our plan is to put all queued jobs on hold, suspend all running jobs, and turning off the login node. I've just discovered that the partitions have a state, and it can be s

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
on that the only way out is through, and the only way through is together. " *Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857 > Il mercoledì 8 novembre 2017, Lachlan Musicman ha > scritto: > >> On 9 November 2017 at 09:19, Elisabetta Falivene >

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 09:19, Elisabetta Falivene wrote: > I'm getting this message anytime I try to execute any job on my cluster. > (node01 is the name of my first of eight nodes and is up and running) > > Trying a python simple script: > *root@mycluster:/tmp# srun python test.py * > *slurmd[nod

Re: [slurm-users] Get list of nodes and their status, one node per line, no duplicates

2017-11-08 Thread Lachlan Musicman
I use alias sn='sinfo -Nle -o "%.20n %.15C %.8O %.7t" | uniq' and then it's just [root@machine]# sn cheers L. -- "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consc

Re: [slurm-users] slurm-dev Mailing list changes this weekend, slurm-dev will become slurm-users

2017-11-05 Thread Lachlan Musicman
Likewise - cheers! L. -- "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consciousness that our institutions have failed and our ecosystem is collapsing, yet we are stil