Re: [slurm-users] working with SLURM_EXPORT_ENV

2018-03-21 Thread Christopher Samuel
On 21/03/18 19:09, Daniel Grimwood wrote: Hi Chris, Hiya! Thanks for that. I had overlooked SBATCH_EXPORT as I interpreted the man page literally, as "Same as --export". It's actually "Same as --export without setting SLURM_EXPORT_ENV=NONE". That's great! My pleasure. :-) Now all we n

Re: [slurm-users] srun not allowed in a partition

2018-03-21 Thread Christopher Samuel
On 22/03/18 01:43, sysadmin.caos wrote: I'm trying to compile SLURM-17.02.7 with "lua" support executing "./configure && make && make contribs && make install", but make does nothing in src/plugins/job_submit/lua and I don't know why... How do I have to compile that plugin? The rest of the pl

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Christopher Samuel
On 22/03/18 00:09, Ole Holm Nielsen wrote: Chris, I don't understand what you refer to as "that"? Someone must have created /etc/pam.d/slurm.* files, and it doesn't seem to be the Slurm RPMs. Sorry Ole, just meant that PAM automates reading those files for you if you create them (and the code

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Michael Jennings
On Wednesday, 21 March 2018, at 20:14:22 (+0100), Ole Holm Nielsen wrote: > Thanks for your friendly advice! I keep forgetting about Systemd > details, and your suggestions are really detailed and useful for > others! Do you mind if I add your advice to my Slurm Wiki page? Of course not! Espec

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Ole Holm Nielsen
Hi Michael, Thanks for your friendly advice! I keep forgetting about Systemd details, and your suggestions are really detailed and useful for others! Do you mind if I add your advice to my Slurm Wiki page? /Ole On 21-03-2018 16:29, Michael Jennings wrote: On Wednesday, 21 March 2018, at 1

Re: [slurm-users] UsageFactor in combination with GrpTRESRunMins

2018-03-21 Thread Henkel, Andreas
PS: we're using Slurm 17.11.5 Am 21.03.2018 um 16:18 schrieb Henkel, Andreas mailto:hen...@uni-mainz.de>>: Hi, recently, while trying a new configuration I came cross a Problem. In principal, we have one big Partition containing all nodes with PriorityTier=2. Each account got GrpTRESRunMin=

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Michael Jennings
On Wednesday, 21 March 2018, at 12:08:00 (+0100), Ole Holm Nielsen wrote: > One working solution is to modify the slurmd Systemd service file > /usr/lib/systemd/system/slurmd.service to add a line: > LimitCORE=0 This is a bit off-topic, but I see this a lot, so I thought I'd provide a friendly

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Michael Jennings
On Wednesday, 21 March 2018, at 08:40:32 (-0600), Ryan Cox wrote: > UsePAM has to do with how jobs are launched when controlled by > Slurm.  Basically, it sends jobs launched under Slurm through the > PAM stack.  UsePAM is not required by pam_slurm_adopt because it is > *sshd* and not *slurmd or s

[slurm-users] UsageFactor in combination with GrpTRESRunMins

2018-03-21 Thread Henkel, Andreas
Hi, recently, while trying a new configuration I came cross a Problem. In principal, we have one big Partition containing all nodes with PriorityTier=2. Each account got GrpTRESRunMin=cpu=<#somelimit> set. Every now and then we have the Situation that part of the nodes are idling. For this we

Re: [slurm-users] fast way for a node to determine its own state?

2018-03-21 Thread Michael Jennings
On Wednesday, 21 March 2018, at 12:05:49 (+0100), Alexis Huxley wrote: > > >Depending on the load on the scheduler, this can be slow. Is there > > >faster way? Perhaps one that doesn't involve communicating with > > >the scheduler node? Thanks! > > Thanks for the suggestion Ole, but we have somet

Re: [slurm-users] srun not allowed in a partition

2018-03-21 Thread sysadmin.caos
I'm trying to compile SLURM-17.02.7 with "lua" support executing "./configure && make && make contribs && make install", but make does nothing in src/plugins/job_submit/lua and I don't know why... How do I have to compile that plugin? The rest of the plugins compile with no problems (defaults,

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Ryan Cox
Ole, UsePAM has to do with how jobs are launched when controlled by Slurm.  Basically, it sends jobs launched under Slurm through the PAM stack.  UsePAM is not required by pam_slurm_adopt because it is *sshd* and not *slurmd or slurmstepd* that is involved with pam_slurm_adopt.  That's what I

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Ole Holm Nielsen
On 03/21/2018 02:03 PM, Bill Barth wrote: I don’t think we had to do anything special since we have UsePAM = 1 in our slurm.conf. I didn’t do the install personally, but our pam.d/slurm* files are written by us and installed by our configuration management system. Not sure which one UsePAM loo

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Ole Holm Nielsen
On 03/21/2018 01:57 PM, Chris Samuel wrote: On Wednesday, 21 March 2018 11:49:53 PM AEDT Ole Holm Nielsen wrote: However, there are no /etc/pam.d/slurm.* files on our system (running Slurm 17.02). Did TACC create a special Slurm PAM configuration file, and is this documented in the public doma

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Bill Barth
Ole, I don’t think we had to do anything special since we have UsePAM = 1 in our slurm.conf. I didn’t do the install personally, but our pam.d/slurm* files are written by us and installed by our configuration management system. Not sure which one UsePAM looks for, but here are ours: c501-101[s

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Chris Samuel
On Wednesday, 21 March 2018 11:49:53 PM AEDT Ole Holm Nielsen wrote: > However, there are no /etc/pam.d/slurm.* files on our system (running > Slurm 17.02). Did TACC create a special Slurm PAM configuration file, > and is this documented in the public domain? I think that's just how PAM works.

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Ole Holm Nielsen
On 03/21/2018 01:08 PM, Bill Barth wrote: You could set /etc/security/limits.conf on every node to contain something like (check my syntax): * soft core 0 * hard core 0 Nice suggestion, however, processes spawned by slurmd doesn't read the /etc/security/limits.conf file. And make sure th

Re: [slurm-users] Array Job Node Allocation

2018-03-21 Thread Emyr James
Hi Gareth, Thanks for the suggestion. This does seem like a good way forward. I will look into it. Regards, Emyr On 21/03/2018 13:34, gareth.willi...@csiro.au wrote: Hi Emyr, Perhaps you could be more explicit about the i/o boundedness and have jobs request an io gres as well as compute

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Bill Barth
You could set /etc/security/limits.conf on every node to contain something like (check my syntax): * soft core 0 * hard core 0 And make sure that /etc/pam.d/slurm.* and /etc/pam.d/system-auth* contain: session required pam_limits.so session required pam_limits.so …so that li

Re: [slurm-users] fast way for a node to determine its own state?

2018-03-21 Thread Alexis Huxley
> */2 * * * * scontrol --oneliner show node > /cluster/var/node-info.new > 2>/dev/null && mv -f /cluster/var/node-info.new /cluster/var/node-info > 2>/dev/null > > So, every 2. minute, the /cluster/var/node-info is updated (if the > scontrol command succeeds), and the nodes simply grep in that f

Re: [slurm-users] fast way for a node to determine its own state?

2018-03-21 Thread Bjørn-Helge Mevik
Alexis Huxley writes: >> >Depending on the load on the scheduler, this can be slow. Is there >> >faster way? Perhaps one that doesn't involve communicating with >> >the scheduler node? Thanks! > > Thanks for the suggestion Ole, but we have something in place that > we don't want to change at this

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Chris Samuel
On Wednesday, 21 March 2018 10:08:00 PM AEDT Ole Holm Nielsen wrote: > Thanks for sharing any experiences. Would: echo "ulimit -c unlimited" in the taskprolog work? I guess it assumes a Bourne shell.. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

[slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Ole Holm Nielsen
We experience problems with MPI jobs dumping lots (1 per MPI task) of multi-GB core dump files, causing problems for file servers and compute nodes. The user has "ulimit -c 0" in his .bashrc file, but that's ignored when slurmd starts the job, and the slurmd process limits are employed in stea

Re: [slurm-users] fast way for a node to determine its own state?

2018-03-21 Thread Alexis Huxley
> >Depending on the load on the scheduler, this can be slow. Is there > >faster way? Perhaps one that doesn't involve communicating with > >the scheduler node? Thanks! Thanks for the suggestion Ole, but we have something in place that we don't want to change at this time. We just need a faster way

Re: [slurm-users] srun not allowed in a partition

2018-03-21 Thread Chris Samuel
On Wednesday, 21 March 2018 9:07:08 PM AEDT sysadmin.caos wrote: > What I want to get is a batch partition that doesn't allow "srun" commands > from the command line and a interactive partition only for "srun" commands. You might well be able to do this with a lua submit filter, testing for the

Re: [slurm-users] fast way for a node to determine its own state?

2018-03-21 Thread Ole Holm Nielsen
On 03/21/2018 11:18 AM, Alexis Huxley wrote: I'm running a node health script that needs to know the state of the node on which it is running. Currently, I'm getting the state with this: sinfo -N ... | grep `uname -n` Depending on the load on the scheduler, this can be slow. Is there fa

Re: [slurm-users] Array Job Node Allocation

2018-03-21 Thread Gareth.Williams
Hi Emyr, Perhaps you could be more explicit about the i/o boundedness and have jobs request an io gres as well as compute and memory resource. You could then set the amount of io resource per node (and maybe globally - possibly separate iolocal and ioglobal). Then you could avoid io contention

[slurm-users] fast way for a node to determine its own state?

2018-03-21 Thread Alexis Huxley
I'm running a node health script that needs to know the state of the node on which it is running. Currently, I'm getting the state with this: sinfo -N ... | grep `uname -n` Depending on the load on the scheduler, this can be slow. Is there faster way? Perhaps one that doesn't involve com

[slurm-users] srun not allowed in a partition

2018-03-21 Thread sysadmin.caos
Hello, I would like to configure SLURM with two partitions: one called "batch.q" only for batchs jobs one called "interactive.q" only for batch jobs What I want to get is a batch partition that doesn't allow "srun" commands from the command line and

Re: [slurm-users] working with SLURM_EXPORT_ENV

2018-03-21 Thread Daniel Grimwood
Hi Chris, Thanks for that. I had overlooked SBATCH_EXPORT as I interpreted the man page literally, as "Same as --export". It's actually "Same as --export without setting SLURM_EXPORT_ENV=NONE". That's great! Now all we need for completeness is a SRUN_EXPORT that works the same, although SBATCH