Re: [slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-11 Thread Aravindh Sampathkumar
@Chris and @Lachlan, Thanks for your responses. I resolved the issue based on hint from Jeffrey in earlier email. I tweaked the location of PID files in slurm config files, but missed to change them in the systemd service definition files. Making them watch the same PID files did the trick.

[slurm-users] Limit shared memory usage

2018-10-11 Thread Sam Hawarden
Hi there, I run a small SLURM cluster that serves both bigmem applications as well as parallelizing data reduction pipelines. I'm interested in allowing users to write interim files to shared memory in /dev/shm/$USER/$SLURM_JOBID but I don't know how to tell slurm to include those files in t

Re: [slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-11 Thread Chris Samuel
On 12/10/18 07:58, Aravindh Sampathkumar wrote: I'm trying to setup a SLURM cluster in a virtual environment before actually deploying it for serious work. I hit a snag where Slurmdbd fails soon after starting because of trouble connecting to MariaDB. I don't see any errors there, just that s

Re: [slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-11 Thread Lachlan Musicman
1. After systemctl restart slurmdbd , what does journalctl -xe say? 2. Your email is very hard to read. This is bc posted in html, with terminal colours and etc. Could you send the next email in plain text pls? Cheers L. On Fri, 12 Oct 2018 at 08:02, Aravindh Sampathkumar wrote: > Hello. > > I'

[slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-11 Thread Aravindh Sampathkumar
Hello. I'm trying to setup a SLURM cluster in a virtual environment before actually deploying it for serious work. I hit a snag where Slurmdbd fails soon after starting because of trouble connecting to MariaDB. SlurmDBD service status: [root@slmaster ~]# systemctl status slurmdbd *●* slurmdbd.se

Re: [slurm-users] how to easily to obtain jobid for array jobs?

2018-10-11 Thread Michael Gutteridge
There is also the SQUEUE_FORMAT environment variable. Set that in the appropriate place (/etc/profile and such) to '%i %A (and whatever other output you like)' and you should be good to go. - Michael On Thu, Oct 11, 2018 at 4:00 AM Loris Bennett wrote: > Hi Daan, > > Daan van Rossum writes:

Re: [slurm-users] Tuning the backfill scheduler

2018-10-11 Thread Michael Gutteridge
Hi We've run into similar problems with backfill (though not apparently of the scale you've got). We have a number of users who will drop 5,000+ jobs at once- as you've indicated, this can play havoc with backfill. One of the newer* parameters for the backfill scheduler that's been a real help f

Re: [slurm-users] how to easily to obtain jobid for array jobs?

2018-10-11 Thread Loris Bennett
Hi Daan, Daan van Rossum writes: > Dear slurm-users, > > My users complain about not being able to cancel array jobs (on slurm > 17.11.3). > > The problem is that squeue by default outputs the %i (job step id) for array > jobs, which cannot be used in scancel. What /can/ be used is %A (job id

[slurm-users] how to easily to obtain jobid for array jobs?

2018-10-11 Thread Daan van Rossum
Dear slurm-users, My users complain about not being able to cancel array jobs (on slurm 17.11.3). The problem is that squeue by default outputs the %i (job step id) for array jobs, which cannot be used in scancel. What /can/ be used is %A (job id). How can I tell users to easily find that num