@Chris and @Lachlan,
Thanks for your responses.
I resolved the issue based on hint from Jeffrey in earlier email. I tweaked the
location of PID files in slurm config files, but missed to change them in the
systemd service definition files.
Making them watch the same PID files did the trick.
Hi there,
I run a small SLURM cluster that serves both bigmem applications as well as
parallelizing data reduction pipelines.
I'm interested in allowing users to write interim files to shared memory in
/dev/shm/$USER/$SLURM_JOBID but I don't know how to tell slurm to include those
files in t
On 12/10/18 07:58, Aravindh Sampathkumar wrote:
I'm trying to setup a SLURM cluster in a virtual environment before
actually deploying it for serious work. I hit a snag where Slurmdbd
fails soon after starting because of trouble connecting to MariaDB.
I don't see any errors there, just that s
1. After systemctl restart slurmdbd , what does journalctl -xe say?
2. Your email is very hard to read. This is bc posted in html, with
terminal colours and etc. Could you send the next email in plain text pls?
Cheers
L.
On Fri, 12 Oct 2018 at 08:02, Aravindh Sampathkumar
wrote:
> Hello.
>
> I'
Hello.
I'm trying to setup a SLURM cluster in a virtual environment before
actually deploying it for serious work. I hit a snag where Slurmdbd
fails soon after starting because of trouble connecting to MariaDB.
SlurmDBD service status:
[root@slmaster ~]# systemctl status slurmdbd
*●* slurmdbd.se
There is also the SQUEUE_FORMAT environment variable. Set that in the
appropriate place (/etc/profile and such) to '%i %A (and whatever other
output you like)' and you should be good to go.
- Michael
On Thu, Oct 11, 2018 at 4:00 AM Loris Bennett
wrote:
> Hi Daan,
>
> Daan van Rossum writes:
Hi
We've run into similar problems with backfill (though not apparently of the
scale you've got). We have a number of users who will drop 5,000+ jobs at
once- as you've indicated, this can play havoc with backfill.
One of the newer* parameters for the backfill scheduler that's been a real
help f
Hi Daan,
Daan van Rossum writes:
> Dear slurm-users,
>
> My users complain about not being able to cancel array jobs (on slurm
> 17.11.3).
>
> The problem is that squeue by default outputs the %i (job step id) for array
> jobs, which cannot be used in scancel. What /can/ be used is %A (job id
Dear slurm-users,
My users complain about not being able to cancel array jobs (on slurm 17.11.3).
The problem is that squeue by default outputs the %i (job step id) for array
jobs, which cannot be used in scancel. What /can/ be used is %A (job id). How
can I tell users to easily find that num