The slurm scheduler only locks out user requests when specific data
structures are locked due to modification, or potential modification.
So, the most effective technique is to limit the time window when that
will be happening by a combination of efficient traversal of the main
scheduling loop (whe
We use a scenario that is analogous to yours using features. Features
are defined in slurm.conf and are associated with nodes from-which a
job may be submitted, as an administratively, configuration-managed
authoritative source. (NodeName=xx-login State=FUTURE
AvailableFeatures=) (ie.
={green,blue,
Also consider the --no-kill ("-k") options to sbatch (and srun.)
Following from the sbatch man page.
-k, --no-kill [=off]
Do not automatically terminate a job if one of the nodes
it has been allocated fails. The user will
assume the responsibilities for fau
What is in /var/log/munge/munged.log?
Munge is quite strict about permissions in its whole hierarchy of
control and configuration files, appropriately.
On Thu, May 28, 2020 at 11:01 AM Rodrigo Santibáñez
wrote:
>
> Hello,
>
> You could find the solution here
> https://wiki.fysik.dtu.dk/niflheim/S
When upgrading to 18.08 it is prudent to add following lines into your
/etc/my.cnf as per
https://slurm.schedmd.com/accounting.html
https://slurm.schedmd.com/SLUG19/High_Throughput_Computing.pdf (slide #6)
[mysqld]
innodb_buffer_pool_size=1G
innodb_log_file_size=64M
innodb_lock_wait_timeout=90