Some time ago we've been using slurmctl prologue for this.
2017-10-16 16:36 GMT+02:00 Ryan Richholt :
> Thanks, that sounds like a good idea. A prolog script could also handle
> this right? That way if the node crashes while the job is running, it would
> still be saved.
>
> On Mon, Oct 16, 2017
Sorry, bad Phone typo
Le 18 oct. 2017 08:07, "Benjamin LIPERE" a
écrit :
Wellington, for security, first wrong starting. HPC not secure. Except if
you have à 10pers team. I hope that at list you put thé cluster behind a
router firewall in à militarisation zone. If you d'idées not second score
in
Hello,
I have a noob question regarding the accounting in SLURM. In particular,
I'm trying to figure out how is memory TRES accounting done in SLURM.
Concrete case:
A user has submitted 2 short jobs under a certain account. Now I want to
get what has happened in the account with sreport. While c
is there anyway after a job starts to determine why the scheduler
choose the series of nodes it did?
for some reason on an empty cluster when i spin up a large job it's
staggering the allocation across a seemingly random allocation of
nodes
we're using backfill/cons_res + gres, and all the nodes
Thank you in advance.
[2017-07-02T00:00:08.700] Warning: Note very large processing time from
daily_rollup for slurmhpc: usec=7346971 began=00:00:01.353
[2017-07-03T00:00:08.130] Warning: Note very large processing time from
daily_rollup for slurmhpc: usec=7368223 began=00:00:00.762
[2017-07-04T
Hello,
I am running a small cluster, and recently we wanted to enable
the OverSubscribe option for the default partition in order to allow jobs
to share a node, as described here:
https://slurm.schedmd.com/cons_res_share.html
However, when I try to enable the option, I get an error message:
sco
Hello Christian,
Am 18.10.2017 um 21:26 schrieb Christian Leitold:
> I am running a small cluster, and recently we wanted to enable
> the OverSubscribe option for the default partition in order to allow
> jobs to share a node, as described here:
oversubscribe fka. shared
> SelectType=select/lin
On 19/10/17 05:24, Douglas Meyer wrote:
> We have job_table purge set for 61 days and step_table for 11. Seems
> to have no impact.
So you have this in slurmdbd.conf?
PurgeJobAfter=61days
PurgeStepAfter=11days
Anything in the logs when you start up slurmdbd?
What does this say?
sacctmgr lis
Re: [slurm-dev] Re: Qos limits associations and AD auth Hey Benjamin,
I am sorry english is not my mother language, so I barely understand
what you wrote
can you explain when you have more time?
Thanks, Nadav
On 18/10/2017 17:59, Benjamin LIPERE wrote:
Sorry, bad Phone typo
On 18/10/17 16:27, Nadav Toledo wrote:
> about B:ן¿½ The reason is I dont want to manually adding each user to
> the slurm database (sacctmgr create user...)
I'm afraid you don't really have an option there, if you want to use the
slurmdbd limits then you're going to need to add the users to the
Hey chris,
Problem is, even adding a user : sacctmgr create user
domain_name\\user_name account=research
restarting slurmctrld and trying to run a job with the above user :
srun bash, resulting in:
slurmctld: error: User 243309139 not found
slurmctld: _job_create: invalid acc
11 matches
Mail list logo