On 6/8/21 12:27 AM, Sid Young wrote:
Is there a tool that will extract the job counts in JSON format? Such as
#running, #in pending #onhold etc
I am trying to build some custom dashboards for the our new cluster and
this would be a really useful set of metrics to gather and display.
We have
* David Chaffin [210607 14:44]:
>
> we get a lot of small sub-node jobs that we want to pack together. Maui
> does this pretty well with the smallest node that will hold the job,
> NODEALLOCATIONPOLICY MINRESOURCE
> I can't figure out the slurm equivalent. Default backfill isn't working
> well.
Hi All
Can another advise the possibilities of me encountering the error message as
below when submitting a job ?
sbatch: error: memory allocation failure
The same script use work perfectly fine until I include #SBATCH
--nodelist=(compute[015-046]) (once removed it work as it should)
The issu
G'Day all,
Is there a tool that will extract the job counts in JSON format? Such as
#running, #in pending #onhold etc
I am trying to build some custom dashboards for the our new cluster and
this would be a really useful set of metrics to gather and display.
Sid Young
W: https://off-grid-engine
Hi all,
we get a lot of small sub-node jobs that we want to pack together. Maui
does this pretty well with the smallest node that will hold the job,
NODEALLOCATIONPOLICY MINRESOURCE
I can't figure out the slurm equivalent. Default backfill isn't working
well. Anyone know of one?
Thanks,
David
Hi,
Is there a way to use task affinity on a per-partition basis? We
couldn't find anything in the docs that described doing this. And our
attempts to specify this on a per partition basis failed.
Thanks,
Herc
Hi,
This doesn't solve your problem but might be an option:
In similar cases, we instruct our users to create `n` Jobs of `m` Steps. Some
experimentation may be required to determine the number of Steps to maximize
Job run time without hitting your limits. Our max limit is 14 days, so this
p
Hi,
On 7/06/2021 04:33, David Schanzenbach wrote:
> In our .rpmmacros file we use, the following option is set:
> %_with_slurmrestd 1
You also need libjwt: https://bugs.schedmd.com/show_bug.cgi?id=4
Ward