On Fri, Aug 30, 2019 at 2:58 PM Guillaume Perrault Archambault
wrote:
> My problem with that though, is what if each script (the 9 scripts in my
> earlier example) each require different requirements? For example, run on a
> different partition, or set a different time limit? My understanding is
Thank you Paul.. If admin does agree to creating various QOS job limits or
GPU limits (eg 5,10,15,20,...) then tat could be a powerful solution. This
would allow me to use job arrays.
I still prefer a user side solution if possible because I'd like my script
to be cluster-agnostic as much as possi
Yes, QoS's are dynamic.
-Paul Edmon-
On 8/30/19 2:58 PM, Guillaume Perrault Archambault wrote:
Hi Paul,
Thanks for your pointers.
I'll looking into QOS and MCS after my paper deadline (Sept 5). Re
QOS, as expressed to Peter in the reply I just now sent, I wonder if
it the QOS of a job can b
Hi Paul,
Thanks for your pointers.
I'll looking into QOS and MCS after my paper deadline (Sept 5). Re QOS, as
expressed to Peter in the reply I just now sent, I wonder if it the QOS of
a job can be change while it's pending (submitted but not yet running).
Regards,
Guillaume.
On Fri, Aug 30, 20
Hi Steven,
Those both sound like potentially good solutions.
So basically, you're saying that if I script it properly, I can use a
single job array to launch multiple scripts by using a master sbatch script.
My problem with that though, is what if each script (the 9 scripts in my
earlier example
After you restart slurmctld do "scontrol reconfigure"
Brian Andrus
On 8/30/2019 6:57 AM, Robert Kudyba wrote:
I had set RealMemory to a really high number as I mis-interpreted the
recommendation.
NodeName=node[001-003] CoresPerSocket=12 RealMemory=
196489092 Sockets=2 Gres=gpu:1
But now I
A QoS is probably your best bet. Another variant might be MCS, which
you can use to help reduce resource fragmentation. For limits though
QoS will be your best bet.
-Paul Edmon-
On 8/30/19 7:33 AM, Steven Dick wrote:
It would still be possible to use job arrays in this situation, it's
just
I had set RealMemory to a really high number as I mis-interpreted the
recommendation.
NodeName=node[001-003] CoresPerSocket=12 RealMemory= 196489092 Sockets=2
Gres=gpu:1
But now I set it to:
RealMemory=191000
I restarted slurmctld. And according to the Bright Cluster support team:
"Unless it ha
It would still be possible to use job arrays in this situation, it's
just slightly messy.
So the way a job array works is that you submit a single script, and
that script is provided an integer for each subjob. The integer is in
a range, with a possible step (default=1).
To run the situation you
Hi,
we have some compute nodes paid by different project owners. 10% are owned by
project A and 90% are owned by project B.
We want to implement the following policy such that every certain time period
(e.g. two weeks):
- Project A doesn't use more than 10% of the cluster in this time period
-
10 matches
Mail list logo