date:20190830

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-30 Thread Steven Dick

On Fri, Aug 30, 2019 at 2:58 PM Guillaume Perrault Archambault wrote: > My problem with that though, is what if each script (the 9 scripts in my > earlier example) each require different requirements? For example, run on a > different partition, or set a different time limit? My understanding is

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-30 Thread Guillaume Perrault Archambault

Thank you Paul.. If admin does agree to creating various QOS job limits or GPU limits (eg 5,10,15,20,...) then tat could be a powerful solution. This would allow me to use job arrays. I still prefer a user side solution if possible because I'd like my script to be cluster-agnostic as much as possi

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-30 Thread Paul Edmon

Yes, QoS's are dynamic. -Paul Edmon- On 8/30/19 2:58 PM, Guillaume Perrault Archambault wrote: Hi Paul, Thanks for your pointers. I'll looking into QOS and MCS after my paper deadline (Sept 5). Re QOS, as expressed to Peter in the reply I just now sent, I wonder if it the QOS of a job can b

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-30 Thread Guillaume Perrault Archambault

Hi Paul, Thanks for your pointers. I'll looking into QOS and MCS after my paper deadline (Sept 5). Re QOS, as expressed to Peter in the reply I just now sent, I wonder if it the QOS of a job can be change while it's pending (submitted but not yet running). Regards, Guillaume. On Fri, Aug 30, 20

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-30 Thread Guillaume Perrault Archambault

Hi Steven, Those both sound like potentially good solutions. So basically, you're saying that if I script it properly, I can use a single job array to launch multiple scripts by using a master sbatch script. My problem with that though, is what if each script (the 9 scripts in my earlier example

Re: [slurm-users] sbatch tasks stuck in queue when a job is hung

2019-08-30 Thread Brian Andrus

After you restart slurmctld do "scontrol reconfigure" Brian Andrus On 8/30/2019 6:57 AM, Robert Kudyba wrote: I had set RealMemory to a really high number as I mis-interpreted the recommendation. NodeName=node[001-003] CoresPerSocket=12 RealMemory= 196489092 Sockets=2 Gres=gpu:1 But now I

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-30 Thread Paul Edmon

A QoS is probably your best bet. Another variant might be MCS, which you can use to help reduce resource fragmentation. For limits though QoS will be your best bet. -Paul Edmon- On 8/30/19 7:33 AM, Steven Dick wrote: It would still be possible to use job arrays in this situation, it's just

Re: [slurm-users] sbatch tasks stuck in queue when a job is hung

2019-08-30 Thread Robert Kudyba

I had set RealMemory to a really high number as I mis-interpreted the recommendation. NodeName=node[001-003] CoresPerSocket=12 RealMemory= 196489092 Sockets=2 Gres=gpu:1 But now I set it to: RealMemory=191000 I restarted slurmctld. And according to the Bright Cluster support team: "Unless it ha

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-30 Thread Steven Dick

It would still be possible to use job arrays in this situation, it's just slightly messy. So the way a job array works is that you submit a single script, and that script is provided an integer for each subjob. The integer is in a range, with a possible step (default=1). To run the situation you

[slurm-users] Usage splitting

2019-08-30 Thread Stefan Staeglich

Hi, we have some compute nodes paid by different project owners. 10% are owned by project A and 90% are owned by project B. We want to implement the following policy such that every certain time period (e.g. two weeks): - Project A doesn't use more than 10% of the cluster in this time period -

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Re: [slurm-users] sbatch tasks stuck in queue when a job is hung

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

Re: [slurm-users] sbatch tasks stuck in queue when a job is hung

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

[slurm-users] Usage splitting

10 matches

Site Navigation

Mail list logo

Footer information