[slurm-users] Enforce gpu usage limits (with GRES?)

2023-02-01 Thread Analabha Roy
perhaps by setting CUDA_VISIBLE_DEVICES), or is there any additional config for that? Do I really need that extra "GPU" partition that the vendor put in for any of this, or is there a way to bind GRES resources to a particular partition in such a way that simply launching jobs in that partiti

Re: [slurm-users] [ext] Enforce gpu usage limits (with GRES?)

2023-02-02 Thread Analabha Roy
brück Center for Molecular Medicine in > the Helmholtz Association / Charité – Universitätsmedizin Berlin > > Visiting Address: Invalidenstr. 80, 3rd Floor, Room 03 028, 10117 Berlin > Postal Address: Chariteplatz 1, 10117 Berlin > > E-Mail: manuel.holtgr...@bihealth.de > Phone: +49

Re: [slurm-users] Enforce gpu usage limits (with GRES?)

2023-02-04 Thread Analabha Roy
likely some others I miss. > > > MfG > -- > Markus Kötter, +49 681 870832434 > 30159 Hannover, Lange Laube 6 > Helmholtz Center for Information Security > -- Analabha Roy Assistant Professor Department of Physics <http://www.buruniv.ac.in/academics/department/

[slurm-users] Hibernating a whole cluster

2023-02-06 Thread Analabha Roy
Hi, I've just finished setup of a single node "cluster" with slurm on ubuntu 20.04. Infrastructural limitations prevent me from running it 24/7, and it's only powered on during business hours. Currently, I have a cron job running that hibernates that sole node before closing time. The hiberna

Re: [slurm-users] [External] Hibernating a whole cluster

2023-02-07 Thread Analabha Roy
the nodes when needed. > > Cheers, > Florian > ------ > *From:* slurm-users on behalf of > Analabha Roy > *Sent:* Monday, 6 February 2023 18:21 > *To:* slurm-users@lists.schedmd.com > *Subject:* [External] [slurm-users] Hibernating a whole cluster > > Hi, > &g

Re: [slurm-users] [External] Hibernating a whole cluster

2023-02-07 Thread Analabha Roy
t; cronjob that hibernates/shuts it down will do so when there are no jobs > running. At least in theory. > > Hope that helps. > > Sean > > --- > Sean McGrath > Senior Systems Administrator, IT Services > > -- > *From:* slurm-users

Re: [slurm-users] [External] Hibernating a whole cluster

2023-02-07 Thread Analabha Roy
if needed. > > Il 07/02/2023 13:14, Analabha Roy ha scritto: > > Hi Sean, > > > > Thanks for your awesome suggestion! I'm going through the reservation > > docs now. At first glance, it seems like a daily reservation would turn > > down jobs that are too

Re: [slurm-users] [External] Hibernating a whole cluster

2023-02-07 Thread Analabha Roy
--- > Sean McGrath > Senior Systems Administrator, IT Services > > -- > *From:* slurm-users on behalf of > Analabha Roy > *Sent:* Tuesday 7 February 2023 12:14 > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [External] Hibernatin

[slurm-users] I just had a "conversation" with ChatGPT about working DMTCP, OpenMPI and SLURM. Here are the results

2023-02-10 Thread Analabha Roy
ent on the veracity and reliability of the AI's response. AR -- Analabha Roy Assistant Professor Department of Physics <http://www.buruniv.ac.in/academics/department/physics> The University of Burdwan <http://www.buruniv.ac.in/> Golapbag Campus, Barddhaman 713104 West Bengal, Indi

Re: [slurm-users] I just had a "conversation" with ChatGPT about working DMTCP, OpenMPI and SLURM. Here are the results

2023-02-18 Thread Analabha Roy
27;ll have to fix afterwards), it does not directly impact users: their > jobs will run and complete/fail regardless of slurmctld state. At most > the users won't receive a completion mail and they will be billed less > than expected. > > Diego > > Il 10/02/2023 20:06, Analabh

Re: [slurm-users] I just had a "conversation" with ChatGPT about working DMTCP, OpenMPI and SLURM. Here are the results

2023-02-19 Thread Analabha Roy
Hi, Thanks for the advice. I already tried out mana, but at present it only works with mpich, not openmpi, which is what I've setup via Ubuntu. AR On Sun, 19 Feb 2023, 02:10 Christopher Samuel, wrote: > On 2/10/23 11:06 am, Analabha Roy wrote: > > > I'm havi

[slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Analabha Roy
set is for the GPU: https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/gres.conf Thanks for your attention, Regards, AR -- Analabha Roy Assistant Professor Department of Physics <http://www.buruniv.ac.in/academics/department/physics> The University of Burdwan <h

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Analabha Roy
I started with slurm I built the sbatch one small step at a time. > Nodes, cores. memory, partition, mail, etc > > It sounds like your config is very close but your problem may be in the > submit script. > > Best of luck and welcome to slurm. It is very powerful with a huge > comm

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-25 Thread Analabha Roy
nce you are comfortable I would urge you to use the > NodeName/Partition descriptor format above and encourage your users to > declare oversubscription in their jobs. It is a little more work up front > but far easier than correcting scripts later. > > > Doug > > >

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-26 Thread Analabha Roy
efault. Without an > estimated runtime to work with the backfill scheduler is crippled. In an > environment mixing single thread and MPI jobs of various sizes it is > critical the jobs are honest in their requirements providing slurm the > information needed to correctly assign resources

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-26 Thread Analabha Roy
100 CPUs running, according to this, but 60 according to scontrol on the node?? The submission scripts are on pastebin: https://pastebin.com/s21yXFH2 https://pastebin.com/C0uW0Aut AR > Doug > > > On Sun, Feb 26, 2023 at 2:43 AM Analabha Roy > wrote: > >> Hi Doug,

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-03-02 Thread Analabha Roy
. Thanks for all the help. It was awesome. AR > Doug > > On Sun, Feb 26, 2023 at 10:25 PM Analabha Roy > wrote: > >> Hey, >> >> >> Thanks for sticking with this. >> >> On Sun, 26 Feb 2023 at 23:43, Doug Meyer wrote: >> >>> Hi