Re: [slurm-users] DBD Reset

2022-06-15 Thread Ryan Novosielski
It very much rang a bell! I think there is also an scontrol command that you can use to show the actual running config (probably “show config”), which will include the defaults if you are seeing something that you don’t have specified in the config file. Sent from my iPhone On Jun 15, 2022, at

Re: [slurm-users] DBD Reset

2022-06-15 Thread Reed Dier
Well, you nailed it. Honestly a little surprised it was working to begin with. In the DBD conf > -#DbdPort=7031 > +DbdPort=7031 And then in the slurm.conf > -#AccountingStoragePort=3306 > +AccountingStoragePort=7031 I’m not sure how my slurm.conf showed the 3306 mysql port commented out. I did

Re: [slurm-users] DBD Reset

2022-06-15 Thread Ryan Novosielski
Apologies for not having more concrete information available when I’m replying to you, but I figured maybe having a fast hint might be better. Have a look at how the various daemons communicate with one another. This sounds to me like a firewall thing between maybe the SlurmCtld and where the S

[slurm-users] DBD Reset

2022-06-15 Thread Reed Dier
Hoping this is an easy answer. My mysql instance somehow corrupted itself, and I’m having to purge and start over. This is ok, because the data in there isn’t too valuable, and we aren’t making use of associations or anything like that yet (no AccountingStorageEnforce). That said, I’ve decided

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Williams, Gareth (IM&T, Black Mountain)
I think the problem might be that you are not requesting memory, so by default, all memory on a node is allocated to the job and "cons_res" will not allocate a second job to any node. That comes up quite often. Gareth -Original Message- From: slurm-users On Behalf Of Guillaume De Naye

[slurm-users] Fwd: [HTCondor-users] Save the dates! HTCondor European Workshop 2022 Oct 11 - Oct 14

2022-06-15 Thread Matthew T West
Good afternoon All, In the spirit of scheduler ecumenicalism, I would like to invite folks on this list to the (hybrid) European HTCondor workshop this October. While the bulk of presentations will focus on dHTC workflows, the HTCondor community does have a number of ongoing collaborations wit

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Guillaume De Nayer
On 06/15/2022 05:25 PM, Ward Poelmans wrote: > Hi Guillaume, > > On 15/06/2022 16:59, Guillaume De Nayer wrote: >> >> Perhaps I missunderstand the Slurm documentation... >> >> As thought that the --exclusive option used in combination with sbatch >> will reserve the whole node (40 cores) for the j

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Ward Poelmans
Hi Guillaume, On 15/06/2022 16:59, Guillaume De Nayer wrote: Perhaps I missunderstand the Slurm documentation... As thought that the --exclusive option used in combination with sbatch will reserve the whole node (40 cores) for the job (submitted with sbatch). This part is working fine. I can c

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Guillaume De Nayer
On 06/15/2022 03:53 PM, Frank Lenaerts wrote: > On Wed, Jun 15, 2022 at 02:20:56PM +0200, Guillaume De Nayer wrote: >> One collegue has to run 20,000 jobs on this machine. Every job starts >> his program with mpirun on 12 cores. The standard slurm behavior makes >> that the node, which runs this jo

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Frank Lenaerts
On Wed, Jun 15, 2022 at 02:20:56PM +0200, Guillaume De Nayer wrote: > One collegue has to run 20,000 jobs on this machine. Every job starts > his program with mpirun on 12 cores. The standard slurm behavior makes > that the node, which runs this job is blocked (and 28 cores are idle). > The small c

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Frank Lenaerts
On Wed, Jun 15, 2022 at 02:20:56PM +0200, Guillaume De Nayer wrote: > In order to solve this problem I'm trying to start some subtasks with > srun inside a batch job (without mpirun for now): > > #!/bin/bash > #SBATCH --job-name=test_multi_prog_srun > #SBATCH --nodes=1 > #SBATCH --partition=short

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Guillaume De Nayer
On 06/15/2022 02:48 PM, Tina Friedrich wrote: > Hi Guillaume, > Hi Tina, > in that example you wouldn't need the 'srun' to run more than one task, > I think. > You are correct. To start a program like sleep I could simply run: sleep 20s & sleep 30s & wait However, my objective is to use mpiru

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Tina Friedrich
Hi Guillaume, in that example you wouldn't need the 'srun' to run more than one task, I think. I'm not 100% sure, but to me it sounds like you're currently assigning whole nodes to jobs rather than cores (i.e have 'SelectType=select/linear' and no OverSubscribe) and find that to be wastefu

[slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Guillaume De Nayer
Dear all, I'm new on this list. I am responsible for several small clusters at our chair. I set up slurm 21.08.8-2 on a small cluster (CentOS 7) with 8 nodes: NodeName=node0[1-8] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 One collegue has to run 20,000 jobs on this mac