date:20240905

[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

2024-09-05 Thread Angel de Vicente via slurm-users

Hello,

Brian Andrus via slurm-users
 writes:

> Unless you are using cgroups and constraints, there is no limit
> imposed.

[...]

> So your request did not exceed what slurm sees as available (1 cpu
> using 4GB), so it is happy to let your script run. I suspect if you
> look at the usage, you will see that 1 cpu spiked high while the
> others did nothing.

Thanks for the input.

I'm aware that without cgroups and constraints there is no real limit
imposed, but what I don't understand is why the first three submissions
below do get stopped by sbatch while the last one happily goes through?

>> ,
>> | $ sbatch -N 1 -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
>> | sbatch: error: Batch job submission failed: Memory required by task is not 
>> available
>> |
>> | $ sbatch -N 1 -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
>> | sbatch: error: Batch job submission failed: Memory required by task is not 
>> available
>> |
>> | $ sbatch -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
>> | sbatch: error: Batch job submission failed: Memory required by task is not 
>> available
>> `

>> ,
>> | $ sbatch -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
>> | Submitted batch job 133982
>> `

Cheers,
-- 
Ángel de Vicente  
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

2024-09-05 Thread Angel de Vicente via slurm-users

Hello again,

Angel de Vicente via slurm-users
 writes:

> [...] I don't understand is why the first three submissions
> below do get stopped by sbatch while the last one happily goes through?
>
>>> ,
>>> | $ sbatch -N 1 -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
>>> | sbatch: error: Batch job submission failed: Memory required by task is 
>>> not available
>>> |
>>> | $ sbatch -N 1 -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
>>> | sbatch: error: Batch job submission failed: Memory required by task is 
>>> not available
>>> |
>>> | $ sbatch -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
>>> | sbatch: error: Batch job submission failed: Memory required by task is 
>>> not available
>>> `
>
>>> ,
>>> | $ sbatch -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
>>> | Submitted batch job 133982
>>> `

Ah, I think I do perhaps understand now... 

In the first three cases Slurm knows that everything is going to run
inside a single node (either because I explicitly set "-N 1" or because
I'm submitting a single task that uses 76 CPUs ("-n 1 -c 76"), and thus
it knows that the required memory (304000M) exceeds the MaxMemPerNode
configuration and it blocks the submission.

In the last case my job will have 76 single-cpu tasks, but I'm not
explicitly asking for a number of nodes, so in theory the job could be
split in a number of nodes and thus MaxMemPerNode would not be exceeded,
so it lets the job go through.

[In my case I guess the confusion comes from the fact that there is only
a node in this "cluster", so I see the four cases as basically
identical, with regards to Memory limits].

In any case, if the fourth job above gets past the submission phase, I
would think more reasonable that it never actually runs (in my system),
because allocating it to run in a single node is like going back to
submission #2

Cheers,
-- 
Ángel de Vicente  
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Make a job pending in the plugin

2024-09-05 Thread Benjamin Jin via slurm-users

Hello all,

I am tyring to build a custom plugin to force some jobs to be pended.

In the official document, `ESLURM*` errors are only valid for `job_submit_lua`.

I tried to send `ESLURM_JOB_PENDING`, but it only rejects the job submission.

Does anyone know how to pend a job in job_submit plugin?

Thanks.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Make a job pending in the plugin

2024-09-05 Thread Ole Holm Nielsen via slurm-users


On 9/5/24 11:13, Benjamin Jin via slurm-users wrote:

I am tyring to build a custom plugin to force some jobs to be pended.

In the official document, `ESLURM*` errors are only valid for `job_submit_lua`.

I tried to send `ESLURM_JOB_PENDING`, but it only rejects the job submission.

Does anyone know how to pend a job in job_submit plugin?


I believe that a job state of "Pending" is only possible *after* the job 
has been submitted successfully to slurmctld, and after the job_submit 
plugin has completed for the job.  If slurmctld cannot allocate nodes to 
the job, it will be assigned a state of "Pending", i.e., the job is 
waiting for resources.


The job_submit plugin is documented in 
https://slurm.schedmd.com/job_submit_plugins.html and it's not really 
obvious what you can manipulate in the plugin.  My guess is that the 
future job state, after slurmctld has accepted the job, cannot be changed 
in the job_submit plugin.  I don't know if a state of ESLURM_JOB_HELD can 
be set?


May I suggest that you try to setup user limits in Slurm so that jobs 
requesting too many resources will remain "Pending".


I hope this helps,
Ole

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Configuration for nodes with different TmpFs locations and TmpDisk sizes

2024-09-05 Thread Jake Longo via slurm-users

Hi all,

We have a number of machines in our compute cluster that have larger disks
available for local data. I would like to add them to the same partition as
the rest of the nodes but assign them a larger TmpDisk value which would
allow users to request a larger tmp and land on those machines.

The main hurdle is that (for reasons beyond my control) the larger local
disks are on a special mount point /largertmp whereas the rest of the
compute cluster uses the vanilla /tmp. I can't see an obvious way to make
this work as the TmpFs value appears to be global only and attempting to
set TmpDisk to a value larger than TmpFs for those nodes will put the
machine into an invalid state.

I couldn't see any similar support tickets or anything in the mail archive
but I wouldn't have thought it would be that unusual to do this.

Thanks in advance!
Jake

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Configuration for nodes with different TmpFs locations and TmpDisk sizes

2024-09-05 Thread Cutts, Tim via slurm-users

I’ve always had local storage mounted in the same place, in /tmp.  In LSF 
clusters, I just let LSF’s lim get on with autodetecting how big /tmp was and 
setting the tmp resource automatically.  I presume SLURM can do the same thing, 
but I’ve never checked.

Tim

--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by 
visiting our Service 
Catalogue |


From: Jake Longo via slurm-users 
Date: Thursday, 5 September 2024 at 11:13 AM
To: slurm-us...@schedmd.com 
Subject: [slurm-users] Configuration for nodes with different TmpFs locations 
and TmpDisk sizes
Hi all,

We have a number of machines in our compute cluster that have larger disks 
available for local data. I would like to add them to the same partition as the 
rest of the nodes but assign them a larger TmpDisk value which would allow 
users to request a larger tmp and land on those machines.

The main hurdle is that (for reasons beyond my control) the larger local disks 
are on a special mount point /largertmp whereas the rest of the compute cluster 
uses the vanilla /tmp. I can't see an obvious way to make this work as the 
TmpFs value appears to be global only and attempting to set TmpDisk to a value 
larger than TmpFs for those nodes will put the machine into an invalid state.

I couldn't see any similar support tickets or anything in the mail archive but 
I wouldn't have thought it would be that unusual to do this.

Thanks in advance!
Jake


AstraZeneca UK Limited is a company incorporated in England and Wales with 
registered number:03674842 and its registered office at 1 Francis Crick Avenue, 
Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only 
and may contain confidential and privileged information. If they have come to 
you in error, you must not copy or show them to anyone; instead, please reply 
to this e-mail, highlighting the error to the sender and then immediately 
delete the message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor communications, 
please see our privacy notice at 
www.astrazeneca.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Loris Bennett via slurm-users

Hi,

With

  $ salloc --version
  slurm 23.11.10

and 

  $ grep LaunchParameters /etc/slurm/slurm.conf 
  LaunchParameters=use_interactive_step

the following 

  $ salloc  --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 
--qos=standard
  salloc: Granted job allocation 18928869
  salloc: Nodes c001 are ready for job

creates a job

  $ squeue --me
   JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
18928779 interacti interactloris  R   1:05  1 c001

but causes the terminal to block.

From a second terminal I can log into the compute node:

  $ ssh c001
  [13:39:36] loris@c001 (1000) ~

Is that the expected behaviour or should salloc return a shell directly
on the compute node (like srun --pty /bin/bash -l used to do)?

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Jason Simms via slurm-users

I know this doesn't particularly help you, but for me on 23.11.6 it works
as expected and immediately drops me onto the allocated node. In answer to
your question, yes, as I understand it the default/expected behavior is to
return the shell directly.

Jason

On Thu, Sep 5, 2024 at 8:18 AM Loris Bennett via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi,
>
> With
>
>   $ salloc --version
>   slurm 23.11.10
>
> and
>
>   $ grep LaunchParameters /etc/slurm/slurm.conf
>   LaunchParameters=use_interactive_step
>
> the following
>
>   $ salloc  --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000
> --qos=standard
>   salloc: Granted job allocation 18928869
>   salloc: Nodes c001 are ready for job
>
> creates a job
>
>   $ squeue --me
>JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
> 18928779 interacti interactloris  R   1:05  1 c001
>
> but causes the terminal to block.
>
> From a second terminal I can log into the compute node:
>
>   $ ssh c001
>   [13:39:36] loris@c001 (1000) ~
>
> Is that the expected behaviour or should salloc return a shell directly
> on the compute node (like srun --pty /bin/bash -l used to do)?
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Herr/Mr)
> FUB-IT, Freie Universität Berlin
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>


-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Carsten Beyer via slurm-users


Hi Loris,

we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our config 
contains a second parameter InteractiveStepOptions in slurm.conf:


InteractiveStepOptions="--interactive --preserve-env --pty $SHELL -l"
LaunchParameters=enable_nss_slurm,use_interactive_step

That works fine for us:

[k202068@levantetest ~]$ salloc -N1 -A k20200 -p compute
salloc: Pending job allocation 857
salloc: job 857 queued and waiting for resources
salloc: job 857 has been allocated resources
salloc: Granted job allocation 857
salloc: Waiting for resource configuration
salloc: Nodes lt1 are ready for job
[k202068@lt1 ~]$

Best Regards,
Carsten


Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users:

Hi,

With

   $ salloc --version
   slurm 23.11.10

and

   $ grep LaunchParameters /etc/slurm/slurm.conf
   LaunchParameters=use_interactive_step

the following

   $ salloc  --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 
--qos=standard
   salloc: Granted job allocation 18928869
   salloc: Nodes c001 are ready for job

creates a job

   $ squeue --me
JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
 18928779 interacti interactloris  R   1:05  1 c001

but causes the terminal to block.

 From a second terminal I can log into the compute node:

   $ ssh c001
   [13:39:36] loris@c001 (1000) ~

Is that the expected behaviour or should salloc return a shell directly
on the compute node (like srun --pty /bin/bash -l used to do)?

Cheers,

Loris


--
Carsten Beyer
Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone:  +49 40 460094-221
Fax:+49 40 460094-270
Email:  be...@dkrz.de
URL:http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Jason Simms via slurm-users

Ours works fine, however, without the InteractiveStepOptions parameter.

JLS

On Thu, Sep 5, 2024 at 9:53 AM Carsten Beyer via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Loris,
>
> we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our config
> contains a second parameter InteractiveStepOptions in slurm.conf:
>
> InteractiveStepOptions="--interactive --preserve-env --pty $SHELL -l"
> LaunchParameters=enable_nss_slurm,use_interactive_step
>
> That works fine for us:
>
> [k202068@levantetest ~]$ salloc -N1 -A k20200 -p compute
> salloc: Pending job allocation 857
> salloc: job 857 queued and waiting for resources
> salloc: job 857 has been allocated resources
> salloc: Granted job allocation 857
> salloc: Waiting for resource configuration
> salloc: Nodes lt1 are ready for job
> [k202068@lt1 ~]$
>
> Best Regards,
> Carsten
>
>
> Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users:
> > Hi,
> >
> > With
> >
> >$ salloc --version
> >slurm 23.11.10
> >
> > and
> >
> >$ grep LaunchParameters /etc/slurm/slurm.conf
> >LaunchParameters=use_interactive_step
> >
> > the following
> >
> >$ salloc  --partition=interactive --ntasks=1 --time=00:03:00
> --mem=1000 --qos=standard
> >salloc: Granted job allocation 18928869
> >salloc: Nodes c001 are ready for job
> >
> > creates a job
> >
> >$ squeue --me
> > JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
> >  18928779 interacti interactloris  R   1:05  1
> c001
> >
> > but causes the terminal to block.
> >
> >  From a second terminal I can log into the compute node:
> >
> >$ ssh c001
> >[13:39:36] loris@c001 (1000) ~
> >
> > Is that the expected behaviour or should salloc return a shell directly
> > on the compute node (like srun --pty /bin/bash -l used to do)?
> >
> > Cheers,
> >
> > Loris
> >
> --
> Carsten Beyer
> Abteilung Systeme
>
> Deutsches Klimarechenzentrum GmbH (DKRZ)
> Bundesstraße 45a * D-20146 Hamburg * Germany
>
> Phone:  +49 40 460094-221
> Fax:+49 40 460094-270
> Email:  be...@dkrz.de
> URL:http://www.dkrz.de
>
> Geschäftsführer: Prof. Dr. Thomas Ludwig
> Sitz der Gesellschaft: Hamburg
> Amtsgericht Hamburg HRB 39784
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>


-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Carsten Beyer via slurm-users

Thanks Jason for the hint. Looks like, the parameter was kept in 
slurm.conf from previous SLURM versions at our site.  Works also without 
setting InteractiveStepOptions in slurm.conf.


Best Regards,
Carsten


Am 05.09.24 um 15:55 schrieb Jason Simms via slurm-users:

Ours works fine, however, without the InteractiveStepOptions parameter.

JLS

On Thu, Sep 5, 2024 at 9:53 AM Carsten Beyer via slurm-users 
 wrote:


Hi Loris,

we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our
config
contains a second parameter InteractiveStepOptions in slurm.conf:

InteractiveStepOptions="--interactive --preserve-env --pty $SHELL -l"
LaunchParameters=enable_nss_slurm,use_interactive_step

That works fine for us:

[k202068@levantetest ~]$ salloc -N1 -A k20200 -p compute
salloc: Pending job allocation 857
salloc: job 857 queued and waiting for resources
salloc: job 857 has been allocated resources
salloc: Granted job allocation 857
salloc: Waiting for resource configuration
salloc: Nodes lt1 are ready for job
[k202068@lt1 ~]$

Best Regards,
Carsten


Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users:
> Hi,
>
> With
>
>    $ salloc --version
>    slurm 23.11.10
>
> and
>
>    $ grep LaunchParameters /etc/slurm/slurm.conf
>    LaunchParameters=use_interactive_step
>
> the following
>
>    $ salloc  --partition=interactive --ntasks=1 --time=00:03:00
--mem=1000 --qos=standard
>    salloc: Granted job allocation 18928869
>    salloc: Nodes c001 are ready for job
>
> creates a job
>
>    $ squeue --me
>                 JOBID PARTITION     NAME     USER ST  TIME 
NODES NODELIST(REASON)
>              18928779 interacti interact    loris  R  1:05     
1 c001
>
> but causes the terminal to block.
>
>  From a second terminal I can log into the compute node:
>
>    $ ssh c001
>    [13:39:36] loris@c001 (1000) ~
>
> Is that the expected behaviour or should salloc return a shell
directly
> on the compute node (like srun --pty /bin/bash -l used to do)?
>
> Cheers,
>
> Loris
>
-- 
Carsten Beyer

Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone:  +49 40 460094-221
Fax:    +49 40 460094-270
Email: be...@dkrz.de
URL: http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com



--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms


--
Carsten Beyer
Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone:  +49 40 460094-221
Fax:+49 40 460094-270
Email:be...@dkrz.de
URL:http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Loris Bennett via slurm-users

Jason Simms via slurm-users  writes:

> Ours works fine, however, without the InteractiveStepOptions parameter.

My assumption is also that default value should be OK.

It would be nice if some one could confirm that 23.11.10 was working for
them.  However, we'll probably be upgrading to 24.5 fairly soon, and so
we shall see whether the issue persists.

Cheers,

Loris 

> JLS
>
> On Thu, Sep 5, 2024 at 9:53 AM Carsten Beyer via slurm-users 
>  wrote:
>
>  Hi Loris,
>
>  we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our config 
>  contains a second parameter InteractiveStepOptions in slurm.conf:
>
>  InteractiveStepOptions="--interactive --preserve-env --pty $SHELL -l"
>  LaunchParameters=enable_nss_slurm,use_interactive_step
>
>  That works fine for us:
>
>  [k202068@levantetest ~]$ salloc -N1 -A k20200 -p compute
>  salloc: Pending job allocation 857
>  salloc: job 857 queued and waiting for resources
>  salloc: job 857 has been allocated resources
>  salloc: Granted job allocation 857
>  salloc: Waiting for resource configuration
>  salloc: Nodes lt1 are ready for job
>  [k202068@lt1 ~]$
>
>  Best Regards,
>  Carsten
>
>  Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users:
>  > Hi,
>  >
>  > With
>  >
>  >$ salloc --version
>  >slurm 23.11.10
>  >
>  > and
>  >
>  >$ grep LaunchParameters /etc/slurm/slurm.conf
>  >LaunchParameters=use_interactive_step
>  >
>  > the following
>  >
>  >$ salloc  --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 
> --qos=standard
>  >salloc: Granted job allocation 18928869
>  >salloc: Nodes c001 are ready for job
>  >
>  > creates a job
>  >
>  >$ squeue --me
>  > JOBID PARTITION NAME USER ST   TIME  NODES 
> NODELIST(REASON)
>  >  18928779 interacti interactloris  R   1:05  1 c001
>  >
>  > but causes the terminal to block.
>  >
>  >  From a second terminal I can log into the compute node:
>  >
>  >$ ssh c001
>  >[13:39:36] loris@c001 (1000) ~
>  >
>  > Is that the expected behaviour or should salloc return a shell directly
>  > on the compute node (like srun --pty /bin/bash -l used to do)?
>  >
>  > Cheers,
>  >
>  > Loris
>  >
>  -- 
>  Carsten Beyer
>  Abteilung Systeme
>
>  Deutsches Klimarechenzentrum GmbH (DKRZ)
>  Bundesstraße 45a * D-20146 Hamburg * Germany
>
>  Phone:  +49 40 460094-221
>  Fax:+49 40 460094-270
>  Email:  be...@dkrz.de
>  URL:http://www.dkrz.de
>
>  Geschäftsführer: Prof. Dr. Thomas Ludwig
>  Sitz der Gesellschaft: Hamburg
>  Amtsgericht Hamburg HRB 39784
>
>  -- 
>  slurm-users mailing list -- slurm-users@lists.schedmd.com
>  To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
> -- 
> Jason L. Simms, Ph.D., M.P.H.
> Manager of Research Computing
> Swarthmore College
> Information Technology Services
> (610) 328-8102
> Schedule a meeting: https://calendly.com/jlsimms
-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Paul Edmon via slurm-users


Its definitely working for 23.11.8, which is what we are using.

-Paul Edmon-

On 9/5/24 10:22 AM, Loris Bennett via slurm-users wrote:

Jason Simms via slurm-users  writes:


Ours works fine, however, without the InteractiveStepOptions parameter.

My assumption is also that default value should be OK.

It would be nice if some one could confirm that 23.11.10 was working for
them.  However, we'll probably be upgrading to 24.5 fairly soon, and so
we shall see whether the issue persists.

Cheers,

Loris


JLS

On Thu, Sep 5, 2024 at 9:53 AM Carsten Beyer via slurm-users 
 wrote:

  Hi Loris,

  we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our config
  contains a second parameter InteractiveStepOptions in slurm.conf:

  InteractiveStepOptions="--interactive --preserve-env --pty $SHELL -l"
  LaunchParameters=enable_nss_slurm,use_interactive_step

  That works fine for us:

  [k202068@levantetest ~]$ salloc -N1 -A k20200 -p compute
  salloc: Pending job allocation 857
  salloc: job 857 queued and waiting for resources
  salloc: job 857 has been allocated resources
  salloc: Granted job allocation 857
  salloc: Waiting for resource configuration
  salloc: Nodes lt1 are ready for job
  [k202068@lt1 ~]$

  Best Regards,
  Carsten

  Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users:
  > Hi,
  >
  > With
  >
  >$ salloc --version
  >slurm 23.11.10
  >
  > and
  >
  >$ grep LaunchParameters /etc/slurm/slurm.conf
  >LaunchParameters=use_interactive_step
  >
  > the following
  >
  >$ salloc  --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 
--qos=standard
  >salloc: Granted job allocation 18928869
  >salloc: Nodes c001 are ready for job
  >
  > creates a job
  >
  >$ squeue --me
  > JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
  >  18928779 interacti interactloris  R   1:05  1 c001
  >
  > but causes the terminal to block.
  >
  >  From a second terminal I can log into the compute node:
  >
  >$ ssh c001
  >[13:39:36] loris@c001 (1000) ~
  >
  > Is that the expected behaviour or should salloc return a shell directly
  > on the compute node (like srun --pty /bin/bash -l used to do)?
  >
  > Cheers,
  >
  > Loris
  >
  --
  Carsten Beyer
  Abteilung Systeme

  Deutsches Klimarechenzentrum GmbH (DKRZ)
  Bundesstraße 45a * D-20146 Hamburg * Germany

  Phone:  +49 40 460094-221
  Fax:+49 40 460094-270
  Email:  be...@dkrz.de
  URL:http://www.dkrz.de

  Geschäftsführer: Prof. Dr. Thomas Ludwig
  Sitz der Gesellschaft: Hamburg
  Amtsgericht Hamburg HRB 39784

  --
  slurm-users mailing list -- slurm-users@lists.schedmd.com
  To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Nodelist syntax and semantics

2024-09-05 Thread Jackson, Gary L. via slurm-users

Is there a description of the “nodelist” syntax and semantics somewhere other 
than the source code? By “nodelist” I mean expressions like “name[000,099-100]” 
and how this one, for example, expands to “name000, name099, name100”.

 

-- 

Gary



smime.p7s
Description: S/MIME cryptographic signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Nodelist syntax and semantics

2024-09-05 Thread Paul Edmon via slurm-users

I think this might be the closest to one: 
https://slurm.schedmd.com/slurm.conf.html#SECTION_NODE-CONFIGURATION 
From the third paragraph:


"Multiple node names may be comma separated (e.g. "alpha,beta,gamma") 
and/or a simple node range expression may optionally be used to specify 
numeric ranges of nodes to avoid building a configuration file with 
large numbers of entries. The node range expression can contain one pair 
of square brackets with a sequence of comma-separated numbers and/or 
ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or 
"lx[15,18,32-33]"). Note that the numeric ranges can include one or more 
leading zeros to indicate the numeric portion has a fixed number of 
digits (e.g. "linux[-1023]"). Multiple numeric ranges can be 
included in the expression (e.g. "rack[0-63]_blade[0-41]"). If one or 
more numeric expressions are included, one of them must be at the end of 
the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can 
always be used in a comma-separated list."


-Paul Edmon-

On 9/5/24 3:24 PM, Jackson, Gary L. via slurm-users wrote:


Is there a description of the “nodelist” syntax and semantics 
somewhere other than the source code? By “nodelist” I mean expressions 
like “name[000,099-100]” and how this one, for example, expands to 
“name000, name099, name100”.


--

Gary


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

[slurm-users] Make a job pending in the plugin

[slurm-users] Re: Make a job pending in the plugin

[slurm-users] Configuration for nodes with different TmpFs locations and TmpDisk sizes

[slurm-users] Re: Configuration for nodes with different TmpFs locations and TmpDisk sizes

[slurm-users] salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

[slurm-users] Nodelist syntax and semantics

[slurm-users] Re: Nodelist syntax and semantics

15 matches

Site Navigation

Mail list logo

Footer information