[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-06 Thread Loris Bennett via slurm-users
Paul Edmon via slurm-users  writes:

> Its definitely working for 23.11.8, which is what we are using.

It turns out we had unintentionally started firewalld on the login node.
Now this has been turned off, 'salloc' drops into a shell on a compute
node as desired.

Thanks for all the data points.

Cheers,

Loris

> -Paul Edmon-
>
> On 9/5/24 10:22 AM, Loris Bennett via slurm-users wrote:
>> Jason Simms via slurm-users  writes:
>>
>>> Ours works fine, however, without the InteractiveStepOptions parameter.
>> My assumption is also that default value should be OK.
>>
>> It would be nice if some one could confirm that 23.11.10 was working for
>> them.  However, we'll probably be upgrading to 24.5 fairly soon, and so
>> we shall see whether the issue persists.
>>
>> Cheers,
>>
>> Loris
>>
>>> JLS
>>>
>>> On Thu, Sep 5, 2024 at 9:53 AM Carsten Beyer via slurm-users 
>>>  wrote:
>>>
>>>   Hi Loris,
>>>
>>>   we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our config
>>>   contains a second parameter InteractiveStepOptions in slurm.conf:
>>>
>>>   InteractiveStepOptions="--interactive --preserve-env --pty $SHELL -l"
>>>   LaunchParameters=enable_nss_slurm,use_interactive_step
>>>
>>>   That works fine for us:
>>>
>>>   [k202068@levantetest ~]$ salloc -N1 -A k20200 -p compute
>>>   salloc: Pending job allocation 857
>>>   salloc: job 857 queued and waiting for resources
>>>   salloc: job 857 has been allocated resources
>>>   salloc: Granted job allocation 857
>>>   salloc: Waiting for resource configuration
>>>   salloc: Nodes lt1 are ready for job
>>>   [k202068@lt1 ~]$
>>>
>>>   Best Regards,
>>>   Carsten
>>>
>>>   Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users:
>>>   > Hi,
>>>   >
>>>   > With
>>>   >
>>>   >$ salloc --version
>>>   >slurm 23.11.10
>>>   >
>>>   > and
>>>   >
>>>   >$ grep LaunchParameters /etc/slurm/slurm.conf
>>>   >LaunchParameters=use_interactive_step
>>>   >
>>>   > the following
>>>   >
>>>   >$ salloc  --partition=interactive --ntasks=1 --time=00:03:00 
>>> --mem=1000 --qos=standard
>>>   >salloc: Granted job allocation 18928869
>>>   >salloc: Nodes c001 are ready for job
>>>   >
>>>   > creates a job
>>>   >
>>>   >$ squeue --me
>>>   > JOBID PARTITION NAME USER ST   TIME  NODES 
>>> NODELIST(REASON)
>>>   >  18928779 interacti interactloris  R   1:05  1 
>>> c001
>>>   >
>>>   > but causes the terminal to block.
>>>   >
>>>   >  From a second terminal I can log into the compute node:
>>>   >
>>>   >$ ssh c001
>>>   >[13:39:36] loris@c001 (1000) ~
>>>   >
>>>   > Is that the expected behaviour or should salloc return a shell directly
>>>   > on the compute node (like srun --pty /bin/bash -l used to do)?
>>>   >
>>>   > Cheers,
>>>   >
>>>   > Loris
>>>   >
>>>   --
>>>   Carsten Beyer
>>>   Abteilung Systeme
>>>
>>>   Deutsches Klimarechenzentrum GmbH (DKRZ)
>>>   Bundesstraße 45a * D-20146 Hamburg * Germany
>>>
>>>   Phone:  +49 40 460094-221
>>>   Fax:+49 40 460094-270
>>>   Email:  be...@dkrz.de
>>>   URL:http://www.dkrz.de
>>>
>>>   Geschäftsführer: Prof. Dr. Thomas Ludwig
>>>   Sitz der Gesellschaft: Hamburg
>>>   Amtsgericht Hamburg HRB 39784
>>>
>>>   --
>>>   slurm-users mailing list -- slurm-users@lists.schedmd.com
>>>   To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>>
>>> -- Jason L. Simms, Ph.D., M.P.H.
>>> Manager of Research Computing
>>> Swarthmore College
>>> Information Technology Services
>>> (610) 328-8102
>>> Schedule a meeting: https://calendly.com/jlsimms
-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-06 Thread Brian Andrus via slurm-users
Folks have addressed the obvious config settings, but also check your 
prolog/epilog scripts/settings as well as the .bashrc/.bash_profile and 
stuff in /etc/profile.d/

That may be hanging it up.

Brian Andrus

On 9/5/2024 5:17 AM, Loris Bennett via slurm-users wrote:

Hi,

With

   $ salloc --version
   slurm 23.11.10

and

   $ grep LaunchParameters /etc/slurm/slurm.conf
   LaunchParameters=use_interactive_step

the following

   $ salloc  --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 
--qos=standard
   salloc: Granted job allocation 18928869
   salloc: Nodes c001 are ready for job

creates a job

   $ squeue --me
JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
 18928779 interacti interactloris  R   1:05  1 c001

but causes the terminal to block.

 From a second terminal I can log into the compute node:

   $ ssh c001
   [13:39:36] loris@c001 (1000) ~

Is that the expected behaviour or should salloc return a shell directly
on the compute node (like srun --pty /bin/bash -l used to do)?

Cheers,

Loris



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Configuration for nodes with different TmpFs locations and TmpDisk sizes

2024-09-06 Thread simpsond4--- via slurm-users
Hi,

This may help.


job_container.conf

# All nodes have /localscratch but for some_nodes2 it is mounted as NVME.
AutoBasePath=true
BasePath=/localscratch
Shared=true
# Some nodes have /localscratch1 configured, as localscratch is actually taken 
by a valid local device setup
NodeName=some_nodes[9995-] AutoBasePath=true BasePath=/localscratch1 
Shared=true
# Some_nodes2 where we want to use local NVME mounted at localscratch. If this 
is nvidia kit, we may not want /dev/shm so explicit /tmp
NodeName=some_nodes2[7770-] Dirs="/tmp" AutoBasePath=true 
BasePath=/localscratch Shared=true




David

--
David Simpson - Senior Systems Engineer
ARCCA, Redwood Building,
King Edward VII Avenue,
Cardiff, CF10 3NB

David Simpson - peiriannydd uwch systemau
ARCCA, Adeilad Redwood,
King Edward VII Avenue,
Caerdydd, CF10 3NB


From: Jake Longo via slurm-users 
Date: Wednesday, 4 September 2024 at 16:19
To: slurm-us...@schedmd.com 
Subject: [slurm-users] Configuration for nodes with different TmpFs locations 
and TmpDisk sizes
External email to Cardiff University - Take care when replying/opening 
attachments or links.
Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor 
atodiadau neu ddolenni.

Hi,

We have a number of machines in our compute cluster that have larger disks 
available for local data. I would like to add them to the same partition as the 
rest of the nodes but assign them a larger TmpDisk value which would allow 
users to request a larger tmp and land on those machines.

The main hurdle is that (for reasons beyond my control) the larger local disks 
are on a special mount point /largertmp whereas the rest of the compute cluster 
uses the vanilla /tmp. I can't see an obvious way to make this work as the 
TmpFs value appears to be global only and attempting to set TmpDisk to a value 
larger than TmpFs for those nodes will put the machine into an invalid state.

I couldn't see any similar support tickets or anything in the mail archive but 
I wouldn't have thought it would be that unusual to do this.

Thanks in advance!
Jake

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com