[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step
Paul Edmon via slurm-users writes: > Its definitely working for 23.11.8, which is what we are using. It turns out we had unintentionally started firewalld on the login node. Now this has been turned off, 'salloc' drops into a shell on a compute node as desired. Thanks for all the data points. Cheers, Loris > -Paul Edmon- > > On 9/5/24 10:22 AM, Loris Bennett via slurm-users wrote: >> Jason Simms via slurm-users writes: >> >>> Ours works fine, however, without the InteractiveStepOptions parameter. >> My assumption is also that default value should be OK. >> >> It would be nice if some one could confirm that 23.11.10 was working for >> them. However, we'll probably be upgrading to 24.5 fairly soon, and so >> we shall see whether the issue persists. >> >> Cheers, >> >> Loris >> >>> JLS >>> >>> On Thu, Sep 5, 2024 at 9:53 AM Carsten Beyer via slurm-users >>> wrote: >>> >>> Hi Loris, >>> >>> we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our config >>> contains a second parameter InteractiveStepOptions in slurm.conf: >>> >>> InteractiveStepOptions="--interactive --preserve-env --pty $SHELL -l" >>> LaunchParameters=enable_nss_slurm,use_interactive_step >>> >>> That works fine for us: >>> >>> [k202068@levantetest ~]$ salloc -N1 -A k20200 -p compute >>> salloc: Pending job allocation 857 >>> salloc: job 857 queued and waiting for resources >>> salloc: job 857 has been allocated resources >>> salloc: Granted job allocation 857 >>> salloc: Waiting for resource configuration >>> salloc: Nodes lt1 are ready for job >>> [k202068@lt1 ~]$ >>> >>> Best Regards, >>> Carsten >>> >>> Am 05.09.24 um 14:17 schrieb Loris Bennett via slurm-users: >>> > Hi, >>> > >>> > With >>> > >>> >$ salloc --version >>> >slurm 23.11.10 >>> > >>> > and >>> > >>> >$ grep LaunchParameters /etc/slurm/slurm.conf >>> >LaunchParameters=use_interactive_step >>> > >>> > the following >>> > >>> >$ salloc --partition=interactive --ntasks=1 --time=00:03:00 >>> --mem=1000 --qos=standard >>> >salloc: Granted job allocation 18928869 >>> >salloc: Nodes c001 are ready for job >>> > >>> > creates a job >>> > >>> >$ squeue --me >>> > JOBID PARTITION NAME USER ST TIME NODES >>> NODELIST(REASON) >>> > 18928779 interacti interactloris R 1:05 1 >>> c001 >>> > >>> > but causes the terminal to block. >>> > >>> > From a second terminal I can log into the compute node: >>> > >>> >$ ssh c001 >>> >[13:39:36] loris@c001 (1000) ~ >>> > >>> > Is that the expected behaviour or should salloc return a shell directly >>> > on the compute node (like srun --pty /bin/bash -l used to do)? >>> > >>> > Cheers, >>> > >>> > Loris >>> > >>> -- >>> Carsten Beyer >>> Abteilung Systeme >>> >>> Deutsches Klimarechenzentrum GmbH (DKRZ) >>> Bundesstraße 45a * D-20146 Hamburg * Germany >>> >>> Phone: +49 40 460094-221 >>> Fax:+49 40 460094-270 >>> Email: be...@dkrz.de >>> URL:http://www.dkrz.de >>> >>> Geschäftsführer: Prof. Dr. Thomas Ludwig >>> Sitz der Gesellschaft: Hamburg >>> Amtsgericht Hamburg HRB 39784 >>> >>> -- >>> slurm-users mailing list -- slurm-users@lists.schedmd.com >>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >>> >>> -- Jason L. Simms, Ph.D., M.P.H. >>> Manager of Research Computing >>> Swarthmore College >>> Information Technology Services >>> (610) 328-8102 >>> Schedule a meeting: https://calendly.com/jlsimms -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step
Folks have addressed the obvious config settings, but also check your prolog/epilog scripts/settings as well as the .bashrc/.bash_profile and stuff in /etc/profile.d/ That may be hanging it up. Brian Andrus On 9/5/2024 5:17 AM, Loris Bennett via slurm-users wrote: Hi, With $ salloc --version slurm 23.11.10 and $ grep LaunchParameters /etc/slurm/slurm.conf LaunchParameters=use_interactive_step the following $ salloc --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 --qos=standard salloc: Granted job allocation 18928869 salloc: Nodes c001 are ready for job creates a job $ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 18928779 interacti interactloris R 1:05 1 c001 but causes the terminal to block. From a second terminal I can log into the compute node: $ ssh c001 [13:39:36] loris@c001 (1000) ~ Is that the expected behaviour or should salloc return a shell directly on the compute node (like srun --pty /bin/bash -l used to do)? Cheers, Loris -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Configuration for nodes with different TmpFs locations and TmpDisk sizes
Hi, This may help. job_container.conf # All nodes have /localscratch but for some_nodes2 it is mounted as NVME. AutoBasePath=true BasePath=/localscratch Shared=true # Some nodes have /localscratch1 configured, as localscratch is actually taken by a valid local device setup NodeName=some_nodes[9995-] AutoBasePath=true BasePath=/localscratch1 Shared=true # Some_nodes2 where we want to use local NVME mounted at localscratch. If this is nvidia kit, we may not want /dev/shm so explicit /tmp NodeName=some_nodes2[7770-] Dirs="/tmp" AutoBasePath=true BasePath=/localscratch Shared=true David -- David Simpson - Senior Systems Engineer ARCCA, Redwood Building, King Edward VII Avenue, Cardiff, CF10 3NB David Simpson - peiriannydd uwch systemau ARCCA, Adeilad Redwood, King Edward VII Avenue, Caerdydd, CF10 3NB From: Jake Longo via slurm-users Date: Wednesday, 4 September 2024 at 16:19 To: slurm-us...@schedmd.com Subject: [slurm-users] Configuration for nodes with different TmpFs locations and TmpDisk sizes External email to Cardiff University - Take care when replying/opening attachments or links. Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni. Hi, We have a number of machines in our compute cluster that have larger disks available for local data. I would like to add them to the same partition as the rest of the nodes but assign them a larger TmpDisk value which would allow users to request a larger tmp and land on those machines. The main hurdle is that (for reasons beyond my control) the larger local disks are on a special mount point /largertmp whereas the rest of the compute cluster uses the vanilla /tmp. I can't see an obvious way to make this work as the TmpFs value appears to be global only and attempting to set TmpDisk to a value larger than TmpFs for those nodes will put the machine into an invalid state. I couldn't see any similar support tickets or anything in the mail archive but I wouldn't have thought it would be that unusual to do this. Thanks in advance! Jake -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com