Hi! I'd like to set different resource limits for different steps of my job. A sample script might look like this (e.g. job.sh):
#!/bin/bash srun --cpus-per-task=1 --mem=1 echo "Starting..." srun --cpus-per-task=4 --mem=250 --exclusive <do something complicated> srun --cpus-per-task=1 --mem=1 echo "Finished." Then I would run the script from the command line using the following command: sbatch --ntasks=1 job.sh. I have observed that while none of the steps appear to have limited memory (which I'm pretty sure has to do with my proctrack plugin type), the second step runs and scontrol show step <id>.1 shows the step has having been allocated 4 CPUs, in reality the step is only able to use 1. I have also observed the opposite. Running the following command, I can see that the job step is able to use all CPUs allocated to the job, rather than the one it was allocated itself: sbatch --ntasks=1 --cpus-per-task=2 << EOF #!/bin/bash srun --cpus-per-task=1 <do something complicated> EOF My goal here is to be able to run a single job with 3 steps where the first and last step are always executed, even if the second would not be run because too many resources were requested. Here is my slurm.conf, with commented out lines removed (this is just a small test cluster with a single node on the same machine as the controller): SlurmctldHost=ubuntu CredType=cred/munge AuthType=auth/munge EnforcePartLimits=ALL MpiDefault=none ProctrackType=proctrack/linuxproc ReturnToService=2 SlurmctldPidFile=/var/spool/slurm/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/spool/slurm/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurm/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurm SwitchType=switch/none TaskPlugin=task/affinity TaskPluginParam=Sched InactiveLimit=0 KillWait=30 MinJobAge=3600 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU AccountingStorageType=accounting_storage/slurmdbd AccountingStoreJobComment=YES ClusterName=cluster JobCompHost=localhost JobCompLoc=slurm_db JobCompPort=3306 JobCompType=jobcomp/mysql JobCompUser=slurm JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=info SlurmctldLogFile=/var/spool/slurm/slurmctld.log SlurmdDebug=info SlurmdLogFile=/var/spool/slurm/slurmd/slurmd.log NodeName=ubuntu CPUs=4 RealMemory=500 State=UNKNOWN PartitionName=main Nodes=ubuntu Default=YES MaxTime=INFINITE State=UP AllowGroups=maria Any advice would be greatly appreciated! Thanks in advance! -- Thanks, Maria