The situation you are describing is an excellent use for array jobs. You
can create a list of the non-numerical values you want to use and then
call the specific one you want using the built in jobid environment
vairables. The list can be created as a bash list in your submission
script directly or an input file based on what is easiest to make.

Example: Assume I have 4 different input values I want to run for the
same script. I would create a submission script as such:

#SBATCH --array=0-3
inputs=('in1' 'in2' 'in3' 'in4')
script.py ${inputs[$SLURM_ARRAY_TASK_ID]}

This will submit 4 separate jobs running different inputs for your
python script.

If you have 500 individual jobs that you want to run putting all you
inputs in your submission script is probably a lot of unnecessary work
and would make changing your inputs difficult. In that case you can make
a file where each input is on it's own line and create a submission
script similar to:

#SBATCH --array=1-4
input=$(sed -n -e ${SLURM_ARRAY_TASK_ID}p $file
script.py $input

Be aware that bash lists start indexing at 0 so your array would likely
start at 0 and sed line numbers will start at 1 so you want to start at
1. You can also make the array index to be anything you want so if you
know you only want to run lines 100-200 you can set array=100-200.


On 7/15/20 1:45 PM, Renfro, Michael wrote:
> If the 500 parameters happened to be filenames, you could do adapt like 
> (appropriated from somewhere else, but I can’t find the reference quickly:
> =====
> #!/bin/bash
>  # get count of files in this directory
> NUMFILES=$(ls -1 *.inp | wc -l)
> # subtract 1 as we have to use zero-based indexing (first element is 0)
> # submit array of jobs to SLURM
> if [ $ZBNUMFILES -ge 0 ]; then
>   sbatch --array=0-$ZBNUMFILES array_job.sh
> else
>   echo "No jobs to submit, since no input files in this directory.”
> fi
> =====
> with:
> =====
> #!/bin/bash
> #SBATCH --nodes=1  --ntasks-per-node=1 --cpus-per-task=1
> #SBATCH --time=00:01:00
> #SBATCH --job-name array_demo_2
> echo "All jobs in this array have:"
> echo "This job in the array has:"
> # grab our filename from a directory listing
> FILES=($(ls -1 *.inp))
> echo "My input file is ${FILENAME}”
> # make new directory, change into it, and run
> mkdir ${FILENAME}_out
> cd ${FILENAME}_out
> echo "First 10 lines of ../${FILENAME} are:" > ${FILENAME}_results.out
> head ../${FILENAME} >> ${FILENAME}_results.out
> =====
> If the 500 parameters were lines in a file, the same logic would apply:
> - subtract 1 from the number of lines in the file to determine the array limit
> - add 1 to ${SLURM_ARRAY_TASK_ID} to get a line number for a specific 
> parameter
> - something like "sed -n ‘${TASK_ID_PLUS_ONE}p’ filename” to retrieve that 
> parameter
> - run the Python script with that value
>> I'm trying to run an embarrassingly parallel experiment, with 500+ tasks 
>> that all differ in one parameter.  e.g.:
>> job 1 - script.py foo
>> job 2 - script.py bar
>> job 3 - script.py baz
>> and so on.
>> This seems like a case where having a slurm array hold all of these jobs 
>> would help, so I could just submit one job to my cluster instead of 500 
>> individual jobs.  It seems like sarray is only set up for varying an integer 
>> index parameter.  How would i do this for non-numeric values (say, if the 
>> parameter I'm varying is a string in a given list) ?

