Hello all, I am submitting a job to a SLURM scheduler, which contains an array of small jobs.
For example, here's a script that simply prints out the date and hostname of the compute node from within a heredoc: ------------------- #!/bin/bash ...(variables)... sbatch --parsable --partition=${jobPartition} --array=1-${jobArrayCount} --job-name=${jobName}.%a --output=${jobName}.stdout.%a.%j --error=${jobName}.stderr.%a.%j --mem-per-cpu=${jobMem} --export=ALL <<"EOF" #!/bin/bash stamp=`date && hostname` echo -e "Child array job [${SLURM_ARRAY_TASK_ID}]:\n${stamp}" EOF exit 0 ------------------- The filenames of the output and error logs from this job contain the correct array task ID (1 through ${jobArrayCount}, represented with the %a variable) and parent job ID (represented with the %j variable). However, the job name (${jobName}.%a) only expands the ${jobName} variable, and it prints the %a value as a string literal — that is, it is left untranslated to the array task ID. For example, if "jobName=foo", then the use of --job-name=${jobName}.%a results in the scheduler using the job name "foo.%a", instead of "foo.1", "foo.2", and so on, up to the number of child jobs in the array. As output and error logs can use the %a array task ID variable, is there a way to get the job name assignment to use this variable as well? Another thing I tried was to move the job name assignment within the heredoc block: ------------------- #!/bin/bash ...(variables)... sbatch --parsable --partition=${jobPartition} --array=1-${jobArrayCount} --output=${jobName}.stdout.%a.%j --error=${jobName}.stderr.%a.%j --mem-per-cpu=${jobMem} --export=ALL <<"EOF" #!/bin/bash #SBATCH --job-name="${jobName}.${SLURM_ARRAY_TASK_ID}" stamp=`date && hostname` echo -e "Child array job [${SLURM_ARRAY_TASK_ID}]:\n${stamp}" EOF exit 0 ------------------- In this case, the job name is rendered literally as the string "${jobName}.${SLURM_ARRAY_TASK_ID}". A third thing that I tried was to rename the job name via `scontrol`, after the fact, which works but only if the job is in the scheduler and only if it is running: ------------------- $ scontrol update JobId=${arrayJobId} JobName=${jobName}.${jobArrayTaskId} ------------------- The `sacct` program does not seem to have keywords that grant access to array job and task IDs, e.g.: ------------------- $ sacct -j ${arrayJobId} --format=ArrayJobId,ArrayTaskId --noheader --parsable2 sacct: error: Invalid field requested: "ArrayJobId" ------------------- (Keywords are listed here: https://slurm.schedmd.com/sacct.html) However, it looks like I can use `scontrol` to get the array job and task IDs, though it is a bit of a hack: ------------------- $ scontrol show job ${arrayJobId} | grep ArrayTaskId | awk '{i=split($0,a," "); j=split(a[3],b,"="); k=split(a[4],c,"="); print c[2]"."b[2]; }' testArrayChild.1 ------------------- There are a few problems with this approach: 1. I can't rename the array of jobs until they are in the scheduler 2. My method for getting the array task ID is a hack that seems fragile 3. I can't rename the job after it is finished These issues seem to make this approach difficult to implement in a reliable way. My question, ultimately, is: Is there an easier way to have the an array job name include the array task ID? Regards, Alex