my users found the beauty of job arrays, and they tend to use it every then and now.

Sometimes human factor steps in, and something is wrong in job array specification, and cluster "works" on one failed array job after another.

Isn't there any way how to automatically stop/scancel/? job array after, let say, 20 failed array jobs in row?

So far my experience is, if first ~20 array jobs go right, there is no catastrophic failure in sbatch-file. If they fail, usually it's bad and there is no sense to crunch the remaining thousands of job array jobs.

OT: what is the correct terminology for one item in job array... sub-job? job-array-job? :)

cheers

josef


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to