We're running S[o]GE 8.1.6 and I'm looking for suggestions re. managing
per-node local scratch space as a consumable resource.
Currently that temporary space is in a local disk on each node, with
directories named:
/scratch/${USER}
but this would be changed to something like:
/scratch/${USER}/${JOBID}${JOB_ARRAY_INDEX}
My aim is to have SGE manage scratch space as a resource similar to an
h_vmem resource request:
1. ensure the node has enough scratch space before running the job
df -h /scratch must be greater than $USER_SCRATCH_REQUEST
2. internally decrement the 'available' scratch space according to the
requested amount, even if bytes aren't written to disk yet
3. if the job exceeds the requested scratch space kill the job
4. clean the per-job scratch space when the job is finished
rm -rf /scratch/${USER}/${JOBID}
I understand that #1 will require a custom load sensor (df -sh /scratch).
Feature #2 will require defining scratch space as a consumable complex,
with the amount of scratch space defined per-node -- that's not
hard. I'm a bit concerned about the overhead of SGE running
du -sh /scratch/${USER}/${JOBID}${JOB_ARRAY_INDEX}
for each job (up to 40) on a node, crawling deep directory trees on a
single hard drive every $schedule_interval.
I believe that the presence of the complex will automatically cause
SGE to kill the job (#3) if the per-user limit is exceeded, but I'm
not sure how the load sensor will communicate per-job scratch directory
space consumption, rather than space used (or available) on the entire
scratch disk.
I'd appreciate suggestions on how to ensure that the sge_execd cleans
up the per-job scratch directory at the conclusion of a job. It would
be great if there was a flag to supress this behavior, in case scratch
files needed to be examined for debugging.
Thanks,
Mark
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss