On Wed, 5 Oct 2016, William Hay wrote:
...
Our prolog and epilog (parallel) ssh into the slave nodes and do the equivalent of run-parts on directories full of scripts some of which check if they are running on the head node of the job before doing anything. If we did want the epilog to save TMPDIRS from slave nodes we'd just have to decide how to name them I guess.
...

Presumably this would work for you capture-wise because you're creating your own TMPDIRs rather than using the ones provided by the execd. (As Reuti pointed out, the execd TMPDIRs on slave nodes are ephemeral.)

It'd be a pity to switch to doing it that way: the execd TMPDIR can be paired with an xfs project quota scheme which is nice and tidy. I imagine that deleting TMPDIRs via an epilog has a greater number of failure modes, not all of which can be avoided by purging old directories at boot, like intermittent network problems. How has that worked for you in practice?

Also, passwordless ssh between compute nodes has been useful to avoid. Not only principle of least privilege - it's handy to help identify applications that aren't tightly integrated.

Maybe our users can live with just the master node's TMPDIR.

Cheers,

Mark
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to