On 05/03/18 12:12, Dan Jordan wrote:
What is the /correct /way to clean up processes across the nodes given to my program by SLURM_JOB_NODELIST?
I'd strongly suggest using cgroups in your Slurm config to ensure that processes are corralled and tracked correctly. You can use pam_slurm_adopt from the contrib directory to capture inbound SSH sessions into a running job on the node (and deny access to people who don't). Then Slurm should take care of everything for you without needing an epilog. Hope this helps! Chris