Davide DelVento <davide.quan...@gmail.com> writes: >> I'm curious: What kind of disruption did it cause for your production >> jobs? > > All jobs failed and went in pending/held with "launch failed requeued > held" status, all nodes where the jobs were scheduled went draining. > > The logs only said "error: validate_node_specs: Prolog or job env > setup failure on node xxxx, draining the node". I guess if they said > "-bash: /path/to/prolog: Permission denied" I would have caught the > problem myself.
But that is not a problem caused by having things like exec &> /root/prolog_slurmd.$$ in the script, as you indicated. It is a problem caused by the prolog script file not being executable. > In hindsight it is obvious, but I don't think even the documentation > mentions that, does it? After all you can execute a file with a > non-executable with with "sh filename", so I made the incorrect > assumption that slurm would have invoked the prolog that way. Slurm prologs can be written in any language - we used to have perl prolog scripts. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature