This morning we had several jobs fail with "launch failed requeued held" state. 
  We traced this to a failed prolog.   We fixed the problem but the jobs 
remained in this state.

Is there a way to configure slurm so that it will automatically release the job 
from the Held state so that it can run?   There were plenty of healthy nodes 
for this job so I'd prefer that the job not remained held indefinitely.

Thanks!
-Roger

[cid:image001.png@01D22319.C7D5D540]
Roger Moye
HPC Engineer
713.425.6236 Office
713.898.0021 Mobile

QUANTLAB Financial, LLC
3 Greenway Plaza
Suite 200
Houston, Texas 77046
www.quantlab.com<https://www.quantlab.com/>

-----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential and 
intended solely for the attention and use of the named addressee(s). All 
information and opinions expressed herein are subject to change without notice. 
This communication is not to be construed as an offer to sell or the 
solicitation of an offer to buy any security. Any such offer or solicitation 
can only be made by means of the delivery of a confidential private offering 
memorandum (which should be carefully reviewed for a complete description of 
investment strategies and risks). Any reliance one may place on the accuracy or 
validity of this information is at their own risk. Past performance is not 
necessarily indicative of the future results of an investment. All figures are 
estimated and unaudited unless otherwise noted. If you are not the intended 
recipient, or a person responsible for delivering this to the intended 
recipient, you are not authorized to and must not disclose, copy, distribute, 
or retain this message or any part of it. In this case, please notify the 
sender immediately at 713-333-5440

Reply via email to