On 02/05/18 10:15, R. Paul Wiegand wrote:
Yes, I am sure they are all the same. Typically, I just scontrol reconfig; however, I have also tried restarting all daemons.
Understood. Any diagnostics in the slurmd logs when trying to start a GPU job on the node?
We are moving to 7.4 in a few weeks during our downtime. We had a QDR -> OFED version constraint -> Lustre client version constraint issue that delayed our upgrade.
I feel your pain.. BTW RHEL 7.5 is out now so you'll need that if you need current security fixes.
Should I just wait and test after the upgrade?
Well 17.11.6 will be out then that will include for a deadlock that some sites hit occasionally, so that will be worth throwing into the mix too. Do read the RELEASE_NOTES carefully though, especially if you're using slurmdbd! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC