On 3/20/19 9:09 AM, Peter Steinbach wrote:
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf file, everything stops working again. :/ Should I send around scontrol outputs? And yes, I watched out to set the --mem flag for the job submission this time.
Well there you've said that the GPUs are only accessible by those numbered cores, and the first non-GRES job on the node will likely have taken them already.
You probably want to include all the cores on the appropriate sockets for the GPUs.
All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA