On Monday, 10 September 2018 4:42:00 PM AEST Janne Blomqvist wrote: > One workaround is to reboot the node whenever this happens. Another is > to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option was > added in slurm 17.02 and is not present in 16.05 that you're using).
Phew, we had to set ConstrainKmemSpace=no to avoid breaking Intel Omnipath so looks like we dodged a bullet there. Nice work tracking it down! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC