Hello,
I am in the situation where evaluating the precise memory consumption of jobs beforehand is pretty challenging. So I would like to create a “trust” system, meaning that the requested memory for jobs is taken into account for scheduling, but no action is taken if the job actually breach the limit once running on the node. I tried to use NoOverMemoryKill but it seems to work only for sbatch, not srun. So I ended up declaring memory as an un-consumable resource on the slurm.conf of nodes, but not on the master. This seems to work, but looks rather hackish (and slurm complains of the discrepancy in configuration) Is this a supported practice? Can it bite me later on? Is there a cleaner solution?