Hi Rik

> > Sounds like a job for memory limits (ulimit?), not for OOM
> > notification, right?
> 
> I suspect one problem could be that an HPC job scheduling program
> does not know exactly how much memory each job can take, so it can
> sometimes end up making a mistake and overcommitting the memory on
> one HPC node.
> 
> In that case the user is better off having that job killed and
> restarted elsewhere, than having all of the jobs on that node
> crawl to a halt due to swapping.
> 
> Paul, is this guess correct? :)

Yes.
Fujitsu HPC middleware watching sum of memory consumption of the job
and, if over-consumption happened, kill process and remove job schedule.

I think that is common hpc requirement.
but we watching to user defined memory limit, not swap.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to