Gus Correa <g...@ldeo.columbia.edu> writes:

> On 03/27/2014 05:05 AM, Andreas Schäfer wrote:
>>> >Queue systems won't allow resources to be oversubscribed.

[Maybe that meant that resource managers can, and typically do, prevent
resources being oversubscribed.]

>> I'm fairly confident that you can configure Slurm to oversubscribe
>> nodes: just specify more cores for a node than are actually present.
>>
>
> That is true.
> If you lie to the queue system about your resources,
> it will believe you and oversubscribe.

For what it's worth, oversubscription might be overall or limited.  We
just had a user running some crazy Java program he refuses to explain
submitted as a serial job running ~150 threads.  The over-subscription
was confined to core is used, and the effect on the 127 others was
mostly due to the small overhead of the node daemon reading the crazy
/proc smaps file to track the memory usage.  The other cores were
normally subscribed.

Ob-OMPI:  the other jobs may have been OMPI ones!

> Torque has this same feature.
> I don't know about SGE.
> You may choose to set some or all nodes with more cores than they
> actually have, if that is a good choice for the codes you run.
> However, for our applications oversubscribing is bad, hence my mindset.

Right.  I don't think there's any question that it's a bad idea on a
general purpose cluster running some OMPI jobs, for instance.


Reply via email to