Am 28.06.2012 um 02:00 schrieb Ray Spence: > On Wed, Jun 27, 2012 at 4:52 PM, Reuti <[email protected]> wrote: > Am 28.06.2012 um 01:43 schrieb Ray Spence: > > > On Wed, Jun 27, 2012 at 4:39 PM, Reuti <[email protected]> wrote: > > Am 28.06.2012 um 01:29 schrieb Ray Spence: > > > > > > I think I'm coming to understand how SGE must be configured to restrict > > > > job memory > > > > usage. Our goal is to have one common queue with no memory/slots limits > > > > and one > > > > higher priority queue with memory and slots (h_vmem=128G, slots=32) > > > > limits. My understanding is that the only way to do this is to make > > > > h_vmem and slots globally (?) > > > > > > On the same exechosts? How should SGE know which jobs are allowed to run, > > > if only the ones running in the high priority queue are requesting h_vmem > > > and others could use any memory they want? The management of resources is > > > the goal of SGE. > > > > > > I'm using slotwise subordination for SGE to give the h.q priority over > > > the l.q. > > > > This won't free any resources besides slots. Memory and/or disk space like > > in $TMPDIR is still used up for suspended tasks. > > > > oh yes, I know. And there is no apparent way around this unless I find a > > way to tell SGE to > > checkpoint jobs instead of suspend in these cases. > > Correct. > > > > Any advice on that idea? Not worth it? > > You can only define a large swap space (maybe up to the size of the builtin > memory) and once the suspended processes are swapped out the real memory is > available for the running ones. > > I've wondered if I can tell SGE to checkpoint to a faster storage space like > a large USB device.
SGE won't checkpoint anything. It can only trigger an already outside of SGE available checkpointing facility. Being it kernel level checkpointing or an application level by some suppplied scripts or inside the application acting on certain signals. These can then use any faster media if they are configured to do so, but it needs to be available globally. The rescheduled job may start on another node. -- Reuti > But, that's just idle speculation. > > > > -- Reuti > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
