Re: [gridengine users] RQS Help

Ray Spence Wed, 27 Jun 2012 16:45:11 -0700

On Wed, Jun 27, 2012 at 4:39 PM, Reuti <[email protected]> wrote:


> Am 28.06.2012 um 01:29 schrieb Ray Spence:
>
> > > I think I'm coming to understand how SGE must be configured to
> restrict job memory
> > > usage. Our goal is to have one common queue with no memory/slots
> limits and one
> > > higher priority queue with memory and slots (h_vmem=128G, slots=32)
> limits. My understanding is that the only way to do this is to make h_vmem
> and slots globally (?)
> >
> > On the same exechosts? How should SGE know which jobs are allowed to
> run, if only the ones running in the high priority queue are requesting
> h_vmem and others could use any memory they want? The management of
> resources is the goal of SGE.
> >
> > I'm using slotwise subordination for SGE to give the h.q priority over
> the l.q.
>
> This won't free any resources besides slots. Memory and/or disk space like
> in $TMPDIR is still used up for suspended tasks.
>

oh yes, I know. And there is no apparent way around this unless I find a
way to tell SGE to
checkpoint jobs instead of suspend in these cases. Any advice on that idea?
Not worth it?

>
>
> > Also - my user base will not have the sophistication to request h_vmem,
> etc.. I am
> > really only looking for mechanisms to stop run-away jobs from consuming
> either
> > ram or cores.
> >
> >
> > > consumable via qconf -mc. Once I do that I must set a default limit in
> the next column.
> > > (I found that if I left h_vmem at 0 all jobs got killed..)
> >
> > Well, any used memory of a job will be larger than zero and hence get
> killed.
> >
> >
> > > So, how can I override that global (?) default on my higher priority
> queue? Can I do this
> > > in the queue config via qconf -mq ? I've tried this by setting the
> second column in the
> > > h_vmem line to "2G". Doesn't seem to work..
> >
> > This won't override the default. Just the tighter limit will be used.
> Its purpose is to either limit the h_vmem in case a user doesn't request
> it, or to define an upper limit up to which a user may request h_vmem per
> job (better: per slot). I.e. in the complex definition you define 2GB as
> default, the queue limit it's set to 16GB, and this way the user can
> request and amount between 2GB up to 16GB if 2 isn't sufficient.
> >
> > I've resorted to telling SGE that h_vmem is not consumable. After this,
> the h.q h_vmem
> > limit is obeyed and the l.q is not limited.
>
> If it's only to limit certain jobs to consume a complete node's memory,
> this is the way to go.
>

Yes!


>
> -- Reuti
>
>
> > I've given up on trying to do this per user across a given queue. It
> just doesn't seem possible.
> >
> >
> > -- Reuti
> >
> >
> > > Thanks again,
> > > Ray
> > >
> > >
> > > On Tue, Jun 26, 2012 at 1:59 PM, Reuti <[email protected]>
> wrote:
> > > Hi,
> > >
> > > Am 26.06.2012 um 20:57 schrieb Ray Spence:
> > >
> > > > Back on the list. Please see below -
> > > >
> > > > On Tue, Jun 26, 2012 at 11:41 AM, Reuti <[email protected]>
> wrote:
> > > > Am 26.06.2012 um 20:30 schrieb Ray Spence:
> > > >
> > > > > Reuti,
> > > > >
> > > > > Thank you so much for making RQS/h_vmem clear.
> > > > >
> > > > > I hope I'm not taking advantage of you here - I apologize if so.
> > > >
> > > > No, but please ask on the list or register. Therefore I didn't
> forward you last posting as it was sent from an unknown address.
> > > >
> > > >
> > > > > I have another question regarding slots. Our cluster has 4 nodes
> each with 32 cores. My
> > > > > assumption is that SGE should be able to run 128 total jobs at any
> time. I see only
> > > > > 1 job running per node with many jobs in qw. I think I need to
> change the "slots" value
> > > > > in the queue config? Here is what I have (still early on in
> learning SGE config..)
> > > > > from qconf -sq <queue>
> > > > >
> > > > > slots                1,[scf-sm00.Stat.Berkeley.EDU=32], \
> > > > >                       [scf-sm01.Stat.Berkeley.EDU=32], \
> > > > >                       [scf-sm02.Stat.Berkeley.EDU=32], \
> > > > >                       [scf-sm03.Stat.Berkeley.EDU=32]
> > > >
> > > > It's a matter of taste: the above is correct. If you have identical
> nodes, you can even shorten it to:
> > > >
> > > > slots      32
> > > >
> > > > Great, we do - I'll try that!
> > > >
> > > >
> > > > It's the number of slots per queue instance.
> > > >
> > > >
> > > > > should the "1" be 32? 128? Or, where is it that I tell SGE to use
> all 32 cores?
> > > >
> > > > As the default memory consumption is 248g, only one job can run at a
> time I would say.
> > > >
> > > > Ok, this is not what we want at all. It is more important to use all
> 32 cores/node than attempting
> > > > any ram usage control. I'm going to back out the h_vmem complex
> setting in order to run 128 jobs at a time. Should I reset the h_vmem
> complex back to not consumable (NO)
> > >
> > > If you want to control the usage of memory and avoid oversubscription
> of it, it needs to stay being consumable.
> > >
> > >
> > > > or keep it comsumable but set its default to say 1G? If I do that
> won't users have to request
> > > > a higher h_vmem  amount upon job submission?
> > >
> > > Sure, if they want to use more than 1G they have to request more. They
> have to predict what they need. There is no crystal ball inside SGE which
> could look ahead to predict the necessary memory for for a job.
> > >
> > > -- Reuti
> > >
> > >
> > > >
> > > > Try to submit jobs with "sleep 120" or so for which you requested
> less memory on the command line. The actual used up memory can be checked:
> > > >
> > > > qhost -F h_vmem
> > > >
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > > Thank you again,
> > > > > Ray
> > > > >
> > > > >
> > > > > On Tue, Jun 26, 2012 at 11:14 AM, Reuti <
> [email protected]> wrote:
> > > > > Am 26.06.2012 um 19:42 schrieb Ray Spence:
> > > > >
> > > > > > Hi Reuti,
> > > > > >
> > > > > > I'll respond in-line:
> > > > > >
> > > > > > On Mon, Jun 25, 2012 at 4:21 PM, Reuti <
> [email protected]> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Am 26.06.2012 um 00:57 schrieb Ray Spence:
> > > > > >
> > > > > > > I apologize for more questions but I'm not getting to where
> our group wants our new
> > > > > > > cluster to be. In order to limit all of a given user's jobs in
> a specified queue to a total
> > > > > > > amount of physical ram (h_vmem) I see no other solution than
> an RQS. Is this true?
> > > > > >
> > > > > > Correct. h_vmem is a hard limit while others prefer virtual_free
> as a guidance for SGE, while the latter is not enforced:
> > > > > >
> > > > > >
> http://www.gridengine.info/2009/12/01/adding-memory-requirement-awareness-to-the-scheduler/
> > > > > >
> > > > > >
> > > > > > I've read this info from you. When you say "Use the one you
> defined in your qsub command by requesting it with the -l option..." I take
> you to mean that once I've made a given memory complex (h_vmem,
> virtual_free, etc.) consumable (qconf -mc) in order to enforce any limit on
> that complex users must request a number value on that complex at job
> submission. I think
> > > > > > I'm repeating myself here.. Your info here is what lead me to
> pose my question in the first place.
> > > > > >
> > > > > >
> > > > > > > Using qconf -mq <queue> will limit each job in <queue> but not
> each user's total
> > > > > > > memory footprint across all his jobs, correct?
> > > > > >
> > > > > > Correct.
> > > > > >
> > > > > >
> > > > > > > The node-level limit does not do what
> > > > > > > we want here..
> > > > > >
> > > > > > Correct, it's the memory usage across all queues and resp. all
> jobs on a node.
> > > > > >
> > > > > >
> > > > > > > I have this RQS in place:
> > > > > > >
> > > > > > > {
> > > > > > >    name         high.q-h_vmem
> > > > > > >    description  "high.q h_vmem limited to 128G"
> > > > > >
> > > > > > The quotation marks are not necessary.
> > > > > >
> > > > > >
> > > > > > >    enabled      TRUE
> > > > > > >    limit        users {*} queues high.q to h_vmem=128g
> > > > > > > }
> > > > > >
> > > > > > You made h_vmem consumable and attached a value per exechost?
> > > > > >
> > > > > > Yes - via qconf -mc, here is what the h_vmem line looks like:
> > > > > >
> > > > > > h_vmem              h_vmem     MEMORY      <=    YES         YES
>        248g     0
> > > > >
> > > > > It should be set to a default you expect to be taken for a job. We
> set it to 2g here, and users can increase the per job limit to the one set
> in the queue definition.
> > > > >
> > > > >
> > > > > > (should the "default" value here be different than 248? see
> below.. Must it be 0? Must it NOT be 0?)
> > > > > >
> > > > > > and via qconf -me I've set h_vmem to be a little less (248G)
> than the installed ram (256G)
> > > > > > on each of the cluster's 4 nodes:
> > > > > >
> > > > > > qconf -se <cluster_node>
> > > > > > hostname              <>
> > > > > > load_scaling          NONE
> > > > > > complex_values        slots=32,h_vmem=248G
> > > > > > .....
> > > > > >
> > > > > >
> > > > > >
> > > > > > > which would seem to accomplish our goal. However, jobs
> submitted to high.q against this
> > > > > > > RQS without stating h_vmem needs at submission but which are
> written to exceed the memory limit do exceed the memory limit.
> > > > > >
> > > > > > Correct, the RQS will check the job request for h_vmem, but
> there is no relation back, i.e. that the RQS will limit the job's memory.
> Specifying only an overall limit per user would even make it hard for the
> RQS to decide what limit to (per job) set at all. Or if the overall limit
> is passed: which job should be killed?
> > > > > >
> > > > > >
> > > > > > > Worse, jobs submitted to high.q with an h_vmem need set below
> the RQS limit but which are written to exceed the limit successfully gobble
> up a
> > > > > > > forbidden amount of ram.
> > > > > >
> > > > > > I don't get this sentence. Can you make an example?
> > > > > >
> > > > > > I have a simple shell script that runs the linux tool stress
> which asks the system for some amount of ram, here is that line:
> > > > > >
> > > > > > /usr/bin/stress -v --cpu 1 --io 2 --vm 1 --vm-bytes 150G
> --vm-hang 0
> > > > > >
> > > > > > which ramps up to occupy 150GB by reading and dirtying ram. The
> --vm-hang 0 part tells
> > > > > > stress to simply stop and hang around indefinitely once stress
> has occupied 150GB. This
> > > > > > script succeeds if I do not state h_vmem request at job
> submission
> > > > >
> > > > > ...as the default is 248g
> > > > >
> > > > > > or if I ask for h_vmem under
> > > > > > the RQS limit of 128G.
> > > > >
> > > > > NB: g = base 1000, G = base 1024 (man sge_types)
> > > > >
> > > > >
> > > > > > It seems if RQS is satisfied upon job submission
> > > > >
> > > > > No, at job start.
> > > > >
> > > > >
> > > > > > then it does not
> > > > > > monitor ram usage once a job is running
> > > > >
> > > > > RQS will never monitor running jobs.
> > > > >
> > > > >
> > > > > > - you say as much in this response.
> > > > >
> > > > > You can check with:
> > > > >
> > > > > $ ulimit -aH
> > > > > $ ulimit -aS
> > > > >
> > > > > what was set by SGE for the limits. In addition SGE's execd (not
> the RQS) will monitor the usage which was requested by -l h_vmem=... or set
> by the default in the complex definition (man queue_conf, section RESOURCE
> LIMITS). This will be done by the execd, which doesn't know anyhing about
> other user's jobs on other nodes.
> > > > >
> > > > >
> > > > > > > Regarding ram usage: I have tested and read enough on RQS and
> the various ways to configure SGE to conclude that RQS doesn't actually
> monitor ram usage once a job has been submitted?
> > > > > >
> > > > > > It will monitor the requested RAM to decide whether any
> submitted job is eligible to start. All running ones should never pass
> h_vmem if added up.
> > > > > >
> > > > > > But here (second sentence) you imply that with an h_vmem value
> in an RQS that SGE does indeed monitor a user's running jobs to see if the
> cumulative ram usage exceeds the RQS
> > > > >
> > > > > Not the usage. It's a consumable, so it will just add up all
> requested h_vmem requests at time of start of the job and allow it to run
> or not.
> > > > >
> > > > > -- Reuti
> > > > >
> > > > >
> > > > > > h_vmem limit? Is this true but also that SGE will not kill any
> job to get a user's ram footprint down below the RQS? The monitoring is
> used only to determine if a submitted job may be run if that submitted
> job's h_vmem request and that user's current ram usage are together below
> the RQS limit?
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] RQS Help

Reply via email to