On Wed, Jun 27, 2012 at 3:50 PM, Reuti <[email protected]> wrote:
> Hi, > > Am 27.06.2012 um 23:46 schrieb Ray Spence: > > > I think I'm coming to understand how SGE must be configured to restrict > job memory > > usage. Our goal is to have one common queue with no memory/slots limits > and one > > higher priority queue with memory and slots (h_vmem=128G, slots=32) > limits. My understanding is that the only way to do this is to make h_vmem > and slots globally (?) > > On the same exechosts? How should SGE know which jobs are allowed to run, > if only the ones running in the high priority queue are requesting h_vmem > and others could use any memory they want? The management of resources is > the goal of SGE. > I'm using slotwise subordination for SGE to give the h.q priority over the l.q. Also - my user base will not have the sophistication to request h_vmem, etc.. I am really only looking for mechanisms to stop run-away jobs from consuming either ram or cores. > > > > consumable via qconf -mc. Once I do that I must set a default limit in > the next column. > > (I found that if I left h_vmem at 0 all jobs got killed..) > > Well, any used memory of a job will be larger than zero and hence get > killed. > > > > So, how can I override that global (?) default on my higher priority > queue? Can I do this > > in the queue config via qconf -mq ? I've tried this by setting the > second column in the > > h_vmem line to "2G". Doesn't seem to work.. > > This won't override the default. Just the tighter limit will be used. Its > purpose is to either limit the h_vmem in case a user doesn't request it, or > to define an upper limit up to which a user may request h_vmem per job > (better: per slot). I.e. in the complex definition you define 2GB as > default, the queue limit it's set to 16GB, and this way the user can > request and amount between 2GB up to 16GB if 2 isn't sufficient. > I've resorted to telling SGE that h_vmem is not consumable. After this, the h.q h_vmem limit is obeyed and the l.q is not limited. I've given up on trying to do this per user across a given queue. It just doesn't seem possible. > > -- Reuti > > > > Thanks again, > > Ray > > > > > > On Tue, Jun 26, 2012 at 1:59 PM, Reuti <[email protected]> > wrote: > > Hi, > > > > Am 26.06.2012 um 20:57 schrieb Ray Spence: > > > > > Back on the list. Please see below - > > > > > > On Tue, Jun 26, 2012 at 11:41 AM, Reuti <[email protected]> > wrote: > > > Am 26.06.2012 um 20:30 schrieb Ray Spence: > > > > > > > Reuti, > > > > > > > > Thank you so much for making RQS/h_vmem clear. > > > > > > > > I hope I'm not taking advantage of you here - I apologize if so. > > > > > > No, but please ask on the list or register. Therefore I didn't forward > you last posting as it was sent from an unknown address. > > > > > > > > > > I have another question regarding slots. Our cluster has 4 nodes > each with 32 cores. My > > > > assumption is that SGE should be able to run 128 total jobs at any > time. I see only > > > > 1 job running per node with many jobs in qw. I think I need to > change the "slots" value > > > > in the queue config? Here is what I have (still early on in learning > SGE config..) > > > > from qconf -sq <queue> > > > > > > > > slots 1,[scf-sm00.Stat.Berkeley.EDU=32], \ > > > > [scf-sm01.Stat.Berkeley.EDU=32], \ > > > > [scf-sm02.Stat.Berkeley.EDU=32], \ > > > > [scf-sm03.Stat.Berkeley.EDU=32] > > > > > > It's a matter of taste: the above is correct. If you have identical > nodes, you can even shorten it to: > > > > > > slots 32 > > > > > > Great, we do - I'll try that! > > > > > > > > > It's the number of slots per queue instance. > > > > > > > > > > should the "1" be 32? 128? Or, where is it that I tell SGE to use > all 32 cores? > > > > > > As the default memory consumption is 248g, only one job can run at a > time I would say. > > > > > > Ok, this is not what we want at all. It is more important to use all > 32 cores/node than attempting > > > any ram usage control. I'm going to back out the h_vmem complex > setting in order to run 128 jobs at a time. Should I reset the h_vmem > complex back to not consumable (NO) > > > > If you want to control the usage of memory and avoid oversubscription of > it, it needs to stay being consumable. > > > > > > > or keep it comsumable but set its default to say 1G? If I do that > won't users have to request > > > a higher h_vmem amount upon job submission? > > > > Sure, if they want to use more than 1G they have to request more. They > have to predict what they need. There is no crystal ball inside SGE which > could look ahead to predict the necessary memory for for a job. > > > > -- Reuti > > > > > > > > > > Try to submit jobs with "sleep 120" or so for which you requested less > memory on the command line. The actual used up memory can be checked: > > > > > > qhost -F h_vmem > > > > > > > > > -- Reuti > > > > > > > > > > Thank you again, > > > > Ray > > > > > > > > > > > > On Tue, Jun 26, 2012 at 11:14 AM, Reuti <[email protected]> > wrote: > > > > Am 26.06.2012 um 19:42 schrieb Ray Spence: > > > > > > > > > Hi Reuti, > > > > > > > > > > I'll respond in-line: > > > > > > > > > > On Mon, Jun 25, 2012 at 4:21 PM, Reuti <[email protected]> > wrote: > > > > > Hi, > > > > > > > > > > Am 26.06.2012 um 00:57 schrieb Ray Spence: > > > > > > > > > > > I apologize for more questions but I'm not getting to where our > group wants our new > > > > > > cluster to be. In order to limit all of a given user's jobs in a > specified queue to a total > > > > > > amount of physical ram (h_vmem) I see no other solution than an > RQS. Is this true? > > > > > > > > > > Correct. h_vmem is a hard limit while others prefer virtual_free > as a guidance for SGE, while the latter is not enforced: > > > > > > > > > > > http://www.gridengine.info/2009/12/01/adding-memory-requirement-awareness-to-the-scheduler/ > > > > > > > > > > > > > > > I've read this info from you. When you say "Use the one you > defined in your qsub command by requesting it with the -l option..." I take > you to mean that once I've made a given memory complex (h_vmem, > virtual_free, etc.) consumable (qconf -mc) in order to enforce any limit on > that complex users must request a number value on that complex at job > submission. I think > > > > > I'm repeating myself here.. Your info here is what lead me to pose > my question in the first place. > > > > > > > > > > > > > > > > Using qconf -mq <queue> will limit each job in <queue> but not > each user's total > > > > > > memory footprint across all his jobs, correct? > > > > > > > > > > Correct. > > > > > > > > > > > > > > > > The node-level limit does not do what > > > > > > we want here.. > > > > > > > > > > Correct, it's the memory usage across all queues and resp. all > jobs on a node. > > > > > > > > > > > > > > > > I have this RQS in place: > > > > > > > > > > > > { > > > > > > name high.q-h_vmem > > > > > > description "high.q h_vmem limited to 128G" > > > > > > > > > > The quotation marks are not necessary. > > > > > > > > > > > > > > > > enabled TRUE > > > > > > limit users {*} queues high.q to h_vmem=128g > > > > > > } > > > > > > > > > > You made h_vmem consumable and attached a value per exechost? > > > > > > > > > > Yes - via qconf -mc, here is what the h_vmem line looks like: > > > > > > > > > > h_vmem h_vmem MEMORY <= YES YES > 248g 0 > > > > > > > > It should be set to a default you expect to be taken for a job. We > set it to 2g here, and users can increase the per job limit to the one set > in the queue definition. > > > > > > > > > > > > > (should the "default" value here be different than 248? see > below.. Must it be 0? Must it NOT be 0?) > > > > > > > > > > and via qconf -me I've set h_vmem to be a little less (248G) than > the installed ram (256G) > > > > > on each of the cluster's 4 nodes: > > > > > > > > > > qconf -se <cluster_node> > > > > > hostname <> > > > > > load_scaling NONE > > > > > complex_values slots=32,h_vmem=248G > > > > > ..... > > > > > > > > > > > > > > > > > > > > > which would seem to accomplish our goal. However, jobs submitted > to high.q against this > > > > > > RQS without stating h_vmem needs at submission but which are > written to exceed the memory limit do exceed the memory limit. > > > > > > > > > > Correct, the RQS will check the job request for h_vmem, but there > is no relation back, i.e. that the RQS will limit the job's memory. > Specifying only an overall limit per user would even make it hard for the > RQS to decide what limit to (per job) set at all. Or if the overall limit > is passed: which job should be killed? > > > > > > > > > > > > > > > > Worse, jobs submitted to high.q with an h_vmem need set below > the RQS limit but which are written to exceed the limit successfully gobble > up a > > > > > > forbidden amount of ram. > > > > > > > > > > I don't get this sentence. Can you make an example? > > > > > > > > > > I have a simple shell script that runs the linux tool stress which > asks the system for some amount of ram, here is that line: > > > > > > > > > > /usr/bin/stress -v --cpu 1 --io 2 --vm 1 --vm-bytes 150G --vm-hang > 0 > > > > > > > > > > which ramps up to occupy 150GB by reading and dirtying ram. The > --vm-hang 0 part tells > > > > > stress to simply stop and hang around indefinitely once stress has > occupied 150GB. This > > > > > script succeeds if I do not state h_vmem request at job submission > > > > > > > > ...as the default is 248g > > > > > > > > > or if I ask for h_vmem under > > > > > the RQS limit of 128G. > > > > > > > > NB: g = base 1000, G = base 1024 (man sge_types) > > > > > > > > > > > > > It seems if RQS is satisfied upon job submission > > > > > > > > No, at job start. > > > > > > > > > > > > > then it does not > > > > > monitor ram usage once a job is running > > > > > > > > RQS will never monitor running jobs. > > > > > > > > > > > > > - you say as much in this response. > > > > > > > > You can check with: > > > > > > > > $ ulimit -aH > > > > $ ulimit -aS > > > > > > > > what was set by SGE for the limits. In addition SGE's execd (not the > RQS) will monitor the usage which was requested by -l h_vmem=... or set by > the default in the complex definition (man queue_conf, section RESOURCE > LIMITS). This will be done by the execd, which doesn't know anyhing about > other user's jobs on other nodes. > > > > > > > > > > > > > > Regarding ram usage: I have tested and read enough on RQS and > the various ways to configure SGE to conclude that RQS doesn't actually > monitor ram usage once a job has been submitted? > > > > > > > > > > It will monitor the requested RAM to decide whether any submitted > job is eligible to start. All running ones should never pass h_vmem if > added up. > > > > > > > > > > But here (second sentence) you imply that with an h_vmem value in > an RQS that SGE does indeed monitor a user's running jobs to see if the > cumulative ram usage exceeds the RQS > > > > > > > > Not the usage. It's a consumable, so it will just add up all > requested h_vmem requests at time of start of the job and allow it to run > or not. > > > > > > > > -- Reuti > > > > > > > > > > > > > h_vmem limit? Is this true but also that SGE will not kill any job > to get a user's ram footprint down below the RQS? The monitoring is used > only to determine if a submitted job may be run if that submitted job's > h_vmem request and that user's current ram usage are together below the RQS > limit? > > > > > > > > > > > > > > > > > > > > > > > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
