[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

Nikanth Karthikesan Thu, 29 Jan 2009 07:57:34 -0800

On Wednesday 28 January 2009 06:30:42 Paul Menage wrote:
> Hi Nikanth,
>
> On Fri, Jan 23, 2009 at 6:56 AM, Nikanth Karthikesan <knika...@suse.de> 
wrote:
> > From: Nikanth Karthikesan <knika...@suse.de>
> >
> > Cgroup based OOM killer controller
> >
> > Signed-off-by: Nikanth Karthikesan <knika...@suse.de>
> >
> > ---
> >
> > This is a container group based approach to override the oom killer
> > selection without losing all the benefits of the current oom killer
> > heuristics and oom_adj interface.
>
> The basic functionality looks useful.
>


Thanks.

> But before we add an OOM subsystem and commit to an API that has to be
> supported forever, I think it would be good to have an overall design
> for what kinds of things we want to be able to do regarding cgroups
> and OOM killing.
>
> Specifying a per-cgroup priority is part of the solution, and is
> useful for simple cases. Some kind of userspace notification is also
> useful.
>

Yes, very much.

> The notification system that David/Ying posted has worked pretty well
> for us at Google - it's allowed us to use cpusets and fake numa to
> provide hard memory controls and guarantees for jobs, while avoiding
> having jobs getting killed when they expand faster than we expect. But
> we also acknowledge that it's a bit of a hack, and it would be nice to
> come up with something more generally acceptable for a real
> submission.
>
> > It adds a tunable oom.victim to the oom cgroup. The oom killer will kill
> > the process using the usual badness value but only within the cgroup with
> > the maximum value for oom.victim before killing any process from a cgroup
> > with a lesser oom.victim number. Oom killing could be disabled by setting
> > oom.victim=0.
>
> "priority" might be a better term than "victim".
>

Agreed.

> > CPUSET constrained OOM:
> > Also the tunable oom.cpuset_constrained when enabled, would disable the
> > ordering imposed by this controller for cpuset constrained OOMs.
> >
> > diff --git a/Documentation/cgroups/oom.txt
> > b/Documentation/cgroups/oom.txt new file mode 100644
> > index 0000000..772fb41
> > --- /dev/null
> > +++ b/Documentation/cgroups/oom.txt
> > @@ -0,0 +1,34 @@
> > +OOM Killer controller
> > +--- ------ ----------
> > +
> > +The OOM killer kills the process based on a set of heuristics such that
> > only
>
> Might be worth adding "theoretically" in this sentence :-)
>
> >        do_posix_clock_monotonic_gettime(&uptime);
> > @@ -257,10 +262,30 @@ static struct task_struct
> > *select_bad_process(unsigned long *ppoints,
> >                        continue;
> >
> >                points = badness(p, uptime.tv_sec);
> > +#ifdef CONFIG_CGROUP_OOM
> > +               taskvictim =
> > (container_of(p->cgroups->subsys[oom_subsys_id], +                       
> >                        struct oom_cgroup, css))->victim;
>
> Firstly, this ought to be using the task_subsys_state() function to
> ensure the appropriate rcu_dereference() calls.
>

Ok.

> Secondly, is it safe? I'm not sure if we're in an RCU section in this
> case, and we certainly haven't called task_lock(p) or cgroup_lock().
> You should surround this with rcu_read_lock()/rcu_read_unlock().
>

Ok.

> And thirdly, it would be better to move the #ifdef to the header file,
> and provide dummy functions that return 0 for the kill priority if
> CONFIG_CGROUP_OOM isn't defined.
>

Ok. As this patch uses 0 to disable oom_killing completely, the dummy function 
should return 1 instead of zero. It should be documented more clearly.

> > +               honour_cpuset_constraint = *(container_of(p->cgroups-
> >
> >>subsys[oom_subsys_id],
> >
> > +                                                struct oom_cgroup,
> > css))-
> >
> >>cpuset_constraint;
>
> I think that putting this kind of inter-subsystem dependency in is a
> bad idea. If you want to control whether the OOM killer treats cpusets
> specially, perhaps that flag should be put in cpusets?
>

But then won't it add a special variable in cpusets for oom-controller?

> > +
> > +               if (taskvictim > chosenvictim ||
> > +                       (((taskvictim == chosenvictim) ||
> > +                               (cpuset_constrained &&
> > honour_cpuset_constraint)) +                                && points >
> > *ppoints) ||
> > +                       (taskvictim && !chosen)) {
>
> This could do with more comments or maybe breaking up into simpler
> conditions.
>

Ok.

> > +       if (cont->parent == NULL) {
> > +               oom_css->victim = 1;
>
> Any reason to default to 1 rather than 0?
>

0 disables oom killing completely.

> > +               oom_css->cpuset_constraint =
> > +                       kzalloc(sizeof(*oom_css->cpuset_constraint),
> > GFP_KERNEL); +               *oom_css->cpuset_constraint = false;
> > +       } else {
> > +               parent = oom_css_from_cgroup(cont->parent);
> > +               oom_css->victim = parent->victim;
> > +               oom_css->cpuset_constraint = parent->cpuset_constraint;
> > +       }
>
> So there's a single cpuset_constraint shared by all cgroups? Isn't
> that just a global variable then?
>

Yes, it should be a global variable.

> > +
> > +static int oom_victim_write(struct cgroup *cgrp, struct cftype *cft,
> > +                                       u64 val)
> > +{
> > +
> > +        cgroup_lock();
>
> This isn't really doing much, since you don't synchronize on the read
> side (either the file handler or in the OOM killer itself). It might
> be better to just make the value an atomic_t and avoid taking
> cgroup_lock() here.
>

Yes.

> Should we enforce any constraint that a cgroup can never have a lower
> kill priority than its parent? Or a separate "min child priority"
> value, or just make the cgroup's priority be the max of any in its
> path to the root? That would allow you to safely delegate OOM priority
> control to sub cgroups while still controlling relative priorities for
> each subtree.
>

Setting priority to be the maximum of any in its path seems better to me. It 
should make it easier to handle a group of cgroups.

> > +static int oom_cpuset_write(struct cgroup *cont, struct cftype *cft,
> > +                            const char *buffer)
> > +{
> > +       if (buffer[0] == '1' && buffer[1] == 0)
> > +               *(oom_css_from_cgroup(cont))->cpuset_constraint = true;
> > +       else if (buffer[0] == '0' && buffer[1] == 0)
> > +               *(oom_css_from_cgroup(cont))->cpuset_constraint = false;
> > +       else
> > +               return -EINVAL;
> > +       return 0;
> > +}
>
> This can be a u64 write handler that just complains if its input isn't 0 or
> 1.
>

Yes, that would be cleaner.

> > +static struct cftype oom_cgroup_files[] = {
> > +       {
> > +               .name = "victim",
> > +               .read_u64 = oom_victim_read,
> > +               .write_u64 = oom_victim_write,
> > +       },
> > +};
> > +
> > +static struct cftype oom_cgroup_root_files[] = {
> > +       {
> > +               .name = "victim",
> > +               .read_u64 = oom_victim_read,
> > +               .write_u64 = oom_victim_write,
> > +       },
>
> Don't duplicate here - just have disjoint sets of files, and call
> cgroup_add_files(oom_cgroup_root_files) in addition to the regular
> files if it's the root. (Although as I mentioned above, I don't really
> think this is the right place for the cpuset_constraint file)
>

Ok.

Thanks for the detailed review. I have attached the patch with your comments 
incorporated. There is a read-only oom.effective_priority added which is 
computed as the maximum oom.priority along its path.

Thanks
Nikanth

From: Nikanth Karthikesan <knika...@suse.de>

Cgroup based OOM killer controller

Signed-off-by: Nikanth Karthikesan <knika...@suse.de>

---

This is a container group based approach to override the oom killer selection 
without losing all the benefits of the current oom killer heuristics and 
oom_adj interface. This controller helps in specifying a strict order between 
tasks that can be killed during a oom.

It adds a tunable oom.priority to the oom cgroup. The oom killer will kill the 
process using the usual badness value but only within the cgroup with the 
maximum value for oom.effective_priority before killing any process from a 
cgroup with a lesser oom.effective_priority number. The oom.effective_priority 
is calculated as the maximum oom.priority along its path. Oom killing could be 
disabled for a cgroup by setting oom.effective_priority=0.

diff --git a/Documentation/cgroups/oom.txt b/Documentation/cgroups/oom.txt
new file mode 100644
index 0000000..5ef34db
--- /dev/null
+++ b/Documentation/cgroups/oom.txt
@@ -0,0 +1,36 @@
+OOM Killer controller
+--- ------ ----------
+
+The OOM killer kills the process based on a set of heuristics such that only
+minimum amount of work done will be lost, a large amount of memory would be
+recovered and minimum no of processes are killed.
+
+The user can adjust the score used to select the processes to be killed using
+/proc/<pid>/oom_adj. Giving it a high score will increase the likelihood of 
+this process being killed by the oom-killer.  Valid values are in the range 
+-16 to +15, plus the special value -17, which disables oom-killing altogether
+for that process.
+
+But it is very difficult to suggest an order among tasks to be killed during
+Out Of Memory situation. The OOM Killer controller aids in doing that.
+
+USAGE
+-----
+
+Mount the oom controller by passing 'oom' when mounting cgroups. Echo
+a value in oom.priority file to change the order. The oom.effective_priority
+is calculated as the highest oom.priority along its path. The oom killer 
would
+kill all the processes in a cgroup with a higher oom.effective_priority 
before
+killing a process in a cgroup with lower oom.effective_priority value. Among
+those tasks with same oom.effective_priority value, the usual badness
+heuristics would be applied. The /proc/<pid>/oom_adj still helps adjusting 
the
+oom killer score. Also having oom.effective_priority = 0 would disable oom
+killing for the tasks in that cgroup.
+
+Note: If this is used without proper consideration, innocent processes may
+get killed unnecesarily.
+
+CPUSET constrained OOM:
+Setting oom.cpuset_constraint=1 would disable the ordering during a cpuset
+constrained oom. Setting oom.cpuset_constraint=0 would not distinguish
+between a cpuset constrained oom and system wide oom.
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 9c8d31b..6944f99 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -59,4 +59,8 @@ SUBSYS(freezer)
 SUBSYS(net_cls)
 #endif
 
+#ifdef CONFIG_CGROUP_OOM
+SUBSYS(oom)
+#endif
+
 /* */
diff --git a/include/linux/oomcontrol.h b/include/linux/oomcontrol.h
new file mode 100644
index 0000000..8072d7a
--- /dev/null
+++ b/include/linux/oomcontrol.h
@@ -0,0 +1,35 @@
+#ifndef _LINUX_OOMCONTROL_H
+#define _LINUX_OOMCONTROL_H
+
+#ifdef CONFIG_CGROUP_OOM
+
+struct oom_cgroup { 
+       struct cgroup_subsys_state css;
+
+       /*
+        * the order to be victimized for this group
+        */  
+       atomic_t priority;
+
+       /*
+        * the maximum priority along the path from root
+        */  
+       atomic_t effective_priority;
+
+};
+
+/*
+ * disable during cpuset constrained oom
+ */
+extern atomic_t honour_cpuset_constraint;
+
+u64 task_oom_priority(struct task_struct *p);
+
+#else
+
+#define task_oom_priority(p) (1)
+
+static atomic_t honour_cpuset_constraint; /* unused */
+
+#endif
+#endif
diff --git a/init/Kconfig b/init/Kconfig
index 2af8382..99ed0de 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -354,6 +354,15 @@ config CGROUP_DEBUG
 
          Say N if unsure.
 
+config CGROUP_OOM
+       bool "Oom cgroup subsystem"
+       depends on CGROUPS
+       help
+         This provides a cgroup subsystem which aids controlling
+         the order in which tasks whould be killed during
+         out of memory situations.
+       
+
 config CGROUP_NS
        bool "Namespace cgroup subsystem"
        depends on CGROUPS
diff --git a/mm/Makefile b/mm/Makefile
index 72255be..a5d7222 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,3 +33,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
+obj-$(CONFIG_CGROUP_OOM) += oomcontrol.o 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 40ba050..6851da3 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -26,6 +26,7 @@
 #include <linux/module.h>
 #include <linux/notifier.h>
 #include <linux/memcontrol.h>
+#include <linux/oomcontrol.h>
 #include <linux/security.h>
 
 int sysctl_panic_on_oom;
@@ -200,11 +201,13 @@ static inline enum oom_constraint 
constrained_alloc(struct zonelist *zonelist,
  * (not docbooked, we don't want this one cluttering up the manual)
  */
 static struct task_struct *select_bad_process(unsigned long *ppoints,
-                                               struct mem_cgroup *mem)
+                       struct mem_cgroup *mem, int cpuset_constrained)
 {
        struct task_struct *g, *p;
        struct task_struct *chosen = NULL;
        struct timespec uptime;
+       u64 chosenpriority = 1, taskpriority;
+
        *ppoints = 0;
 
        do_posix_clock_monotonic_gettime(&uptime);
@@ -257,10 +260,35 @@ static struct task_struct *select_bad_process(unsigned 
long *ppoints,
                        continue;
 
                points = badness(p, uptime.tv_sec);
-               if (points > *ppoints || !chosen) {
+
+               taskpriority = task_oom_priority(p);
+
+               /*
+                * select this task if
+                * 1. It has higher oom.priority than the previously selected
+                * task, or
+                * 2. It has the same priority as previously selected task but
+                * higher badness score, or
+                * 3. If this is the first task to be considered and it is not
+                * protected from oom killer by setting priority as zero, or
+                * 4. If this is a cpuset constrained oom and
+                * honour_cpuset_constraint is set
+                */
+               if (taskpriority > chosenpriority ||
+
+                       (((taskpriority == chosenpriority) ||
+                         (cpuset_constrained &&
+                               atomic_read(&honour_cpuset_constraint)))
+                        && points > *ppoints) ||
+
+                       (taskpriority && !chosen)) {
+
                        chosen = p;
                        *ppoints = points;
+                       chosenpriority = taskpriority;
+
                }
+               
        } while_each_thread(g, p);
 
        return chosen;
@@ -431,7 +459,7 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, 
gfp_t gfp_mask)
 
        read_lock(&tasklist_lock);
 retry:
-       p = select_bad_process(&points, mem);
+       p = select_bad_process(&points, mem, 0); /* not cpuset constrained */
        if (PTR_ERR(p) == -1UL)
                goto out;
 
@@ -513,7 +541,7 @@ void clear_zonelist_oom(struct zonelist *zonelist, gfp_t 
gfp_mask)
 /*
  * Must be called with tasklist_lock held for read.
  */
-static void __out_of_memory(gfp_t gfp_mask, int order)
+static void __out_of_memory(gfp_t gfp_mask, int order, int 
cpuset_constrained)
 {
        if (sysctl_oom_kill_allocating_task) {
                oom_kill_process(current, gfp_mask, order, 0, NULL,
@@ -528,7 +556,7 @@ retry:
                 * Rambo mode: Shoot down a process and hope it solves whatever
                 * issues we may have.
                 */
-               p = select_bad_process(&points, NULL);
+               p = select_bad_process(&points, NULL, cpuset_constrained);
 
                if (PTR_ERR(p) == -1UL)
                        return;
@@ -569,7 +597,8 @@ void pagefault_out_of_memory(void)
                panic("out of memory from page fault. panic_on_oom is 
selected.\n");
 
        read_lock(&tasklist_lock);
-       __out_of_memory(0, 0); /* unknown gfp_mask and order */
+       /* unknown gfp_mask and order and not cpuset constrained */
+       __out_of_memory(0, 0, 0); 
        read_unlock(&tasklist_lock);
 
        /*
@@ -623,7 +652,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t 
gfp_mask, int order)
                        panic("out of memory. panic_on_oom is selected\n");
                /* Fall-through */
        case CONSTRAINT_CPUSET:
-               __out_of_memory(gfp_mask, order);
+               __out_of_memory(gfp_mask, order, 1);
                break;
        }
 
diff --git a/mm/oomcontrol.c b/mm/oomcontrol.c
new file mode 100644
index 0000000..d572b1f
--- /dev/null
+++ b/mm/oomcontrol.c
@@ -0,0 +1,294 @@
+/*
+ * kernel/cgroup_oom.c - oom handler cgroup.
+ */
+
+#include <linux/cgroup.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/oomcontrol.h>
+#include <asm/atomic.h>
+
+atomic_t honour_cpuset_constraint;
+
+/*
+ * Helper to retrieve oom controller data from cgroup
+ */
+static struct oom_cgroup *oom_css_from_cgroup(struct cgroup *cgrp)
+{
+        return container_of(cgroup_subsys_state(cgrp,
+                                oom_subsys_id), struct oom_cgroup,
+                                css);
+}
+
+u64 task_oom_priority(struct task_struct *p)
+{
+       rcu_read_lock();
+       return atomic_read(&(container_of(task_subsys_state(p,oom_subsys_id),
+                               struct oom_cgroup, css))->effective_priority);
+       rcu_read_unlock();
+}
+
+static struct cgroup_subsys_state *oom_create(struct cgroup_subsys *ss,
+                                                  struct cgroup *cont)
+{
+       struct oom_cgroup *oom_css = kzalloc(sizeof(*oom_css), GFP_KERNEL);
+       struct oom_cgroup *parent;
+       u64 parent_priority, parent_effective_priority;
+
+       if (!oom_css)
+               return ERR_PTR(-ENOMEM);
+
+       /*
+        * if root last/only group to be victimized
+        * else inherit parents value
+        */
+       if (cont->parent == NULL) {
+               atomic_set(&oom_css->priority, 1);
+               atomic_set(&oom_css->effective_priority, 1);
+               atomic_set(&honour_cpuset_constraint, 0);
+       } else {
+               parent = oom_css_from_cgroup(cont->parent);
+               parent_priority = atomic_read(&parent->priority);
+               parent_effective_priority = 
+                       atomic_read(&parent->effective_priority);
+               atomic_set(&oom_css->priority, parent_priority);
+               atomic_set(&oom_css->effective_priority,
+                                       parent_effective_priority);
+       }
+
+       return &oom_css->css;
+}
+
+static void oom_destroy(struct cgroup_subsys *ss, struct cgroup *cont)
+{
+       kfree(cont->subsys[oom_subsys_id]);
+}
+
+static void increase_effective_priority(struct cgroup *cgrp, u64 val)
+{
+       struct cgroup *curr;
+       struct oom_cgroup *oom_css;
+
+       atomic_set( &(oom_css_from_cgroup(cgrp))->effective_priority, val);
+
+       mutex_lock(&oom_subsys.hierarchy_mutex);
+
+       /*
+        * DFS
+        */
+       if (!list_empty(&cgrp->children))
+               curr = list_first_entry(&cgrp->children,
+                                       struct cgroup, sibling);
+       else
+               goto out;
+
+visit_children:
+       oom_css = oom_css_from_cgroup(curr);
+       if (atomic_read(&oom_css->effective_priority) < val)
+               atomic_set(&oom_css->effective_priority, val);
+
+       if (!list_empty(&curr->children)) {
+               curr = list_first_entry(&curr->children,
+                                       struct cgroup, sibling);
+               goto visit_children;
+       } else {
+visit_siblings:
+               if (curr == 0 || cgrp == curr) goto out;
+
+               if (curr->sibling.next != &curr->parent->children) {
+                       curr = list_entry(curr->sibling.next,
+                                               struct cgroup, sibling);
+                       goto visit_children;
+               } else {
+                       curr = curr->parent;
+                       goto visit_siblings;
+               }
+       }
+out:
+       mutex_unlock(&oom_subsys.hierarchy_mutex);
+
+}
+
+static void decrease_effective_priority(struct cgroup *cgrp, u64 val)
+{
+       struct cgroup *curr;
+       u64 priority, effective_priority;
+
+
+       effective_priority = val;
+
+       atomic_set(&oom_css_from_cgroup(cgrp)->effective_priority,
+                                                       effective_priority);
+
+       mutex_lock(&oom_subsys.hierarchy_mutex);
+
+       /*
+        * DFS
+        */
+       if (!list_empty(&cgrp->children))
+               curr = list_first_entry(&cgrp->children,
+                                       struct cgroup, sibling);
+       else
+               goto out;
+
+visit_children:
+       priority = atomic_read(&oom_css_from_cgroup(curr)->priority);
+
+       if (priority > effective_priority) {
+               atomic_set(&oom_css_from_cgroup(curr)->
+                                       effective_priority, priority);
+               effective_priority = priority;
+       } else 
+               atomic_set(&oom_css_from_cgroup(curr)->
+                               effective_priority,effective_priority);
+
+       if (!list_empty(&curr->children)) {
+               curr = list_first_entry(&curr->children,
+                                               struct cgroup, sibling);
+               goto visit_children;
+       } else {
+visit_siblings:
+               if (curr == 0 || cgrp == curr)
+                       goto out;
+
+               if (curr->parent)
+                               effective_priority =
+                         atomic_read(&oom_css_from_cgroup(
+                          curr->parent)->effective_priority);
+               else
+                       effective_priority = val;
+
+               if (curr->sibling.next != &curr->parent->children) {
+                       curr = list_entry(curr->sibling.next,
+                                               struct cgroup, sibling);
+                       goto visit_children;
+               } else {
+                       curr = curr->parent;
+                       goto visit_siblings;
+               }
+       }
+out:
+                               
+               mutex_unlock(&oom_subsys.hierarchy_mutex);
+
+}
+
+static int oom_priority_write(struct cgroup *cgrp, struct cftype *cft,
+                                       u64 val)
+{
+       u64 effective_priority;
+       u64 old_priority;
+       u64 parent_effective_priority = 0;
+
+       old_priority = atomic_read(&(oom_css_from_cgroup(cgrp))->priority);
+       atomic_set(&(oom_css_from_cgroup(cgrp))->priority, val);
+
+       effective_priority = atomic_read(
+                       &(oom_css_from_cgroup(cgrp))->effective_priority);
+
+       /*
+        * propagate new effective_priority to sub cgroups
+        */
+       if (val > effective_priority)
+               increase_effective_priority(cgrp, val);
+       else if (effective_priority == old_priority &&
+                                               val < effective_priority) {
+               struct oom_cgroup *oom_css = NULL;
+               if (cgrp->parent)
+                       oom_css = oom_css_from_cgroup(cgrp->parent);
+               else
+                       oom_css = oom_css_from_cgroup(cgrp);
+
+               if (cgrp->parent)
+                       parent_effective_priority =
+                               atomic_read(&oom_css->effective_priority);
+                       
+               if (cgrp->parent == NULL || 
+                               parent_effective_priority < effective_priority) 
{
+                       /*
+                        * set effective_priority to max of parents effective 
and
+                        * new priority
+                        */
+                       if (cgrp->parent == NULL || effective_priority < val
+                                       || parent_effective_priority < val)
+                               effective_priority = val;
+                       else
+                               effective_priority = parent_effective_priority;
+
+                       decrease_effective_priority(cgrp, effective_priority);
+
+               } 
+       }
+        return 0;
+}
+
+static u64 oom_effective_priority_read(struct cgroup *cgrp, struct cftype 
*cft)
+{
+        u64 priority = atomic_read(&(oom_css_from_cgroup(cgrp))-
>effective_priority);
+
+        return priority;
+}
+
+static u64 oom_priority_read(struct cgroup *cgrp, struct cftype *cft)
+{
+        u64 priority = atomic_read(&(oom_css_from_cgroup(cgrp))->priority);
+
+        return priority;
+}
+
+static int oom_cpuset_write(struct cgroup *cgrp, struct cftype *cft,
+                                       u64 val)
+{
+       if (val > 1)
+               return -EINVAL;
+       atomic_set(&honour_cpuset_constraint, val);
+       return 0;
+}
+
+static u64 oom_cpuset_read(struct cgroup *cgrp, struct cftype *cft)
+{
+        return atomic_read(&honour_cpuset_constraint);
+}
+
+static struct cftype oom_cgroup_files[] = {
+       {
+               .name = "priority",
+               .read_u64 = oom_priority_read,
+               .write_u64 = oom_priority_write,
+       },
+       {
+               .name = "effective_priority",
+               .read_u64 = oom_effective_priority_read,
+       },
+};
+
+static struct cftype oom_cgroup_root_only_files[] = {
+       {
+               .name = "cpuset_constraint",
+               .read_u64 = oom_cpuset_read,
+               .write_u64 = oom_cpuset_write,
+       },
+};
+
+static int oom_populate(struct cgroup_subsys *ss,
+                                struct cgroup *cont)
+{
+       int ret;
+
+       ret = cgroup_add_files(cont, ss, oom_cgroup_files,
+                               ARRAY_SIZE(oom_cgroup_files));
+       if (!ret && cont->parent == NULL) {
+               ret = cgroup_add_files(cont, ss, oom_cgroup_root_only_files,
+                               ARRAY_SIZE(oom_cgroup_root_only_files));
+       }
+
+       return ret;
+}
+
+struct cgroup_subsys oom_subsys = {
+       .name = "oom",
+       .subsys_id = oom_subsys_id,
+       .create = oom_create,
+       .destroy = oom_destroy,
+       .populate = oom_populate,
+};

_______________________________________________
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

_______________________________________________
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

Reply via email to