On Wed, 6 Jun 2007 17:28:13 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:
> Hi, Andrew/Paul, > > Here's the latest version of containerstats ported to v10. Could you > please consider it for inclusion > > Changelog > > 1. Instead of parsing long container path's use the dentry to match the > container for which stats are required. The user space application > opens the container directory and passes the file descriptor, which > is used to determine the container for which stats are required. > This approach was suggested by Paul Menage > > This patch is inspired by the discussion at http://lkml.org/lkml/2007/4/11/187 > and implements per container statistics as suggested by Andrew Morton > in http://lkml.org/lkml/2007/4/11/263. The patch is on top of 2.6.21-mm1 > with Paul's containers v9 patches (forward ported) > > This patch implements per container statistics infrastructure and re-uses > code from the taskstats interface. A new set of container operations are > registered with commands and attributes. It should be very easy to > *extend* per container statistics, by adding members to the containerstats > structure. > > The current model for containerstats is a pull, a push model (to post > statistics on interesting events), should be very easy to add. Currently > user space requests for statistics by passing the container file descriptor. > Statistics about the state of all the tasks in the container is returned to > user space. > > TODO's/NOTE: > > This patch provides an infrastructure for implementing container statistics. > Based on the needs of each controller, we can incrementally add more > statistics, > event based support for notification of statistics, accumulation of taskstats > into container statistics in the future. > > Sample output > > # ./containerstats -C /container/a > sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0 > > # ./containerstats -C /container/ > sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0 > > If the approach looks good, I'll enhance and post the user space utility for > the same > > Feedback, comments, test results are always welcome! > > > > Signed-off-by: Balbir Singh <[EMAIL PROTECTED]> > --- > > Documentation/accounting/containerstats.txt | 27 ++++++++++ > include/linux/Kbuild | 1 > include/linux/container.h | 8 +++ > include/linux/containerstats.h | 70 > ++++++++++++++++++++++++++++ > include/linux/delayacct.h | 11 ++++ > kernel/container.c | 63 +++++++++++++++++++++++++ > kernel/sched.c | 4 + > kernel/taskstats.c | 66 ++++++++++++++++++++++++++ > 8 files changed, 250 insertions(+) I'd have hoped to see containerstats.c in here. > diff -puN /dev/null include/linux/containerstats.h > --- /dev/null 2007-06-01 20:42:04.000000000 +0530 > +++ linux-2.6.22-rc2-mm1-balbir/include/linux/containerstats.h > 2007-06-05 17:23:56.000000000 +0530 > @@ -0,0 +1,70 @@ > +/* containerstats.h - exporting per-container statistics > + * > + * __ Copyright IBM Corporation, 2007 > + * Author Balbir Singh <[EMAIL PROTECTED]> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of version 2.1 of the GNU Lesser General Public License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + */ > + > +#ifndef _LINUX_CONTAINERSTATS_H > +#define _LINUX_CONTAINERSTATS_H > + > +#include <linux/taskstats.h> I don't understand the relationship between containerstats and taskstats. afacit it's using the same genetlink channel? > +/* > + * Data shared between user space and kernel space on a per container > + * basis. This data is shared using taskstats. > + * > + * Most of these states are derived by looking at the task->state value > + * For the nr_io_wait state, a flag in the delay accounting structure > + * indicates that the task is waiting on IO > + * > + * Each member is aligned to a 8 byte boundary. > + */ > +struct containerstats { > + __u64 nr_sleeping; /* Number of tasks sleeping */ > + __u64 nr_running; /* Number of tasks running */ > + __u64 nr_stopped; /* Number of tasks in stopped state */ > + __u64 nr_uninterruptible; /* Number of tasks in uninterruptible */ > + /* state */ > + __u64 nr_io_wait; /* Number of tasks waiting on IO */ > +}; > + > +/* > + * Commands sent from userspace > + * Not versioned. New commands should only be inserted at the enum's end > + * prior to __CONTAINERSTATS_CMD_MAX > + */ > + > +enum { > + CONTAINERSTATS_CMD_UNSPEC = __TASKSTATS_CMD_MAX, /* Reserved */ This seems to mean that the containerstats commands all get renumbered if we add new taskstats commands. That would be bad? > + */ > +int containerstats_build(struct containerstats *stats, struct dentry *dentry) > +{ > + int ret = -EINVAL; > + struct task_struct *g, *p; > + struct container *cont, *root_cont; > + struct container *src_cont; > + int subsys_id; > + struct containerfs_root *root; > + > + /* > + * Validate dentry by checking the superblock operations > + */ > + if (dentry->d_sb->s_op != &container_ops) > + goto err; > + > + ret = 0; > + src_cont = (struct container *)dentry->d_fsdata; Unneeded cast. > + rcu_read_lock(); > + > + for_each_root(root) { > + if (!root->subsys_bits) > + continue; > + root_cont = &root->top_container; > + get_first_subsys(root_cont, NULL, &subsys_id); > + do_each_thread(g, p) { this needs tasklist_lock? > + cont = task_container(p, subsys_id); > + if (cont == src_cont) { > + switch (p->state) { > + case TASK_RUNNING: > + stats->nr_running++; > + break; > + case TASK_INTERRUPTIBLE: > + stats->nr_sleeping++; > + break; > + case TASK_UNINTERRUPTIBLE: > + stats->nr_uninterruptible++; > + break; > + case TASK_STOPPED: > + stats->nr_stopped++; > + break; > + default: > + if (delayacct_is_task_waiting_on_io(p)) > + stats->nr_io_wait++; > + break; > + } > + } > + } while_each_thread(g, p); > + } > + rcu_read_unlock(); > +err: > + return ret; > +} > + > static int cmppid(const void *a, const void *b) > { > return *(pid_t *)a - *(pid_t *)b; > diff -puN kernel/sched.c~containers-taskstats kernel/sched.c > --- linux-2.6.22-rc2-mm1/kernel/sched.c~containers-taskstats 2007-06-05 > 17:21:57.000000000 +0530 > +++ linux-2.6.22-rc2-mm1-balbir/kernel/sched.c 2007-06-05 > 17:21:57.000000000 +0530 > @@ -4280,11 +4280,13 @@ void __sched io_schedule(void) > { > struct rq *rq = &__raw_get_cpu_var(runqueues); > > + delayacct_set_flag(DELAYACCT_PF_BLKIO); > delayacct_blkio_start(); Would it be suitable and appropriate to embed the delayacct_set_flag() call inside delayacct_blkio_start()? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/