Chris Sturtivant wrote:
> Shailabh Nagar wrote:
>
>> So here's the sequence of pids being used/hashed etc. Please let
>> me know if my assumptions are correct ?
>>
>> 1. Same listener thread opens 2 sockets
>>
>> On sockfd1, does a bind() using
>> sockaddr_nl.nl_pid = my_pid1
>> On sockfd2, do
Shailabh Nagar wrote:
So here's the sequence of pids being used/hashed etc. Please let
me know if my assumptions are correct ?
1. Same listener thread opens 2 sockets
On sockfd1, does a bind() using
sockaddr_nl.nl_pid = my_pid1
On sockfd2, does a bind() using
sockaddr_nl.nl_pid
Jay Lan wrote:
Shailabh Nagar wrote:
Yes. If no one registers to listen on a particular CPU, data from tasks
exiting on that cpu is not sent out at all.
Shailabh also wrote:
During task exit, kernel goes through each registered listener (small
list) and decides which
one needs to get thi
Shailabh Nagar wrote:
> Yes. If no one registers to listen on a particular CPU, data from tasks
> exiting on that cpu is not sent out at all.
Shailabh also wrote:
> During task exit, kernel goes through each registered listener (small
> list) and decides which
> one needs to get this exit data
jamal wrote:
> Shailabh,
>
> On Tue, 2006-04-07 at 12:37 -0400, Shailabh Nagar wrote:
> [..]
>
>>Here's a strawman for the problem we're trying to solve: get
>>notification of the close of a NETLINK_GENERIC socket that had
>>been used to register interest for some cpus within taskstats.
>>
>> Fro
pj wrote:
> writes the code gets to
Never mind that last incomplete post - I hit Send
when I meant to hit Cancel.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsub
Andrew wrote:
> OK, so we're passing in an ASCII string. Fair enough, I think. Paul would
> know better.
Not sure if I know better - just got stronger opinions.
I like the ASCII here - but this is one of those "he who
writes the code gets to
--
I won't rest till it's the be
Shailabh wrote:
> Perhaps I should use the the other ascii format for specifying cpumasks
> since its more amenable
> to specifying an upper bound for the length of the ascii string and is
> more compact ?
Eh - basically - I don't have a strong opinion either way.
I have a slight esthetic prefe
Shailabh,
On Tue, 2006-04-07 at 12:37 -0400, Shailabh Nagar wrote:
[..]
> Here's a strawman for the problem we're trying to solve: get
> notification of the close of a NETLINK_GENERIC socket that had
> been used to register interest for some cpus within taskstats.
>
> From looking at the netlink
Shailabh Nagar wrote:
jamal wrote:
On Mon, 2006-03-07 at 18:01 -0700, Andrew Morton wrote:
On Mon, 03 Jul 2006 20:54:37 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
What happens when a listener exits without doing deregistration
(or if the listener attempts to register another cpumask w
jamal wrote:
On Mon, 2006-03-07 at 18:01 -0700, Andrew Morton wrote:
On Mon, 03 Jul 2006 20:54:37 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
What happens when a listener exits without doing deregistration
(or if the listener attempts to register another cpumask while a current
registrat
On Mon, 2006-03-07 at 18:01 -0700, Andrew Morton wrote:
> On Mon, 03 Jul 2006 20:54:37 -0400
> Shailabh Nagar <[EMAIL PROTECTED]> wrote:
>
> > > What happens when a listener exits without doing deregistration
> > > (or if the listener attempts to register another cpumask while a current
> > > regi
On Mon, 03 Jul 2006 20:54:37 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
> > What happens when a listener exits without doing deregistration
> > (or if the listener attempts to register another cpumask while a current
> > registration is still active).
> >
> ( Jamal, your thoughts on this prob
Shailabh Nagar wrote:
Andrew Morton wrote:
On Fri, 30 Jun 2006 23:37:10 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
Set aside the implementation details and ask "what is a good design"?
A kernel-wide constant, whether determined at build-time or by a
/proc poke
isn't a nice design.
On Mon, 03 Jul 2006 20:13:36 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
> >>+ if (!s)
> >>+ return -ENOMEM;
> >>+ s->pid = pid;
> >>+ INIT_LIST_HEAD(&s->list);
> >>+
> >>+ down_write(sem);
> >>+
Andrew Morton wrote:
On Mon, 03 Jul 2006 17:11:59 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
static inline void taskstats_exit_alloc(struct taskstats **ptidstats)
{
*ptidstats = NULL;
- if (taskstats_has_listeners())
+ if (!list_empty(&get_cpu_var(listener_list)))
Paul Jackson wrote:
Shailabh wrote:
I don't know if there are buffer overflow
issues in passing a string
I don't know if this comment applies to "the standard netlink way of
passing it up using NLA_STRING", but the way I deal with buffer length
issues in the cpuset code is to insist t
On Mon, 03 Jul 2006 17:11:59 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
> >>So the strawman is:
> >>Listener bind()s to genetlink using its real pid.
> >>Sends a separate "registration" message with cpumask to listen to.
> >>Kernel stores (real) pid and cpumask.
> >>During task exit, kernel
Andrew Morton wrote:
On Fri, 30 Jun 2006 23:37:10 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
Set aside the implementation details and ask "what is a good design"?
A kernel-wide constant, whether determined at build-time or by a /proc poke
isn't a nice design.
Can we permit userspace
Shailabh wrote:
> I don't know if there are buffer overflow
> issues in passing a string
I don't know if this comment applies to "the standard netlink way of
passing it up using NLA_STRING", but the way I deal with buffer length
issues in the cpuset code is to insist that the user code express th
Shailabh wrote:
> Yes. If no one registers to listen on a particular CPU, data from tasks
> exiting on that cpu is not sent out at all.
Excellent.
> So I chose to use the "cpulist" ascii format that has been helpfully
> provided in include/linux/cpumask.h (by whom I wonder :-)
Excellent.
--
Paul Jackson wrote:
Shailabh wrote:
Sends a separate "registration" message with cpumask to listen to.
Kernel stores (real) pid and cpumask.
Question:
=
Ah - good.
So this means that I could configure a system with a fork/exit
intensive, performance critical job on some dedi
Shailabh wrote:
> Sends a separate "registration" message with cpumask to listen to.
> Kernel stores (real) pid and cpumask.
Question:
=
Ah - good.
So this means that I could configure a system with a fork/exit
intensive, performance critical job on some dedicated CPUs, and be able
to c
On Fri, 30 Jun 2006 23:37:10 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
> >Set aside the implementation details and ask "what is a good design"?
> >
> >A kernel-wide constant, whether determined at build-time or by a /proc poke
> >isn't a nice design.
> >
> >Can we permit userspace to send in
Andrew Morton wrote:
On Fri, 30 Jun 2006 22:20:23 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
If we're going to abuse nl_pid then how about we design things so that
nl_pid is treated as two 16-bit words - one word is the start CPU and the
other word is the end cpu?
Or, if a 65536-CPU l
On Fri, 30 Jun 2006 22:20:23 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
> >If we're going to abuse nl_pid then how about we design things so that
> >nl_pid is treated as two 16-bit words - one word is the start CPU and the
> >other word is the end cpu?
> >
> >Or, if a 65536-CPU limit is too s
Andrew Morton wrote:
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
+/*
+ * Per-task exit data sent from the kernel to user space
+ * is tagged by an id based on grouping of cpus.
+ *
+ * If userspace specifies a non-zero P as the nl_pid field of
+ * the sockaddr_nl structure while binding to a n
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
>
> Based on previous discussions, the above solutions can be expanded/modified
> to:
>
> a) allow userspace to listen to a group of cpus instead of all. Multiple
> collection daemons can distribute the load as you pointed out. Doing
> collection
> by cp
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
>
> +/*
> + * Per-task exit data sent from the kernel to user space
> + * is tagged by an id based on grouping of cpus.
> + *
> + * If userspace specifies a non-zero P as the nl_pid field of
> + * the sockaddr_nl structure while binding to a netlink socket,
On Fri, 2006-30-06 at 15:10 -0400, Shailabh Nagar wrote:
>
> Also to get feedback on this kind of usage of the nl_pid field, the
> approach etc.
>
It does not look unreasonable. I think you may have issues when you have
multiple such sockets opened within a single process. But
do some testing
Shailabh Nagar wrote:
> Shailabh Nagar wrote:
>
>
> Index: linux-2.6.17-mm3equiv/kernel/taskstats.c
> ===
> --- linux-2.6.17-mm3equiv.orig/kernel/taskstats.c 2006-06-30
> 11:57:14.0 -0400
> +++ linux-2.6.17-mm3equiv/ke
Shailabh Nagar wrote:
> Andrew,
>
> Based on previous discussions, the above solutions can be expanded/modified
> to:
>
> a) allow userspace to listen to a group of cpus instead of all. Multiple
> collection daemons can distribute the load as you pointed out. Doing
> collection
> by cpu groups r
Andrew Morton wrote:
> On Thu, 29 Jun 2006 09:44:08 -0700
> Paul Jackson <[EMAIL PROTECTED]> wrote:
>
>
>>>You're probably correct on that model. However, it all depends on the actual
>>>workload. Are people who actually have large-CPU (>256) systems actually
>>>running fork()-heavy things like web
On Thu, 2006-29-06 at 23:01 -0400, Shailabh Nagar wrote:
> jamal wrote:
> >
> >
> >>As long as the user is willing to pay the price in terms of memory,
> >>
> >>
> >
> >You may wanna draw a line to the upper limit - maybe even allocate slab
> >space.
> >
> >
> Didn't quite understand...cou
jamal wrote:
On Thu, 2006-29-06 at 21:11 -0400, Shailabh Nagar wrote:
Andrew Morton wrote:
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
[..]
So if we can detect the silly sustained-high-exit-rate scenario then it
seems to me quite legitimate to do some aggressive data reducti
Andrew wrote:
> Nah. Stick it in the same cacheline as tasklist_lock (I'm amazed that
> we've continued to get away with a global lock for that).
Yes - a bit amazing. But no sense compounding the problem now.
We shouldn't be adding global locks/modifiable data in the
fork/exit code path if we c
On Thu, 29 Jun 2006 19:25:26 -0700
Paul Jackson <[EMAIL PROTECTED]> wrote:
> Andrew wrote:
> > Like, a single message which says "20,000 sub-millisecond-runtime tasks
> > exited in the past second" or something.
>
> System wide accumulation of such data in the exit() code path still
> risks being
Andrew wrote:
> Like, a single message which says "20,000 sub-millisecond-runtime tasks
> exited in the past second" or something.
System wide accumulation of such data in the exit() code path still
risks being a bottleneck, just a bit later on.
I'm more inclined now to look for ways to disable c
On Thu, 2006-29-06 at 21:11 -0400, Shailabh Nagar wrote:
> Andrew Morton wrote:
>
> >Shailabh Nagar <[EMAIL PROTECTED]> wrote:
[..]
> >So if we can detect the silly sustained-high-exit-rate scenario then it
> >seems to me quite legitimate to do some aggressive data reduction on that.
> >Like, a s
Andrew Morton wrote:
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
The rates (or upper bounds) that are being discussed here, as of now,
are 1000 exits/sec/CPU for
1024 CPU systems. That would be roughly 1M exits/system *
248Bytes/message = 248 MB/sec.
I think it's worth differentiating
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
>
> The rates (or upper bounds) that are being discussed here, as of now,
> are 1000 exits/sec/CPU for
> 1024 CPU systems. That would be roughly 1M exits/system *
> 248Bytes/message = 248 MB/sec.
I think it's worth differentiating between burst rates an
jamal wrote:
On Thu, 2006-29-06 at 16:01 -0400, Shailabh Nagar wrote:
Jamal,
any thoughts on the flow control capabilities of netlink that apply here
? Usage of the connection is to supply statistics data to userspace.
if you want reliable delivery, then you cant just depend on as
On Thu, 2006-29-06 at 18:13 -0400, Shailabh Nagar wrote:
>
> And now I remember why I didn't go down that path earlier. Relayfs is one-way
> kernel->user and lacks the ability to send query commands from user space
> that we need. Either we would need to send commands up through a separate
> int
On Thu, 2006-29-06 at 16:01 -0400, Shailabh Nagar wrote:
>
> Jamal,
> any thoughts on the flow control capabilities of netlink that apply here
> ? Usage of the connection is to supply statistics data to userspace.
>
if you want reliable delivery, then you cant just depend on async events
from
Andrew Morton wrote:
>>Yup...the per-cpu, high speed requirements are up relayfs' alley, unless
>>Jamal or netlink folks
>>are planning something (or can shed light on) how large flows can be
>>managed over netlink. I suspect
>>this discussion has happened before :-)
>
>
> yeah.
And now I rem
Shailabh wrote:
> How much memory do these 1024 CPU machines have
From:
http://www.hpcwire.com/hpc/653963.html (May 12, 2006)
SGI has already shipped more than a dozen SGI systems with
over a terabyte of memory and about a hundred systems of half
a terabyte or larger. But the n
Andrew Morton wrote:
On Thu, 29 Jun 2006 15:10:31 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
I agree, and I'm viewing this as blocking the taskstats merge. Because if
this _is_ a problem then it's a big one because fixing it will be
intrusive, and might well involve userspace-visible
On Thu, 29 Jun 2006 15:43:41 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
> >Could be so. But we need to understand how significant the impact of this
> >will be in practice.
> >
> >We could find, once this is deployed is real production environments on
> >large machines that the data loss is
Andrew Morton wrote:
On Thu, 29 Jun 2006 15:10:31 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
I agree, and I'm viewing this as blocking the taskstats merge. Because if
this _is_ a problem then it's a big one because fixing it will be
intrusive, and might well involve userspace-visible
On Thu, 29 Jun 2006 15:10:31 -0400
Shailabh Nagar <[EMAIL PROTECTED]> wrote:
> >I agree, and I'm viewing this as blocking the taskstats merge. Because if
> >this _is_ a problem then it's a big one because fixing it will be
> >intrusive, and might well involve userspace-visible changes.
> >
> >
Shailabh wrote:
> First off, just a reminder that this is inherently a netlink flow
> control issue...which was being exacerbated earlier by taskstats
> decision to send per-tgid data (no longer the case).
>
> But I'd like to know whats our target here ? How many messages
> per second do we want t
Andrew Morton wrote:
On Thu, 29 Jun 2006 09:44:08 -0700
Paul Jackson <[EMAIL PROTECTED]> wrote:
You're probably correct on that model. However, it all depends on the actual
workload. Are people who actually have large-CPU (>256) systems actually
running fork()-heavy things like webservers o
52 matches
Mail list logo