* Rik van Riel <r...@redhat.com> wrote:

> > If, on the other hand, you're just going to remotely sample the 
> > in-memory context, that sounds good.
> 
> It's the latter.
> 
> If you look at /proc/<pid>/{stack,syscall,wchan} and other files, 
> you will see we already have ways to determine, from in memory 
> content, where a program is running at a certain point in time.
> 
> In fact, the timer interrupt based accounting does a similar thing. 
> It has a task examine its own in-memory state to figure out what it 
> was doing before the timer interrupt happened.
> 
> The kernel side stack pointer is probably enough to tell us whether 
> a task is active in kernel space, on an irq stack, or (maybe) in 
> user space. Not convinced about the latter, we may need to look at 
> the same state the RCU code keeps track of to see what mode a task 
> is in...
> 
> I am looking at the code to see what locks we need to grab.
> 
> I suspect the runqueue lock may be enough, to ensure that the task 
> struct, and stack do not go away while we are looking at them.

That will be enough, especially if you get to the task reference via 
rq->curr.

> We cannot take the lock_trace(task) from irq context, and we 
> probably do not need to anyway, since we do not care about a precise 
> stack trace for the task.

So one worry with this and similar approaches of statistically 
detecting user mode would be the fact that on the way out to 
user-space we don't really destroy the previous call trace - we just 
pop off the stack (non-destructively), restore RIPs and are gone.

We'll need that percpu flag I suspect.

And once we have the flag, we can get rid of the per syscall RCU 
callback as well, relatively easily: with CMPXCHG (in 
synchronize_rcu()!) we can reliably sample whether a CPU is in user 
mode right now, while the syscall entry/exit path does not use any 
atomics, we can just use a simple MOV.

Once we observe 'user mode', then we have observed quiescent state and 
synchronize_rcu() can continue. If we've observed kernel mode we can 
frob the remote task's TIF_ flags to make it go into a quiescent state 
publishing routine on syscall-return.

The only hard requirement of this scheme from the RCU synchronization 
POV is that all kernel contexts that may touch RCU state need to flip 
this flag reliably to 'kernel mode': i.e. all irq handlers, traps, 
NMIs and all syscall variants need to do this.

But once it's there, it's really neat.

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to