dtrace/systemtap options

Andrei Barbu Fri, 23 May 2008 19:32:52 -0700

Hi,


I've been thinking about all of the various ways to get either Dtrace
or SystemTap working. I've come up with 4 options:

1
Dtrace would need to sit entirely out of the kernel. About 3k lines of
simple code to rewrite, FSF lawyers are ok with this option. Since it
must sit in userland and it has a VM as well as several supporting
processes it needs the RPC mechanism. We can interrupt the kernel at
the probe, suspend that task, switch into a different task (we'll call
it the dtrace kernel task), have that one head to userland and give
the cpu over to the dtrace process. We shouldn't be letting other
processes run while dtrace does, so that they don't mess up the system
state; priority inversion handles the task switching between dtrace
tasks, no need for the scheduler. We also can't let it do everything,
like call arbitrary processes because it's so special to the scheduler
and rather sensitive in terms of kernel state. There would be some
places in the kernel that we simply can't switch into userland
immediately from, even if we implement this by having a dedicated
kernel thread for dtrace. Finding all of those places may be tricky.
This option basically adds a special type of thread to mach, is rather
invasive, although it allows modification of the kernel state and that
may well be a desired option. This is the only way to support dtrace
that I can see. Since this is R/W it may well be good for other tasks
as well, potentially moving other things, like the Linux drivers, into
userland.
2
This one is ugly, and I'd rather not do it. It involves adding module
loading support to do the same thing that SystemTap does on Linux.
Compile code, add a prelude so that it becomes a regular module, load
it in and use kprobes to call it.
3
It would be far nicer if SystemTap generated code could sit in
userland. But perhaps we don't want something as invasive as the
dtrace solution since SystemTap is strictly for R/O observation. We
could on a probe point execute code to copy, using copy on write for
efficiency, the data that the module will need and the relevant kernel
state. Then when we get to a point in the kernel that is amiable to
servicing these probe points we can switch into userland, SystemTap
code can run as any regular process, and then return when ready.
There's a nice solution, although a strange one, to telling the kernel
which memory we need a copy of: DWARF. It's designed to let debuggers
know exactly this and we could just dump something like libelf into
SystemTap; and some extra ELF support in the kernel, thereby getting
this mechanism mostly for free. This seems to be the least invasive
solution. It also seems the most flexible, and most hurd-ish. It also
doesn't add any major mechanisms to mach. On the downside this is
inherently R/O, although that does increase security significantly.
4
We could do 1 with SystemTap instead of dtrace thereby giving it R/W
access. The downside is that we sacrifice a lot of security, although
we do gain a lot of power while doing it. I think if the trouble for 1
is worth it, and it may well be, we should do dtrace instead of
SystemTap.

I prefer #3. It all depends, do we want userspace instrumentation to
have the possibility of R/W access?

As things progress more information will be here:
http://csclub.uwaterloo.ca/~abarbu/hurd/


Andrei

dtrace/systemtap options

Reply via email to