On Jun 19, 2006, at 6:41 PM, Robert Lor wrote:
Theo Schlossnagle wrote:
Heh. Syscall probes and FBT probes in Dtrace have zero
overhead. User-space probes do have overhead, but it is only a
few instructions (two I think). Besically, the probe points are
replaced by illegal instructions and the kernel infrastructure
for Dtrace will fasttrap the ops and then act. So, it is tiny
tiny overhead. Little enough that it isn't unreasonable to
instrument things like s_lock which are tiny.
Theo, you're a genius. FBT (funciton boundary tracing) probes have
zero overhead (section 4.1) and user-space probes has two
instructions over head (section 4.2). I was incorrect about making
a general zero overhead statement. But it's so close to zero :-)
http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf
The reason that Robert proposes user-space probes (I assume) is
that tracing C functions can be too granular and not conveniently
expose the "right" information to make tracing useful.
Yes, I'm proposing user-space probes (aka User Statically-Defined
Tracing - USDT). USDT provides a high-level abstraction so the
application can expose well defined probes without the user having
to know the detailed implementation. For example, instead of
having to know the function LWLockAcquire(), a well documented
probe called lwlock_acquire with the appropriate args is much more
usable.
I am giving a talk at OSCON this year about PostgreSQL on "big
systems". Big is all relative, but I will be talking about dtrace a
bit and the advantages of running PostgreSQL on Solaris which is what
we ended up doing after some extremely disturbing experiences on
Linux. I was able to track a very acute memory "leak" in pl/perl
(which Neil so kindly fixed) within a few moments -- and this is
without explicit user-space trace points. If there were good user-
space points, I likely wouldn't have had to dig in the source as a
pre-cursor to my dtrace efforts.
The things you might be able to do with user-specific trace points:
o better understand the block scatter (distance of block-level
reads) for a specific query).
o understand lock contention in vastly multiprocessor systems
using plockstat (my hunch is that heavy-weight locks might be better).
o our current box is 4 way opteron, but we have a 16-way T2000
as well.
o report on queries including turn-around time, block-accesses,
lock acquisitions grouped by query for specific time windows.
The nice thing about dtrace is that it requires no "prep" to look at
a problem. When something is acting odd in production, you don't
want to attempt to repeat it in a test environment first. You want
to observe it. Dtrace allows you to dig in "really deep" in
production with an acceptable performance penalty and ask questions
that couldn't be asked before. It is exceptionally clever stuff. Of
all the new "neat stuff" in Solaris 10, it has my vote for coolest
and most useful. I've nailed several production problems (outside
of Postgres) using dtrace with accuracy and efficiency. When Solaris
10u2 is released, we'll be trying Postgres on ZFS, so my rankings may
change :-)
The idea of having intelligently placed dtrace probes in Postrgres
would allow us to deal with postgres as a "first class" app on
Solaris 10 with respect to troubleshooting obtuse production
problems. That, to me, is exciting stuff.
Best regards,
Theo
// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
// Ecelerity: Run with it.
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster