On Jun 19, 2006, at 6:41 PM, Robert Lor wrote:

Theo Schlossnagle wrote:


Heh. Syscall probes and FBT probes in Dtrace have zero overhead. User-space probes do have overhead, but it is only a few instructions (two I think). Besically, the probe points are replaced by illegal instructions and the kernel infrastructure for Dtrace will fasttrap the ops and then act. So, it is tiny tiny overhead. Little enough that it isn't unreasonable to instrument things like s_lock which are tiny.

Theo, you're a genius. FBT (funciton boundary tracing) probes have zero overhead (section 4.1) and user-space probes has two instructions over head (section 4.2). I was incorrect about making a general zero overhead statement. But it's so close to zero :-)

http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf


The reason that Robert proposes user-space probes (I assume) is that tracing C functions can be too granular and not conveniently expose the "right" information to make tracing useful.

Yes, I'm proposing user-space probes (aka User Statically-Defined Tracing - USDT). USDT provides a high-level abstraction so the application can expose well defined probes without the user having to know the detailed implementation. For example, instead of having to know the function LWLockAcquire(), a well documented probe called lwlock_acquire with the appropriate args is much more usable.

I am giving a talk at OSCON this year about PostgreSQL on "big systems". Big is all relative, but I will be talking about dtrace a bit and the advantages of running PostgreSQL on Solaris which is what we ended up doing after some extremely disturbing experiences on Linux. I was able to track a very acute memory "leak" in pl/perl (which Neil so kindly fixed) within a few moments -- and this is without explicit user-space trace points. If there were good user- space points, I likely wouldn't have had to dig in the source as a pre-cursor to my dtrace efforts.

The things you might be able to do with user-specific trace points:
o better understand the block scatter (distance of block-level reads) for a specific query). o understand lock contention in vastly multiprocessor systems using plockstat (my hunch is that heavy-weight locks might be better). o our current box is 4 way opteron, but we have a 16-way T2000 as well. o report on queries including turn-around time, block-accesses, lock acquisitions grouped by query for specific time windows.

The nice thing about dtrace is that it requires no "prep" to look at a problem. When something is acting odd in production, you don't want to attempt to repeat it in a test environment first. You want to observe it. Dtrace allows you to dig in "really deep" in production with an acceptable performance penalty and ask questions that couldn't be asked before. It is exceptionally clever stuff. Of all the new "neat stuff" in Solaris 10, it has my vote for coolest and most useful. I've nailed several production problems (outside of Postgres) using dtrace with accuracy and efficiency. When Solaris 10u2 is released, we'll be trying Postgres on ZFS, so my rankings may change :-)

The idea of having intelligently placed dtrace probes in Postrgres would allow us to deal with postgres as a "first class" app on Solaris 10 with respect to troubleshooting obtuse production problems. That, to me, is exciting stuff.

Best regards,

Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
// Ecelerity: Run with it.



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to