Hi Adam,
CPU_DTRACE_ENTRY: that would explain the difference between the two use
cases...
Not generally possible: I guess this is technically true, thanks to
trampolines, tail calls, and JMPL whose source registers have probably
been clobbered. However, we should at least be able to detect vanilla
CALL reliably by decoding %i7 (I've never heard of a compiler clobbering
it). The attached scripts do this. The dtrace script reads in the
instruction at %i7, decodes it and computes the target when possible,
and emits the result at the head of each stack trace. The awk script
(which takes the dtrace script's output as input) then checks whether
i7's target points to the same function as the leaf's caller and decide
whether to filter it out.
Running the two scripts on a process, I got 13.5k total samples, with
with 214 of those unable to read %i7 for some reason; of the valid
samples, 8.2k had %i7's target point to a different function than the leaf.
I've checked source code for a smattering of the samples, and every time
the i7-enhanced stack trace is the correct one.
Maybe this won't ever get baked into dtrace, but at least there's a
workaround now if other folks need it.
Also, it might be good to update the wiki with this gotcha. Missing tail
calls is one thing... silently missing legitimate callers who made a
stack frame is pretty annoying (and makes profiling significantly less
useful if you're hunting for the caller of that expensive function).
Regards,
Ryan
On 7/19/2010 7:55 PM, Adam Leventhal wrote:
Hey Ryan,
We have special logic in pid provider entry and return probes to account for
the fact that they execute in leaf call context. Unfortunately, this is not
generally possible.
Search for uses of CPU_DTRACE_ENTRY on cvs.opensolaris.org.
Adam
On Jul 14, 2010, at 3:05 AM, Ryan Johnson wrote:
Hi all,
There was a discussion about this a while back, but now some new information
has come to light:
Consider the situation where a thread calls a leaf function which does not
create a stack frame (such as atomic_*). The following toy program demonstrates
this scenario:
--- begin backtrace.c ------------
#include<atomic.h>
unsigned int x;
int foo() {
int y = 1;
while(y != 100) {
y = atomic_swap_32(&x, y);
}
return x;
}
int main() {
return 10 + foo();
}
--- end backtrace.c ------------
If we grab stack frames by instrumenting the function directly --
"pid$target::atomic_swap_32:entry {...@counts[ustack()]=count();}" -- then we
get:
libc.so.1`atomic_swap_32
a.out`foo+0x2c
a.out`main+0x4
a.out`_start+0x108
1054138
If, on the other hand, we profile the process -- "profile-7777us/pid==$target/
{...@counts[ustack()]=count(); }" -- and grab stack frames again, we get (among
other far rarer variants):
libc.so.1`atomic_swap_32
a.out`main+0x4
a.out`_start+0x108
1167
Note that this is *not* related to tail calls (there are none here). Since in
both cases $pc is the same, the problem must be due to either the profile
probe's way of saving process context or else the stack walking code doing
something wrong under profiling...
Ideas?
Ryan
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
--
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
profile-7777us /pid==$target/{
self->i7 = uregs[R_I7];
self->i7value = *(unsigned int*)copyin(self->i7, 4);
}
/* attempt to decode a CALL insn and compute its target (see sparcv9 manual) */
profile-7777us /(self->i7value >>30) == 0x1/ {
this->simm = self->i7value*4;
this->simm = (this->simm & 0x80000000)? (this->simm | 0xffffffff00000000l)
: (this->simm & 0xffffffff);
self->true_caller = self->i7 + this->simm;
self->i7 = 0;
self->i7value = 0;
}
profile-7777us /pid==$target && !self->true_caller/ {
@profile[ustack()] = count();
}
profile-7777us /self->true_caller/ {
@profilei7[uaddr(self->true_caller), ustack()] = count();
self->true_caller = 0;
}
END {
printa("%...@d\n--i7-- %A%k\n",@profilei7);
printa("%...@d%k\n",@profile);
}
$1 ~ /^[0-9]+$/ {
# found a number!
print $1
need = 1;
i7 = 0;
next;
}
$1 == "--I7--" {
i7 = $2;
next;
}
need == 1 {
first = $1;
need = 2;
next;
}
need == 2 {
need = 0;
second = $1;
if(i7 == 0) {
# %i7 was useless... skip
print first; print second;
}
else {
split(i7, a, "+0x");
split(first, b, "+0x");
print first
# only print i7's target if not the leaf
if(a[1] != b[1]) { print i7 }
print second
}
next;
}
# default...
{ print }
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org