Hi Adam,

CPU_DTRACE_ENTRY: that would explain the difference between the two use cases...

Not generally possible: I guess this is technically true, thanks to trampolines, tail calls, and JMPL whose source registers have probably been clobbered. However, we should at least be able to detect vanilla CALL reliably by decoding %i7 (I've never heard of a compiler clobbering it). The attached scripts do this. The dtrace script reads in the instruction at %i7, decodes it and computes the target when possible, and emits the result at the head of each stack trace. The awk script (which takes the dtrace script's output as input) then checks whether i7's target points to the same function as the leaf's caller and decide whether to filter it out.

Running the two scripts on a process, I got 13.5k total samples, with with 214 of those unable to read %i7 for some reason; of the valid samples, 8.2k had %i7's target point to a different function than the leaf.

I've checked source code for a smattering of the samples, and every time the i7-enhanced stack trace is the correct one.

Maybe this won't ever get baked into dtrace, but at least there's a workaround now if other folks need it.

Also, it might be good to update the wiki with this gotcha. Missing tail calls is one thing... silently missing legitimate callers who made a stack frame is pretty annoying (and makes profiling significantly less useful if you're hunting for the caller of that expensive function).

Regards,
Ryan

On 7/19/2010 7:55 PM, Adam Leventhal wrote:
Hey Ryan,

We have special logic in pid provider entry and return probes to account for 
the fact that they execute in leaf call context. Unfortunately, this is not 
generally possible.

Search for uses of CPU_DTRACE_ENTRY on cvs.opensolaris.org.

Adam

On Jul 14, 2010, at 3:05 AM, Ryan Johnson wrote:

Hi all,

There was a discussion about this a while back, but now some new information 
has come to light:

Consider the situation where a thread calls a leaf function which does not 
create a stack frame (such as atomic_*). The following toy program demonstrates 
this scenario:

--- begin backtrace.c ------------
#include<atomic.h>
unsigned int x;
int foo() {
    int y = 1;
    while(y != 100) {
        y = atomic_swap_32(&x, y);
    }
    return x;
}
int main() {
    return 10 + foo();
}
--- end backtrace.c ------------

If we grab stack frames by instrumenting the function directly -- 
"pid$target::atomic_swap_32:entry {...@counts[ustack()]=count();}" -- then we 
get:

              libc.so.1`atomic_swap_32
              a.out`foo+0x2c
              a.out`main+0x4
              a.out`_start+0x108
          1054138

If, on the other hand, we profile the process -- "profile-7777us/pid==$target/ 
{...@counts[ustack()]=count(); }" -- and grab stack frames again, we get (among 
other far rarer variants):

              libc.so.1`atomic_swap_32
              a.out`main+0x4
              a.out`_start+0x108
             1167

Note that this is *not* related to tail calls (there are none here). Since in 
both cases $pc is the same, the problem must be due to either the profile 
probe's way of saving process context or else the stack walking code doing 
something wrong under profiling...

Ideas?
Ryan

_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

--
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl


profile-7777us /pid==$target/{
    self->i7 = uregs[R_I7];
    self->i7value = *(unsigned int*)copyin(self->i7, 4);
}
/* attempt to decode a CALL insn and compute its target (see sparcv9 manual) */
profile-7777us /(self->i7value >>30) == 0x1/ {
    this->simm = self->i7value*4;
    this->simm = (this->simm & 0x80000000)? (this->simm | 0xffffffff00000000l) 
: (this->simm & 0xffffffff);
    self->true_caller = self->i7 + this->simm;
    self->i7 = 0;
    self->i7value = 0;
}
profile-7777us /pid==$target && !self->true_caller/ {
    @profile[ustack()] = count();
}
profile-7777us /self->true_caller/ {
    @profilei7[uaddr(self->true_caller), ustack()] = count();
    self->true_caller = 0;
}
END {
    printa("%...@d\n--i7-- %A%k\n",@profilei7);
    printa("%...@d%k\n",@profile);
}
$1 ~ /^[0-9]+$/ {
    # found a number!
    print $1
    need = 1;
    i7 = 0;
    next;
}

$1 == "--I7--" {
    i7 = $2;
    next;
}

need == 1 {
    first = $1;
    need = 2;
    next;
}

need == 2 {
    need = 0;
    second = $1;
    if(i7 == 0) {
        # %i7 was useless... skip
        print first; print second;
    }
    else {
        split(i7, a, "+0x");
        split(first, b, "+0x");
        print first
        # only print i7's target if not the leaf
        if(a[1] != b[1]) { print i7 }
        print second
    }
    next;
}

# default...
{ print }
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to