Hi Alexei, On Thu, Dec 5, 2013 at 12:40 PM, Alexei Starovoitov <a...@plumgrid.com> wrote: >> On Tue, Dec 3, 2013 at 4:01 PM, Andi Kleen <a...@firstfloor.org> wrote: >>> >>> Can you do some performance comparison compared to e.g. ktap? >>> How much faster is it? > > Did simple ktap test with 1M alloc_skb/kfree_skb toy test from earlier email: > trace skb:kfree_skb { > if (arg2 == 0x100) { > printf("%x %x\n", arg1, arg2) > } > } > 1M skb alloc/free 350315 (usecs) > > baseline without any tracing: > 1M skb alloc/free 145400 (usecs) > > then equivalent bpf test: > void filter(struct bpf_context *ctx) > { > void *loc = (void *)ctx->regs.dx; > if (loc == 0x100) { > struct sk_buff *skb = (struct sk_buff *)ctx->regs.si; > char fmt[] = "skb %p loc %p\n"; > bpf_trace_printk(fmt, sizeof(fmt), (long)skb, (long)loc, 0); > } > } > 1M skb alloc/free 183214 (usecs) > > so with one 'if' condition the difference ktap vs bpf is 350-145 vs 183-145 > > obviously ktap is an interpreter, so it's not really fair. > > To make it really unfair I did: > trace skb:kfree_skb { > if (arg2 == 0x100 || arg2 == 0x200 || arg2 == 0x300 || arg2 == 0x400 > || > arg2 == 0x500 || arg2 == 0x600 || arg2 == 0x700 || arg2 == 0x800 > || > arg2 == 0x900 || arg2 == 0x1000) { > printf("%x %x\n", arg1, arg2) > } > } > 1M skb alloc/free 484280 (usecs) > > and corresponding bpf: > void filter(struct bpf_context *ctx) > { > void *loc = (void *)ctx->regs.dx; > if (loc == 0x100 || loc == 0x200 || loc == 0x300 || loc == 0x400 || > loc == 0x500 || loc == 0x600 || loc == 0x700 || loc == 0x800 || > loc == 0x900 || loc == 0x1000) { > struct sk_buff *skb = (struct sk_buff *)ctx->regs.si; > char fmt[] = "skb %p loc %p\n"; > bpf_trace_printk(fmt, sizeof(fmt), (long)skb, (long)loc, 0); > } > } > 1M skb alloc/free 185660 (usecs) > > the difference is bigger now: 484-145 vs 185-145 > There have big differences for compare arg2(in ktap) with direct register access(ctx->regs.dx).
The current argument fetching(arg2 in above testcase) implementation in ktap is very inefficient, see ktap/interpreter/lib_kdebug.c:kp_event_getarg. The only way to speedup is kernel tracing code change, let external tracing module access event field not through list lookup. This work is not started yet. :) Of course, I'm not saying this argument fetching issue is the performance root cause compared with bpf and Systemtap, the bytecode executing speed wouldn't compare with raw machine code. (There have a plan to use JIT in ktap core, like luajit project, but it need some time to work on) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/