* Jiri Olsa <jo...@kernel.org> wrote:
> hi, > this patchset adds support to optimize usdt probes on top of 5-byte > nop instruction. > > The generic approach (optimize all uprobes) is hard due to emulating > possible multiple original instructions and its related issues. The > usdt case, which stores 5-byte nop seems much easier, so starting > with that. > > The basic idea is to replace breakpoint exception with syscall which > is faster on x86_64. For more details please see changelog of patch 8. > > The run_bench_uprobes.sh benchmark triggers uprobe (on top of different > original instructions) in a loop and counts how many of those happened > per second (the unit below is million loops). > > There's big speed up if you consider current usdt implementation > (uprobe-nop) compared to proposed usdt (uprobe-nop5): > > # ./benchs/run_bench_uprobes.sh > > usermode-count : 818.386 ± 1.886M/s > syscall-count : 8.923 ± 0.003M/s > --> uprobe-nop : 3.086 ± 0.005M/s > uprobe-push : 2.751 ± 0.001M/s > uprobe-ret : 1.481 ± 0.000M/s > --> uprobe-nop5 : 4.016 ± 0.002M/s > uretprobe-nop : 1.712 ± 0.008M/s > uretprobe-push : 1.616 ± 0.001M/s > uretprobe-ret : 1.052 ± 0.000M/s > uretprobe-nop5 : 2.015 ± 0.000M/s So I had to dig into patch #12 to see the magnitude of the speedup: # current: # usermode-count : 818.836 ± 2.842M/s # syscall-count : 8.917 ± 0.003M/s # uprobe-nop : 3.056 ± 0.013M/s # uprobe-push : 2.903 ± 0.002M/s # uprobe-ret : 1.533 ± 0.001M/s # --> uprobe-nop5 : 1.492 ± 0.000M/s # uretprobe-nop : 1.783 ± 0.000M/s # uretprobe-push : 1.672 ± 0.001M/s # uretprobe-ret : 1.067 ± 0.002M/s # --> uretprobe-nop5 : 1.052 ± 0.000M/s # # after the change: # # usermode-count : 818.386 ± 1.886M/s # syscall-count : 8.923 ± 0.003M/s # uprobe-nop : 3.086 ± 0.005M/s # uprobe-push : 2.751 ± 0.001M/s # uprobe-ret : 1.481 ± 0.000M/s # --> uprobe-nop5 : 4.016 ± 0.002M/s # uretprobe-nop : 1.712 ± 0.008M/s # uretprobe-push : 1.616 ± 0.001M/s # uretprobe-ret : 1.052 ± 0.000M/s # --> uretprobe-nop5 : 2.015 ± 0.000M/s That's a +169% and a +91% speedup - pretty darn impressive! Thanks, Ingo