Hi Arnaldo, Jiri, A few weeks ago, you had asked if I had more requests for the perf tool. I have put together the following list to improve the usability of the perf tool, at least for our usage. Nothing is very big just small improvements here and there.
1/ perf stat interval printing Today, the timestamp printed via perf stat -I is relative to the start of the measurements. It would be beneficial to also support a mode where it is using a source which can be synchronized with other traces or profiles. For instance, using gettimeofday() or clocktime(MONOTONIC). 2/ perf report event grouping if you do: $ perf record -e '{ cycles, instructions, branches }' .... $ perf report It will show the 3 profiles together which is VERY useful. However the output is confusing because it is hard to tell which % corresponds to which event. I know it is cmdline order. But it would be good to have a header in the columns to point to the events, instead of guessing. A few times, I had to revert to perf report --header-only to figure out the event order. I discovered the 'i' key on the function profile. But it is still hard to find the events, especially if you passed many of them. 3/ annotate output of loops Percent│401f00: xor %eax,%eax │401f02: test %edi,%edi │401f04: ↓ jle 401f2b <triad+0x2b> │401f06: nopw %cs:0x0(%rax,%rax,1) 34.20 │401f1┌─→ movsd (%rcx,%rax,8),%xmm1 14.60 │401f1│: mulsd %xmm0,%xmm1 33.24 │401f1│: addsd (%rdx,%rax,8),%xmm1 9.98 │401f1│: movsd %xmm1,(%rsi,%rax,8) 0.10 │401f2│: add $0x1,%rax 0.03 │401f2├── cmp %eax,%edi 7.84 │401f2└──↑ jg 401f10 <triad+0x10> │401f2b: mov $0x18,%eax │401f30: ← retq The loop arrows cut through the code addresses. That is annoying! 4/ sorting and event groups If I do: $ perf record -e '{cycles,instructions}' $ perf report It will sort the samples based on the first (leader) of the group. Yet here all events are sampling events. You could as well sort with the second event. But I don't think perf report support sort order on multiple events. Both are from the same category: syms (or ip). Right now, I would have to collect another profile: $ perf record -e '{instructions,cycles}' $ perf report 5) cgroups Today, to measure multiple group events in the same cgroup, you need to do: $ perf stat -e cycles,branch,instructions -G foo,foo,foo ..... You need to specify the cgroup N-times for N-events. It would be good to support a mode where you'd have to specify the cgroup once: $ perf stat -e cycles,branches,instructions --cgroup-all foo,bar Would measure cycles,branches,instructions for both cgroup foo and bar. 6) perf script ip vs. callchain I already submitted this request separately. It is about providing a way to generate the callchain separately from the ip in perf script. Right now, they are lumped together which is not always useful. Also right now, the callchain is a multi-line output which is not useful. perf script should stick with one line per sample, at least when symbolization is off. We have examples of that with brstack. I may have more requests but I wanted to start with these for now. Thanks for your efforts.