On 2015/11/18 13:41, Namhyung Kim wrote:
On Wed, Nov 18, 2015 at 12:13:08PM +0800, Wangnan (F) wrote:

On 2015/11/17 23:05, Jiri Olsa wrote:
From: Jiri Olsa <jo...@redhat.com>

As reported by Milian, currently for DWARF unwind (both libdw
and libunwind) we display callchain in callee order only.

Adding the support to follow callchain order setup to libunwind
DWARF unwinder, so we could get following output for report:

   $ perf record --call-graph dwarf ls
   ...
   $ perf report --no-children --stdio

     39.26%  ls       libc-2.21.so      [.] __strcoll_l
                  |
                  ---__strcoll_l
                     mpsort_with_tmp
                     mpsort_with_tmp
                     sort_files
                     main
                     __libc_start_main
                     _start
                     0

   $ perf report -g caller --no-children --stdio
     ...
     39.26%  ls       libc-2.21.so      [.] __strcoll_l
                  |
                  ---0
                     _start
                     __libc_start_main
                     main
                     sort_files
                     mpsort_with_tmp
                     mpsort_with_tmp
                     __strcoll_l

Reported-by: Milian Wolff <milian.wo...@kdab.com>
Based-on-patch-by: Milian Wolff <milian.wo...@kdab.com>
Link: http://lkml.kernel.org/n/tip-lmtbeqm403f3luw4jkjev...@git.kernel.org
Signed-off-by: Jiri Olsa <jo...@kernel.org>
---
  tools/perf/util/unwind-libunwind.c | 47 ++++++++++++++++++++++++--------------
  1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/tools/perf/util/unwind-libunwind.c 
b/tools/perf/util/unwind-libunwind.c
index 0ae8844fe7a6..705e1c19f1ea 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
[SNIP]

-               unw_get_reg(&c, UNW_REG_IP, &ip);
-               ret = ip ? entry(ip, ui->thread, cb, arg) : 0;
In original code if ip == 0 entry() won't be called.

+               if (callchain_param.order == ORDER_CALLER)
+                       j = max_stack - i - 1;
+               ret = entry(ips[j], ui->thread, cb, arg);
But in new code event if ips[j] == 0 an entry will be built, which causes
a behavior changes user noticable:

Before this patch:


# perf report --no-children --stdio --call-graph=callee
...
      3.38%  a.out    a.out             [.] funcc
               |
               ---funcc
                  |
                   --2.70%-- funcb
                             funca
                             main
                             __libc_start_main
                             _start

After this patch:

# perf report --no-children --stdio --call-graph=callee
...
      3.38%  a.out    a.out             [.] funcc
               |
               ---funcc
                  |
                  |--2.70%-- funcb
                  |          funca
                  |          main
                  |          __libc_start_main
                  |          _start
                  |
                   --0.68%-- 0


I'm not sure whether we can regard this behavior changing as a bugfix? I
think
there may be some reason the original code explicitly avoid creating an '0'
entry.
I think callchain value being 0 is an error or marker for the end of
callchain.  So it'd be better avoiding 0 entry.

But unfortunately, we have many 0 entries (and broken callchain after
them) with fp recording on optimized binaries.  I think we should omit
those callchains.

Maybe something like this?


diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 5ef90be2a249..22642c5719ab 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1850,6 +1850,15 @@ static int thread__resolve_callchain_sample(struct 
thread *thread,
  #endif
                ip = chain->ips[j];
+ /* callchain value inside zero page means it's broken, stop */
+               if (ip < 4096) {
+                       if (callchain_param.order == ORDER_CALLER) {
+                               callchain_cursor_reset(&callchain_cursor);
+                               continue;
+                       } else
+                               break;
+               }
+
                err = add_callchain_ip(thread, parent, root_al, &cpumode, ip);
if (err)

Then we totally get rid of 0 entries, but how can we explain
the sum of overhead of different branches?

Is it possible to explicitly tell user the place where perf
failed to unwind call stack? For example:

     3.38%  a.out    a.out             [.] funcc
              |
              ---funcc
                 |
                 |--2.70%-- funcb
                 |          funca
                 |          main
                 |          __libc_start_main
                 |          _start
                 |
                  --0.68%-- (unwind failure)


Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to