from:"Ravi Bangoria"

Re: [PATCH v6 4/7] perf annotate: Do not ignore call instruction with indirect target

2016-09-20 Thread Ravi Bangoria

Hi Arnaldo,

On Monday 19 September 2016 09:14 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Aug 19, 2016 at 06:29:35PM +0530, Ravi Bangoria escreveu:
>> Do not ignore call instruction with indirect target when its already
>> identified as a call. This is an extension of commit e8ea1561952b
>> ("perf annotate: Use raw form for register indirect call instructions")
>> to generalize annotation for all instructions with indirect calls.
>>
>> This is needed for certain powerpc call instructions that use address
>> in a register (such as bctrl, btarl, ...).
>>
>> Apart from that, when kcore is used to disassemble function, all call
>> instructions were ignored. This patch will fix it as a side effect by
>> not ignoring them. For example,
>>
>> Before (with kcore):
>>mov%r13,%rdi
>>callq  0x811a7e70
>>  ^ jmpq   64
>>mov%gs:0x7ef41a6e(%rip),%al
>>
>> After (with kcore):
>>mov%r13,%rdi
>>  > callq  0x811a7e70
>>  ^ jmpq   64
>>mov%gs:0x7ef41a6e(%rip),%al
> Ok, makes sense, but then now I have the -> and can't press enter to go
> to that function, in fact for the case I'm using as a test, the
> vsnprintf kernel function, I get:
>
>│ 56:   test   %al,%al 
>   
>  ▒
>│ ↓ je 81  
>   
>  ▒
>│   lea-0x38(%rbp),%rsi
>   
>  ▒
>│   mov%r15,%rdi   
>   
>  ▒
>│ → callq  0x993e3230 
>
> That 0x993e3230 should've been resolved to:
>
> [root@jouet ~]# grep 993e3230 /proc/kallsyms 
> 993e3230 t format_decode
>
> Trying to investigate why it doesn't...

I investigated this.

If this example is with kcore, then it's expected. Because, perf annotate does
not inspect kallsyms when it can't find symbol name from disassembly itself.

For example, disassembly of  finish_task_switch,

with kcore:

810cf1b0:   mov$0x1,%esi
810cf1b5:   mov$0x4,%edi
810cf1ba:   callq  0x811aced0
810cf1bf:   andb   $0xfb,0x4c4(%rbx)
810cf1c6:   jmpq   0x810cf0e9
810cf1cb:   mov%rbx,%rsi
810cf1ce:   mov%r13,%rdi
810cf1d1:   callq  0x811a7e70
810cf1d6:   jmpq   0x810cf0e4

with debuginfo:

810cf1b0:   mov$0x1,%esi
810cf1b5:   mov$0x4,%edi
810cf1ba:   callq  811aced0 <___perf_sw_event>
810cf1bf:   andb   $0xfb,0x4c4(%rbx)
810cf1c6:   jmpq   810cf0e9 
810cf1cb:   mov%rbx,%rsi
810cf1ce:   mov%r13,%rdi
810cf1d1:   callq  811a7e70 <__perf_event_task_sched_in>
ffff810cf1d6:   jmpq   810cf0e4 

call__parse tries to find symbol from angle brackets which is not present
in case of kcore.

-Ravi


> - Arnaldo
>
>> Suggested-by: Michael Ellerman 
>> [Suggested about 'bctrl' instruction]
>> Signed-off-by: Ravi Bangoria 
>> ---
>> Changes in v6:
>>   - No change
>>
>>  tools/perf/util/annotate.c | 8 ++--
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index ea07588..a05423b 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -81,16 +81,12 @@ static int call__parse(struct ins_operands *ops, const 
>> char *norm_arch)
>>  return ops->target.name == NULL ? -1 : 0;
>>  
>>  indirect_call:
>> -tok = strchr(endptr, '(');
>> -if (tok != NULL) {
>> +tok = strchr(endptr, '*');
>> +if (tok == NULL) {
>>  ops->target.addr = 0;
>>  return 0;
>>  }
>>  
>> -tok = strchr(endptr, '*');
>> -if (tok == NULL)
>> -return -1;
>> -
>>  ops->target.addr = strtoull(tok + 1, NULL, 16);
>>  return 0;
>>  }
>> -- 
>> 2.5.5

[PATCH v7 0/6] perf annotate: Cross arch support + few fixes

2016-09-21 Thread Ravi Bangoria

Currently Perf annotate support code navigation (branches and calls)
only when run on the same architecture where perf.data was recorded.
But, for example, record on powerpc server and annotate on client's
x86 desktop is not supported.

This patchset adds supports for that.

Example:

  Record on powerpc:
  $ ./perf record -a

  Report -> Annotate on x86:
  $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc

Changes in v7:
  - Using string for normalized arch names instread of macros.(i.e.
removed patch 1/7 of v6)
  - In patch 1/6, make norm_arch as global var instead of passing them
to each parser.
  - In patch 1/6 and 6/6, little bit change in initializing instruction
list.
  - patch 4/7 of v6 is already accepted. Removed that in v7.
  - Address other review comments.
  - Added more examples in patch descriptions.

v6 link:
  https://lkml.org/lkml/2016/8/19/411

Kim, I don't have arm test machine. Can you please help me to test
this on arm.


Kim Phillips (1):
  perf annotate: cross arch annotate support fixes for ARM

Naveen N. Rao (1):
  perf annotate: Add support for powerpc

Ravi Bangoria (4):
  perf annotate: Add cross arch annotate support
  perf annotate: Show raw form for jump instruction with indirect target
  perf annotate: Support jump instruction with target as second operand
  perf annotate: Fix jump target outside of function address range

 tools/perf/builtin-top.c  |   2 +-
 tools/perf/ui/browsers/annotate.c |   8 +-
 tools/perf/ui/gtk/annotate.c  |   2 +-
 tools/perf/util/annotate.c| 259 --
 tools/perf/util/annotate.h|   8 +-
 5 files changed, 232 insertions(+), 47 deletions(-)

-- 
2.5.5

[PATCH v7 1/6] perf annotate: Add cross arch annotate support

2016-09-21 Thread Ravi Bangoria

Change current data structures and function to enable cross arch
annotate.

Current perf implementation does not support cross arch annotate.
To make it truly cross arch, instruction table of all arch should
be present in perf binary. And use appropriate table based on arch
where perf.data was recorded.

Record on arm:
  $ ./perf record -a

Report -> Annotate on x86:
  $ ./perf report -i perf.data.arm --vmlinux vmlinux.arm

Signed-off-by: Ravi Bangoria 
---
Changes in v7:
  - Make norm_arch as global var instead of passing them to each parser.
  - Address other review comments.

 tools/perf/builtin-top.c  |   2 +-
 tools/perf/ui/browsers/annotate.c |   3 +-
 tools/perf/ui/gtk/annotate.c  |   2 +-
 tools/perf/util/annotate.c| 151 --
 tools/perf/util/annotate.h|   3 +-
 5 files changed, 134 insertions(+), 27 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 4007857..41ecdd6 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -129,7 +129,7 @@ static int perf_top__parse_source(struct perf_top *top, 
struct hist_entry *he)
return err;
}
 
-   err = symbol__disassemble(sym, map, 0);
+   err = symbol__disassemble(sym, map, 0, NULL);
if (err == 0) {
 out_assign:
top->sym_filter_entry = he;
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 4c18271..214a14a 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map 
*map,
  (nr_pcnt - 1);
}
 
-   err = symbol__disassemble(sym, map, sizeof_bdl);
+   err = symbol__disassemble(sym, map, sizeof_bdl,
+ perf_evsel__env_arch(evsel));
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 42d3199..c127aba 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -167,7 +167,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct 
map *map,
if (map->dso->annotate_warned)
return -1;
 
-   err = symbol__disassemble(sym, map, 0);
+   err = symbol__disassemble(sym, map, 0, perf_evsel__env_arch(evsel));
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index aeb5a44..816aa2c 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -21,10 +21,13 @@
 #include 
 #include 
 #include 
+#include 
+#include "../arch/common.h"
 
 const char *disassembler_style;
 const char *objdump_path;
 static regex_t  file_lineno;
+static char*norm_arch;
 
 static struct ins *ins__find(const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
@@ -66,10 +69,8 @@ static int call__parse(struct ins_operands *ops, struct map 
*map)
 
name++;
 
-#ifdef __arm__
-   if (strchr(name, '+'))
+   if (!strcmp(norm_arch, "arm") && strchr(name, '+'))
return -1;
-#endif
 
tok = strchr(name, '>');
if (tok == NULL)
@@ -252,16 +253,12 @@ static int mov__parse(struct ins_operands *ops, struct 
map *map __maybe_unused)
return -1;
 
target = ++s;
-#ifdef __arm__
+
comment = strchr(s, ';');
-#else
-   comment = strchr(s, '#');
-#endif
+   if (comment == NULL)
+   comment = strchr(s, '#');
 
-   if (comment != NULL)
-   s = comment - 1;
-   else
-   s = strchr(s, '\0') - 1;
+   s = (comment != NULL) ? comment - 1 : strchr(s, '\0') - 1;
 
while (s > target && isspace(s[0]))
--s;
@@ -364,14 +361,92 @@ bool ins__is_ret(const struct ins *ins)
return ins->ops == &ret_ops;
 }
 
-static struct ins instructions[] = {
+static struct ins instructions_x86[] = {
{ .name = "add",   .ops  = &mov_ops, },
{ .name = "addl",  .ops  = &mov_ops, },
{ .name = "addq",  .ops  = &mov_ops, },
{ .name = "addw",  .ops  = &mov_ops, },
{ .name = "and",   .ops  = &mov_ops, },
-#ifdef __arm__
-   { .name = "b", .ops  = &jump_ops, }, // might also be a call
+   { .name = "bts",   .ops  = &mov_ops, },
+   { .name = "call",  .ops  = &call_ops, },
+   { .name = "callq", .ops  = &call_ops, },
+   { .name = "cmp",   .ops  = &mov_ops, },
+

[PATCH v7 4/6] perf annotate: Support jump instruction with target as second operand

2016-09-21 Thread Ravi Bangoria

Current perf is not able to parse jump instruction when second operand
contains target address. Arch like powerpc has such instructions. For
example, 'bne  cr7,0xc00f6154'.

objdump o/p:
  c00f6140:   ld r9,1032(r31)
  c00f6144:   cmpdi  cr7,r9,0
  c00f6148:   bnecr7,0xc00f6154
  c00f614c:   ld r9,2312(r30)
  c00f6150:   stdr9,1032(r31)
  c00f6154:   ld r9,88(r31)

Before patch:
 ld r9,1032(r31)
 cmpdi  cr7,r9,0
  v  bne3ff09f2c
 ld r9,2312(r30)
 stdr9,1032(r31)
  74:ld r9,88(r31)

After patch:
 ld r9,1032(r31)
 cmpdi  cr7,r9,0
  v  bne74
 ld r9,2312(r30)
 stdr9,1032(r31)
  74:ld r9,88(r31)

Signed-off-by: Ravi Bangoria 
---
Changes in v7:
  - Added example in description

 tools/perf/util/annotate.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1ccf26a..a9dbac1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -122,8 +122,12 @@ bool ins__is_call(const struct ins *ins)
 static int jump__parse(struct ins_operands *ops, struct map *map 
__maybe_unused)
 {
const char *s = strchr(ops->raw, '+');
+   const char *c = strchr(ops->raw, ',');
 
-   ops->target.addr = strtoull(ops->raw, NULL, 16);
+   if (c++ != NULL)
+   ops->target.addr = strtoull(c, NULL, 16);
+   else
+   ops->target.addr = strtoull(ops->raw, NULL, 16);
 
if (s++ != NULL)
ops->target.offset = strtoull(s, NULL, 16);
-- 
2.5.5

[PATCH v7 2/6] perf annotate: Add support for powerpc

2016-09-21 Thread Ravi Bangoria

From: "Naveen N. Rao" 

Current perf can disassemble annotated function but it does not have
parsing logic for powerpc instructions. So all navigation options are
not available for powerpc.

Apart from that, Powerpc has long list of branch instructions and
hardcoding them in table appears to be error-prone. So, add function
to find instruction instead of creating table. This function dynamically
create table (list of 'struct ins'), and instead of creating object
every time, first check if list already contain object for that
instruction.

Signed-off-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
Changes in v7:
  - Little bit change in initializing instruction list.

 tools/perf/util/annotate.c | 112 +
 1 file changed, 112 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 816aa2c..5aa72d9 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -531,6 +531,11 @@ static struct ins instructions_arm[] = {
{ .name = "retq",  .ops  = &ret_ops, },
 };
 
+struct instructions_powerpc {
+   struct ins *ins;
+   struct list_head list;
+};
+
 static int ins__key_cmp(const void *name, const void *insp)
 {
const struct ins *ins = insp;
@@ -546,6 +551,111 @@ static int ins__cmp(const void *a, const void *b)
return strcmp(ia->name, ib->name);
 }
 
+static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head,
+const char *name, struct ins_ops *ops)
+{
+   struct instructions_powerpc *ins_powerpc;
+   struct ins *ins;
+
+   ins = zalloc(sizeof(struct ins));
+   if (!ins)
+   return NULL;
+
+   ins_powerpc = zalloc(sizeof(struct instructions_powerpc));
+   if (!ins_powerpc)
+   goto out_free_ins;
+
+   ins->name = strdup(name);
+   if (!ins->name)
+   goto out_free_ins_power;
+
+   ins->ops = ops;
+   ins_powerpc->ins = ins;
+   list_add_tail(&(ins_powerpc->list), &(head->list));
+
+   return ins;
+
+out_free_ins_power:
+   zfree(&ins_powerpc);
+out_free_ins:
+   zfree(&ins);
+   return NULL;
+}
+
+static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head,
+   const char *name)
+{
+   struct instructions_powerpc *pos;
+
+   list_for_each_entry(pos, &head->list, list) {
+   if (!strcmp(pos->ins->name, name))
+   return pos->ins;
+   }
+   return NULL;
+}
+
+static struct ins *ins__find_powerpc(const char *name)
+{
+   int i;
+   struct ins *ins;
+   struct ins_ops *ops;
+   static struct instructions_powerpc head = {
+   .list = LIST_HEAD_INIT(head.list),
+   };
+
+   /*
+* - Interested only if instruction starts with 'b'.
+* - Few start with 'b', but aren't branch instructions.
+*/
+   if (name[0] != 'b' ||
+   !strncmp(name, "bcd", 3)   ||
+   !strncmp(name, "brinc", 5) ||
+   !strncmp(name, "bper", 4))
+   return NULL;
+
+   /*
+* Return if we already have object of 'struct ins' for this instruction
+*/
+   ins = list_search__ins_powerpc(&head, name);
+   if (ins)
+   return ins;
+
+   ops = &jump_ops;
+
+   i = strlen(name) - 1;
+   if (i < 0)
+   return NULL;
+
+   /* ignore optional hints at the end of the instructions */
+   if (name[i] == '+' || name[i] == '-')
+   i--;
+
+   if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) {
+   /*
+* if the instruction ends up with 'l' or 'la', then
+* those are considered 'calls' since they update LR.
+* ... except for 'bnl' which is branch if not less than
+* and the absolute form of the same.
+*/
+   if (strcmp(name, "bnl") && strcmp(name, "bnl+") &&
+   strcmp(name, "bnl-") && strcmp(name, "bnla") &&
+   strcmp(name, "bnla+") && strcmp(name, "bnla-"))
+   ops = &call_ops;
+   }
+   if (name[i] == 'r' && name[i-1] == 'l')
+   /*
+* instructions ending with 'lr' are considered to be
+* return instructions
+*/
+   ops = &ret_ops;
+
+   /*
+* Add instruction to list so next time no need to
+* allocate memory for i

[PATCH v7 5/6] perf annotate: Fix jump target outside of function address range

2016-09-21 Thread Ravi Bangoria

If jump target is outside of function range, perf is not handling it
correctly. Especially when target address is lesser than function start
address, target offset will be negative. But, target address declared
to be unsigned, converts negative number into 2's complement. See below
example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
lesser than function start address(34cf0).

34ac0 - 34cf0 = -0x230 = 0xfdd0

Objdump output:

  00034cf0 <__sigaction>:
  __GI___sigaction():
34cf0: lea-0x20(%rdi),%eax
34cf3: cmp-bashx1,%eax
34cf6: jbe34d00 <__sigaction+0x10>
34cf8: jmpq   34ac0 <__GI___libc_sigaction>
34cfd: nopl   (%rax)
34d00: mov0x386161(%rip),%rax# 3bae68 <_DYNAMIC+0x2e8>
34d07: movl   -bashx16,%fs:(%rax)
34d0e: mov-bashx,%eax
34d13: retq

perf annotate before applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
v  jbe10
v  jmpq   fdd0
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashx,%eax
   retq

perf annotate after applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
v  jbe10
^  jmpq   34ac0 <__GI___libc_sigaction>
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashxffff,%eax
   retq

Signed-off-by: Ravi Bangoria 
---
Changes in v7:
  - No changes

 tools/perf/ui/browsers/annotate.c |  5 +++--
 tools/perf/util/annotate.c| 14 +-
 tools/perf/util/annotate.h|  5 +++--
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 214a14a..2d04bdf 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -215,7 +215,7 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
ui_browser__set_color(browser, color);
if (dl->ins && dl->ins->ops->scnprintf) {
if (ins__is_jump(dl->ins)) {
-   bool fwd = dl->ops.target.offset > 
(u64)dl->offset;
+   bool fwd = dl->ops.target.offset > dl->offset;
 
ui_browser__write_graph(browser, fwd ? 
SLSMG_DARROW_CHAR :

SLSMG_UARROW_CHAR);
@@ -245,7 +245,8 @@ static bool disasm_line__is_valid_jump(struct disasm_line 
*dl, struct symbol *sy
 {
if (!dl || !dl->ins || !ins__is_jump(dl->ins)
|| !disasm_line__has_offset(dl)
-   || dl->ops.target.offset >= symbol__size(sym))
+   || dl->ops.target.offset < 0
+   || dl->ops.target.offset >= (s64)symbol__size(sym))
return false;
 
return true;
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index a9dbac1..fc44dd1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -129,10 +129,12 @@ static int jump__parse(struct ins_operands *ops, struct 
map *map __maybe_unused)
else
ops->target.addr = strtoull(ops->raw, NULL, 16);
 
-   if (s++ != NULL)
+   if (s++ != NULL) {
ops->target.offset = strtoull(s, NULL, 16);
-   else
-   ops->target.offset = UINT64_MAX;
+   ops->target.offset_avail = true;
+   } else {
+   ops->target.offset_avail = false;
+   }
 
return 0;
 }
@@ -140,7 +142,7 @@ static int jump__parse(struct ins_operands *ops, struct map 
*map __maybe_unused)
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   if (!ops->target.addr)
+   if (!ops->target.addr || ops->target.offset < 0)
return ins__raw_scnprintf(ins, bf, size, ops);
 
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
@@ -1373,9 +1375,11 @@ static int symbol__parse_objdump_line(struct symbol 
*sym, struct map *map,
if (dl == NULL)
return -1;
 
-   if (dl->ops.target.offset == UINT64_MAX)
+   if (!disasm_line__has_offset(dl)) {
dl->ops.target.offset = dl->ops.target.addr -
map__rip_2objdump(map, sym->start);
+   dl->ops.target.offset_avail = true;
+   }
 
/* kcore has no symbols, so add the call target name */
if (dl->ins && ins__is_call(dl->ins) && !dl->ops.target.n

[PATCH v7 3/6] perf annotate: Show raw form for jump instruction with indirect target

2016-09-21 Thread Ravi Bangoria

For jump instructions that does not include target address as direct
operand, use raw value for that. This is needed for certain powerpc
jump instructions that use target address in a register (such as bctr,
btar, ...).

Before:
 ld r12,32088(r12)
 mtctr  r12
  v  bctr   ca2c
 stdr2,24(r1)
 addis  r12,r2,-1

After:
 ld r12,32088(r12)
 mtctr  r12
  v  bctr
 stdr2,24(r1)
 addis  r12,r2,-1

Suggested-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
---
Changes in v7:
  - Added example in description

 tools/perf/util/annotate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 5aa72d9..1ccf26a 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -136,6 +136,9 @@ static int jump__parse(struct ins_operands *ops, struct map 
*map __maybe_unused)
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
+   if (!ops->target.addr)
+   return ins__raw_scnprintf(ins, bf, size, ops);
+
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
 }
 
-- 
2.5.5

[PATCH v7 6/6] perf annotate: cross arch annotate support fixes for ARM

2016-09-21 Thread Ravi Bangoria

From: Kim Phillips 

For ARM we remove the list that contains non-arm insns, and
instead add more maintainable branch instruction regex logic.

Signed-off-by: Kim Phillips 
Signed-off-by: Ravi Bangoria 
---
Changes in v7:
  - Little bit change in initializing instruction list.

 tools/perf/util/annotate.c | 177 +
 1 file changed, 65 insertions(+), 112 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index fc44dd1..83d5ac8 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -28,6 +28,7 @@ const char*disassembler_style;
 const char *objdump_path;
 static regex_t  file_lineno;
 static char*norm_arch;
+static regex_t arm_call_insn, arm_jump_insn;
 
 static struct ins *ins__find(const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
@@ -449,98 +450,7 @@ static struct ins instructions_x86[] = {
{ .name = "retq",  .ops  = &ret_ops, },
 };
 
-static struct ins instructions_arm[] = {
-   { .name = "add",   .ops  = &mov_ops, },
-   { .name = "addl",  .ops  = &mov_ops, },
-   { .name = "addq",  .ops  = &mov_ops, },
-   { .name = "addw",  .ops  = &mov_ops, },
-   { .name = "and",   .ops  = &mov_ops, },
-   { .name = "b", .ops  = &jump_ops, }, /* might also be a call */
-   { .name = "bcc",   .ops  = &jump_ops, },
-   { .name = "bcs",   .ops  = &jump_ops, },
-   { .name = "beq",   .ops  = &jump_ops, },
-   { .name = "bge",   .ops  = &jump_ops, },
-   { .name = "bgt",   .ops  = &jump_ops, },
-   { .name = "bhi",   .ops  = &jump_ops, },
-   { .name = "bl",.ops  = &call_ops, },
-   { .name = "bls",   .ops  = &jump_ops, },
-   { .name = "blt",   .ops  = &jump_ops, },
-   { .name = "blx",   .ops  = &call_ops, },
-   { .name = "bne",   .ops  = &jump_ops, },
-   { .name = "bts",   .ops  = &mov_ops, },
-   { .name = "call",  .ops  = &call_ops, },
-   { .name = "callq", .ops  = &call_ops, },
-   { .name = "cmp",   .ops  = &mov_ops, },
-   { .name = "cmpb",  .ops  = &mov_ops, },
-   { .name = "cmpl",  .ops  = &mov_ops, },
-   { .name = "cmpq",  .ops  = &mov_ops, },
-   { .name = "cmpw",  .ops  = &mov_ops, },
-   { .name = "cmpxch", .ops  = &mov_ops, },
-   { .name = "dec",   .ops  = &dec_ops, },
-   { .name = "decl",  .ops  = &dec_ops, },
-   { .name = "imul",  .ops  = &mov_ops, },
-   { .name = "inc",   .ops  = &dec_ops, },
-   { .name = "incl",  .ops  = &dec_ops, },
-   { .name = "ja",.ops  = &jump_ops, },
-   { .name = "jae",   .ops  = &jump_ops, },
-   { .name = "jb",.ops  = &jump_ops, },
-   { .name = "jbe",   .ops  = &jump_ops, },
-   { .name = "jc",.ops  = &jump_ops, },
-   { .name = "jcxz",  .ops  = &jump_ops, },
-   { .name = "je",.ops  = &jump_ops, },
-   { .name = "jecxz", .ops  = &jump_ops, },
-   { .name = "jg",.ops  = &jump_ops, },
-   { .name = "jge",   .ops  = &jump_ops, },
-   { .name = "jl",.ops  = &jump_ops, },
-   { .name = "jle",   .ops  = &jump_ops, },
-   { .name = "jmp",   .ops  = &jump_ops, },
-   { .name = "jmpq",  .ops  = &jump_ops, },
-   { .name = "jna",   .ops  = &jump_ops, },
-   { .name = "jnae",  .ops  = &jump_ops, },
-   { .name = "jnb",   .ops  = &jump_ops, },
-   { .name = "jnbe",  .ops  = &jump_ops, },
-   { .name = "jnc",   .ops  = &jump_ops, },
-   { .name = "jne",   .ops  = &jump_ops, },
-   { .name = "jng",   .ops  = &jump_ops, },
-   { .name = "jnge",  .ops  = &jump_ops, },
-   { .name = "jnl",   .ops  = &jump_ops, },
-   { .name = "jnle",  .ops  = &jump_ops, },
-   { .name = "jno",   .ops  = &jump_ops, },
-   { .name = "jnp",   .ops  = &jump_ops, },
-   { .name = "jns",   .ops  = &jump_ops, },
-   { .name = "jnz",   .ops  = &jump_ops, },
-   { .name = "jo",.ops  = &jump_ops, },
-   { .name = "jp",.ops  = &jump_ops, },
-   { .name = "jpe",   .ops  = &jump_ops, },
-   { .name = &qu

Re: [PATCH v7 0/6] perf annotate: Cross arch support + few fixes

2016-09-21 Thread Ravi Bangoria

On Thursday 22 September 2016 01:04 AM, Kim Phillips wrote:
> On Wed, 21 Sep 2016 21:17:50 +0530
> Ravi Bangoria  wrote:
>
>> Kim, I don't have arm test machine. Can you please help me to test
>> this on arm.
> This works for me:  hitting return on return instructions yields
> "Invalid jump offset", but I'll get that later.

Thanks Kim.

Hmm.. so, ins__find_arm does not contain logic for return instructions. 
Navigation
with return instruction is working fine for x86 and powerpc.

-Ravi

> Thanks,
>
> Kim
>

Re: [PATCH v7 0/6] perf annotate: Cross arch support + few fixes

2016-09-27 Thread Ravi Bangoria

Hello,

Any updates?

Arnaldo, if patches looks good to you, can you please pickup them.

-Ravi

On Wednesday 21 September 2016 09:17 PM, Ravi Bangoria wrote:
> Currently Perf annotate support code navigation (branches and calls)
> only when run on the same architecture where perf.data was recorded.
> But, for example, record on powerpc server and annotate on client's
> x86 desktop is not supported.
>
> This patchset adds supports for that.
>
> Example:
>
>   Record on powerpc:
>   $ ./perf record -a
>
>   Report -> Annotate on x86:
>   $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc
>
> Changes in v7:
>   - Using string for normalized arch names instread of macros.(i.e.
> removed patch 1/7 of v6)
>   - In patch 1/6, make norm_arch as global var instead of passing them
> to each parser.
>   - In patch 1/6 and 6/6, little bit change in initializing instruction
> list.
>   - patch 4/7 of v6 is already accepted. Removed that in v7.
>   - Address other review comments.
>   - Added more examples in patch descriptions.
>
> v6 link:
>   https://lkml.org/lkml/2016/8/19/411
>
> Kim, I don't have arm test machine. Can you please help me to test
> this on arm.
>
>
> Kim Phillips (1):
>   perf annotate: cross arch annotate support fixes for ARM
>
> Naveen N. Rao (1):
>   perf annotate: Add support for powerpc
>
> Ravi Bangoria (4):
>   perf annotate: Add cross arch annotate support
>   perf annotate: Show raw form for jump instruction with indirect target
>   perf annotate: Support jump instruction with target as second operand
>   perf annotate: Fix jump target outside of function address range
>
>  tools/perf/builtin-top.c  |   2 +-
>  tools/perf/ui/browsers/annotate.c |   8 +-
>  tools/perf/ui/gtk/annotate.c  |   2 +-
>  tools/perf/util/annotate.c| 259 
> --
>  tools/perf/util/annotate.h|   8 +-
>  5 files changed, 232 insertions(+), 47 deletions(-)
>

[RFC] perf uprobe: Skip prologue if program compiled without optimization

2016-07-28 Thread Ravi Bangoria

Function prologue prepares stack and registers before executing function
logic. When target program is compiled without optimization, function
parameter information is only valid after prologue. When we probe entrypc
of the function, and try to record function parameter, it contains
garbage value.

For example,
  $ vim test.c
#include 

void foo(int i)
{
   printf("i: %d\n", i);
}

int main()
{
  foo(42);
  return 0;
}

  $ gcc -g test.c -o test
  $ objdump -dl test | less
foo():
/home/ravi/test.c:4
  400536:   55  push   %rbp
  400537:   48 89 e5mov%rsp,%rbp
  40053a:   48 83 ec 10 sub-bashx10,%rsp
  40053e:   89 7d fcmov%edi,-0x4(%rbp)
/home/ravi/test.c:5
  400541:   8b 45 fcmov-0x4(%rbp),%eax
...
...
main():
/home/ravi/test.c:9
  400558:   55  push   %rbp
  400559:   48 89 e5mov%rsp,%rbp
/home/ravi/test.c:10
  40055c:   bf 2a 00 00 00  mov-bashx2a,%edi
  400561:   e8 d0 ff ff ff  callq  400536 
/home/ravi/test.c:11

  $ ./perf probe -x ./test 'foo i'
  $ cat /sys/kernel/debug/tracing/uprobe_events
 p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32

  $ ./perf record -e probe_test:foo ./test
  $ ./perf script
 test  5778 [001]  4918.562027: probe_test:foo: (400536) i=0

Here variable 'i' is passed via stack which is pushed on stack at
0x40053e. But we are probing at 0x400536.

To resolve this issues, we need to probe on next instruction after
prologue. gdb and systemtap also does same thing. I've implemented
this patch based on approach systemtap has used.

After applying patch:

  $ ./perf probe -x ./test 'foo i'
  $ cat /sys/kernel/debug/tracing/uprobe_events
p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32

  $ ./perf record -e probe_test:foo ./test
  $ ./perf script
test  6300 [001]  5877.879327: probe_test:foo: (400541) i=42

No need to skip prologue for optimized case since debug info is correct
for each instructions for -O2 -g. For more details please visit:
https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6

Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/probe-finder.c | 156 +
 1 file changed, 156 insertions(+)

diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index f2d9ff0..a788b9c2 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -892,6 +892,161 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, 
struct probe_finder *pf)
return die_walk_lines(sp_die, probe_point_lazy_walker, pf);
 }
 
+static bool var_has_loclist(Dwarf_Die *die)
+{
+   Dwarf_Attribute loc;
+   int tag = dwarf_tag(die);
+
+   if (tag != DW_TAG_formal_parameter &&
+   tag != DW_TAG_variable)
+   return false;
+
+   return (dwarf_attr_integrate(die, DW_AT_location, &loc) &&
+   dwarf_whatform(&loc) == DW_FORM_sec_offset);
+}
+
+/*
+ * For any object in given CU whose DW_AT_location is a location list,
+ * target program is compiled with optimization.
+ */
+static bool optimized_target(Dwarf_Die *die)
+{
+   Dwarf_Die tmp_die;
+
+   if (var_has_loclist(die))
+   return true;
+
+   if (!dwarf_child(die, &tmp_die) && optimized_target(&tmp_die))
+   return true;
+
+   if (!dwarf_siblingof(die, &tmp_die) && optimized_target(&tmp_die))
+   return true;
+
+   return false;
+}
+
+static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines,
+   Dwarf_Addr pf_addr, unsigned long *entrypc_idx)
+{
+   unsigned long i;
+   Dwarf_Addr addr;
+
+   for (i = 0; i < nr_lines; i++) {
+   if (dwarf_lineaddr(dwarf_onesrcline(lines, i), &addr))
+   return false;
+
+   if (addr == pf_addr) {
+   *entrypc_idx = i;
+   return true;
+   }
+   }
+   return false;
+}
+
+static bool get_postprologue_addr(unsigned long entrypc_idx,
+ Dwarf_Lines *lines,
+ unsigned long nr_lines,
+ Dwarf_Addr highpc,
+ Dwarf_Addr *postprologue_addr)
+{
+   unsigned long i;
+   int entrypc_lno, lno;
+   Dwarf_Line *line;
+   Dwarf_Addr addr;
+   bool p_end;
+
+   /* entrypc_lno is actual source line number */
+   line = dwarf_onesrcline(lines, entrypc_idx);
+   if (dwarf_lineno(line, &entrypc_lno))
+   return false;
+
+   for (i = entrypc_idx; i < n

[PATCH] perf uprobe: Skip prologue if program compiled without optimization

2016-08-01 Thread Ravi Bangoria

Function prologue prepares stack and registers before executing function
logic. When target program is compiled without optimization, function
parameter information is only valid after prologue. When we probe entrypc
of the function, and try to record function parameter, it contains
garbage value.

For example,
  $ vim test.c
#include 

void foo(int i)
{
   printf("i: %d\n", i);
}

int main()
{
  foo(42);
  return 0;
}

  $ gcc -g test.c -o test
  $ objdump -dl test | less
foo():
/home/ravi/test.c:4
  400536:   55  push   %rbp
  400537:   48 89 e5mov%rsp,%rbp
  40053a:   48 83 ec 10 sub-bashx10,%rsp
  40053e:   89 7d fcmov%edi,-0x4(%rbp)
/home/ravi/test.c:5
  400541:   8b 45 fcmov-0x4(%rbp),%eax
...
...
main():
/home/ravi/test.c:9
  400558:   55  push   %rbp
  400559:   48 89 e5mov%rsp,%rbp
/home/ravi/test.c:10
  40055c:   bf 2a 00 00 00  mov-bashx2a,%edi
  400561:   e8 d0 ff ff ff  callq  400536 
/home/ravi/test.c:11

  $ ./perf probe -x ./test 'foo i'
  $ cat /sys/kernel/debug/tracing/uprobe_events
 p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32

  $ ./perf record -e probe_test:foo ./test
  $ ./perf script
 test  5778 [001]  4918.562027: probe_test:foo: (400536) i=0

Here variable 'i' is passed via stack which is pushed on stack at
0x40053e. But we are probing at 0x400536.

To resolve this issues, we need to probe on next instruction after
prologue. gdb and systemtap also does same thing. I've implemented
this patch based on approach systemtap has used.

After applying patch:

  $ ./perf probe -x ./test 'foo i'
  $ cat /sys/kernel/debug/tracing/uprobe_events
p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32

  $ ./perf record -e probe_test:foo ./test
  $ ./perf script
test  6300 [001]  5877.879327: probe_test:foo: (400541) i=42

No need to skip prologue for optimized case since debug info is correct
for each instructions for -O2 -g. For more details please visit:
https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6

Signed-off-by: Ravi Bangoria 
---
Changes wrt RFC:
  - Skip prologue only when function parameter is specified
  - Notify user about skipping prologue

 tools/perf/util/probe-finder.c | 164 +
 1 file changed, 164 insertions(+)

diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index f2d9ff0..8efa7f2 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -892,6 +892,169 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, 
struct probe_finder *pf)
return die_walk_lines(sp_die, probe_point_lazy_walker, pf);
 }
 
+static bool var_has_loclist(Dwarf_Die *die)
+{
+   Dwarf_Attribute loc;
+   int tag = dwarf_tag(die);
+
+   if (tag != DW_TAG_formal_parameter &&
+   tag != DW_TAG_variable)
+   return false;
+
+   return (dwarf_attr_integrate(die, DW_AT_location, &loc) &&
+   dwarf_whatform(&loc) == DW_FORM_sec_offset);
+}
+
+/*
+ * For any object in given CU whose DW_AT_location is a location list,
+ * target program is compiled with optimization.
+ */
+static bool optimized_target(Dwarf_Die *die)
+{
+   Dwarf_Die tmp_die;
+
+   if (var_has_loclist(die))
+   return true;
+
+   if (!dwarf_child(die, &tmp_die) && optimized_target(&tmp_die))
+   return true;
+
+   if (!dwarf_siblingof(die, &tmp_die) && optimized_target(&tmp_die))
+   return true;
+
+   return false;
+}
+
+static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines,
+   Dwarf_Addr pf_addr, unsigned long *entrypc_idx)
+{
+   unsigned long i;
+   Dwarf_Addr addr;
+
+   for (i = 0; i < nr_lines; i++) {
+   if (dwarf_lineaddr(dwarf_onesrcline(lines, i), &addr))
+   return false;
+
+   if (addr == pf_addr) {
+   *entrypc_idx = i;
+   return true;
+   }
+   }
+   return false;
+}
+
+static bool get_postprologue_addr(unsigned long entrypc_idx,
+ Dwarf_Lines *lines,
+ unsigned long nr_lines,
+ Dwarf_Addr highpc,
+ Dwarf_Addr *postprologue_addr)
+{
+   unsigned long i;
+   int entrypc_lno, lno;
+   Dwarf_Line *line;
+   Dwarf_Addr addr;
+   bool p_end;
+
+   /* entrypc_lno is actual source line number */
+   line = dwarf_onesrcline(lines, entrypc_idx);
+   if (dwa

Re: [RFC] perf uprobe: Skip prologue if program compiled without optimization

2016-08-01 Thread Ravi Bangoria



On Saturday 30 July 2016 08:34 AM, Masami Hiramatsu wrote:

On Thu, 28 Jul 2016 20:01:51 +0530
Ravi Bangoria  wrote:


Function prologue prepares stack and registers before executing function
logic. When target program is compiled without optimization, function
parameter information is only valid after prologue. When we probe entrypc
of the function, and try to record function parameter, it contains
garbage value.

Right! :)


Thanks Masami for review.

I've sent patch with changes you suggested. Please review it.

-Ravi

Re: [PATCH] perf/powerpc: Fix build-test failure

2016-09-07 Thread Ravi Bangoria

Hi Arnaldo,

Can you please pick up this.

-Ravi

On Wednesday 31 August 2016 01:33 PM, Ravi Bangoria wrote:
> 'make -C tools/perf build-test' is failing with below log for poewrpc.
>
>   In file included from 
> /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf.h:15:0,
>from util/cpumap.h:8,
>from util/env.c:1:
>   /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf-sys.h:23:56:
>   fatal error: ../../arch/powerpc/include/uapi/asm/unistd.h: No such file or 
> directory
>   compilation terminated.
>
> I bisected it and found it's failing from commit ad430729ae00 ("Remove:
> kernel unistd*h files from perf's MANIFEST, not used").
>
> Header file '../../arch/powerpc/include/uapi/asm/unistd.h' is included
> only for powerpc in tools/perf/perf-sys.h.
>
> By looking closly at commit history, I found little weird thing:
>
> Commit f2d9cae9ea9e ("perf powerpc: Use uapi/unistd.h to fix build
> error") replaced 'asm/unistd.h' with 'uapi/asm/unistd.h'
>
> Commit d2709c7ce4c5 ("perf: Make perf build for x86 with UAPI
> disintegration applied") removes all arch specific 'uapi/asm/unistd.h'
> for all archs and adds generic .
>
> Commit f0b9abfb0446 ("Merge branch 'linus' into perf/core") again
> includes 'uapi/asm/unistd.h' for powerpc. Don't know how exactly this
> happened as this change is not part of commit also.
>
> Signed-off-by: Ravi Bangoria 
> ---
>  tools/perf/perf-sys.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/tools/perf/perf-sys.h b/tools/perf/perf-sys.h
> index 7ed72a4..e4b717e 100644
> --- a/tools/perf/perf-sys.h
> +++ b/tools/perf/perf-sys.h
> @@ -20,7 +20,6 @@
>  #endif
>
>  #ifdef __powerpc__
> -#include "../../arch/powerpc/include/uapi/asm/unistd.h"
>  #define CPUINFO_PROC {"cpu"}
>  #endif
>

Re: [PATCH v6 0/7] perf: Cross arch annotate + few miscellaneous fixes

2016-09-07 Thread Ravi Bangoria

Hello,

Any update on this?

-Ravi

On Friday 19 August 2016 06:29 PM, Ravi Bangoria wrote:
> Currently Perf annotate support code navigation (branches and calls)
> only when run on the same architecture where perf.data was recorded.
> But, for example, record on powerpc server and annotate on client's
> x86 desktop is not supported.
>
> This patchset enables cross arch annotate. Currently I've used x86
> and arm instructions which are already available and added support
> for powerpc.
>
> Additionally this patch series also contains few other related fixes.
>
> Patches are prepared on top of acme/perf/core and tested it with x86
> and powerpc only.
>
> Note for arm:
> I don't have arm test machine. As suggested by Russell in one of the
> review comment, I've copied all instructions from default table to
> arm table. This way it want break tool on arm but cleanup is needed
> for x86 specific instructions added in arm table.
>
> Example:
>
>   Record on powerpc:
>   $ ./perf record -a
>
>   Report -> Annotate on x86:
>   $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc
>
> Changes in v6:
>   - Instead of adding only those instructions defined in #ifdef __arm__,
> add all instructions from default table to arm table.
>
> v5 link:
>   https://lkml.org/lkml/2016/8/19/35
>
> Naveen N. Rao (1):
>   perf annotate: Add support for powerpc
>
> Ravi Bangoria (6):
>   perf: Define macro for normalized arch names
>   perf annotate: Add cross arch annotate support
>   perf annotate: Do not ignore call instruction with indirect target
>   perf annotate: Show raw form for jump instruction with indirect target
>   perf annotate: Support jump instruction with target as second operand
>   perf annotate: Fix jump target outside of function address range
>
>  tools/perf/arch/common.c   |  36 ++--
>  tools/perf/arch/common.h   |  11 ++
>  tools/perf/builtin-top.c   |   2 +-
>  tools/perf/ui/browsers/annotate.c  |   8 +-
>  tools/perf/ui/gtk/annotate.c   |   2 +-
>  tools/perf/util/annotate.c | 330 
> +++--
>  tools/perf/util/annotate.h |  10 +-
>  tools/perf/util/unwind-libunwind.c |   4 +-
>  8 files changed, 327 insertions(+), 76 deletions(-)
>

[PATCH] perf/powerpc: Fix build-test failure

2016-08-31 Thread Ravi Bangoria

'make -C tools/perf build-test' is failing with below log for poewrpc.

  In file included from 
/tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf.h:15:0,
   from util/cpumap.h:8,
   from util/env.c:1:
  /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf-sys.h:23:56:
  fatal error: ../../arch/powerpc/include/uapi/asm/unistd.h: No such file or 
directory
  compilation terminated.

I bisected it and found it's failing from commit ad430729ae00 ("Remove:
kernel unistd*h files from perf's MANIFEST, not used").

Header file '../../arch/powerpc/include/uapi/asm/unistd.h' is included
only for powerpc in tools/perf/perf-sys.h.

By looking closly at commit history, I found little weird thing:

Commit f2d9cae9ea9e ("perf powerpc: Use uapi/unistd.h to fix build
error") replaced 'asm/unistd.h' with 'uapi/asm/unistd.h'

Commit d2709c7ce4c5 ("perf: Make perf build for x86 with UAPI
disintegration applied") removes all arch specific 'uapi/asm/unistd.h'
for all archs and adds generic .

Commit f0b9abfb0446 ("Merge branch 'linus' into perf/core") again
includes 'uapi/asm/unistd.h' for powerpc. Don't know how exactly this
happened as this change is not part of commit also.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/perf-sys.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/perf/perf-sys.h b/tools/perf/perf-sys.h
index 7ed72a4..e4b717e 100644
--- a/tools/perf/perf-sys.h
+++ b/tools/perf/perf-sys.h
@@ -20,7 +20,6 @@
 #endif
 
 #ifdef __powerpc__
-#include "../../arch/powerpc/include/uapi/asm/unistd.h"
 #define CPUINFO_PROC   {"cpu"}
 #endif
 
-- 
2.4.11

[PATCH 2/2] perf ppc64le: Fix probe location when using DWARF

2016-08-08 Thread Ravi Bangoria

Powerpc has Global Entry Point and Local Entry Point for functions.
LEP catches call from both the GEP and the LEP. Symbol table of ELF
contains GEP and Offset from which we can calculate LEP, but debuginfo
does not have LEP info.

Currently, perf prioritize symbol table over dwarf to probe on LEP
for ppc64le. But when user tries to probe with function parameter,
we fall back to using dwarf(i.e. GEP) and when function called via
LEP, probe will never hit.

For example:
  $ objdump -d vmlinux
...
do_sys_open():
c02eb4a0:   e8 00 4c 3c addis   r2,r12,232
c02eb4a4:   60 00 42 38 addir2,r2,96
c02eb4a8:   a6 02 08 7c mflrr0
c02eb4ac:   d0 ff 41 fb std r26,-48(r1)

  $ sudo ./perf probe do_sys_open
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060904

  $ sudo ./perf probe 'do_sys_open filename:string'
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string

For second case, perf probed on GEP. So when function will be called
via LEP, probe won't hit.

  $ sudo ./perf record -a -e probe:do_sys_open ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB perf.data ]

To resolve this issue, let's not prioritize symbol table, let perf
decide what it wants to use. Perf is already converting GEP to LEP
when it uses symbol table. When perf uses debuginfo, let it find
LEP offset form symbol table. This way we fall back to probe on LEP
for all cases.

After patch:
  $ sudo ./perf probe 'do_sys_open filename:string'
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string

  $ sudo ./perf record -a -e probe:do_sys_open ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ]

Signed-off-by: Ravi Bangoria 
---
 tools/perf/arch/powerpc/util/sym-handling.c | 27 +
 tools/perf/util/probe-event.c   | 37 -
 tools/perf/util/probe-event.h   |  6 -
 3 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index c6d0f91..8d4dc97 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -54,10 +54,6 @@ int arch__compare_symbol_names(const char *namea, const char 
*nameb)
 #endif
 
 #if defined(_CALL_ELF) && _CALL_ELF == 2
-bool arch__prefers_symtab(void)
-{
-   return true;
-}
 
 #ifdef HAVE_LIBELF_SUPPORT
 void arch__sym_update(struct symbol *s, GElf_Sym *sym)
@@ -100,4 +96,27 @@ void arch__fix_tev_from_maps(struct perf_probe_event *pev,
tev->point.offset += lep_offset;
}
 }
+
+void arch__post_process_probe_trace_events(struct perf_probe_event *pev,
+  int ntevs)
+{
+   struct probe_trace_event *tev;
+   struct map *map;
+   struct symbol *sym = NULL;
+   struct rb_node *tmp;
+   int i = 0;
+
+   map = get_target_map(pev->target, pev->uprobes);
+   if (!map || map__load(map, NULL) < 0)
+   return;
+
+   for (i = 0; i < ntevs; i++) {
+   tev = &pev->tevs[i];
+   map__for_each_symbol(map, sym, tmp) {
+   if (map->unmap_ip(map, sym->start) == 
tev->point.address)
+   arch__fix_tev_from_maps(pev, tev, map, sym);
+   }
+   }
+}
+
 #endif
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 4e215e7..5efa535 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -178,7 +178,7 @@ static struct map *kernel_get_module_map(const char *module)
return NULL;
 }
 
-static struct map *get_target_map(const char *target, bool user)
+struct map *get_target_map(const char *target, bool user)
 {
/* Init maps of given executable or kernel */
if (user)
@@ -703,19 +703,32 @@ post_process_kernel_probe_trace_events(struct 
probe_trace_event *tevs,
return skipped;
 }
 
+void __weak
+arch__post_process_probe_trace_events(struct perf_probe_event *pev 
__maybe_unused,
+ int ntevs __maybe_unused)
+{
+}
+
 /* Post processing the probe events */
-static int post_process_probe_trace_events(struct probe_trace_event *tevs,
+static int post_process_probe_trace_events(struct perf_probe_event *pev,
+  struct probe_trace_event *tevs,
   int ntevs, const char *module,
   bool uprobe)
 {
-   if (uprobe)
-   return add_exec_to_probe_tra

[PATCH 1/2] perf: Add function to post process kernel trace events

2016-08-08 Thread Ravi Bangoria

Instead of inline code, introduce function to post process kernel
probe trace events.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/probe-event.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 953dc1a..4e215e7 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -664,22 +664,14 @@ static int add_module_to_probe_trace_events(struct 
probe_trace_event *tevs,
return ret;
 }
 
-/* Post processing the probe events */
-static int post_process_probe_trace_events(struct probe_trace_event *tevs,
-  int ntevs, const char *module,
-  bool uprobe)
+static int
+post_process_kernel_probe_trace_events(struct probe_trace_event *tevs,
+  int ntevs)
 {
struct ref_reloc_sym *reloc_sym;
char *tmp;
int i, skipped = 0;
 
-   if (uprobe)
-   return add_exec_to_probe_trace_events(tevs, ntevs, module);
-
-   /* Note that currently ref_reloc_sym based probe is not for drivers */
-   if (module)
-   return add_module_to_probe_trace_events(tevs, ntevs, module);
-
reloc_sym = kernel_get_ref_reloc_sym();
if (!reloc_sym) {
pr_warning("Relocated base symbol is not found!\n");
@@ -711,6 +703,21 @@ static int post_process_probe_trace_events(struct 
probe_trace_event *tevs,
return skipped;
 }
 
+/* Post processing the probe events */
+static int post_process_probe_trace_events(struct probe_trace_event *tevs,
+  int ntevs, const char *module,
+  bool uprobe)
+{
+   if (uprobe)
+   return add_exec_to_probe_trace_events(tevs, ntevs, module);
+
+   if (module)
+   /* Currently ref_reloc_sym based probe is not for drivers */
+   return add_module_to_probe_trace_events(tevs, ntevs, module);
+
+   return post_process_kernel_probe_trace_events(tevs, ntevs);
+}
+
 /* Try to find perf_probe_event with debuginfo */
 static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
  struct probe_trace_event **tevs)
-- 
2.7.4

Re: [PATCH v6 2/7] perf annotate: Add cross arch annotate support

2016-08-26 Thread Ravi Bangoria

Hi Kim,

I've tested your patch on x86 and powerpc and it looks fine to me. Can you 
please
put your signed-off-by.

Please add Act-by: Ravi Bangoria  as well.

Regards,
-Ravi

On Wednesday 24 August 2016 02:06 AM, Kim Phillips wrote:
> On Tue, 23 Aug 2016 11:17:16 +0900
> Namhyung Kim  wrote:
>
>> On Tue, Aug 23, 2016 at 8:01 AM, Kim Phillips  wrote:
>>> On Fri, 19 Aug 2016 18:29:33 +0530
>>> Ravi Bangoria  wrote:
>>>
>>>> Changes in v6:
>>>>   - Instead of adding only those instructions defined in #ifdef __arm__,
>>>> add all instructions from default table to arm table.
>>> Thanks, I've gone through the list and removed all not-ARM
>>> instructions, and added some missing ARM branch instructions:
>> Can we use regex patterns instead?
> Yes, that helps prevent mistakes updating instruction lists - how does
> this look?:
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index b2c6cf3..52316f3 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -26,6 +26,7 @@
>  const char   *disassembler_style;
>  const char   *objdump_path;
>  static regex_tfile_lineno;
> +static regex_tarm_call_insn, arm_jump_insn;
>
>  static struct ins *ins__find(const char *name, const char *norm_arch);
>  static int disasm_line__parse(char *line, char **namep, char **rawp);
> @@ -449,98 +450,7 @@ static struct ins instructions_x86[] = {
>   { .name = "retq",  .ops  = &ret_ops, },
>  };
>
> -static struct ins instructions_arm[] = {
> - { .name = "add",   .ops  = &mov_ops, },
> - { .name = "addl",  .ops  = &mov_ops, },
> - { .name = "addq",  .ops  = &mov_ops, },
> - { .name = "addw",  .ops  = &mov_ops, },
> - { .name = "and",   .ops  = &mov_ops, },
> - { .name = "b", .ops  = &jump_ops, }, /* might also be a call */
> - { .name = "bcc",   .ops  = &jump_ops, },
> - { .name = "bcs",   .ops  = &jump_ops, },
> - { .name = "beq",   .ops  = &jump_ops, },
> - { .name = "bge",   .ops  = &jump_ops, },
> - { .name = "bgt",   .ops  = &jump_ops, },
> - { .name = "bhi",   .ops  = &jump_ops, },
> - { .name = "bl",.ops  = &call_ops, },
> - { .name = "bls",   .ops  = &jump_ops, },
> - { .name = "blt",   .ops  = &jump_ops, },
> - { .name = "blx",   .ops  = &call_ops, },
> - { .name = "bne",   .ops  = &jump_ops, },
> - { .name = "bts",   .ops  = &mov_ops, },
> - { .name = "call",  .ops  = &call_ops, },
> - { .name = "callq", .ops  = &call_ops, },
> - { .name = "cmp",   .ops  = &mov_ops, },
> - { .name = "cmpb",  .ops  = &mov_ops, },
> - { .name = "cmpl",  .ops  = &mov_ops, },
> - { .name = "cmpq",  .ops  = &mov_ops, },
> - { .name = "cmpw",  .ops  = &mov_ops, },
> - { .name = "cmpxch", .ops  = &mov_ops, },
> - { .name = "dec",   .ops  = &dec_ops, },
> - { .name = "decl",  .ops  = &dec_ops, },
> - { .name = "imul",  .ops  = &mov_ops, },
> - { .name = "inc",   .ops  = &dec_ops, },
> - { .name = "incl",  .ops  = &dec_ops, },
> - { .name = "ja",.ops  = &jump_ops, },
> - { .name = "jae",   .ops  = &jump_ops, },
> - { .name = "jb",.ops  = &jump_ops, },
> - { .name = "jbe",   .ops  = &jump_ops, },
> - { .name = "jc",.ops  = &jump_ops, },
> - { .name = "jcxz",  .ops  = &jump_ops, },
> - { .name = "je",.ops  = &jump_ops, },
> - { .name = "jecxz", .ops  = &jump_ops, },
> - { .name = "jg",.ops  = &jump_ops, },
> - { .name = "jge",   .ops  = &jump_ops, },
> - { .name = "jl",.ops  = &jump_ops, },
> - { .name = "jle",   .ops  = &jump_ops, },
> - { .name = "jmp",   .ops  = &jump_ops, },
> - { .name = "jmpq",  .ops  = &jump_ops, },
> - { .name = "jna",   .ops  = &jump_ops, },
> - { .name = "jnae",  .ops  = &jump_ops, },
> - { .name = "jnb",   .ops  = &jump_ops, },
> - { .name = "jnbe",  .ops  = &jump_ops, },
> - { .name = "jnc",   .ops  = &a

[RFC] perf tool: Fix memory corruption because of zero length symbols

2017-10-24 Thread Ravi Bangoria

Perf top is often crashing at very random locations on powerpc.
After investigating, I found the crash only happens when sample
is of zero length symbol. Powerpc kernel has many such symbols
which does not contain length details in vmlinux binary and thus
start and end addresses of such symbols are same.

Structure

  struct sym_hist {
u64   nr_samples;
u64   period;
struct sym_hist_entry addr[0];
  };

has last member 'addr[]' of size zero. 'addr[]' is an array of
addresses that belongs to one symbol (function). If function
consist of 100 instructions, 'addr' points to an array of 100
'struct sym_hist_entry' elements. For zero length symbol, it
points to the *empty* array, i.e. no members in the array and
thus offset 0 is also invalid for such array.

  static int __symbol__inc_addr_samples(...)
  {
...
offset = addr - sym->start;
h = annotation__histogram(notes, evidx);
h->nr_samples++;
h->addr[offset].nr_samples++;
h->period += sample->period;
h->addr[offset].period += sample->period;
...
  }

Here, when 'addr' is same as 'sym->start', 'offset' becomes 0,
which is valid for normal symbols but *invalid* for zero length
symbols and thus updating h->addr[offset] causes memory corruption.

Fix this by adding one dummy element for zero length symbols.

Fixes: edee44be5919 ("perf annotate: Don't throw error for zero length symbols")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/annotate.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4397a8b..15db525 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -606,9 +606,19 @@ static struct arch *arch__find(const char *name)
 int symbol__alloc_hist(struct symbol *sym)
 {
struct annotation *notes = symbol__annotation(sym);
-   const size_t size = symbol__size(sym);
+   size_t size = symbol__size(sym);
size_t sizeof_sym_hist;
 
+   /*
+* Add buffer of one element for zero length symbol.
+* When sample is taken from first instruction of
+* zero length symbol, perf still resolves it and
+* shows symbol name in perf report and allows to
+* annotate it.
+*/
+   if (size == 0)
+   size = 1;
+
/* Check for overflow when calculating sizeof_sym_hist */
if (size > (SIZE_MAX - sizeof(struct sym_hist)) / sizeof(struct 
sym_hist_entry))
return -1;
-- 
1.8.3.1

[PATCH] perf probe: Restrict user to use only one executable with -V

2017-10-26 Thread Ravi Bangoria

If multiple executables (-x) are specified with 'perf probe',
listing variables only considers first executable. Ex,

  $ perf probe -V do_sys_open -x ./test -V fun
Failed to find debug information for address 40
Debuginfo analysis failed.
  Error: Failed to show vars.
Available variables at do_sys_open
@
char*   filename
int dfd
int flags
struct open_flags   op
umode_t mode

Here, first executable is kernel and second is 'test'. Instead
of finding 'fun()' from 'test', perf is looking for 'fun()' in
kernel. Fix this by restricting user to use only one executable
with -V.

  $ perf probe -V do_sys_open -x ./test -V fun
Error: Multiple executables are not allowed.

  $ perf probe -V do_sys_open 
Available variables at do_sys_open
@
char*   filename
int dfd
int flags
struct open_flags   op
umode_t mode

  $ perf probe -x ./test -V fun
Available variables at fun
@
    int arg

Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/probe-event.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index b7aaf9b..0e04015 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1128,6 +1128,11 @@ int show_available_vars(struct perf_probe_event *pevs, 
int npevs,
int i, ret = 0;
struct debuginfo *dinfo;
 
+   if (npevs > 1) {
+   pr_err("Error: Multiple executables are not allowed.\n");
+   return 0;
+   }
+
ret = init_probe_symbol_maps(pevs->uprobes);
if (ret < 0)
return ret;
-- 
1.8.3.1

[PATCH] perf tool: Fix build failure when NO_AUXTRACE=1

2017-11-12 Thread Ravi Bangoria

Perf tool fails with following build failure when AUXTRACE is
not set:

  $ make NO_AUXTRACE=1
  builtin-script.c: In function 'perf_script__process_auxtrace_info':
  util/auxtrace.h:608:44: error: called object is not a function or function 
pointer
   #define perf_event__process_auxtrace_info  0
  ^

Fix it by guarding function under HAVE_AUXTRACE_SUPPORT.

Fixes: 47e5a26a916b ("perf script: Fix --per-event-dump for auxtrace synth 
evsels")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/builtin-script.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ad6404dcf91c..9b43bda45a41 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2848,6 +2848,7 @@ int process_cpu_map_event(struct perf_tool *tool 
__maybe_unused,
return set_maps(script);
 }
 
+#ifdef HAVE_AUXTRACE_SUPPORT
 static int perf_script__process_auxtrace_info(struct perf_tool *tool,
  union perf_event *event,
  struct perf_session *session)
@@ -2862,6 +2863,9 @@ static int perf_script__process_auxtrace_info(struct 
perf_tool *tool,
 
return ret;
 }
+#else
+#define perf_script__process_auxtrace_info 0
+#endif
 
 int cmd_script(int argc, const char **argv)
 {
-- 
2.13.6

Re: [PATCH 16/35] perf annotate: Add samples into struct annotation_line

2017-11-13 Thread Ravi Bangoria


Hi Jiri,

This patch seems to be causing segfault with "perf top --stdio".

Steps to reproduce:
1. start "perf top --stdio" in one terminal
2. run some simple workload in another terminal, let it get finished.
3. annotate function from previous workload in perf top (press 'a' 
followed by 's')


Perf will crash with:

  perf: Segmentation fault
  Obtained 8 stack frames.
  ./perf(sighandler_dump_stack+0x3e) [0x4f1b6e]
  /lib64/libc.so.6(+0x36a7f) [0x7ff3aa7e4a7f]
  ./perf() [0x4a27fd]
  ./perf(symbol__annotate+0x199) [0x4a4439]
  ./perf() [0x44e32d]
  ./perf() [0x44f098]
  /lib64/libpthread.so.0(+0x736c) [0x7ff3acee836c]
  /lib64/libc.so.6(clone+0x3e) [0x7ff3aa8bee1e]

Can you please check.

Thanks,
Ravi


On 10/11/2017 08:31 PM, Jiri Olsa wrote:

Adding samples array into struct annotation_line to
hold the annotation data. The data are populated in
the following patches.

Link: http://lkml.kernel.org/n/tip-97yja5m7z9brrcuf2gwr5...@git.kernel.org
Signed-off-by: Jiri Olsa 
---
  tools/perf/util/annotate.c |  8 
  tools/perf/util/annotate.h | 17 -
  2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 8e1e88aab45a..1b5c7d0a53e8 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -901,7 +901,14 @@ static struct annotation_line*
  annotation_line__new(struct annotate_args *args, size_t privsize)
  {
struct annotation_line *al;
+   struct perf_evsel *evsel = args->evsel;
size_t size = privsize + sizeof(*al);
+   int nr = 1;
+
+   if (perf_evsel__is_group_event(evsel))
+   nr = evsel->nr_members;
+
+   size += sizeof(al->samples[0]) * nr;

al = zalloc(size);
if (al) {
@@ -910,6 +917,7 @@ annotation_line__new(struct annotate_args *args, size_t 
privsize)
al->offset = args->offset;
al->line   = strdup(args->line);
al->line_nr= args->line_nr;
+   al->samples_nr = nr;
}

return al;
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index a02a2bf4f2ab..9c722a7e5f6d 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -58,6 +58,16 @@ bool ins__is_fused(struct arch *arch, const char *ins1, 
const char *ins2);

  struct annotation;

+struct sym_hist_entry {
+   u64 nr_samples;
+   u64 period;
+};
+
+struct annotation_data {
+   double   percent;
+   struct sym_hist_entryhe;
+};
+
  struct annotation_line {
struct list_head node;
struct rb_node   rb_node;
@@ -67,6 +77,8 @@ struct annotation_line {
floatipc;
u64  cycles;
size_t   privsize;
+   int  samples_nr;
+   struct annotation_data   samples[0];
  };

  struct disasm_line {
@@ -87,11 +99,6 @@ static inline bool disasm_line__has_offset(const struct 
disasm_line *dl)
return dl->ops.target.offset_avail;
  }

-struct sym_hist_entry {
-   u64 nr_samples;
-   u64 period;
-};
-
  void disasm_line__free(struct disasm_line *dl);
  struct annotation_line*
  annotation_line__next(struct annotation_line *pos, struct list_head *head);

[PATCH] perf annotate: Remove precision for mnemonics

2017-11-13 Thread Ravi Bangoria

There are many instructions, esp on powerpc, whose mnemonics are
longer than 6 characters. Using precision limit causes truncation
of such mnemonics.

Fix this by removing precision limit. Note that, 'width' is still
6, so alignment won't get affected for length <= 6.

Before:

   li r11,-1
   xscvdp vs1,vs1
   add.   r10,r10,r11

After:

  li r11,-1
  xscvdpsxds vs1,vs1
  add.   r10,r10,r11

Reported-by: Donald Stence 
Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/annotate.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 54321b947de8..6462a7423beb 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -165,7 +165,7 @@ static void ins__delete(struct ins_operands *ops)
 static int ins__raw_scnprintf(struct ins *ins, char *bf, size_t size,
  struct ins_operands *ops)
 {
-   return scnprintf(bf, size, "%-6.6s %s", ins->name, ops->raw);
+   return scnprintf(bf, size, "%-6s %s", ins->name, ops->raw);
 }
 
 int ins__scnprintf(struct ins *ins, char *bf, size_t size,
@@ -230,12 +230,12 @@ static int call__scnprintf(struct ins *ins, char *bf, 
size_t size,
   struct ins_operands *ops)
 {
if (ops->target.name)
-   return scnprintf(bf, size, "%-6.6s %s", ins->name, 
ops->target.name);
+   return scnprintf(bf, size, "%-6s %s", ins->name, 
ops->target.name);
 
if (ops->target.addr == 0)
return ins__raw_scnprintf(ins, bf, size, ops);
 
-   return scnprintf(bf, size, "%-6.6s *%" PRIx64, ins->name, 
ops->target.addr);
+   return scnprintf(bf, size, "%-6s *%" PRIx64, ins->name, 
ops->target.addr);
 }
 
 static struct ins_ops call_ops = {
@@ -299,7 +299,7 @@ static int jump__scnprintf(struct ins *ins, char *bf, 
size_t size,
c++;
}
 
-   return scnprintf(bf, size, "%-6.6s %.*s%" PRIx64,
+   return scnprintf(bf, size, "%-6s %.*s%" PRIx64,
 ins->name, c ? c - ops->raw : 0, ops->raw,
 ops->target.offset);
 }
@@ -372,7 +372,7 @@ static int lock__scnprintf(struct ins *ins, char *bf, 
size_t size,
if (ops->locked.ins.ops == NULL)
return ins__raw_scnprintf(ins, bf, size, ops);
 
-   printed = scnprintf(bf, size, "%-6.6s ", ins->name);
+   printed = scnprintf(bf, size, "%-6s ", ins->name);
return printed + ins__scnprintf(&ops->locked.ins, bf + printed,
size - printed, ops->locked.ops);
 }
@@ -448,7 +448,7 @@ static int mov__parse(struct arch *arch, struct 
ins_operands *ops, struct map *m
 static int mov__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   return scnprintf(bf, size, "%-6.6s %s,%s", ins->name,
+   return scnprintf(bf, size, "%-6s %s,%s", ins->name,
 ops->source.name ?: ops->source.raw,
 ops->target.name ?: ops->target.raw);
 }
@@ -488,7 +488,7 @@ static int dec__parse(struct arch *arch __maybe_unused, 
struct ins_operands *ops
 static int dec__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   return scnprintf(bf, size, "%-6.6s %s", ins->name,
+   return scnprintf(bf, size, "%-6s %s", ins->name,
 ops->target.name ?: ops->target.raw);
 }
 
@@ -500,7 +500,7 @@ static struct ins_ops dec_ops = {
 static int nop__scnprintf(struct ins *ins __maybe_unused, char *bf, size_t 
size,
  struct ins_operands *ops __maybe_unused)
 {
-   return scnprintf(bf, size, "%-6.6s", "nop");
+   return scnprintf(bf, size, "%-6s", "nop");
 }
 
 static struct ins_ops nop_ops = {
@@ -990,7 +990,7 @@ void disasm_line__free(struct disasm_line *dl)
 int disasm_line__scnprintf(struct disasm_line *dl, char *bf, size_t size, bool 
raw)
 {
if (raw || !dl->ins.ops)
-   return scnprintf(bf, size, "%-6.6s %s", dl->ins.name, 
dl->ops.raw);
+   return scnprintf(bf, size, "%-6s %s", dl->ins.name, 
dl->ops.raw);
 
return ins__scnprintf(&dl->ins, bf, size, &dl->ops);
 }
-- 
2.13.6

Re: [PATCH 16/35] perf annotate: Add samples into struct annotation_line

2017-11-14 Thread Ravi Bangoria


Hi Jiri,

On 11/14/2017 03:01 PM, Jiri Olsa wrote:

On Mon, Nov 13, 2017 at 09:14:38PM +0100, Jiri Olsa wrote:

On Mon, Nov 13, 2017 at 09:16:20PM +0530, Ravi Bangoria wrote:

Hi Jiri,

This patch seems to be causing segfault with "perf top --stdio".

Steps to reproduce:
1. start "perf top --stdio" in one terminal
2. run some simple workload in another terminal, let it get finished.
3. annotate function from previous workload in perf top (press 'a' followed
by 's')

Perf will crash with:

   perf: Segmentation fault
   Obtained 8 stack frames.
   ./perf(sighandler_dump_stack+0x3e) [0x4f1b6e]
   /lib64/libc.so.6(+0x36a7f) [0x7ff3aa7e4a7f]
   ./perf() [0x4a27fd]
   ./perf(symbol__annotate+0x199) [0x4a4439]
   ./perf() [0x44e32d]
   ./perf() [0x44f098]
   /lib64/libpthread.so.0(+0x736c) [0x7ff3acee836c]
   /lib64/libc.so.6(clone+0x3e) [0x7ff3aa8bee1e]

Can you please check.

hum, I'm getting following crash after resizing the terminal window:

perf: Floating point exception
Obtained 8 stack frames.
./perf(dump_stack+0x2e) [0x510c89]
./perf(sighandler_dump_stack+0x2e) [0x510d69]
/lib64/libc.so.6(+0x36a80) [0x7f9419588a80]
./perf(perf_top__header_snprintf+0x208) [0x4f42c1]
./perf() [0x453c09]
./perf() [0x454ddb]
/lib64/libpthread.so.0(+0x736d) [0x7f941bc8c36d]
/lib64/libc.so.6(clone+0x3f) [0x7f9419662e1f]
Floating point exception (core dumped)

working on fix

so my crash is caused by bogus resize code, I have it working with fix for
memory corruption happening in SIGWINCH signal handler (attached)
could you please check if that fixes the code for you?


Yes, this fixes the crash caused by resize.

But original crash I reported is still there. Issue seems to be with evsel
being NULL and we are trying to de-reference it somewhere inside
annotation_line__new().

Will try to spend more time on it.

-Ravi

[PATCH] perf record: Fix tool crash with xyarray

2017-10-18 Thread Ravi Bangoria

I see 'perf record -p ' crashes with following log:

   *** Error in `./perf': free(): invalid next size (normal): 
0x0298b340 ***
   === Backtrace: =
   /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7fd85c87e5]
   /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f7fd85d137a]
   /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7fd85d553c]
   ./perf(perf_evsel__close+0xb4)[0x4b7614]
   ./perf(perf_evlist__delete+0x100)[0x4ab180]
   ./perf(cmd_record+0x1d9)[0x43a5a9]
   ./perf[0x49aa2f]
   ./perf(main+0x631)[0x427841]
   /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7fd8571830]
   ./perf(_start+0x29)[0x427a59]

This is because functions xyarray__max_x() and xyarray__max_y()
are returning incorrect values, causing crash while accessing
xyarray.

Fixes: d74be4767367 ("perf xyarray: Save max_x, max_y")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/xyarray.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/xyarray.h b/tools/perf/util/xyarray.h
index 4ba726c..d6858ed 100644
--- a/tools/perf/util/xyarray.h
+++ b/tools/perf/util/xyarray.h
@@ -21,12 +21,12 @@ static inline void *xyarray__entry(struct xyarray *xy, int 
x, int y)
return &xy->contents[x * xy->row_size + y * xy->entry_size];
 }
 
-static inline int xyarray__max_y(struct xyarray *xy)
+static inline int xyarray__max_x(struct xyarray *xy)
 {
return xy->max_x;
 }
 
-static inline int xyarray__max_x(struct xyarray *xy)
+static inline int xyarray__max_y(struct xyarray *xy)
 {
return xy->max_y;
 }
-- 
2.7.4

Re: [PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-06 Thread Ravi Bangoria

Hi Oleg,

Sorry for bit late reply.


On 08/03/2018 04:54 PM, Oleg Nesterov wrote:
> Hi Ravi,
> 
> I was going to give up and ack this series, but it seems I noticed
> a bug...
> 
> On 07/31, Ravi Bangoria wrote:
>>
>> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +struct delayed_uprobe *du;
>> +
>> +if (delayed_uprobe_check(uprobe, mm))
>> +return 0;
>> +
>> +du  = kzalloc(sizeof(*du), GFP_KERNEL);
>> +if (!du)
>> +return -ENOMEM;
>> +
>> +du->uprobe = uprobe;
>> +du->mm = mm;
> 
> I am surprised I didn't notice this before...
> 
> So
>   du->mm = mm;
> 
> is fine, mm can't go away, uprobe_clear_state() does 
> delayed_uprobe_remove(NULL,mm).
> 
> But
>   du->uprobe = uprobe;
> 
> doesn't look right, uprobe can go away and it can be freed, its memory can be 
> reused.
> We can't rely on remove_breakpoint(),


I'm sorry. I didn't get this. How can uprobe go away without calling
uprobe_unregister()
-> rergister_for_each_vma()
   -> remove_breakpoint()
And remove_breakpoint() will get called only when last uprobe consumer is going
away _for that mm_. So, we can rely on remove_breakpoint() to remove {uprobe,mm}
from delayed_uprobe_list. Am I missing anything? Or it would be even better if
you can tell me some example scenario.


> the application can unmap the probed page/vma.
> Yes we do not care about the application in this case, say, the next 
> uprobe_mmap() can
> wrongly increment the counter, we do not care although this can lead to 
> hard-to-debug
> problems. But, if nothing else, the kernel can crash if the freed memory is 
> unmapped.
> So I think put_uprobe() should do delayed_uprobe_remove(uprobe, NULL) before 
> kfree()
> and delayed_uprobe_remove() should be updated to handle the mm==NULL case.
> 
> Also. delayed_uprobe_add() should check the list and avoid duplicates. 
> Otherwise the
> trivial
> 
>   for (;;)
>   munmap(mmap(uprobed_file));
> 
> will eat the memory until uprobe is unregistered.


I'm already calling delayed_uprobe_check(uprobe, mm) from delayed_uprobe_add().
So, we don't add same {uprobe,mm} multiple time in delayed_uprobe_list.

Is it the same check you are asking me to add or something else.


> 
> 
>> +static bool valid_ref_ctr_vma(struct uprobe *uprobe,
>> +  struct vm_area_struct *vma)
>> +{
>> +unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
>> +
>> +return uprobe->ref_ctr_offset &&
>> +vma->vm_file &&
>> +file_inode(vma->vm_file) == uprobe->inode &&
>> +vma->vm_flags & VM_WRITE &&
>> +!(vma->vm_flags & VM_SHARED) &&
> 
>   vma->vm_flags & (VM_WRITE|VM_SHARED) == VM_WRITE &&
> 
> looks a bit better to me, but I won't insist.

Sure. I'll change it.

> 
>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
>> +{
>> +struct list_head *pos, *q;
>> +struct delayed_uprobe *du;
>> +unsigned long vaddr;
>> +int ret = 0, err = 0;
>> +
>> +mutex_lock(&delayed_uprobe_lock);
>> +list_for_each_safe(pos, q, &delayed_uprobe_list) {
>> +du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +if (!valid_ref_ctr_vma(du->uprobe, vma))
>> +continue;
>> +
>> +vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
>> +ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
>> +/* Record an error and continue. */
>> +err = ret & !err ? ret : err;
> 
> I try to avoid the cosmetic nits, but I simply can't look at this line ;)
> 
>   if (ret && !err)
>   err = ret;

This is neat. Will replace it.

> 
>> @@ -1072,7 +1281,14 @@ int uprobe_mmap(struct vm_area_struct *vma)
>>  struct uprobe *uprobe, *u;
>>  struct inode *inode;
>>
>> -if (no_uprobe_events() || !valid_vma(vma, true))
>> +if (no_uprobe_events())
>> +return 0;
>> +
>> +if (vma->vm_flags & VM_WRITE &&
>> +test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags))
>> +delayed_uprobe_install(vma);
> 
> OK, so you also added the VM_WRITE check and I agree. But then I think we
> should also check VM_SHARED, just like valid_ref_ctr_vma() does?

Right. I'll add a check here.

Thanks for reviewing :)

Re: [PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-07 Thread Ravi Bangoria

Hi Oleg,

>> I'm sorry. I didn't get this. How can uprobe go away without calling
>> uprobe_unregister()
>> -> rergister_for_each_vma()
>>-> remove_breakpoint()
>> And remove_breakpoint() will get called
> 
> assuming that _unregister() will find the same vma with the probed insn. But
> as I said, the application can munmap the probed page/vma.

Got it now :)

Thanks.

[PATCH 3/3] perf script: Fix crash because of missing feat_op[] entry

2018-06-20 Thread Ravi Bangoria

perf_event__process_feature() tries to access feat_ops[feat].process
which is not defined for feat = HEADER_LAST_FEATURE and thus perf is
crashing. Add dummy entry for HEADER_LAST_FEATURE in the feat_ops.

Before:
  # ./perf record -o - ls | ./perf script
  Segmentation fault (core dumped)

After:
  # ./perf record -o - ls | ./perf script
  ls 7031 4392.099856:  25 cpu-clock:uhH:  7f5e0ce7cd60
  ls 7031 4392.100355:  25 cpu-clock:uhH:  7f5e0c706ef7

Signed-off-by: Ravi Bangoria 
Fixes: 57b5de463925 ("perf report: Support forced leader feature in pipe mode")
---
 tools/perf/util/header.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 540cd2dcd3e7..de8e3e29d870 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2555,14 +2555,18 @@ struct feature_ops {
}
 
 /* feature_ops not implemented: */
+#define write_last_feature NULL
+
 #define print_tracing_data NULL
 #define print_build_id NULL
+#define print_last_feature NULL
 
 #define process_branch_stack   NULL
 #define process_stat   NULL
+#define process_last_feature   NULL
 
 
-static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
+static const struct feature_ops feat_ops[HEADER_LAST_FEATURE + 1] = {
FEAT_OPN(TRACING_DATA,  tracing_data,   false),
FEAT_OPN(BUILD_ID,  build_id,   false),
FEAT_OPR(HOSTNAME,  hostname,   false),
@@ -2585,6 +2589,7 @@ static const struct feature_ops 
feat_ops[HEADER_LAST_FEATURE] = {
FEAT_OPN(CACHE, cache,  true),
FEAT_OPR(SAMPLE_TIME,   sample_time,false),
FEAT_OPR(MEM_TOPOLOGY,  mem_topology,   true),
+   FEAT_OPN(LAST_FEATURE,  last_feature,   false),
 };
 
 struct header_print_data {
-- 
2.14.4

[PATCH 2/3] perf script: Fix crash because of missing evsel->priv

2018-06-20 Thread Ravi Bangoria

perf script in pipped mode is crashing because evsel->priv is not
set properly. Fix it.

Before:
  # ./perf record -o - -- ls | ./perf script
Segmentation fault (core dumped)

After:
  # ./perf record -o - -- ls | ./perf script
  ls 2282 1031.731974:  25 cpu-clock:uhH:  7effe4b3d29e
  ls 2282 1031.73:  25 cpu-clock:uhH:  7effe4b3a650

Signed-off-by: Ravi Bangoria 
Fixes: a14390fde64e ("perf script: Allow creating per-event dump files")
---
 tools/perf/builtin-script.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4d1cee68cfd2..acee05562f5e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1822,6 +1822,7 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
struct perf_evlist *evlist;
struct perf_evsel *evsel, *pos;
int err;
+   static struct perf_evsel_script *es;
 
err = perf_event__process_attr(tool, event, pevlist);
if (err)
@@ -1830,6 +1831,19 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
evlist = *pevlist;
evsel = perf_evlist__last(*pevlist);
 
+   if (!evsel->priv) {
+   if (scr->per_event_dump) {
+   evsel->priv = perf_evsel_script__new(evsel,
+   scr->session->data);
+   } else {
+   es = zalloc(sizeof(*es));
+   if (!es)
+   return -ENOMEM;
+   es->fp = stdout;
+   evsel->priv = es;
+   }
+   }
+
if (evsel->attr.type >= PERF_TYPE_MAX &&
evsel->attr.type != PERF_TYPE_SYNTH)
return 0;
-- 
2.14.4

Re: [PATCH 2/3] perf script: Fix crash because of missing evsel->priv

2018-06-20 Thread Ravi Bangoria

Hi Arnaldo,

On 06/20/2018 07:22 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jun 20, 2018 at 07:00:29PM +0530, Ravi Bangoria escreveu:
>> perf script in pipped mode is crashing because evsel->priv is not
>> set properly. Fix it.
>>
>> Before:
>>   # ./perf record -o - -- ls | ./perf script
>> Segmentation fault (core dumped)
>>
>> After:
>>   # ./perf record -o - -- ls | ./perf script
>>   ls 2282 1031.731974:  25 cpu-clock:uhH:  7effe4b3d29e
>>   ls 2282 1031.73:  25 cpu-clock:uhH:  7effe4b3a650
>>
>> Signed-off-by: Ravi Bangoria 
>> Fixes: a14390fde64e ("perf script: Allow creating per-event dump files")
> 
> Humm, this cset doesn't set evsel->priv to a 'struct perf_evsel_script'
> object, will check which one does to continue review.

Possible. Actually, git bisect led me to this commit.

Ravi

Re: [PATCH 3/3] perf script: Fix crash because of missing feat_op[] entry

2018-06-20 Thread Ravi Bangoria

Hi Arnaldo,

On 06/20/2018 07:19 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jun 20, 2018 at 07:00:30PM +0530, Ravi Bangoria escreveu:
>> perf_event__process_feature() tries to access feat_ops[feat].process
>> which is not defined for feat = HEADER_LAST_FEATURE and thus perf is
>> crashing. Add dummy entry for HEADER_LAST_FEATURE in the feat_ops.
> 
> Humm, first impression is that we should check for HEADER_LAST_FEATURE
> and not try to access that array slot, as it is just a marker, not an
> actual feature.

Sure. Let me try to handle it inside perf_event__process_feature() itself.
May be something like below.

-
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 540cd2dcd3e7..4dc251d3b372 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3461,6 +3461,10 @@ int perf_event__process_feature(struct perf_tool *tool,
return -1;
}
 
+   /* HEADER_LAST_FEATURE is just a marker. Ignore it. */
+   if (feat == HEADER_LAST_FEATURE)
+   return 0;
+
if (!feat_ops[feat].process)
return 0;
-

Thanks,
Ravi

Re: [PATCH 2/3] perf script: Fix crash because of missing evsel->priv

2018-06-21 Thread Ravi Bangoria

Hi Arnaldo,

On 06/20/2018 07:22 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jun 20, 2018 at 07:00:29PM +0530, Ravi Bangoria escreveu:
>> perf script in pipped mode is crashing because evsel->priv is not
>> set properly. Fix it.
>>
>> Before:
>>   # ./perf record -o - -- ls | ./perf script
>> Segmentation fault (core dumped)
>>
>> After:
>>   # ./perf record -o - -- ls | ./perf script
>>   ls 2282 1031.731974:  25 cpu-clock:uhH:  7effe4b3d29e
>>   ls 2282 1031.73:  25 cpu-clock:uhH:  7effe4b3a650
>>
>> Signed-off-by: Ravi Bangoria 
>> Fixes: a14390fde64e ("perf script: Allow creating per-event dump files")
> 
> Humm, this cset doesn't set evsel->priv to a 'struct perf_evsel_script'
> object, will check which one does to continue review.

So, it's not about setting evsel->priv to 'struct perf_evsel_script', but
it's about setting proper output stream, either file or stdout. When
'struct perf_evsel_script' was not introduced, we were setting evsel->priv
to output stream. So I think, this commit missed to set evsel->priv properly
in pipped case. Ex, below patch applied _directly_ on top of a14390fde64e
fixes the issue.

--
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index fb5e49b3bc44..66cc4b29bf4d 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1645,6 +1645,9 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
evlist = *pevlist;
evsel = perf_evlist__last(*pevlist);

+   if (!evsel->priv)
+   evsel->priv = stdout;
+
if (evsel->attr.type >= PERF_TYPE_MAX &&
evsel->attr.type != PERF_TYPE_SYNTH)
return 0;
--

To me this commit seems to be the bad. Please let me know if that is not
the case. I'll change the last patch as you suggested an post v2 soon.

Thanks,
Ravi

[PATCH v2 2/3] perf script: Fix crash because of missing evsel->priv

2018-06-25 Thread Ravi Bangoria

perf script in pipped mode is crashing because evsel->priv is not
set properly. Fix it.

Before:
  # ./perf record -o - -- ls | ./perf script
Segmentation fault (core dumped)

After:
  # ./perf record -o - -- ls | ./perf script
  ls 2282 1031.731974:  25 cpu-clock:uhH:  7effe4b3d29e
  ls 2282 1031.73:  25 cpu-clock:uhH:  7effe4b3a650

Signed-off-by: Ravi Bangoria 
Fixes: a14390fde64e ("perf script: Allow creating per-event dump files")
---
 tools/perf/builtin-script.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index f3fefbcc4503..ad2ac1300420 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1834,6 +1834,7 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
struct perf_evlist *evlist;
struct perf_evsel *evsel, *pos;
int err;
+   static struct perf_evsel_script *es;
 
err = perf_event__process_attr(tool, event, pevlist);
if (err)
@@ -1842,6 +1843,19 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
evlist = *pevlist;
evsel = perf_evlist__last(*pevlist);
 
+   if (!evsel->priv) {
+   if (scr->per_event_dump) {
+   evsel->priv = perf_evsel_script__new(evsel,
+   scr->session->data);
+   } else {
+   es = zalloc(sizeof(*es));
+   if (!es)
+   return -ENOMEM;
+   es->fp = stdout;
+   evsel->priv = es;
+   }
+   }
+
if (evsel->attr.type >= PERF_TYPE_MAX &&
evsel->attr.type != PERF_TYPE_SYNTH)
return 0;
-- 
2.14.4

[PATCH v2 3/3] perf script/annotate: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]

2018-06-25 Thread Ravi Bangoria

perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE]
which is not defined and thus perf is crashing. HEADER_LAST_FEATURE
is used as an end marker for the perf report but it's unused for
perf script/annotate. Ignore HEADER_LAST_FEATURE for perf script/
annotate.

Before:
  # ./perf record -o - ls | ./perf script
  Segmentation fault (core dumped)

After:
  # ./perf record -o - ls | ./perf script
  ls 7031 4392.099856:  25 cpu-clock:uhH:  7f5e0ce7cd60
  ls 7031 4392.100355:  25 cpu-clock:uhH:  7f5e0c706ef7

Signed-off-by: Ravi Bangoria 
Fixes: 57b5de463925 ("perf report: Support forced leader feature in pipe mode")
---
 tools/perf/builtin-annotate.c | 11 ++-
 tools/perf/builtin-report.c   |  3 ++-
 tools/perf/builtin-script.c   | 11 ++-
 tools/perf/util/header.c  |  2 +-
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 5eb22cc56363..8180319285af 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -283,6 +283,15 @@ static int process_sample_event(struct perf_tool *tool,
return ret;
 }
 
+static int process_feature_event(struct perf_tool *tool,
+union perf_event *event,
+struct perf_session *session)
+{
+   if (event->feat.feat_id < HEADER_LAST_FEATURE)
+   return perf_event__process_feature(tool, event, session);
+   return 0;
+}
+
 static int hist_entry__tty_annotate(struct hist_entry *he,
struct perf_evsel *evsel,
struct perf_annotate *ann)
@@ -471,7 +480,7 @@ int cmd_annotate(int argc, const char **argv)
.attr   = perf_event__process_attr,
.build_id = perf_event__process_build_id,
.tracing_data   = perf_event__process_tracing_data,
-   .feature= perf_event__process_feature,
+   .feature= process_feature_event,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index cdb5b6949832..c04dc7b53797 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -217,7 +217,8 @@ static int process_feature_event(struct perf_tool *tool,
}
 
/*
-* All features are received, we can force the
+* (feat_id = HEADER_LAST_FEATURE) is the end marker which
+* means all features are received, now we can force the
 * group if needed.
 */
setup_forced_leader(rep, session->evlist);
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ad2ac1300420..568ddfac3213 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3044,6 +3044,15 @@ int process_cpu_map_event(struct perf_tool *tool 
__maybe_unused,
return set_maps(script);
 }
 
+static int process_feature_event(struct perf_tool *tool,
+union perf_event *event,
+struct perf_session *session)
+{
+   if (event->feat.feat_id < HEADER_LAST_FEATURE)
+   return perf_event__process_feature(tool, event, session);
+   return 0;
+}
+
 #ifdef HAVE_AUXTRACE_SUPPORT
 static int perf_script__process_auxtrace_info(struct perf_tool *tool,
  union perf_event *event,
@@ -3088,7 +3097,7 @@ int cmd_script(int argc, const char **argv)
.attr= process_attr,
.event_update   = perf_event__process_event_update,
.tracing_data= perf_event__process_tracing_data,
-   .feature = perf_event__process_feature,
+   .feature = process_feature_event,
.build_id= perf_event__process_build_id,
.id_index= perf_event__process_id_index,
.auxtrace_info   = perf_script__process_auxtrace_info,
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 59fcc790c865..653ff65aa2c3 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3464,7 +3464,7 @@ int perf_event__process_feature(struct perf_tool *tool,
pr_warning("invalid record type %d in pipe-mode\n", type);
return 0;
}
-   if (feat == HEADER_RESERVED || feat > HEADER_LAST_FEATURE) {
+   if (feat == HEADER_RESERVED || feat >= HEADER_LAST_FEATURE) {
pr_warning("invalid record type %d in pipe-mode\n", type);
return -1;
}
-- 
2.14.4

[PATCH v2 1/3] perf script: Add missing output fields in a hint

2018-06-25 Thread Ravi Bangoria

Few fields are missing in a perf script -F hint. Add them.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/builtin-script.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index a31d7082188e..f3fefbcc4503 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3125,8 +3125,9 @@ int cmd_script(int argc, const char **argv)
 "+field to add and -field to remove."
 "Valid types: hw,sw,trace,raw,synth. "
 "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
-"addr,symoff,period,iregs,uregs,brstack,brstacksym,flags,"
-
"bpf-output,callindent,insn,insnlen,brstackinsn,synth,phys_addr",
+"addr,symoff,srcline,period,iregs,uregs,brstack,"
+"brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
+"callindent,insn,insnlen,synth,phys_addr,metric,misc",
 parse_output_fields),
OPT_BOOLEAN('a', "all-cpus", &system_wide,
"system-wide collection from all CPUs"),
-- 
2.14.4

[PATCH v2 0/3] perf script: Few trivial fixes

2018-06-25 Thread Ravi Bangoria

First patch fixes perf output field hint. Second and third fixes
crash when used in a piped mode.

v2 changes:
 - [PATCH 3/3] Changed as suggested by Arnaldo

v1: https://lkml.org/lkml/2018/6/20/538

Ravi Bangoria (3):
  perf script: Add missing output fields in a hint
  perf script: Fix crash because of missing evsel->priv
  perf script/annotate: Fix crash caused by accessing
feat_ops[HEADER_LAST_FEATURE]

 tools/perf/builtin-annotate.c | 11 ++-
 tools/perf/builtin-report.c   |  3 ++-
 tools/perf/builtin-script.c   | 30 +++---
 tools/perf/util/header.c  |  2 +-
 4 files changed, 40 insertions(+), 6 deletions(-)

-- 
2.14.4

[PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 61b0481ef417..492a0e005b4d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -689,6 +689,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.\n");
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -1103,6 +1109,9 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (!uprobe)
return -ENOMEM;
+   if (IS_ERR(uprobe))
+   return PTR_ERR(uprobe);
+
/*
 * We can race with uprobe_unregister()->delete_uprobe().
 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4

[PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 246 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 ++-
 4 files changed, 280 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..61b0481ef417 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,157 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, &delayed_uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct

[PATCH v8 1/6] Uprobes: Simplify uprobe_register() body

2018-08-08 Thread Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 69 ++---
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-   consumer_add(uprobe, uc);
-   return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+{
+   struct uprobe *uprobe;
+
+   uprobe = find_uprobe(inode, offset);
+   if (WARN_ON(!uprobe))
+   return;
+
+   down_write(&uprobe->register_rwsem);
+   __uprobe_unregister(uprobe, uc);
+   up_write(&uprobe->register_rwsem);
+   put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
down_write(&uprobe->register_rwsem);
ret = -EAGAIN;
if (likely(uprobe_is_active(uprobe))) {
-   ret = __uprobe_register(uprobe, uc);
+   consumer_add(uprobe, uc);
+   ret = register_for_each_vma(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
goto retry;
return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+   struct uprobe_consumer *uc)
+{
+   return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
-{
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
-   down_write(&uprobe->register_rwsem);
-   __uprobe_unregister(uprobe, uc);
-   up_write(&uprobe->register_rwsem);
-   put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
struct vm_area_struct *vma;
-- 
2.14.4

[PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..be64d943d7ea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(&new->tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, &uprobe_list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(&uprobe_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(&tu->tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4

[PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

2018-08-08 Thread Ravi Bangoria

Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria 
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h| 2 +-
 kernel/events/uprobes.c| 9 +
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr,
+   return uprobe_write_opcode(auprobe, mm, vaddr,
   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, 
uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-   uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+   unsigned long vaddr, uprobe_opcode_t opcode)
 {
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long 
vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, 
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long 
vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t 
*)&auprobe->insn);
+   return uprobe_write_opcode(auprobe, mm, vaddr,
+   *(uprobe_opcode_t *)&auprobe->insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4

[PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore)

2018-08-08 Thread Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = &tev->point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = &tev->point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, &buf);
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = &tev->point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/pr

[PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria

v8 changes:
 - Call delayed_uprobe_remove(uprobe, NULL) from put_uprobe() as well.
 - Other minor optimizations.

v7: https://lkml.org/lkml/2018/7/31/20


Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation


Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c |   2 +-
 include/linux/uprobes.h|   7 +-
 kernel/events/uprobes.c| 329 +++--
 kernel/trace/trace.c   |   2 +-
 kernel/trace/trace_uprobe.c|  75 +-
 tools/perf/util/probe-event.c  |  39 -
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 -
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 --
 tools/perf/util/symbol.h   |   7 +
 12 files changed, 472 insertions(+), 73 deletions(-)

-- 
2.14.4

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-12 Thread Ravi Bangoria

Hi Song,

On 08/11/2018 01:27 PM, Song Liu wrote:
>> +
>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>> +{
>> +   if (!du)
>> +   return;
> Do we really need this check?


Not necessary though, but I would still like to keep it for a safety.


> 
>> +   list_del(&du->list);
>> +   kfree(du);
>> +}
>> +
>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct 
>> *mm)
>> +{
>> +   struct list_head *pos, *q;
>> +   struct delayed_uprobe *du;
>> +
>> +   if (!uprobe && !mm)
>> +   return;
> And do we really need this check?


Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I remove
this check, code below (or more accurately code suggested by Oleg) will remove
all entries from delayed_uprobe_list. So I will keep this check but put a 
comment
above function.


[...]
>> +
>> +   ret = get_user_pages_remote(NULL, mm, vaddr, 1,
>> +   FOLL_WRITE, &page, &vma, NULL);
>> +   if (unlikely(ret <= 0)) {
>> +   /*
>> +* We are asking for 1 page. If get_user_pages_remote() 
>> fails,
>> +* it may return 0, in that case we have to return error.
>> +*/
>> +   ret = (ret == 0) ? -EBUSY : ret;
>> +   pr_warn("Failed to %s ref_ctr. (%d)\n",
>> +   d > 0 ? "increment" : "decrement", ret);
> This warning is not really useful. Seems this function has little information
> about which uprobe is failing here. Maybe we only need warning in the caller
> (or caller of caller).


Sure, I can move this warning to caller of this function but what are the
exact fields you would like to print with warning? Something like this is
fine?

pr_warn("ref_ctr %s failed for 0x%lx, 0x%lx, 0x%lx, 0x%p",
d > 0 ? "increment" : "decrement", inode->i_ino,
offset, ref_ctr_offset, mm);

More importantly, the reason I didn't print more info is because dmesg is
accessible to unprivileged users in many distros but uprobes are not. So
printing this information may be a security violation. No?


> 
>> +   return ret;
>> +   }
>> +
>> +   kaddr = kmap_atomic(page);
>> +   ptr = kaddr + (vaddr & ~PAGE_MASK);
>> +
>> +   if (unlikely(*ptr + d < 0)) {
>> +   pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
>> +   "curr val: %d, delta: %d\n", vaddr, *ptr, d);
>> +   ret = -EINVAL;
>> +   goto out;
>> +   }
>> +
>> +   *ptr += d;
>> +   ret = 0;
>> +out:
>> +   kunmap_atomic(kaddr);
>> +   put_page(page);
>> +   return ret;
>> +}
>> +
>> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
>> + bool is_register)
> What's the reason of bool is_register here vs. short d in __update_ref_ctr()?
> Can we use short for both?


Yes, I can use short as well.


> 
>> +{
>> +   struct vm_area_struct *rc_vma;
>> +   unsigned long rc_vaddr;
>> +   int ret = 0;
>> +
>> +   rc_vma = find_ref_ctr_vma(uprobe, mm);
>> +
>> +   if (rc_vma) {
>> +   rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>> +   ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
>> +
>> +   if (is_register)
>> +   return ret;
>> +   }
> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
> function is a little confusing (at least for me). How about we always use
> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?


No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
between uprobe_register() and process creation.


[...]
>>
>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
> This function name is confusing. How about we call it delayed_ref_ctr_incr() 
> or
> something similar? Also, we should add comments to highlight this is vma is 
> not
> the vma containing the uprobe, but the vma containing the ref_ctr.


Sure, I'll do that.


> 
>> +{
>> +   struct list_head *pos, *q;
>> +   struct delayed_uprobe *du;
>> +   unsigned long vaddr;
>> +   int ret = 0, err = 0;
>> +
>> +   mutex_lock(&delayed_uprobe_lock);
>> +   list_for_each_safe(pos, q, &delayed_uprobe_list) {
>> +   du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +   if (!valid_ref_ctr_vma(du->uprobe, vma))
>> +   continue;
>> +
>> +   vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
>> +   ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
>> +   /* Record an error and continue. */
>> +   if (ret && !err)
>> +   err = ret;
> I think this is a good place (when ret != 0) to call pr_warn(). I guess we can
> print which mm get error for which uprobe (inode+offset).


__update_ref_ctr() is already printing warnin

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria

Hi Song,

On 08/13/2018 11:17 AM, Ravi Bangoria wrote:
>>> +
>>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct 
>>> *mm)
>>> +{
>>> +   struct list_head *pos, *q;
>>> +   struct delayed_uprobe *du;
>>> +
>>> +   if (!uprobe && !mm)
>>> +   return;
>> And do we really need this check?
> 
> 
> Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I 
> remove
> this check, code below (or more accurately code suggested by Oleg) will remove
> all entries from delayed_uprobe_list. So I will keep this check but put a 
> comment
> above function.
> 

Sorry, my bad. Please ignore above comment. Even though, it saves us
to unnecessary loop over entire delayed_uprobe_list when both uprobe
and mm are NULL, that case is not possible with current code. Also,
I'm not dereferencing any of them. So, IMHO it's fine to remove this
check.

Thanks,
Ravi

Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-08-13 Thread Ravi Bangoria

Hi Song,

On 08/11/2018 01:42 PM, Song Liu wrote:
> Do we really need this given we already have PATCH 4/6?
> uprobe_regsiter() can be called
> out of trace_uprobe, this patch won't catch all conflicts anyway.

Right but it, at least, catch all conflicts happening via trace_uprobe.

I don't mind in removing this patch but I would like to get an opinion of
Oleg/Srikar/Steven/Masami.

Thanks,
Ravi

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria

Hi Oleg,

On 08/13/2018 05:20 PM, Oleg Nesterov wrote:
> On 08/13, Ravi Bangoria wrote:
>>
>> On 08/11/2018 01:27 PM, Song Liu wrote:
>>>> +
>>>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>>>> +{
>>>> +   if (!du)
>>>> +   return;
>>> Do we really need this check?
>>
>> Not necessary though, but I would still like to keep it for a safety.
> 
> Heh. I tried to ignore all minor problems in this version, but now that Song
> mentioned this unnecessary check...
> 
> Personally I really dislike the checks like this one.
> 
>   - It can confuse the reader who will try to understand the purpose
> 
>   - it can hide a bug if delayed_uprobe_delete(du) is actually called
> with du == NULL.
> 
> IMO, you should either remove it and let the kernel crash (to notice the
> problem), or turn it into
> 
>   if (WARN_ON(!du))
>   return;
> 
> which is self-documented and reports the problem without kernel crash.

Ok. I'll remove that check.

> 
>>>> +   rc_vma = find_ref_ctr_vma(uprobe, mm);
>>>> +
>>>> +   if (rc_vma) {
>>>> +   rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>>>> +   ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
>>>> +
>>>> +   if (is_register)
>>>> +   return ret;
>>>> +   }
>>> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
>>> function is a little confusing (at least for me). How about we always use
>>> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?
>>
>>
>> No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
>> between uprobe_register() and process creation.
> 
> Yes.
> 
> But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
> hook and avoid delayed_uprobe_install() in uprobe_mmap().

I'm sorry. I didn't get this.

> 
> Afaics, the really problematic case is dlopen() which can race with 
> _register()
> too, right?

dlopen() should internally use mmap() right? So what is the problem here? Can
you please elaborate.

Thanks,
Ravi

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria




On 08/13/2018 06:47 PM, Oleg Nesterov wrote:
> On 08/13, Ravi Bangoria wrote:
>>
>>> But damn, process creation (exec) is trivial. We could add a new 
>>> uprobe_exec()
>>> hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>
>> I'm sorry. I didn't get this.
> 
> Sorry for confusion...
> 
> I meant, if only exec*( could race with _register(), we could add another 
> uprobe
> hook which updates all (delayed) counters before return to user-mode.

Ok.

> 
>>> Afaics, the really problematic case is dlopen() which can race with 
>>> _register()
>>> too, right?
>>
>> dlopen() should internally use mmap() right? So what is the problem here? Can
>> you please elaborate.
> 
> What I tried to say is that we can't avoid 
> uprobe_mmap()->delayed_uprobe_install()
> because dlopen() can race with _register() too, just like exec.

Right :)

Thanks,
Ravi

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria

Hi Song,

On 08/13/2018 10:42 PM, Song Liu wrote:
> On Mon, Aug 13, 2018 at 6:17 AM, Oleg Nesterov  wrote:
>> On 08/13, Ravi Bangoria wrote:
>>>
>>>> But damn, process creation (exec) is trivial. We could add a new 
>>>> uprobe_exec()
>>>> hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>>
>>> I'm sorry. I didn't get this.
>>
>> Sorry for confusion...
>>
>> I meant, if only exec*( could race with _register(), we could add another 
>> uprobe
>> hook which updates all (delayed) counters before return to user-mode.
>>
>>>> Afaics, the really problematic case is dlopen() which can race with 
>>>> _register()
>>>> too, right?
>>>
>>> dlopen() should internally use mmap() right? So what is the problem here? 
>>> Can
>>> you please elaborate.
>>
>> What I tried to say is that we can't avoid 
>> uprobe_mmap()->delayed_uprobe_install()
>> because dlopen() can race with _register() too, just like exec.
>>
>> Oleg.
>>
> 
> How about we do delayed_uprobe_install() per file? Say we keep a list
> of delayed_uprobe
> in load_elf_binary(). Then we can install delayed_uprobe after loading
> all sections of the
> file.

I'm not sure if I totally understood the idea. But how this approach can
solve dlopen() race with _register()?

Rather, making delayed_uprobe_list an mm field seems simple and effective
idea to me. The only overhead will be list_empty(mm->delayed_list) check.

Please let me know if I misunderstood you.

Thanks,
Ravi

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-14 Thread Ravi Bangoria



> +static int delayed_uprobe_install(struct vm_area_struct *vma)
> +{
> + struct list_head *pos, *q;
> + struct delayed_uprobe *du;
> + unsigned long vaddr;
> + int ret = 0, err = 0;
> +
> + mutex_lock(&delayed_uprobe_lock);
> + list_for_each_safe(pos, q, &delayed_uprobe_list) {
> + du = list_entry(pos, struct delayed_uprobe, list);
> +
> + if (!valid_ref_ctr_vma(du->uprobe, vma))
> + continue;

I think we should compare mm here. I.e.:

if (du->mm != vma->vm_mm || !valid_ref_ctr_vma(du->uprobe, vma))
continue;

Otherwise things can mess up.

> +
> + vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
> + ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
> + /* Record an error and continue. */
> + if (ret && !err)
> + err = ret;
> + delayed_uprobe_delete(du);
> + }
> + mutex_unlock(&delayed_uprobe_lock);
> + return err;
> +}

Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-20 Thread Ravi Bangoria

Hi Song,

> root@virt-test:~# ~/a.out
> 11
> semaphore 0
> semaphore 0
> semaphore 2  <<<  when the uprobe is enabled

Yes, this happens when multiple vmas points to the same file portion.
Can you check /proc/`pgrep a.out`/maps.

Logic is simple. If we are going to patch an instruction, increment the
reference counter. If we are going to unpatch an instruction, decrement
the reference counter. In this case, we patched instruction twice and
thus incremented reference counter twice as well.

Ravi

Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-20 Thread Ravi Bangoria

Hi Song,

> However, if I start a.out AFTER enabling the uprobe, there is something wrong:
> 
> root@virt-test:~# ~/a.out
> 11
> semaphore 0   <<< this should be non-zero, as the uprobe is already 
> enabled

Ok. I'm able to reproduce this. Digging deeper.

Ravi

Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-20 Thread Ravi Bangoria

Hi Song,

On 08/21/2018 10:53 AM, Ravi Bangoria wrote:
> Hi Song,
> 
>> However, if I start a.out AFTER enabling the uprobe, there is something 
>> wrong:
>>
>> root@virt-test:~# ~/a.out
>> 11
>> semaphore 0   <<< this should be non-zero, as the uprobe is already 
>> enabled

In this testcase, semaphore variable is stored into .bss:

  $ nm test | grep semaphore
  10010c5e B semaphore

  $ readelf -SW ./test | grep "data\|bss"
[22] .data PROGBITS10010c58 000c58 04 00  
WA  0   0  1
[23] .bss  NOBITS  10010c5c 000c5c 04 00  
WA  0   0  2

I'm not so sure but I guess .bss data initialization happens after
calling uprobe_mmap() and thus you are seeing semaphore as 0.

To verify this, if I force to save semaphore into data section by
assigning non-zero value to it:

  volatile short semaphore = 1

 $ nm test | grep semaphore
 10010c5c D semaphore

 $ readelf -SW ./test | grep "data\|bss"
[22] .data PROGBITS10010c58 000c58 06 00  
WA  0   0  2
[23] .bss  NOBITS  10010c5e 000c5e 02 00  
WA  0   0  1 

increment/decrement works fine.

Ravi

Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-21 Thread Ravi Bangoria



On 08/21/2018 01:04 PM, Naveen N. Rao wrote:
> Song Liu wrote:
>> I am testing the patch set with the following code:
>>
>> #include 
>> #include 
>>
>> volatile short semaphore = 0;
>>
>> int for_uprobe(int c)
>> {
>>     printf("%d\n", c + 10);
>>     return c + 1;
>> }
>>
>> int main(int argc, char *argv[])
>> {
>>     for_uprobe(argc);
>>     while (1) {
>>     sleep(1);
>>     printf("semaphore %d\n", semaphore);
>>     }
>> }
>>
>> I created a uprobe on function for_uprobe(), that uses semaphore as
>> reference counter:
>>
>>   echo "p:uprobe_1 /root/a.out:0x49a(0x1036)" >> uprobe_events
> 
> Is that even valid? That _looks_ like a semaphore, but I'm not quite sure 
> that it qualifies as an _SDT_ semaphore. Do you see this issue if you instead 
> use the macros provided by  to create SDT markers?
> 

Right. By default SDT reference counters(semaphore) goes into .probes
section:

  [25] .probes   PROGBITS0060102c 00102c 04 00  WA  
0   0  2

which has PROGBITS set. So this works fine. And there are many other
things which are coded into . So the official way to use
SDT markers should be through that.

Ravi

Re: [PATCH] trace_uprobe: support reference count in fd-based uprobe

2018-08-22 Thread Ravi Bangoria




On 08/22/2018 03:53 AM, Song Liu wrote:
> This patch applies on top of Ravi Bangoria's work that enables reference
> count for uprobe:
> 
>   https://lkml.org/lkml/2018/8/20/37
> 
> After Ravi's work, the effort to enable it in fd-based uprobe is straight
> forward. Highest 40 bits of perf_event_attr.config is used to stored offset
> of the reference count (semaphore).
> 
> Format information in /sys/bus/event_source/devices/uprobe/format/ is
> updated to reflect this new feature.

LGTM

Reviewed-and-tested-by: Ravi Bangoria 


> 
> Signed-off-by: Song Liu 
> Cc: Ravi Bangoria 
> Cc: Masami Hiramatsu 
> Cc: Oleg Nesterov 
> Cc: Srikar Dronamraju 
> Cc: Naveen N. Rao 
> Cc: Steven Rostedt (VMware) 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Teng Qin 
> ---
>  include/linux/trace_events.h|  3 +-
>  kernel/events/core.c| 49 ++---
>  kernel/trace/trace_event_perf.c |  7 +++--
>  kernel/trace/trace_probe.h  |  3 +-
>  kernel/trace/trace_uprobe.c |  4 ++-
>  5 files changed, 50 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 78a010e19ed4..4130a5497d40 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -575,7 +575,8 @@ extern int bpf_get_kprobe_info(const struct perf_event 
> *event,
>  bool perf_type_tracepoint);
>  #endif
>  #ifdef CONFIG_UPROBE_EVENTS
> -extern int  perf_uprobe_init(struct perf_event *event, bool is_retprobe);
> +extern int  perf_uprobe_init(struct perf_event *event,
> +  unsigned long ref_ctr_offset, bool is_retprobe);
>  extern void perf_uprobe_destroy(struct perf_event *event);
>  extern int bpf_get_uprobe_info(const struct perf_event *event,
>  u32 *fd_type, const char **filename,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8f0434a9951a..75a0219be420 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8363,30 +8363,39 @@ static struct pmu perf_tracepoint = {
>   *
>   * PERF_PROBE_CONFIG_IS_RETPROBE if set, create kretprobe/uretprobe
>   *   if not set, create kprobe/uprobe
> + *
> + * The following values specify a reference counter (or semaphore in the
> + * terminology of tools like dtrace, systemtap, etc.) Userspace Statically
> + * Defined Tracepoints (USDT). Currently, we use 40 bit for the offset.
> + *
> + * PERF_UPROBE_REF_CTR_OFFSET_BITS   # of bits in config as th offset
> + * PERF_UPROBE_REF_CTR_OFFSET_SHIFT  # of bits to shift left
>   */
>  enum perf_probe_config {
>   PERF_PROBE_CONFIG_IS_RETPROBE = 1U << 0,  /* [k,u]retprobe */
> + PERF_UPROBE_REF_CTR_OFFSET_BITS = 40,
> + PERF_UPROBE_REF_CTR_OFFSET_SHIFT = 64 - PERF_UPROBE_REF_CTR_OFFSET_BITS,
>  };
>  
>  PMU_FORMAT_ATTR(retprobe, "config:0");
> +#endif
>  
> -static struct attribute *probe_attrs[] = {
> +#ifdef CONFIG_KPROBE_EVENTS
> +static struct attribute *kprobe_attrs[] = {
>   &format_attr_retprobe.attr,
>   NULL,
>  };
>  
> -static struct attribute_group probe_format_group = {
> +static struct attribute_group kprobe_format_group = {
>   .name = "format",
> - .attrs = probe_attrs,
> + .attrs = kprobe_attrs,
>  };
>  
> -static const struct attribute_group *probe_attr_groups[] = {
> - &probe_format_group,
> +static const struct attribute_group *kprobe_attr_groups[] = {
> + &kprobe_format_group,
>   NULL,
>  };
> -#endif
>  
> -#ifdef CONFIG_KPROBE_EVENTS
>  static int perf_kprobe_event_init(struct perf_event *event);
>  static struct pmu perf_kprobe = {
>   .task_ctx_nr= perf_sw_context,
> @@ -8396,7 +8405,7 @@ static struct pmu perf_kprobe = {
>   .start  = perf_swevent_start,
>   .stop   = perf_swevent_stop,
>   .read   = perf_swevent_read,
> - .attr_groups= probe_attr_groups,
> + .attr_groups= kprobe_attr_groups,
>  };
>  
>  static int perf_kprobe_event_init(struct perf_event *event)
> @@ -8428,6 +8437,24 @@ static int perf_kprobe_event_init(struct perf_event 
> *event)
>  #endif /* CONFIG_KPROBE_EVENTS */
>  
>  #ifdef CONFIG_UPROBE_EVENTS
> +PMU_FORMAT_ATTR(ref_ctr_offset, "config:63-24");
> +
> +static struct attribute *uprobe_attrs[] = {
> + &format_attr_retprobe.attr,
> + &format_attr_ref_ctr_offset.attr,
> + NULL,
> +};
> +
> +static struct attribute_group uprobe_format_group = {
> + .name = "format",
> + .attrs = uprobe_attrs

Re: [PATCH v6 5/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-07-26 Thread Ravi Bangoria

Hi Oleg,

On 07/25/2018 04:38 PM, Oleg Nesterov wrote:
> No, I can't understand this patch...
> 
> On 07/16, Ravi Bangoria wrote:
>>
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -63,6 +63,8 @@ static struct percpu_rw_semaphore dup_mmap_sem;
>>
>>  /* Have a copy of original instruction */
>>  #define UPROBE_COPY_INSN0
>> +/* Reference counter offset is reloaded with non-zero value. */
>> +#define REF_CTR_OFF_RELOADED1
>>
>>  struct uprobe {
>>  struct rb_node  rb_node;/* node in the rb tree */
>> @@ -476,9 +478,23 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, 
>> struct mm_struct *mm,
>>  return ret;
>>
>>  ret = verify_opcode(old_page, vaddr, &opcode);
>> -if (ret <= 0)
>> +if (ret < 0)
>>  goto put_old;
> 
> I agree, "ret <= 0" wasn't nice even before this change, but "ret < 0" looks
> worse because this is simply not possible.


Ok. Any better idea?
I think if we don't track all mms patched by uprobe, we have to rely
on current instruction.


> 
>> +/*
>> + * If instruction is already patched but reference counter offset
>> + * has been reloaded to non-zero value, increment the reference
>> + * counter and return.
>> + */
>> +if (ret == 0) {
>> +if (is_register &&
>> +test_bit(REF_CTR_OFF_RELOADED, &uprobe->flags)) {
>> +WARN_ON(!uprobe->ref_ctr_offset);
>> +ret = update_ref_ctr(uprobe, mm, true);
>> +}
>> +goto put_old;
>> +}
> 
> So we need to force update_ref_ctr(true) in case when uprobe_register_refctr()
> detects the already registered uprobe with ref_ctr_offset == 0, and then it 
> calls
> register_for_each_vma().
> 
> Why this can't race with uprobe_mmap() ?
> 
> uprobe_mmap() can do install_breakpoint() right after REF_CTR_OFF_RELOADED 
> was set,
> then register_for_each_vma() will find this vma and do install_breakpoint() 
> too.
> If ref_ctr_vma was already mmaped, the counter will be incremented twice, no?


Hmm right. Probably, I can fix this race by using some lock, but I don't
know if it's good to do that inside uprobe_mmap().


> 
>> @@ -971,6 +1011,7 @@ register_for_each_vma(struct uprobe *uprobe, struct 
>> uprobe_consumer *new)
>>  bool is_register = !!new;
>>  struct map_info *info;
>>  int err = 0;
>> +bool installed = false;
>>
>>  percpu_down_write(&dup_mmap_sem);
>>  info = build_map_info(uprobe->inode->i_mapping,
>> @@ -1000,8 +1041,10 @@ register_for_each_vma(struct uprobe *uprobe, struct 
>> uprobe_consumer *new)
>>  if (is_register) {
>>  /* consult only the "caller", new consumer. */
>>  if (consumer_filter(new,
>> -UPROBE_FILTER_REGISTER, mm))
>> +UPROBE_FILTER_REGISTER, mm)) {
>>  err = install_breakpoint(uprobe, mm, vma, 
>> info->vaddr);
>> +installed = true;
>> +}
>>  } else if (test_bit(MMF_HAS_UPROBES, &mm->flags)) {
>>  if (!filter_chain(uprobe,
>>  UPROBE_FILTER_UNREGISTER, mm))
>> @@ -1016,6 +1059,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
>> uprobe_consumer *new)
>>  }
>>   out:
>>  percpu_up_write(&dup_mmap_sem);
>> +if (installed)
>> +clear_bit(REF_CTR_OFF_RELOADED, &uprobe->flags);
> 
> I simply can't understand this "bool installed"


That boolean is needed because consumer_filter() returns false when this
function gets called first time from uprobe_register(). And consumer_filter
returns true when it gets called by uprobe_apply(). If I make it
unconditional, there is no effect of setting REF_CTR_OFF_RELOADED bit. So
this boolean is needed.

Though, there is one more issue I found. Let's say there are two processes
running and user probes on both of them using uprobe_register() using, let's
say systemtap. Now, some other guy does uprobe_register_refctr() using
'perf -p PID' on same uprobe but he is interested in only one process. Here,
we will increment the reference counter only in the "PID" process and we will
clear REF_CTR_OFF_RELOADED flag. Later, some other guy does 'perf -a' which
will ca

Re: [PATCH v6 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-20 Thread Ravi Bangoria

Oleg, Srikar, Masami, Song,

Any feedback please? :)

Thanks,
Ravi

On 07/16/2018 02:17 PM, Ravi Bangoria wrote:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
> if (reference_counter > 0) {
> Execute marker instructions;
> }
> 
> Default value of reference counter is 0. Tracer has to increment the 
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the a uprobe infrastructure. Ex,[2]
> 
>   # cat tick.c
> ... 
> for (i = 0; i < 100; i++) {
>   DTRACE_PROBE1(tick, loop1, i);
> if (TICK_LOOP2_ENABLED()) {
> DTRACE_PROBE1(tick, loop2, i); 
> }
> printf("hi: %d\n", i); 
> sleep(1);
> }   
> ... 
> 
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>  3  sdt_tick:loop1
>  0  sdt_tick:loop2
>  2.747086086 seconds time elapsed
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
> hi: 0
> hi: 1
> hi: 2
> ^C  
>  Performance counter stats for '/tmp/tick':
>  3  sdt_tick:loop2
>2.561851452 seconds time elapsed
> 
> 
> v6 changes:
>  - Do not export struct uprobe outside of uprobe.c. Instead, use
>container_of(arch_uprobe) to regain uprobe in uprobe_write_opcode().
>  - Allow 0 as a special value for reference counter _offset_. I.e.
>two uprobes, one having ref_ctr_offset=0 and the other having
>non-zero ref_ctr_offset can coexists.
>  - If vma holding reference counter is not present while patching an
>instruction, we add that uprobe in delayed_uprobe_list. When
>appropriate mapping is created, we increment the reference counter
>and remove uprobe from delayed_uprobe_list. While doing all this,
>v5 was searching for all such uprobes in uprobe_tree which is not 
>require. Also, uprobes are stored in rbtree with inode+offset as
>the key and v5 was searching uprobe based on inode+ref_ctr_offset
>which is wrong too. Fix this by directly looing on delayed_uprobe_list.
>  - Consider VM_SHARED vma as invalid for reference counter. Return
>false from valid_ref_ctr_vma() if vma->vm_flags has VM_SHARED set.
>  - No need to use FOLL_FORCE in get_user_pages_remote() while getting
>a page to update reference counter. Remove it.
>  - Do not mention usage of reference counter in Documentation/.
> 
> 
> v5 can be found at: https://lkml.org/lkml/2018/6/28/51
> 
> Ravi Bangoria (6):
>   Uprobes: Simplify uprobe_register() body
>   Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
>   Uprobes: Support SDT markers having reference count (semaphore)
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   perf probe: Support SDT markers having reference counter (semaphore)
> 
>  arch/arm/probes/uprobes/core.c |   2 +-
>  arch/mips/kernel/uprobes.c |   2 +-
>  include/linux/uprobes.h|   7 +-
>  kernel/events/uprobes.c| 358 
> -
>  kernel/trace/trace.c   |   2 +-
>  kernel/trace/trace_uprobe.c|  78 -
>  tools/perf/util/probe-event.c  |  39 -
>  tools/perf/util/probe-event.h  |   1 +
>  tools/perf/util/probe-file.c   |  34 +++-
>  tools/perf/util/probe-file.h   |   1 +
>  tools/perf/util/symbol-elf.c   |  46 --
>  tools/perf/util/symbol.h   |   7 +
>  12 files changed, 503 insertions(+), 74 deletions(-)
>

[PATCH v7 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-07-30 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ad92fed11526..c27546929ae7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -679,6 +679,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.\n");
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -1093,6 +1099,9 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (!uprobe)
return -ENOMEM;
+   if (IS_ERR(uprobe))
+   return PTR_ERR(uprobe);
+
/*
 * We can race with uprobe_unregister()->delete_uprobe().
 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4

[PATCH v7 6/6] perf probe: Support SDT markers having reference counter (semaphore)

2018-07-30 Thread Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = &tev->point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = &tev->point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, &buf);
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = &tev->point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/pr

[PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-30 Thread Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 232 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 +++-
 4 files changed, 267 insertions(+), 10 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..ad92fed11526 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,154 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, &delayed_uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct

[PATCH v7 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

2018-07-30 Thread Ravi Bangoria

Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria 
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h| 2 +-
 kernel/events/uprobes.c| 9 +
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr,
+   return uprobe_write_opcode(auprobe, mm, vaddr,
   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, 
uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-   uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+   unsigned long vaddr, uprobe_opcode_t opcode)
 {
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long 
vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, 
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long 
vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t 
*)&auprobe->insn);
+   return uprobe_write_opcode(auprobe, mm, vaddr,
+   *(uprobe_opcode_t *)&auprobe->insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4

[PATCH v7 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-30 Thread Ravi Bangoria

v7 changes:
 - Don't allow both zero and non-zero reference counter offset at
   the same time. It's painful and error prone.
 - Don't call delayed_uprobe_install() if vma->vm_mm does not have
   any breakpoint installed.

v6: https://lkml.org/lkml/2018/7/16/353


Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed


Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c |   2 +-
 include/linux/uprobes.h|   7 +-
 kernel/events/uprobes.c| 315 +++--
 kernel/trace/trace.c   |   2 +-
 kernel/trace/trace_uprobe.c|  75 +-
 tools/perf/util/probe-event.c  |  39 -
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 -
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 --
 tools/perf/util/symbol.h   |   7 +
 12 files changed, 459 insertions(+), 72 deletions(-)

-- 
2.14.4

[PATCH v7 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-07-30 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..be64d943d7ea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(&new->tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, &uprobe_list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(&uprobe_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(&tu->tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4

[PATCH v7 1/6] Uprobes: Simplify uprobe_register() body

2018-07-30 Thread Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 69 ++---
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-   consumer_add(uprobe, uc);
-   return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+{
+   struct uprobe *uprobe;
+
+   uprobe = find_uprobe(inode, offset);
+   if (WARN_ON(!uprobe))
+   return;
+
+   down_write(&uprobe->register_rwsem);
+   __uprobe_unregister(uprobe, uc);
+   up_write(&uprobe->register_rwsem);
+   put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
down_write(&uprobe->register_rwsem);
ret = -EAGAIN;
if (likely(uprobe_is_active(uprobe))) {
-   ret = __uprobe_register(uprobe, uc);
+   consumer_add(uprobe, uc);
+   ret = register_for_each_vma(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
goto retry;
return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+   struct uprobe_consumer *uc)
+{
+   return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
-{
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
-   down_write(&uprobe->register_rwsem);
-   __uprobe_unregister(uprobe, uc);
-   up_write(&uprobe->register_rwsem);
-   put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
struct vm_area_struct *vma;
-- 
2.14.4

Re: [PATCH v6 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-23 Thread Ravi Bangoria

Hi Oleg,

On 07/23/2018 09:56 PM, Oleg Nesterov wrote:
> I have a mixed feeling about this series... I'll try to summarise my thinking
> tomorrow, but I do not see any obvious problem so far. Although I have some
> concerns about 5/6, I need to re-read it after sleep.

Sure.

> 
> 
> On 07/16, Ravi Bangoria wrote:
>>
>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
>> +{
>> +struct list_head *pos, *q;
>> +struct delayed_uprobe *du;
>> +unsigned long vaddr;
>> +int ret = 0, err = 0;
>> +
>> +mutex_lock(&delayed_uprobe_lock);
>> +list_for_each_safe(pos, q, &delayed_uprobe_list) {
>> +du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +if (!du->uprobe->ref_ctr_offset ||
> 
> Is it possible to see ->ref_ctr_offset == 0 in delayed_uprobe_list?

I'll remove this check.

> 
>> @@ -1072,7 +1282,13 @@ int uprobe_mmap(struct vm_area_struct *vma)
>>  struct uprobe *uprobe, *u;
>>  struct inode *inode;
>>  
>> -if (no_uprobe_events() || !valid_vma(vma, true))
>> +if (no_uprobe_events())
>> +return 0;
>> +
>> +if (vma->vm_flags & VM_WRITE)
>> +delayed_uprobe_install(vma);
> 
> Obviously not nice performance-wise... OK, I do not know if it will actually
> hurt in practice and probably we can use the better data structures if 
> necessary.
> But can't we check MMF_HAS_UPROBES at least? I mean,
> 
>   if (vma->vm_flags & VM_WRITE && test_bit(MMF_HAS_UPROBES, 
> &vma->vm_mm->flags))
>   delayed_uprobe_install(vma);
> 
> ?

Yes.

> 
> 
> Perhaps we can even add another flag later, MMF_HAS_DELAYED_UPROBES set by
> delayed_uprobe_add().

Yes, good idea.

Thanks for the review,
Ravi

[PATCH 0/2] perf trace: Two trivial fixes

2018-01-29 Thread Ravi Bangoria

Two independent fixes:
First adds 'generated' directory into .gitignore
Second fixes call-graph output with perf trace

Ravi Bangoria (2):
  perf tools: Add trace/beauty/generated/ into .gitignore
  perf trace: Fix call-graph output

 tools/perf/.gitignore  | 1 +
 tools/perf/builtin-trace.c | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
1.8.3.1

[PATCH 1/2] perf tools: Add trace/beauty/generated/ into .gitignore

2018-01-29 Thread Ravi Bangoria

No functionality changes.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 643cc4ba..3e5135d 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -31,5 +31,6 @@ config.mak.autogen
 .config-detected
 util/intel-pt-decoder/inat-tables.c
 arch/*/include/generated/
+trace/beauty/generated/
 pmu-events/pmu-events.c
 pmu-events/jevents
-- 
1.8.3.1

[PATCH 2/2] perf trace: Fix call-graph output

2018-01-29 Thread Ravi Bangoria

Recently, Arnaldo fixed global vs event specific --max-stack usage
with commit bd3dda9ab0fb ("perf trace: Allow overriding global
--max-stack per event"). This commit is having a regression when
we don't use --max-stack at all with perf trace. Ex,

  $ ./perf trace record -g ls
  $ ./perf trace -i perf.data
 0.076 ( 0.002 ms): ls/9109 brk(
 0.196 ( 0.008 ms): ls/9109 access(filename: 0x9f998b70, mode: R
 0.209 ( 0.031 ms): ls/9109 open(filename: 0x9f998978, flags: CLOEXEC

This is missing call-traces.
After patch:

  $ ./perf trace -i perf.data
 0.076 ( 0.002 ms): ls/9109 brk(
do_syscall_trace_leave ([kernel.kallsyms])
[0] ([unknown])
syscall_exit_work ([kernel.kallsyms])
brk (/usr/lib64/ld-2.17.so)
_dl_sysdep_start (/usr/lib64/ld-2.17.so)
_dl_start_final (/usr/lib64/ld-2.17.so)
_dl_start (/usr/lib64/ld-2.17.so)
_start (/usr/lib64/ld-2.17.so)
 0.196 ( 0.008 ms): ls/9109 access(filename: 0x9f998b70, mode: R
do_syscall_trace_leave ([kernel.kallsyms])
[0] ([unknown])

Fixes: bd3dda9ab0fb ("perf trace: Allow overriding global --max-stack per 
event")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/builtin-trace.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 17d11de..e90f5c3 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1661,9 +1661,12 @@ static int trace__resolve_callchain(struct trace *trace, 
struct perf_evsel *evse
struct callchain_cursor *cursor)
 {
struct addr_location al;
+   int max_stack = evsel->attr.sample_max_stack ?
+   evsel->attr.sample_max_stack :
+   trace->max_stack;
 
if (machine__resolve(trace->host, &al, sample) < 0 ||
-   thread__resolve_callchain(al.thread, cursor, evsel, sample, NULL, 
NULL, evsel->attr.sample_max_stack))
+   thread__resolve_callchain(al.thread, cursor, evsel, sample, NULL, 
NULL, max_stack))
return -1;
 
return 0;
-- 
1.8.3.1

[PATCH] trace_uprobe: Display correct offset in uprobe_events

2018-01-05 Thread Ravi Bangoria

Recently, how the pointers being printed with %p has been changed
by commit ad67b74d2469 ("printk: hash addresses printed with %p").
This is causing a regression while showing offset in the
uprobe_events file. Instead of %p, use %px to display offset.

Before patch:

  # perf probe -vv -x /tmp/a.out main
  Opening /sys/kernel/debug/tracing//uprobe_events write=1
  Writing event: p:probe_a/main /tmp/a.out:0x58c

  # cat /sys/kernel/debug/tracing/uprobe_events
  p:probe_a/main /tmp/a.out:0x49a0f352

After patch:

  # cat /sys/kernel/debug/tracing/uprobe_events
  p:probe_a/main /tmp/a.out:0x058c

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 40592e7b3568..268029ae1be6 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -608,7 +608,7 @@ static int probes_seq_show(struct seq_file *m, void *v)
 
/* Don't print "0x  (null)" when offset is 0 */
if (tu->offset) {
-   seq_printf(m, "0x%p", (void *)tu->offset);
+   seq_printf(m, "0x%px", (void *)tu->offset);
} else {
switch (sizeof(void *)) {
case 4:
-- 
2.13.6

Re: [PATCH] trace_uprobe: Display correct offset in uprobe_events

2018-01-07 Thread Ravi Bangoria



On 01/08/2018 10:49 AM, Srikar Dronamraju wrote:
> * Ravi Bangoria  [2018-01-06 11:12:46]:
>
>> Recently, how the pointers being printed with %p has been changed
>> by commit ad67b74d2469 ("printk: hash addresses printed with %p").
>> This is causing a regression while showing offset in the
>> uprobe_events file. Instead of %p, use %px to display offset.
>>
>> Signed-off-by: Ravi Bangoria 
>> ---
>>  kernel/trace/trace_uprobe.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>> index 40592e7b3568..268029ae1be6 100644
>> --- a/kernel/trace/trace_uprobe.c
>> +++ b/kernel/trace/trace_uprobe.c
>> @@ -608,7 +608,7 @@ static int probes_seq_show(struct seq_file *m, void *v)
>>
>>  /* Don't print "0x  (null)" when offset is 0 */
>>  if (tu->offset) {
>> -seq_printf(m, "0x%p", (void *)tu->offset);
>> +seq_printf(m, "0x%px", (void *)tu->offset);
>>  } else {
>>  switch (sizeof(void *)) {
>>  case 4:
> Looks good to me.  Did you consider %pK instead of %px?

Thanks Srikar,

Checked %pK. But I see same issue with that:

perf probe:
  Opening /sys/kernel/debug/tracing//uprobe_events write=1
  Writing event: p:probe_a/main /tmp/a.out:0x58c

cat /sys/kernel/debug/tracing/uprobe_events:
  p:probe_a/main /tmp/a.out:0x14fd571e

> Acked-by: Srikar Dronamraju

[PATCH 0/3] perf trace powerpc: Remove libaudit dependency for syscalls

2018-01-29 Thread Ravi Bangoria

This is almost identical set of patches recently done for s390.

With this, user can run perf trace without libaudit on powerpc
as well. Ex,

  $ make
... libaudit: [ OFF ]

  $ ./perf trace ls
0.221 ( 0.005 ms): ls/43330 open(filename: 0xac1e2778, flags: CLOEXEC   ) = 
3
0.227 ( 0.003 ms): ls/43330 read(fd: 3, buf: 0x39c4d678, count: 832 ) = 
832
0.233 ( 0.002 ms): ls/43330 fstat(fd: 3, statbuf: 0x39c4d4b0) = 0
...

  $ ./perf trace -e "open*" ls
0.000 ( 0.014 ms): ls/43342 open(filename: 0x793d8978, flags: CLOEXEC   ) = 
3
0.038 ( 0.006 ms): ls/43342 open(filename: 0x793f2778, flags: CLOEXEC   ) = 
3
    ...

Ravi Bangoria (3):
  tools include powerpc: Grab a copy of
arch/powerpc/include/uapi/asm/unistd.h
  perf powerpc: Generate system call table from asm/unistd.h
  perf trace powerpc: Use generated syscall table

 tools/arch/powerpc/include/uapi/asm/unistd.h   | 399 +
 tools/perf/Makefile.config |   2 +
 tools/perf/arch/powerpc/Makefile   |  21 ++
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  35 ++
 tools/perf/check-headers.sh|   1 +
 tools/perf/util/syscalltbl.c   |   4 +
 6 files changed, 462 insertions(+)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
 create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

-- 
1.8.3.1

[PATCH 2/3] perf powerpc: Generate system call table from asm/unistd.h

2018-01-29 Thread Ravi Bangoria

This should speed up accessing new system calls introduced with
the kernel rather than waiting for libaudit updates to include
them.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/arch/powerpc/Makefile   | 21 +
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  | 35 ++
 2 files changed, 56 insertions(+)
 create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 42dab7c..c93e8f4 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -6,3 +6,24 @@ endif
 HAVE_KVM_STAT_SUPPORT := 1
 PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
 PERF_HAVE_JITDUMP := 1
+
+#
+# Syscall table generation for perf
+#
+
+out:= $(OUTPUT)arch/powerpc/include/generated/asm
+header := $(out)/syscalls_64.c
+sysdef := $(srctree)/tools/arch/powerpc/include/uapi/asm/unistd.h
+sysprf := $(srctree)/tools/perf/arch/powerpc/entry/syscalls/
+systbl := $(sysprf)/mksyscalltbl
+
+# Create output directory if not already present
+_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
+
+$(header): $(sysdef) $(systbl)
+   $(Q)$(SHELL) '$(systbl)' '$(CC)' $(sysdef) > $@
+
+clean::
+   $(call QUIET_CLEAN, powerpc) $(RM) $(header)
+
+archheaders: $(header)
diff --git a/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl 
b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
new file mode 100755
index 000..975947c
--- /dev/null
+++ b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
@@ -0,0 +1,35 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Generate system call table for perf. Derived from
+# s390 script.
+#
+# Copyright IBM Corp. 2017
+# Author(s):  Hendrik Brueckner 
+# Changed by: Ravi Bangoria 
+
+gcc=$1
+input=$2
+
+if ! test -r $input; then
+   echo "Could not read input file" >&2
+   exit 1
+fi
+
+create_table()
+{
+   local max_nr
+
+   echo 'static const char *syscalltbl_powerpc_64[] = {'
+   while read sc nr; do
+   printf '\t[%d] = "%s",\n' $nr $sc
+   max_nr=$nr
+   done
+   echo '};'
+   echo "#define SYSCALLTBL_POWERPC_64_MAX_ID $max_nr"
+}
+
+$gcc -m64 -E -dM -x c  $input \
+   |sed -ne 's/^#define __NR_//p' \
+   |sort -t' ' -k2 -nu\
+   |create_table
-- 
1.8.3.1

[PATCH 1/3] tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h

2018-01-29 Thread Ravi Bangoria

Will be used for generating the syscall id/string translation table.

Signed-off-by: Ravi Bangoria 
---
 tools/arch/powerpc/include/uapi/asm/unistd.h | 399 +++
 tools/perf/check-headers.sh  |   1 +
 2 files changed, 400 insertions(+)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h

diff --git a/tools/arch/powerpc/include/uapi/asm/unistd.h 
b/tools/arch/powerpc/include/uapi/asm/unistd.h
new file mode 100644
index 000..df8684f3
--- /dev/null
+++ b/tools/arch/powerpc/include/uapi/asm/unistd.h
@@ -0,0 +1,399 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * This file contains the system call numbers.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _UAPI_ASM_POWERPC_UNISTD_H_
+#define _UAPI_ASM_POWERPC_UNISTD_H_
+
+
+#define __NR_restart_syscall 0
+#define __NR_exit1
+#define __NR_fork2
+#define __NR_read3
+#define __NR_write   4
+#define __NR_open5
+#define __NR_close   6
+#define __NR_waitpid 7
+#define __NR_creat   8
+#define __NR_link9
+#define __NR_unlink 10
+#define __NR_execve 11
+#define __NR_chdir  12
+#define __NR_time   13
+#define __NR_mknod  14
+#define __NR_chmod  15
+#define __NR_lchown 16
+#define __NR_break  17
+#define __NR_oldstat18
+#define __NR_lseek  19
+#define __NR_getpid 20
+#define __NR_mount  21
+#define __NR_umount 22
+#define __NR_setuid 23
+#define __NR_getuid 24
+#define __NR_stime  25
+#define __NR_ptrace 26
+#define __NR_alarm  27
+#define __NR_oldfstat   28
+#define __NR_pause  29
+#define __NR_utime  30
+#define __NR_stty   31
+#define __NR_gtty   32
+#define __NR_access 33
+#define __NR_nice   34
+#define __NR_ftime  35
+#define __NR_sync   36
+#define __NR_kill   37
+#define __NR_rename 38
+#define __NR_mkdir  39
+#define __NR_rmdir  40
+#define __NR_dup41
+#define __NR_pipe   42
+#define __NR_times  43
+#define __NR_prof   44
+#define __NR_brk45
+#define __NR_setgid 46
+#define __NR_getgid 47
+#define __NR_signal 48
+#define __NR_geteuid49
+#define __NR_getegid50
+#define __NR_acct   51
+#define __NR_umount252
+#define __NR_lock   53
+#define __NR_ioctl  54
+#define __NR_fcntl  55
+#define __NR_mpx56
+#define __NR_setpgid57
+#define __NR_ulimit 58
+#define __NR_oldolduname59
+#define __NR_umask  60
+#define __NR_chroot 61
+#define __NR_ustat  62
+#define __NR_dup2   63
+#define __NR_getppid64
+#define __NR_getpgrp65
+#define __NR_setsid 66
+#define __NR_sigaction  67
+#define __NR_sgetmask   68
+#define __NR_ssetmask   69
+#define __NR_setreuid   70
+#define __NR_setregid   71
+#define __NR_sigsuspend 72
+#define __NR_sigpending 73
+#define __NR_sethostname74
+#define __NR_setrlimit  75
+#define __NR_getrlimit  76
+#define __NR_getrusage  77
+#define __NR_gettimeofday   78
+#define __NR_settimeofday   79
+#define __NR_getgroups  80
+#define __NR_setgroups  81
+#define __NR_select 82
+#define __NR_symlink83
+#define __NR_oldlstat   84
+#define __NR_readlink   85
+#define __NR_uselib 86
+#define __NR_swapon 87
+#define __NR_reboot 88
+#define __NR_readdir89
+#define __NR_mmap   90
+#define __NR_munmap 91
+#define __NR_truncate   92
+#define __NR_ftruncate  93
+#define __NR_fchmod 94
+#define __NR_fchown 95
+#define __NR_getpriority96
+#define __NR_setpriority97
+#define __NR_profil 98
+#define __NR_statfs 99
+#define __NR_fstatfs   100
+#define __NR_ioperm101
+#define __NR_socketcall102
+#define __NR_syslog103
+#define __NR_setitimer 104
+#define __NR_getitimer 105
+#define __NR_stat  106
+#define __NR_lstat 107
+#define __NR_fstat 108
+#define

[PATCH 3/3] perf trace powerpc: Use generated syscall table

2018-01-29 Thread Ravi Bangoria

This should speed up accessing new system calls introduced with the
kernel rather than waiting for libaudit updates to include them.

It also enables users to specify wildcards, for example, perf trace -e
'open*', just like was already possible on x86 and s390.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/Makefile.config   | 2 ++
 tools/perf/util/syscalltbl.c | 4 
 2 files changed, 6 insertions(+)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 0dfdaa9..577a5d2 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -27,6 +27,8 @@ NO_SYSCALL_TABLE := 1
 # Additional ARCH settings for ppc
 ifeq ($(SRCARCH),powerpc)
   NO_PERF_REGS := 0
+  NO_SYSCALL_TABLE := 0
+  CFLAGS += -I$(OUTPUT)arch/powerpc/include/generated
   LIBUNWIND_LIBS := -lunwind -lunwind-ppc64
 endif
 
diff --git a/tools/perf/util/syscalltbl.c b/tools/perf/util/syscalltbl.c
index 303bdb8..b12c5f5 100644
--- a/tools/perf/util/syscalltbl.c
+++ b/tools/perf/util/syscalltbl.c
@@ -30,6 +30,10 @@
 #include 
 const int syscalltbl_native_max_id = SYSCALLTBL_S390_64_MAX_ID;
 static const char **syscalltbl_native = syscalltbl_s390_64;
+#elif defined(__powerpc64__)
+#include 
+const int syscalltbl_native_max_id = SYSCALLTBL_POWERPC_64_MAX_ID;
+static const char **syscalltbl_native = syscalltbl_powerpc_64;
 #endif
 
 struct syscall {
-- 
1.8.3.1

Re: [tip:perf/core] perf trace: Allow overriding global --max-stack per event

2018-01-29 Thread Ravi Bangoria

Hi Arnaldo,

This commit seems to be causing a regression:

  $ ./perf trace record -g ls


Perf compiled from acme/perf/core:

  $ ./perf trace -i perf.data
 0.200 ( 0.016 ms): ls/19722 brk(   
  
 0.367 ( 0.024 ms): ls/19722 access(filename: 0xa1438b70, mode: R   
  
 0.401 ( 0.019 ms): ls/19722 open(filename: 0xa1438978, flags: CLOEXEC  
  

   Missing calltraces ^


Distro perf:

  $ perf trace -i perf.data
 0.200 ( 0.016 ms): ls/19722 brk(   
  
   do_syscall_trace_leave 
([kernel.kallsyms])
   [0] ([unknown])
   syscall_exit_work ([kernel.kallsyms])
   brk (/usr/lib64/ld-2.17.so)
   _dl_sysdep_start (/usr/lib64/ld-2.17.so)
   _dl_start_final (/usr/lib64/ld-2.17.so)
   _dl_start (/usr/lib64/ld-2.17.so)
   _start (/usr/lib64/ld-2.17.so)
 0.367 ( 0.024 ms): ls/19722 access(filename: 0xa1438b70, mode: R   
  
   do_syscall_trace_leave 
([kernel.kallsyms])
   [0] ([unknown])
   syscall_exit_work ([kernel.kallsyms])
   access (/usr/lib64/ld-2.17.so)
   dl_main (/usr/lib64/ld-2.17.so)
   _dl_sysdep_start (/usr/lib64/ld-2.17.so)
   _dl_start_final (/usr/lib64/ld-2.17.so)
   _dl_start (/usr/lib64/ld-2.17.so)
   _start (/usr/lib64/ld-2.17.so)
 0.401 ( 0.019 ms): ls/19722 open(filename: 0xa1438978, flags: CLOEXEC  
  
   do_syscall_trace_leave 
([kernel.kallsyms])
   [0] ([unknown])
   syscall_exit_work ([kernel.kallsyms])
   open64 (/usr/lib64/ld-2.17.so)


Patch something like below should fix the issue ??

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 531d43b..d0ace22 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1642,9 +1642,12 @@ static int trace__resolve_callchain(struct trace *trace, 
struct perf_evsel *evse
struct callchain_cursor *cursor)
 {
struct addr_location al;
+   int max_stack = evsel->attr.sample_max_stack ?
+   evsel->attr.sample_max_stack:
+   trace->max_stack;
 
if (machine__resolve(trace->host, &al, sample) < 0 ||
-   thread__resolve_callchain(al.thread, cursor, evsel, sample, NULL, 
NULL, evsel->attr.sample_max_stack))
+   thread__resolve_callchain(al.thread, cursor, evsel, sample, NULL, 
NULL, max_stack))
return -1;
 
return 0;


Thanks,
Ravi


On 01/17/2018 10:04 PM, tip-bot for Arnaldo Carvalho de Melo wrote:
> Commit-ID:  bd3dda9ab0fbdb8a91a2e869d93a0c9692b8444f
> Gitweb: 
> https://git.kernel.org/tip/bd3dda9ab0fbdb8a91a2e869d93a0c9692b8444f
> Author: Arnaldo Carvalho de Melo 
> AuthorDate: Mon, 15 Jan 2018 11:33:53 -0300
> Committer:  Arnaldo Carvalho de Melo 
> CommitDate: Wed, 17 Jan 2018 10:23:33 -0300
>
> perf trace: Allow overriding global --max-stack per event
>
> The per-event max-stack setting wasn't overriding the global --max-stack
> setting:
>
>   # perf trace --no-syscalls --max-stack 4 -e 
> probe_libc:inet_pton/call-graph=dwarf,max-stack=2/ ping -6 -c 1 ::1
>   PING ::1(::1) 56 data bytes
>   64 bytes from ::1: icmp_seq=1 ttl=64 time=0.072 ms
>
>   --- ::1 ping statistics ---
>   1 packets transmitted, 1 received, 0% packet loss, time 0ms
>   rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
>0.000 probe_libc:inet_pton:(7feb7a998350))
>  __inet_pton (inlined)
>  gaih_inet.constprop.7 
> (/usr/lib64/libc-2.26.so)
>  __GI_getaddrinfo (inlined)
>  [0xaa39b6108f3f] (/usr/bin/ping)
>   #
>
> Fix it:
>
>   # perf trace --no-syscalls --max-stack 4 -e 
> probe_libc:inet_pton/call-graph=dwarf,max-stack=2/ ping -6 -c 1 ::1
>   PING ::1(::1) 56 data bytes
>   64 bytes from ::1: icmp_seq=1 ttl=64 time=0.073 ms
>
>   --- ::1 ping statistics ---
>   1 packets transmitted, 1 received, 0% packet loss, time 0ms
>   rtt min/avg/max/mdev = 0.073/0.073/0.073/0.000 ms
>0.000 probe_libc:inet_pton:(7f1083221350))
>  __inet_pton (inlined)
>

Re: [PATCH 2/2] trace_uprobe: Simplify probes_seq_show()

2018-03-06 Thread Ravi Bangoria



On 02/08/2018 09:13 AM, Ravi Bangoria wrote:
>
> On 02/08/2018 08:59 AM, Kees Cook wrote:
>> On Tue, Feb 6, 2018 at 8:34 PM, Ravi Bangoria
>>  wrote:
>>> Simplify probes_seq_show() function. We are using %lx to display
>>> the offset and we don't prepend unnecessary 0s in the offset.
>>>
>>> Signed-off-by: Ravi Bangoria 
>>> ---
>>>  kernel/trace/trace_uprobe.c | 21 +++--
>>>  1 file changed, 3 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>>> index c2c965398893..c12a3957e1ee 100644
>>> --- a/kernel/trace/trace_uprobe.c
>>> +++ b/kernel/trace/trace_uprobe.c
>>> @@ -602,24 +602,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
>>> char c = is_ret_probe(tu) ? 'r' : 'p';
>>> int i;
>>>
>>> -   seq_printf(m, "%c:%s/%s", c, tu->tp.call.class->system,
>>> -   trace_event_name(&tu->tp.call));
>>> -   seq_printf(m, " %s:", tu->filename);
>>> -
>>> -   /* Don't print "0x  (null)" when offset is 0 */
>>> -   if (tu->offset) {
>>> -   seq_printf(m, "0x%lx", tu->offset);
>>> -   } else {
>>> -   switch (sizeof(void *)) {
>>> -   case 4:
>>> -   seq_printf(m, "0x");
>>> -   break;
>>> -   case 8:
>>> -   default:
>>> -   seq_printf(m, "0x");
>>> -   break;
>>> -   }
>>> -   }
>>> +   seq_printf(m, "%c:%s/%s %s:0x%lx", c, tu->tp.call.class->system,
>>> +   trace_event_name(&tu->tp.call), tu->filename,
>>> +   tu->offset);
>> To keep the prepended zeros (and avoid the redundant 0x prefix):
>>
>> "...%#0*lx...", ... sizeof(void *) * 2, tu->offset);
>>
>> As in:
>>
>> +   seq_printf(m, "%c:%s/%s %s:%#0*lx", c, tu->tp.call.class->system,
>> +   trace_event_name(&tu->tp.call), tu->filename,
>> +   sizeof(void *) * 2, tu->offset);
> This is useful, thanks Kees.
>
> @Wang, Do we really need those 0s? Won't just 0x0 should
> suffice? Here is the sample output...
>
>   # echo "p:probe_a/main /tmp/a.out:0x0" > uprobe_events
>
> Before patch:
>   # cat uprobe_events
>     p:probe_a/main /tmp/a.out:0x
>
> After patch:
>   # cat uprobe_events
>     p:probe_a/main /tmp/a.out:0x0

Wang, ping :)

Kees, I don't hear back from Wang and no one has reported any issues with
the patches yet. Can I have your Acked-by?

Thanks,
Ravi

Re: [PATCH 2/2] trace_uprobe: Simplify probes_seq_show()

2018-03-06 Thread Ravi Bangoria

On 03/07/2018 03:34 AM, Kees Cook wrote:
> On Tue, Mar 6, 2018 at 12:12 AM, Ravi Bangoria
>  wrote:
>>
>> On 02/08/2018 09:13 AM, Ravi Bangoria wrote:
>>> Wang, ping :)
>>>
>>> Kees, I don't hear back from Wang and no one has reported any issues with
>>> the patches yet. Can I have your Acked-by?
> I didn't see a v2 of these patches with the output fixed?

Kees, I didn't send v2 because those 0s seems unnecessary to me.

Please let me know if you still wants to show them. Will re-spin :)

Thanks,
Ravi

PS: Changing Masami's email address in cc.

Re: [RFC 3/4] trace_uprobe: Support SDT markers having semaphore

2018-03-07 Thread Ravi Bangoria



On 03/06/2018 05:29 PM, Peter Zijlstra wrote:
> On Wed, Feb 28, 2018 at 01:23:44PM +0530, Ravi Bangoria wrote:
>> Userspace Statically Defined Tracepoints[1] are dtrace style markers
>> inside userspace applications. These markers are added by developer at
>> important places in the code. Each marker source expands to a single
>> nop instruction in the compiled code but there may be additional
>> overhead for computing the marker arguments which expands to couple of
>> instructions. If this computaion is quite more, execution of it can be
>> ommited by runtime if() condition when no one is tracing on the marker:
>>
>> if (semaphore > 0) {
>> Execute marker instructions;
>> }   
>>
>> Default value of semaphore is 0. Tracer has to increment the semaphore
>> before recording on a marker and decrement it at the end of tracing.
>>
>> Implement the semaphore flip logic in trace_uprobe, leaving core uprobe
>> infrastructure as is, except one new callback from uprobe_mmap() to
>> trace_uprobe.
> W.T.H. is that called a semaphore? afaict its just a usage-counter.

I totally agree with you. But it's not me who named it semaphore :)

Please refer to "Semaphore Handling" section at:
https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

We can surly name it differently in the kernel code and document it
properly in the Documents/tracing/

> There is no blocking, no releasing, nothing that would make it an actual
> semaphore.
>
> So please, remove all mention of semaphore from this code, because it,
> most emphatically, is not one.
>
> Also, would it not be much better to do userspace jump-labels for this?
> That completely avoids the dynamic branch at the SDT site.
>

Userspace jump-label is a good idea but...

Semaphore logic has already became a kinda ABI now. Tools like bcc,
gdb, systemtap etc. flip the semaphore while probing the marker.

Thanks,
Ravi

Re: [PATCH v6 21/21] perf-probe: Add array argument support

2018-03-22 Thread Ravi Bangoria

Hi Masami :)

On 03/22/2018 03:53 PM, Masami Hiramatsu wrote:
> On Mon, 19 Mar 2018 13:29:59 +0530
> Ravi Bangoria  wrote:
>
>>
>> Is it okay to allow user to specify array size with type field?
> Fro this patch, yes.

So IIUC, perf _tool_ will allow user to record array either with "name[range]"
or by "name:type[length]". Please correct me if that's wrong.

And If perf tool will allow array length along with TYPE field, I guess we 
should
document that in man perf-probe?

Otherwise,

Acked-by: Ravi Bangoria 

Thanks,
Ravi

>  The availability of type is checked only when
> it is automatically generated.
> IMO, it should be done in another patch, something like
> "Validate user specified type casting" patch. Would you need it?
>
> Thank you,
>

Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)

2018-04-09 Thread Ravi Bangoria

Hi Masami,

On 04/09/2018 12:58 PM, Masami Hiramatsu wrote:
> Hi Ravi,
>
> On Wed,  4 Apr 2018 14:01:10 +0530
> Ravi Bangoria  wrote:
>
>> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct 
>> probe_trace_event *tev)
>>  }
>>  
>>  /* Use the tp->address for uprobes */
>> -if (tev->uprobes)
>> +if (tev->uprobes) {
>>  err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
>> -else if (!strncmp(tp->symbol, "0x", 2))
>> +if (uprobe_ref_ctr_is_supported() &&
>> +tp->ref_ctr_offset &&
>> +err >= 0)
>> +err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
> If the kernel doesn't support uprobe_ref_ctr but the event requires
> to increment uprobe_ref_ctr, I think we should (at least) warn user here.

pr_debug("A semaphore is associated with %s:%s and seems your kernel doesn't 
support it.\n"
 tev->group, tev->event);

Looks good?

>> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct 
>> sdt_note *note,
>>  {
>>  struct strbuf buf;
>>  char *ret = NULL, **args;
>> -int i, args_count;
>> +int i, args_count, err;
>> +unsigned long long ref_ctr_offset;
>>  
>>  if (strbuf_init(&buf, 32) < 0)
>>  return NULL;
>>  
>> -if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> -sdtgrp, note->name, pathname,
>> -sdt_note__get_addr(note)) < 0)
>> +err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> +sdtgrp, note->name, pathname,
>> +sdt_note__get_addr(note));
>> +
>> +ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
>> +if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
>> +err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> We don't have to care about uprobe_ref_ctr support here, because
> this information will be just cached, not directly written to
> uprobe_events.

Sure, will remove the check.

Thanks for the review :).
Ravi

[PATCH 0/3] perf/buildid-cache: Add --list and --purge-all options

2018-04-09 Thread Ravi Bangoria

First patch is a trivial error message fix. Second and third
adds new options --list and --purge-all to 'buildid-cache'
subcommand.

Ravi Bangoria (3):
  tools/parse-options: Add '\n' at the end of error messages
  perf/buildid-cache: Support --list option
  perf/buildid-cache: Support --purge-all option

 tools/lib/subcmd/parse-options.c|  6 +-
 tools/perf/Documentation/perf-buildid-cache.txt |  7 ++-
 tools/perf/builtin-buildid-cache.c  | 75 -
 3 files changed, 81 insertions(+), 7 deletions(-)

-- 
2.14.3

[PATCH 1/3] tools/parse-options: Add '\n' at the end of error messages

2018-04-09 Thread Ravi Bangoria

Few error messages does not have '\n' at the end and thus next
prompt gets printed in the same line. Ex,

  linux~$ perf buildid-cache -verbose --add ./a.out
   Error: did you mean `--verbose` (with two dashes ?)linux~$

Fix it.

Signed-off-by: Ravi Bangoria 
---
 tools/lib/subcmd/parse-options.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/lib/subcmd/parse-options.c b/tools/lib/subcmd/parse-options.c
index f6a1babcbac4..cb7154eccbdc 100644
--- a/tools/lib/subcmd/parse-options.c
+++ b/tools/lib/subcmd/parse-options.c
@@ -433,7 +433,7 @@ static int parse_long_opt(struct parse_opt_ctx_t *p, const 
char *arg,
 
if (ambiguous_option) {
 fprintf(stderr,
-" Error: Ambiguous option: %s (could be --%s%s or 
--%s%s)",
+" Error: Ambiguous option: %s (could be --%s%s or 
--%s%s)\n",
 arg,
 (ambiguous_flags & OPT_UNSET) ?  "no-" : "",
 ambiguous_option->long_name,
@@ -458,7 +458,7 @@ static void check_typos(const char *arg, const struct 
option *options)
return;
 
if (strstarts(arg, "no-")) {
-   fprintf(stderr, " Error: did you mean `--%s` (with two dashes 
?)", arg);
+   fprintf(stderr, " Error: did you mean `--%s` (with two dashes 
?)\n", arg);
exit(129);
}
 
@@ -466,7 +466,7 @@ static void check_typos(const char *arg, const struct 
option *options)
if (!options->long_name)
continue;
if (strstarts(options->long_name, arg)) {
-   fprintf(stderr, " Error: did you mean `--%s` (with two 
dashes ?)", arg);
+   fprintf(stderr, " Error: did you mean `--%s` (with two 
dashes ?)\n", arg);
exit(129);
}
}
-- 
2.14.3

[PATCH 2/3] perf/buildid-cache: Support --list option

2018-04-09 Thread Ravi Bangoria

Perf buildid-cache allows to add/remove files into cache but there
is no option to list all cached files. Add --list option to list
all _valid_ cached files.

Ex,
  # perf buildid-cache --add /tmp/a.out
  # perf buildid-cache -l
/tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c)

Signed-off-by: Ravi Bangoria 
---
 tools/perf/Documentation/perf-buildid-cache.txt |  4 ++-
 tools/perf/builtin-buildid-cache.c  | 41 +++--
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-buildid-cache.txt 
b/tools/perf/Documentation/perf-buildid-cache.txt
index 73c2650bd0db..3f285ba6e1f9 100644
--- a/tools/perf/Documentation/perf-buildid-cache.txt
+++ b/tools/perf/Documentation/perf-buildid-cache.txt
@@ -59,7 +59,9 @@ OPTIONS
exactly same build-id, that is replaced by new one. It can be used
to update kallsyms and kernel dso to vmlinux in order to support
annotation.
-
+-l::
+--list::
+   List all valid binaries from cache.
 -v::
 --verbose::
Be more verbose.
diff --git a/tools/perf/builtin-buildid-cache.c 
b/tools/perf/builtin-buildid-cache.c
index 41db2cba77eb..50db05bd0cc6 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -25,6 +25,7 @@
 #include "util/session.h"
 #include "util/symbol.h"
 #include "util/time-utils.h"
+#include "util/probe-file.h"
 
 static int build_id_cache__kcore_buildid(const char *proc_dir, char *sbuildid)
 {
@@ -297,6 +298,25 @@ static int build_id_cache__update_file(const char 
*filename, struct nsinfo *nsi)
return err;
 }
 
+static void build_id_cache__show_all(void)
+{
+   struct strlist *bidlist;
+   struct str_node *nd;
+   char *buf;
+
+   bidlist = build_id_cache__list_all(true);
+   if (!bidlist) {
+   pr_debug("Failed to get buildids: -%d\n", errno);
+   return;
+   }
+   strlist__for_each_entry(nd, bidlist) {
+   buf = build_id_cache__origname(nd->s);
+   printf("%s (%s)\n", buf, nd->s);
+   free(buf);
+   }
+   strlist__delete(bidlist);
+}
+
 int cmd_buildid_cache(int argc, const char **argv)
 {
struct strlist *list;
@@ -304,6 +324,8 @@ int cmd_buildid_cache(int argc, const char **argv)
int ret = 0;
int ns_id = -1;
bool force = false;
+   bool list_files = false;
+   bool opts_flag = false;
char const *add_name_list_str = NULL,
   *remove_name_list_str = NULL,
   *purge_name_list_str = NULL,
@@ -327,6 +349,7 @@ int cmd_buildid_cache(int argc, const char **argv)
"file(s) to remove"),
OPT_STRING('p', "purge", &purge_name_list_str, "file list",
"file(s) to remove (remove old caches too)"),
+   OPT_BOOLEAN('l', "list", &list_files, "list all cached files"),
OPT_STRING('M', "missing", &missing_filename, "file",
   "to find missing build ids in the cache"),
OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
@@ -344,11 +367,18 @@ int cmd_buildid_cache(int argc, const char **argv)
argc = parse_options(argc, argv, buildid_cache_options,
 buildid_cache_usage, 0);
 
-   if (argc || (!add_name_list_str && !kcore_filename &&
-!remove_name_list_str && !purge_name_list_str &&
-!missing_filename && !update_name_list_str))
+   opts_flag = add_name_list_str || kcore_filename ||
+   remove_name_list_str || purge_name_list_str ||
+   missing_filename || update_name_list_str;
+
+   if (argc || !(list_files || opts_flag))
usage_with_options(buildid_cache_usage, buildid_cache_options);
 
+   /* -l is exclusive. It can not be used with other options. */
+   if (list_files && opts_flag)
+   usage_with_options_msg(buildid_cache_usage,
+   buildid_cache_options, "-l is exclusive.\n");
+
if (ns_id > 0)
nsi = nsinfo__new(ns_id);
 
@@ -366,6 +396,11 @@ int cmd_buildid_cache(int argc, const char **argv)
 
setup_pager();
 
+   if (list_files) {
+   build_id_cache__show_all();
+   goto out;
+   }
+
if (add_name_list_str) {
list = strlist__new(add_name_list_str, NULL);
if (list) {
-- 
2.14.3

[PATCH 3/3] perf/buildid-cache: Support --purge-all option

2018-04-09 Thread Ravi Bangoria

User can remove files from cache using --remove/--purge options
but both needs list of files as an argument. It's not convenient
when you want to flush out entire cache. Add an option to purge
all files from cache.

Ex,
  # perf buildid-cache -l
/tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c)
/tmp/a.out.1 (ebe71fdcf4b366518cc154d570a33cd461a51c36)
  # perf buildid-cache -P -v
Removing /tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c): Ok
Removing /tmp/a.out.1 (ebe71fdcf4b366518cc154d570a33cd461a51c36): Ok
Purged all: Ok

Signed-off-by: Ravi Bangoria 
---
 tools/perf/Documentation/perf-buildid-cache.txt |  3 +++
 tools/perf/builtin-buildid-cache.c  | 36 -
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-buildid-cache.txt 
b/tools/perf/Documentation/perf-buildid-cache.txt
index 3f285ba6e1f9..f6de0952ff3c 100644
--- a/tools/perf/Documentation/perf-buildid-cache.txt
+++ b/tools/perf/Documentation/perf-buildid-cache.txt
@@ -48,6 +48,9 @@ OPTIONS
 --purge=::
 Purge all cached binaries including older caches which have specified
path from the cache.
+-P::
+--purge-all::
+   Purge all cached binaries. This will flush out entire cache.
 -M::
 --missing=::
List missing build ids in the cache for the specified file.
diff --git a/tools/perf/builtin-buildid-cache.c 
b/tools/perf/builtin-buildid-cache.c
index 50db05bd0cc6..3b4373cabd49 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -240,6 +240,34 @@ static int build_id_cache__purge_path(const char 
*pathname, struct nsinfo *nsi)
return err;
 }
 
+static int build_id_cache__purge_all(void)
+{
+   struct strlist *list;
+   struct str_node *pos;
+   int err;
+   char *buf;
+
+   list = build_id_cache__list_all(false);
+   if (!list) {
+   pr_debug("Failed to get buildids: -%d\n", errno);
+   return -EINVAL;
+   }
+
+   strlist__for_each_entry(pos, list) {
+   buf = build_id_cache__origname(pos->s);
+   err = build_id_cache__remove_s(pos->s);
+   pr_debug("Removing %s (%s): %s\n", buf, pos->s,
+err ? "FAIL" : "Ok");
+   free(buf);
+   if (err)
+   break;
+   }
+   strlist__delete(list);
+
+   pr_debug("Purged all: %s\n", err ? "FAIL" : "Ok");
+   return err;
+}
+
 static bool dso__missing_buildid_cache(struct dso *dso, int parm 
__maybe_unused)
 {
char filename[PATH_MAX];
@@ -326,6 +354,7 @@ int cmd_buildid_cache(int argc, const char **argv)
bool force = false;
bool list_files = false;
bool opts_flag = false;
+   bool purge_all = false;
char const *add_name_list_str = NULL,
   *remove_name_list_str = NULL,
   *purge_name_list_str = NULL,
@@ -349,6 +378,7 @@ int cmd_buildid_cache(int argc, const char **argv)
"file(s) to remove"),
OPT_STRING('p', "purge", &purge_name_list_str, "file list",
"file(s) to remove (remove old caches too)"),
+   OPT_BOOLEAN('P', "purge-all", &purge_all, "purge all cached files"),
OPT_BOOLEAN('l', "list", &list_files, "list all cached files"),
OPT_STRING('M', "missing", &missing_filename, "file",
   "to find missing build ids in the cache"),
@@ -369,7 +399,8 @@ int cmd_buildid_cache(int argc, const char **argv)
 
opts_flag = add_name_list_str || kcore_filename ||
remove_name_list_str || purge_name_list_str ||
-   missing_filename || update_name_list_str;
+   missing_filename || update_name_list_str ||
+   purge_all;
 
if (argc || !(list_files || opts_flag))
usage_with_options(buildid_cache_usage, buildid_cache_options);
@@ -455,6 +486,9 @@ int cmd_buildid_cache(int argc, const char **argv)
}
}
 
+   if (purge_all)
+   ret = build_id_cache__purge_all();
+
if (missing_filename)
ret = build_id_cache__fprintf_missing(session, stdout);
 
-- 
2.14.3

Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-04-09 Thread Ravi Bangoria

Hi Oleg,

On 04/09/2018 06:47 PM, Oleg Nesterov wrote:
> On 04/04, Ravi Bangoria wrote:
>> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
>> +{
>> +struct mmu_notifier *mn;
>> +struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
>> +
>> +if (!sml)
>> +return;
>> +sml->mm = mm;
>> +list_add(&(sml->list), &(tu->sml.list));
>> +
>> +/* Register mmu_notifier for this mm. */
>> +mn = kzalloc(sizeof(*mn), GFP_KERNEL);
>> +if (!mn)
>> +return;
>> +
>> +mn->ops = &sdt_mmu_notifier_ops;
>> +__mmu_notifier_register(mn, mm);
>> +}
> I didn't read this version yet, just one question...
>
> So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in 
> Kconfig
> files, this doesn't look right...

Yes, you are write. I'll make CONFIG_UPROBE_EVENTS dependent on
CONFIG_MMU_NOTIFIER.

Thanks,
Ravi

Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-04-09 Thread Ravi Bangoria



On 04/09/2018 07:02 PM, Ravi Bangoria wrote:
> Hi Oleg,
>
> On 04/09/2018 06:47 PM, Oleg Nesterov wrote:
>> I didn't read this version yet, just one question...
>>
>> So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in 
>> Kconfig
>> files, this doesn't look right...
> Yes, you are write.

s/write/right.

Re: [PATCH v5 0/6] enable creating [k,u]probe with perf_event_open

2018-04-09 Thread Ravi Bangoria

Hi Song,

On 12/07/2017 04:15 AM, Song Liu wrote:
> With current kernel, user space tools can only create/destroy [k,u]probes
> with a text-based API (kprobe_events and uprobe_events in tracefs). This
> approach relies on user space to clean up the [k,u]probe after using them.
> However, this is not easy for user space to clean up properly.
>
> To solve this problem, we introduce a file descriptor based API.
> Specifically, we extended perf_event_open to create [k,u]probe, and attach
> this [k,u]probe to the file descriptor created by perf_event_open. These
> [k,u]probe are associated with this file descriptor, so they are not
> available in tracefs.

Sorry for being late. One simple question..

Will it be good to support k/uprobe arguments with perf_event_open()?
Do you have any plans about that?

Thanks,
Ravi

Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-04-10 Thread Ravi Bangoria

Hi Oleg,

On 04/09/2018 06:59 PM, Oleg Nesterov wrote:
> On 04/04, Ravi Bangoria wrote:
>> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
>> +{
>> +struct mmu_notifier *mn;
>> +struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
>> +
>> +if (!sml)
>> +return;
>> +sml->mm = mm;
>> +list_add(&(sml->list), &(tu->sml.list));
>> +
>> +/* Register mmu_notifier for this mm. */
>> +mn = kzalloc(sizeof(*mn), GFP_KERNEL);
>> +if (!mn)
>> +return;
>> +
>> +mn->ops = &sdt_mmu_notifier_ops;
>> +__mmu_notifier_register(mn, mm);
>> +}
> and what if __mmu_notifier_register() fails simply because signal_pending() 
> == T?
> see mm_take_all_locks().
>
> at first glance this all look suspicious and sub-optimal,

Yes. I should have added checks for failure cases.
Will fix them in v3.

Thanks for the review,
Ravi

>  but let me repeat that
> I didn't read this version yet.
>
> Oleg.
>

Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-04-10 Thread Ravi Bangoria

Hi Oleg,

On 04/10/2018 04:36 PM, Oleg Nesterov wrote:
> Hi Ravi,
>
> On 04/10, Ravi Bangoria wrote:
>>> and what if __mmu_notifier_register() fails simply because signal_pending() 
>>> == T?
>>> see mm_take_all_locks().
>>>
>>> at first glance this all look suspicious and sub-optimal,
>> Yes. I should have added checks for failure cases.
>> Will fix them in v3.
> And what can you do if it fails? Nothing except report the problem. But
> signal_pending() is not the unlikely or error condition, it should not
> cause the tracing errors.

...

> Plus mm_take_all_locks() is very heavy... BTW, uprobe_mmap_callback() is
> called unconditionally. Whatever it does, can we at least move it after
> the no_uprobe_events() check? Can't we also check MMF_HAS_UPROBES?

Sure, I'll move it after these conditions.

> Either way, I do not feel that mmu_notifier is the right tool... Did you
> consider the uprobe_clear_state() hook we already have?

Ah! This is really a good idea. We don't need mmu_notifier then.

Thanks for suggestion,
Ravi

Re: [PATCH v2 1/2] trace_uprobe: Use %lx to display offset

2018-04-10 Thread Ravi Bangoria

Hi Steve,

Can you please pull these patches.

Thanks,
Ravi

On 03/15/2018 01:57 PM, Ravi Bangoria wrote:
> tu->offset is unsigned long, not a pointer, thus %lx should
> be used to print it, not the %px.
>
> Fixes: 0e4d819d0893 ("trace_uprobe: Display correct offset in uprobe_events")
> Suggested-by: Kees Cook 
> Signed-off-by: Ravi Bangoria 
> ---
> v2 changelog:
>  - Keep prefixed 0s as is.
>
>  kernel/trace/trace_uprobe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 2014f4351ae0..0298bd15be83 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -608,7 +608,7 @@ static int probes_seq_show(struct seq_file *m, void *v)
>
>   /* Don't print "0x  (null)" when offset is 0 */
>   if (tu->offset) {
> - seq_printf(m, "0x%px", (void *)tu->offset);
> + seq_printf(m, "0x%0*lx", (int)(sizeof(void *) * 2), tu->offset);
>   } else {
>   switch (sizeof(void *)) {
>   case 4:

[PATCH 2/8] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()

2018-03-13 Thread Ravi Bangoria

No functionality changes.

Signed-off-by: Ravi Bangoria 
---
 include/linux/mm.h  |  4 ++--
 kernel/events/uprobes.c | 14 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 95909f2..d7ee526 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2275,13 +2275,13 @@ struct vm_unmapped_area_info {
 }
 
 static inline unsigned long
-offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+vma_offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
 {
return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 }
 
 static inline loff_t
-vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+vma_vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
 {
return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
 }
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bd6f230..535fd39 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -748,7 +748,7 @@ static inline struct map_info *free_map_info(struct 
map_info *info)
curr = info;
 
info->mm = vma->vm_mm;
-   info->vaddr = offset_to_vaddr(vma, offset);
+   info->vaddr = vma_offset_to_vaddr(vma, offset);
}
i_mmap_unlock_read(mapping);
 
@@ -807,7 +807,7 @@ static inline struct map_info *free_map_info(struct 
map_info *info)
goto unlock;
 
if (vma->vm_start > info->vaddr ||
-   vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
+   vma_vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
goto unlock;
 
if (is_register) {
@@ -977,7 +977,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct 
mm_struct *mm)
uprobe->offset >= offset + vma->vm_end - vma->vm_start)
continue;
 
-   vaddr = offset_to_vaddr(vma, uprobe->offset);
+   vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
err |= remove_breakpoint(uprobe, mm, vaddr);
}
up_read(&mm->mmap_sem);
@@ -1023,7 +1023,7 @@ static void build_probe_list(struct inode *inode,
struct uprobe *u;
 
INIT_LIST_HEAD(head);
-   min = vaddr_to_offset(vma, start);
+   min = vma_vaddr_to_offset(vma, start);
max = min + (end - start) - 1;
 
spin_lock(&uprobes_treelock);
@@ -1076,7 +1076,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
if (!fatal_signal_pending(current) &&
filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) {
-   unsigned long vaddr = offset_to_vaddr(vma, 
uprobe->offset);
+   unsigned long vaddr = vma_offset_to_vaddr(vma, 
uprobe->offset);
install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
}
put_uprobe(uprobe);
@@ -1095,7 +1095,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 
inode = file_inode(vma->vm_file);
 
-   min = vaddr_to_offset(vma, start);
+   min = vma_vaddr_to_offset(vma, start);
max = min + (end - start) - 1;
 
spin_lock(&uprobes_treelock);
@@ -1730,7 +1730,7 @@ static struct uprobe *find_active_uprobe(unsigned long 
bp_vaddr, int *is_swbp)
if (vma && vma->vm_start <= bp_vaddr) {
if (valid_vma(vma, false)) {
struct inode *inode = file_inode(vma->vm_file);
-   loff_t offset = vaddr_to_offset(vma, bp_vaddr);
+   loff_t offset = vma_vaddr_to_offset(vma, bp_vaddr);
 
uprobe = find_uprobe(inode, offset);
}
-- 
1.8.3.1

[PATCH 1/8] Uprobe: Export vaddr <-> offset conversion functions

2018-03-13 Thread Ravi Bangoria

No functionality changes.

Signed-off-by: Ravi Bangoria 
---
 include/linux/mm.h  | 12 
 kernel/events/uprobes.c | 10 --
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..95909f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2274,6 +2274,18 @@ struct vm_unmapped_area_info {
return unmapped_area(info);
 }
 
+static inline unsigned long
+offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+{
+   return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+}
+
+static inline loff_t
+vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+{
+   return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
+}
+
 /* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ce6848e..bd6f230 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -130,16 +130,6 @@ static bool valid_vma(struct vm_area_struct *vma, bool 
is_register)
return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
-static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
-{
-   return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-}
-
-static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
-{
-   return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
-}
-
 /**
  * __replace_page - replace page in vma by new page.
  * based on replace_page in mm/ksm.c
-- 
1.8.3.1

[PATCH 0/8] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-03-13 Thread Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}   

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed


Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
[3] https://lkml.org/lkml/2017/12/6/976


Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the 
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

RFC series can be found at:
  https://lkml.org/lkml/2018/2/28/76

Ravi Bangoria (8):
  Uprobe: Export vaddr <-> offset conversion functions
  mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
  Uprobe: Rename map_info to uprobe_map_info
  Uprobe: Export uprobe_map_info along with
uprobe_{build/free}_map_info()
  trace_uprobe: Support SDT markers having reference count (semaphore)
  trace_uprobe/sdt: Fix multiple update of same reference counter
  perf probe: Support SDT markers having reference counter (semaphore)
  trace_uprobe/sdt: Document about reference counter

 Documentation/trace/uprobetracer.txt |  16 +-
 include/linux/mm.h   |  12 ++
 include/linux/uprobes.h  |  11 ++
 kernel/events/uprobes.c  |  62 
 kernel/trace/trace.c |   2 +-
 kernel/trace/trace_uprobe.c  | 273 ++-
 tools/perf/util/probe-event.c|  21 ++-
 tools/perf/util/probe-event.h|   1 +
 tools/perf/util/probe-file.c |  22 ++-
 tools/perf/util/symbol-elf.c |  10 ++
 tools/perf/util/symbol.h |   1 +
 11 files changed, 382 insertions(+), 49 deletions(-)

-- 
1.8.3.1

[PATCH 4/8] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()

2018-03-13 Thread Ravi Bangoria

These exported data structure and functions will be used by other
files in later patches.

No functionality changes.

Signed-off-by: Ravi Bangoria 
---
 include/linux/uprobes.h |  9 +
 kernel/events/uprobes.c | 14 +++---
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..7bd2760 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -109,12 +109,19 @@ enum rp_check {
RP_CHECK_RET,
 };
 
+struct address_space;
 struct xol_area;
 
 struct uprobes_state {
struct xol_area *xol_area;
 };
 
+struct uprobe_map_info {
+   struct uprobe_map_info *next;
+   struct mm_struct *mm;
+   unsigned long vaddr;
+};
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned 
long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, 
unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -149,6 +156,8 @@ struct uprobes_state {
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 void *src, unsigned long len);
+extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info 
*info);
+extern struct uprobe_map_info *uprobe_build_map_info(struct address_space 
*mapping, loff_t offset, bool is_register);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 081b88c1..e7830b8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,23 +695,15 @@ static void delete_uprobe(struct uprobe *uprobe)
put_uprobe(uprobe);
 }
 
-struct uprobe_map_info {
-   struct uprobe_map_info *next;
-   struct mm_struct *mm;
-   unsigned long vaddr;
-};
-
-static inline struct uprobe_map_info *
-uprobe_free_map_info(struct uprobe_map_info *info)
+struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info)
 {
struct uprobe_map_info *next = info->next;
kfree(info);
return next;
 }
 
-static struct uprobe_map_info *
-uprobe_build_map_info(struct address_space *mapping, loff_t offset,
- bool is_register)
+struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping,
+ loff_t offset, bool is_register)
 {
unsigned long pgoff = offset >> PAGE_SHIFT;
struct vm_area_struct *vma;
-- 
1.8.3.1

[PATCH 3/8] Uprobe: Rename map_info to uprobe_map_info

2018-03-13 Thread Ravi Bangoria

map_info is very generic name, rename it to uprobe_map_info.
Renaming will help to export this structure outside of the
file.

Also rename free_map_info() to uprobe_free_map_info() and
build_map_info() to uprobe_build_map_info().

No functionality changes.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 535fd39..081b88c1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,27 +695,29 @@ static void delete_uprobe(struct uprobe *uprobe)
put_uprobe(uprobe);
 }
 
-struct map_info {
-   struct map_info *next;
+struct uprobe_map_info {
+   struct uprobe_map_info *next;
struct mm_struct *mm;
unsigned long vaddr;
 };
 
-static inline struct map_info *free_map_info(struct map_info *info)
+static inline struct uprobe_map_info *
+uprobe_free_map_info(struct uprobe_map_info *info)
 {
-   struct map_info *next = info->next;
+   struct uprobe_map_info *next = info->next;
kfree(info);
return next;
 }
 
-static struct map_info *
-build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
+static struct uprobe_map_info *
+uprobe_build_map_info(struct address_space *mapping, loff_t offset,
+ bool is_register)
 {
unsigned long pgoff = offset >> PAGE_SHIFT;
struct vm_area_struct *vma;
-   struct map_info *curr = NULL;
-   struct map_info *prev = NULL;
-   struct map_info *info;
+   struct uprobe_map_info *curr = NULL;
+   struct uprobe_map_info *prev = NULL;
+   struct uprobe_map_info *info;
int more = 0;
 
  again:
@@ -729,7 +731,7 @@ static inline struct map_info *free_map_info(struct 
map_info *info)
 * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion 
through
 * reclaim. This is optimistic, no harm done if it 
fails.
 */
-   prev = kmalloc(sizeof(struct map_info),
+   prev = kmalloc(sizeof(struct uprobe_map_info),
GFP_NOWAIT | __GFP_NOMEMALLOC | 
__GFP_NOWARN);
if (prev)
prev->next = NULL;
@@ -762,7 +764,7 @@ static inline struct map_info *free_map_info(struct 
map_info *info)
}
 
do {
-   info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
+   info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL);
if (!info) {
curr = ERR_PTR(-ENOMEM);
goto out;
@@ -774,7 +776,7 @@ static inline struct map_info *free_map_info(struct 
map_info *info)
goto again;
  out:
while (prev)
-   prev = free_map_info(prev);
+   prev = uprobe_free_map_info(prev);
return curr;
 }
 
@@ -782,11 +784,11 @@ static inline struct map_info *free_map_info(struct 
map_info *info)
 register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 {
bool is_register = !!new;
-   struct map_info *info;
+   struct uprobe_map_info *info;
int err = 0;
 
percpu_down_write(&dup_mmap_sem);
-   info = build_map_info(uprobe->inode->i_mapping,
+   info = uprobe_build_map_info(uprobe->inode->i_mapping,
uprobe->offset, is_register);
if (IS_ERR(info)) {
err = PTR_ERR(info);
@@ -825,7 +827,7 @@ static inline struct map_info *free_map_info(struct 
map_info *info)
up_write(&mm->mmap_sem);
  free:
mmput(mm);
-   info = free_map_info(info);
+   info = uprobe_free_map_info(info);
}
  out:
percpu_up_write(&dup_mmap_sem);
-- 
1.8.3.1

[PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-03-13 Thread Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
ommited by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in trace_uprobe, leaving core
uprobe infrastructure as is, except one new callback from uprobe_mmap()
to trace_uprobe.

trace_uprobe definition with reference counter will now be:

  :[(ref_ctr_offset)]

There are two different cases while enabling the marker,
 1. Trace existing process. In this case, find all suitable processes
and increment the reference counter in them.
 2. Enable trace before running target binary. In this case, all mmaps
will get notified to trace_uprobe and trace_uprobe will increment
the reference counter if corresponding uprobe is enabled.

At the time of disabling probes, decrement reference counter in all
existing target processes.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Signed-off-by: Fengguang Wu 
[Fengguang reported/fixed build failure in RFC patch]
---
 include/linux/uprobes.h |   2 +
 kernel/events/uprobes.c |   6 ++
 kernel/trace/trace_uprobe.c | 172 +++-
 3 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 7bd2760..2d4df65 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -122,6 +122,8 @@ struct uprobe_map_info {
unsigned long vaddr;
 };
 
+extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned 
long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, 
unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e7830b8..06821bb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1041,6 +1041,9 @@ static void build_probe_list(struct inode *inode,
spin_unlock(&uprobes_treelock);
 }
 
+/* Rightnow the only user of this is trace_uprobe. */
+void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1053,6 +1056,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
struct uprobe *uprobe, *u;
struct inode *inode;
 
+   if (uprobe_mmap_callback)
+   uprobe_mmap_callback(vma);
+
if (no_uprobe_events() || !valid_vma(vma, true))
return 0;
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2014f43..b6c9b48 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "trace_probe.h"
 
@@ -58,6 +60,7 @@ struct trace_uprobe {
struct inode*inode;
char*filename;
unsigned long   offset;
+   unsigned long   ref_ctr_offset;
unsigned long   nhit;
struct trace_probe  tp;
 };
@@ -362,10 +365,10 @@ static int create_trace_uprobe(int argc, char **argv)
 {
struct trace_uprobe *tu;
struct inode *inode;
-   char *arg, *event, *group, *filename;
+   char *arg, *event, *group, *filename, *rctr, *rctr_end;
char buf[MAX_EVENT_NAME_LEN];
struct path path;
-   unsigned long offset;
+   unsigned long offset, ref_ctr_offset;
bool is_delete, is_return;
int i, ret;
 
@@ -375,6 +378,7 @@ static int create_trace_uprobe(int argc, char **argv)
is_return = false;
event = NULL;
group = NULL;
+   ref_ctr_offset = 0;
 
/* argc must be >= 1 */
if (argv[0][0] == '-')
@@ -454,6 +458,26 @@ static int create_trace_uprobe(int argc, char **argv)
goto fail_address_

[PATCH 7/8] perf probe: Support SDT markers having reference counter (semaphore)

2018-03-13 Thread Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/probe-event.c | 21 +
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 22 +++---
 tools/perf/util/symbol-elf.c  | 10 ++
 tools/perf/util/symbol.h  |  1 +
 5 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index e1dbc98..2cbe68a 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1832,6 +1832,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2054,15 +2060,22 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   if (tp->ref_ctr_offset)
+   err = strbuf_addf(&buf, "%s:0x%lx(0x%lx)", tp->module,
+ tp->address, tp->ref_ctr_offset);
+   else
+   err = strbuf_addf(&buf, "%s:0x%lx", tp->module,
+ tp->address);
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f0..15a98c3 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4ae1123..08ba3a6 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -701,6 +701,12 @@ static unsigned long long sdt_note__get_addr(struct 
sdt_note *note)
 : (unsigned long long)note->addr.a64[0];
 }
 
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+   return note->bit32 ? (unsigned long long)note->addr.a32[2]
+   : (unsigned long long)note->addr.a64[2];
+}
+
 static const char * const type_to_suffix[] = {
":s64", "", "", "", ":s32", "", ":s16", ":s8",
"", ":u8", ":u16", "", ":u32", "", "", "", ":u64"
@@ -776,14 +782,24 @@ static char *synthesize_sdt_probe_command(struct sdt_note 
*note,
 {
struct strbuf buf;
char *ret = NULL, **args;
-   int i, args_count;
+   int i, args_count, err;
+   unsigned long long ref_ctr_offset;
 
if (strbuf_init(&buf, 32) < 0)
return NULL;
 
-   if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+   ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+
+   if (ref_ctr_offset)
+

[PATCH 6/8] trace_uprobe/sdt: Fix multiple update of same reference counter

2018-03-13 Thread Ravi Bangoria

For tiny binaries/libraries, different mmap regions points to the
same file portion. In such cases, we may increment reference counter
multiple times. But while de-registration, reference counter will get
decremented only by once leaving reference counter > 0 even if no one
is tracing on that marker.

Ensure increment and decrement happens in sync by keeping list of
mms in trace_uprobe. Increment reference counter only if mm is not
present in the list and decrement only if mm is present in the list.

Example

  # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events

Before patch:

  # perf stat -a -e sdt_tick:loop2
  # /tmp/tick
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 
2>/dev/null | xxd
   000: 02   .

  # pkill perf
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 
2>/dev/null | xxd
  000: 01   .

After patch:

  # perf stat -a -e sdt_tick:loop2
  # /tmp/tick
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 
2>/dev/null | xxd
  000: 01   .

  # pkill perf
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 
2>/dev/null | xxd
  000: 00   .

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 105 +++-
 1 file changed, 103 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index b6c9b48..9bf3f7a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -50,6 +50,11 @@ struct trace_uprobe_filter {
struct list_headperf_events;
 };
 
+struct sdt_mm_list {
+   struct mm_struct *mm;
+   struct sdt_mm_list *next;
+};
+
 /*
  * uprobe event core functions
  */
@@ -61,6 +66,8 @@ struct trace_uprobe {
char*filename;
unsigned long   offset;
unsigned long   ref_ctr_offset;
+   struct sdt_mm_list  *sml;
+   struct rw_semaphore sml_rw_sem;
unsigned long   nhit;
struct trace_probe  tp;
 };
@@ -274,6 +281,7 @@ static inline bool is_ret_probe(struct trace_uprobe *tu)
if (is_ret)
tu->consumer.ret_handler = uretprobe_dispatcher;
init_trace_uprobe_filter(&tu->filter);
+   init_rwsem(&tu->sml_rw_sem);
return tu;
 
 error:
@@ -921,6 +929,74 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, 
unsigned long func,
return trace_handle_return(s);
 }
 
+static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+   struct sdt_mm_list *tmp = tu->sml;
+
+   if (!tu->sml || !mm)
+   return false;
+
+   while (tmp) {
+   if (tmp->mm == mm)
+   return true;
+   tmp = tmp->next;
+   }
+
+   return false;
+}
+
+static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+   struct sdt_mm_list *tmp;
+
+   tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
+   if (!tmp)
+   return;
+
+   tmp->mm = mm;
+   tmp->next = tu->sml;
+   tu->sml = tmp;
+}
+
+static void sdt_del_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+   struct sdt_mm_list *prev, *curr;
+
+   if (!tu->sml)
+   return;
+
+   if (tu->sml->mm == mm) {
+   curr = tu->sml;
+   tu->sml = tu->sml->next;
+   kfree(curr);
+   return;
+   }
+
+   prev = tu->sml;
+   curr = tu->sml->next;
+   while (curr) {
+   if (curr->mm == mm) {
+   prev->next = curr->next;
+   kfree(curr);
+   return;
+   }
+   prev = curr;
+   curr = curr->next;
+   }
+}
+
+static void sdt_flush_mm_list(struct trace_uprobe *tu)
+{
+   struct sdt_mm_list *next, *curr = tu->sml;
+
+   while (curr) {
+   next = curr->next;
+   kfree(curr);
+   curr = next;
+   }
+   tu->sml = NULL;
+}
+
 static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct *vma)
 {
unsigned long vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
@@ -989,17 +1065,25 @@ static void sdt_increment_ref_ctr(struct trace_uprobe 
*tu)
if (IS_ERR(info))
goto out;
 
+   down_write(&tu->sml_rw_sem);
while (info) {
+   if (sdt_check_mm_list(tu, info->mm))
+   goto cont;
+
down_write(&info->mm->mmap_sem);
 
vma = sdt_find_vma(info->mm, tu);
vaddr = vma_offset_to_

[PATCH 8/8] trace_uprobe/sdt: Document about reference counter

2018-03-13 Thread Ravi Bangoria

No functionality changes.

Signed-off-by: Ravi Bangoria 
---
 Documentation/trace/uprobetracer.txt | 16 +---
 kernel/trace/trace.c |  2 +-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt 
b/Documentation/trace/uprobetracer.txt
index bf526a7c..8fb13b0 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -19,15 +19,25 @@ user to calculate the offset of the probepoint in the 
object.
 
 Synopsis of uprobe_tracer
 -
-  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT   : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  -:[GRP/]EVENT
+
+  p : Set a uprobe
+  r : Set a return uprobe (uretprobe)
+  - : Clear uprobe or uretprobe event
 
   GRP   : Group name. If omitted, "uprobes" is the default value.
   EVENT : Event name. If omitted, the event name is generated based
   on PATH+OFFSET.
   PATH  : Path to an executable or a library.
   OFFSET: Offset where the probe is inserted.
+  REF_CTR_OFFSET: Reference counter offset. Optional field. Reference count
+  gate the invocation of probe. If present, by default
+  reference count is 0. Kernel needs to increment it before
+  tracing the probe and decrement it when done. This is
+  identical to semaphore in Userspace Statically Defined
+  Tracepoints (USDT).
 
   FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 20a2300..2104d03 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4604,7 +4604,7 @@ static int tracing_trace_options_open(struct inode 
*inode, struct file *file)
   "place (kretprobe): [:][+]|\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-   "\tplace: :\n"
+  "   place (uprobe): :[(ref_ctr_offset)]\n"
 #endif
"\t args: =fetcharg[:type]\n"
"\t fetcharg: %, @, @[+|-],\n"
-- 
1.8.3.1

1 2 3 4 5 6 7 8 9 >

1 - 100 of 837 matches

Mail list logo