from:"Robert Richter"

Re: [PATCH 4/5] perf, amd: Enable northbridge performance counters on AMD family 15h

2012-11-27 Thread Robert Richter

One minor comment:

On 26.11.12 16:48:30, Jacob Shin wrote:
>  __init int amd_pmu_init(void)
>  {
>   /* Performance-monitoring supported from K7 and later: */
> @@ -666,6 +749,10 @@ __init int amd_pmu_init(void)
>   setup_event_constraints();
>   setup_perfctr_core();
>  
> + num_core_counters = x86_pmu.num_counters;

I would better move this to setup_perfctr_nb().

> +
> + setup_perfctr_nb();
> +
>   /* Events are common for all AMDs */
>   memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
>  sizeof(hw_cache_event_ids));

Otherwise the whole patch set looks good.

Acked-by: Robert Richter 

Thanks Jacob.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: i386, v3.6-rc3: Kernel panic - not syncing: Fatal exception in interrupt

2012-08-28 Thread Robert Richter

On 27.08.12 17:14:27, Robert Richter wrote:
> Steven and Ingo,
> 
> while running profiling tests on 32 bit I got a
> 
>  Kernel panic - not syncing: Fatal exception in interrupt

Seems not to be a regression, I could trigger it with v3.5 too.

The problem seems to be oprofile related, full trace below.

Thanks,

-Robert

BUG: unable to handle kernel NULL pointer dereference at 0040
IP: [] print_context_stack+0x6e/0x8d
*pde =  
Oops:  [#1] SMP 
Modules linked in:

Pid: 15531, comm: perl Not tainted 3.5.0-oprofile-i386-standard-g28a33cb #2 
Hewlett-Packard HP xw9400 Workstation/0A1Ch
EIP: 0060:[] EFLAGS: 00010097 CPU: 3
EIP is at print_context_stack+0x6e/0x8d
EAX: e000 EBX: 0040 ECX: f4bd1f94 EDX: 0040
ESI: f4bd1f94 EDI: f4bd1f94 EBP: f5517ec0 ESP: f5517ea0
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: 0040 CR3: 34403000 CR4: 07d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
Process perl (pid: 15531, ti=f5516000 task=f4f6bbc0 task.ti=f4bd)
Stack:
 03e8 e000 1ffc f4b8e380  0040 f4bd1f94 c1541178
 f5517ef0 c1003717 c1541178 f5517f04  f5517edc  
 f5517ee8 f4bd1f94 f5517fc4 0001 f5517f1c c12d3ba4  c1541178
Call Trace:
 [] dump_trace+0x7b/0xa1
 [] x86_backtrace+0x40/0x88
 [] ? oprofile_add_sample+0x56/0x84
 [] oprofile_add_sample+0x75/0x84
 [] op_amd_check_ctrs+0x46/0x260
 [] ? wake_up_worker+0x19/0x1b
 [] ? insert_work+0x58/0x5c
 [] profile_exceptions_notify+0x23/0x4c
 [] nmi_handle+0x31/0x4a
 [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
 [] do_nmi+0xa9/0x304
 [] ? run_timer_softirq+0x2a/0x1f5
 [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
 [] nmi_stack_correct+0x28/0x2d
 [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
 [] ? do_softirq+0x4b/0x7f
  
 [] irq_exit+0x35/0x5b
 [] smp_apic_timer_interrupt+0x6c/0x7a
 [] apic_timer_interrupt+0x2a/0x30
Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 
73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 
e8 9d 16 03 00 85 c0 8b 55 e0 75 a6 eb cc 
EIP: [] print_context_stack+0x6e/0x8d SS:ESP 0068:f5517ea0
CR2: 0040
---[ end trace 0cce5d2b7aa480ce ]---
Kernel panic - not syncing: Fatal exception in interrupt


-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] x86/oprofile: Fix the calltrace upon profiling some specified events with oprofile

2012-08-28 Thread Robert Richter

On 27.08.12 09:32:13, wei.y...@windriver.com wrote:
> From: Wei Yang 
> 
> Upon enabling the call-graph functionality of oprofile, A few minutes
> later the following calltrace will always occur.
> 
> BUG: unable to handle kernel paging request at 656d6153

This is probably the same I found to yesterday. Will test your fix.

-Robert

> IP: [] print_context_stack+0x55/0x110
> *pde = 
> Oops:  [#1] PREEMPT SMP
> Modules linked in:
> Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc3-WR5.0+snapshot-20120820_standard
> EIP: 0060:[] EFLAGS: 00010093 CPU: 0
> EIP is at print_context_stack+0x55/0x110
> EAX: 656d7ffc EBX: 656d6153 ECX: c1837ee0 EDX: 656d6153
> ESI:  EDI: e000 EBP: f600deac ESP: f600de88
> DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> CR0: 8005003b CR2: 656d6153 CR3: 01934000 CR4: 07d0
> DR0:  DR1:  DR2:  DR3: 
> DR6: 0ff0 DR7: 0400
> Process swapper/0 (pid: 0, ti=f600c000 task=c18411a0 task.ti=c1836000)
> Stack:
> 1a7f76ea 656d7ffc 656d6000 c1837ee0 e000 c1837ee0 656d6153 c188e27c
> 656d6000 f600dedc c10040f8 c188e27c f600def0  f600dec8 c1837ee0
>  f600ded4 c1837ee0 f600dfc4 001f f600df08 c1564d22 
> Call Trace:
> [] dump_trace+0x68/0xf0
> [] x86_backtrace+0xb2/0xc0
> [] oprofile_add_sample+0xa2/0xc0
> [] ? do_softirq+0x6f/0xa0
> [] ppro_check_ctrs+0x79/0x100
> [] ? ppro_shutdown+0x60/0x60
> [] profile_exceptions_notify+0x8f/0xb0
> [] nmi_handle.isra.0+0x48/0x70
> [] do_nmi+0xd3/0x3c0
> [] ? __local_bh_enable+0x29/0x70
> [] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
> [] nmi_stack_correct+0x28/0x2d
> [] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
> [] ? do_softirq+0x6f/0xa0
> 
> [] irq_exit+0x65/0x70
> [] smp_apic_timer_interrupt+0x59/0x89
> [] apic_timer_interrupt+0x2a/0x30
> [] ? acpi_idle_enter_bm+0x245/0x273
> [] cpuidle_enter+0x15/0x20
> [] cpuidle_idle_call+0xa0/0x320
> [] cpu_idle+0x55/0xb0
> [] rest_init+0x6c/0x74
> [] start_kernel+0x2ec/0x2f3
> [] ? repair_env_string+0x51/0x51
> [] i386_start_kernel+0x78/0x7d
> Code: e0 ff ff 89 7d ec 74 5a 8d b4 26 00 00 00 00 8d bc 27 00 00
> 00 00 39 f3 72 0c 8b 45 f0 8d 64 24 18 5b 5e 5f 5d c3 3b 5d ec 72
> ef <8b> 3b 89 f8 89 7d dc e8 ef 40 04 00 85 c0 74 20 8b 40
> EIP: [] print_context_stack+0x55/0x110 SS:ESP 0068:f600de88
> CR2: 656d6153
> ---[ end trace 751c9b47c6ff5e29 ]---
> 
> Let's assume a scenario that do_softirq() switches the stack to a soft irq
> stack, and the soft irq stack is totally empty. At this moment, a nmi 
> interrupt
> occurs, subsequently, CPU does not automatically save SS and SP registers
> and switch any stack, but instead only stores EFLAGS, CS and IP to the soft 
> irq
> stack. since the CPU is in kernel mode when the NMI exception occurs.
> the layout of the current soft irq stack is this:
> 
>   +--+<-the top of soft irq
>   |   EFLAGS |
>   +--+
>   |CS|
>   +--+
>   |IP|
>   +--+
>   |   SAVE_ALL   |
>   +--+
> 
> but the return value of the function kernel_stack_pointer() is'®s->sp'
> (for x86_32 CPU), which is invoked by the x86_backtrace function. Since
> the type of regs pointer is a pt_regs structure, the return value is not
> in the range of the soft irq stack, as the SP register is not save in the
> soft irq stack. Therefore, we need to check if the return value of the 
> function
> resides in valid range. Additionally, the changes has no impact on the normal
> NMI exception.
> 
> Signed-off-by: Yang Wei 
> ---
>  arch/x86/oprofile/backtrace.c |   10 ++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
> index d6aa6e8..a5fca0b 100644
> --- a/arch/x86/oprofile/backtrace.c
> +++ b/arch/x86/oprofile/backtrace.c
> @@ -17,6 +17,11 @@
>  #include 
>  #include 
>  
> +static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
> +{
> + void *t = tinfo;
> + return p > t + sizeof(struct thread_info) && p < t + THREAD_SIZE;
> +}
>  static int backtrace_stack(void *data, char *name)
>  {
>   /* Yes, we want all stacks */
> @@ -110,9 +115,14 @@ void
>  x86_backtrace(struct pt_regs * const regs, unsigned int depth)
>  {
>   struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
> + struct thread_info *context;
>  
>   if (!user_mode_vm(regs)) {
>   unsigned long stack = kernel_stack_pointer(regs);
> + context = (struct thread_info *)
> + (stack & (~(THREAD_SIZE - 1)));
> + if (!valid_stack_ptr(context, (void *)stack))
> + return;
>   if (depth)
>   dump_trace(NULL, regs, (unsigned long *)stack, 0,
>  &backtrace_ops, &depth);
> -- 
> 1.7.0.2
> 
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this

Re: [PATCH 1/1] x86/oprofile: Fix the calltrace upon profiling some specified events with oprofile

2012-09-04 Thread Robert Richter

; (~(THREAD_SIZE - 1)));

You derive the context from a potential wrong stack here.

> + if (!valid_stack_ptr(context, (void *)stack))

Thus, you basically test this:

if (t & (THREAD_SIZE - 1) < sizeof(struct thread_info))
...

> + return;
>   if (depth)
>   dump_trace(NULL, regs, (unsigned long *)stack, 0,
>  &backtrace_ops, &depth);
> -- 
> 1.7.0.2
> 
> 

Though this patch prevents access to an invalid stack (NULL pointer
access or page fault), I don't think this is the proper fix since it
does not fix the root cause that is an invalid stack pointer deliverd
by kernel_stack_pointer(). Also, a fix only in oprofile code does not
solve other uses of dump_trace()/kernel_stack_pointer().

So the proper fix I see is to fix kernel_stack_pointer() to return a
valid stack in case of an empty stack while in softirq. Something like
the patch below. Maybe it must be optimized a bit. I tested the patch
over night with no issues found. Please test it too.

Thanks,

-Robert


>From 8e7c16913b1fcfc63f7b24337551aacc7153c334 Mon Sep 17 00:00:00 2001
From: Robert Richter 
Date: Mon, 3 Sep 2012 20:54:48 +0200
Subject: [PATCH] x86, 32-bit: Fix invalid stack address while in softirq

In 32 bit the stack address provided by kernel_stack_pointer() may
point to an invalid range causing NULL pointer access or page faults
while in NMI (see trace below). This happens if called in softirq
context and if the stack is empty. The address at ®s->sp is then
out of range.

Fixing this by checking if regs and ®s->sp are in the same stack
context. Otherwise return the previous stack pointer stored in struct
thread_info.

 BUG: unable to handle kernel NULL pointer dereference at 000a
 IP: [] print_context_stack+0x6e/0x8d
 *pde = 
 Oops:  [#1] SMP
 Modules linked in:
 Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 
Hewlett-Packard HP xw9400 Workstation/0A1Ch
 EIP: 0060:[] EFLAGS: 00010093 CPU: 0
 EIP is at print_context_stack+0x6e/0x8d
 EAX: e000 EBX: 000a ECX: f4435f94 EDX: 000a
 ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 CR0: 8005003b CR2: 000a CR3: 34ac9000 CR4: 07d0
 DR0:  DR1:  DR2:  DR3: 
 DR6: 0ff0 DR7: 0400
 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000)
 Stack:
  03e8 e000 1ffc f4e39b00  000a f4435f94 c155198c
  f5409ef0 c1003723 c155198c f5409f04  f5409edc  
  f5409ee8 f4435f94 f5409fc4 0001 f5409f1c c12dce1c  c155198c
 Call Trace:
  [] dump_trace+0x7b/0xa1
  [] x86_backtrace+0x40/0x88
  [] ? oprofile_add_sample+0x56/0x84
  [] oprofile_add_sample+0x75/0x84
  [] op_amd_check_ctrs+0x46/0x260
  [] profile_exceptions_notify+0x23/0x4c
  [] nmi_handle+0x31/0x4a
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] do_nmi+0xa0/0x2ff
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] nmi_stack_correct+0x28/0x2d
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] ? do_softirq+0x4b/0x7f
  
  [] irq_exit+0x35/0x5b
  [] smp_apic_timer_interrupt+0x6c/0x7a
  [] apic_timer_interrupt+0x2a/0x30
 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 
73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 
e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc
 EIP: [] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0
 CR2: 000a
 ---[ end trace 62afee3481b00012 ]---
 Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Yang Wei 
Cc: sta...@vger.kernel.org
Signed-off-by: Robert Richter 
---
 arch/x86/include/asm/ptrace.h |   15 ---
 arch/x86/kernel/ptrace.c  |   21 +
 arch/x86/oprofile/backtrace.c |2 +-
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index dcfde52..19f16eb 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -205,21 +205,14 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
 }
 #endif
 
-/*
- * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
- * when it traps.  The previous stack will be directly underneath the saved
- * registers, and 'sp/ss' won't even have been saved. Thus the '®s->sp'.
- *
- * This is valid only for kernel mode traps.
- */
-static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
-{
 #ifdef CONFIG_X86_32
-   return (unsigned long)(®s->sp);
+extern unsigned long kernel_stack_pointer(struct pt_regs *regs);
 #else
+static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
+{
return regs->sp;
-#endif
 }
+#endif
 
 #define GET_IP(regs) ((regs)->ip)
 #define GET_FP(regs) ((regs)->bp

[GIT PULL] oprofile: fixes and updates

2012-09-04 Thread Robert Richter

Ingo,

one patch each for perf/urgent and perf/core, please pull from:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git urgent

and

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

Thanks,

-Robert


The following changes since commit fea7a08acb13524b47711625eebea40a0ede69a0:

  Linux 3.6-rc3 (2012-08-22 13:29:06 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

Robert Richter (2):
  oprofile, s390: Fix uninitialized memory access when writing to oprofilefs
  oprofile: Remove 'WQ on CPUx, prefer CPUy' warning

 arch/s390/oprofile/init.c |   10 +-
 drivers/oprofile/cpu_buffer.c |   11 +++
 2 files changed, 8 insertions(+), 13 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: i386, v3.6-rc3: Kernel panic - not syncing: Fatal exception in interrupt

2012-09-04 Thread Robert Richter

On 04.09.12 11:16:16, Steven Rostedt wrote:
> On Tue, 2012-08-28 at 10:12 +0200, Robert Richter wrote:

> > BUG: unable to handle kernel NULL pointer dereference at 0040
> > IP: [] print_context_stack+0x6e/0x8d
> > *pde =  
> > Oops:  [#1] SMP 
> > Modules linked in:
> > 
> > Pid: 15531, comm: perl Not tainted 3.5.0-oprofile-i386-standard-g28a33cb #2 
> > Hewlett-Packard HP xw9400 Workstation/0A1Ch
> > EIP: 0060:[] EFLAGS: 00010097 CPU: 3
> > EIP is at print_context_stack+0x6e/0x8d
> > EAX: e000 EBX: 0040 ECX: f4bd1f94 EDX: 0040
> > ESI: f4bd1f94 EDI: f4bd1f94 EBP: f5517ec0 ESP: f5517ea0
> >  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > CR0: 8005003b CR2: 0040 CR3: 34403000 CR4: 07d0
> > DR0:  DR1:  DR2:  DR3: 
> > DR6: 0ff0 DR7: 0400
> > Process perl (pid: 15531, ti=f5516000 task=f4f6bbc0 task.ti=f4bd)
> > Stack:
> >  03e8 e000 1ffc f4b8e380  0040 f4bd1f94 c1541178
> >  f5517ef0 c1003717 c1541178 f5517f04  f5517edc  
> >  f5517ee8 f4bd1f94 f5517fc4 0001 f5517f1c c12d3ba4  c1541178
> > Call Trace:
> >  [] dump_trace+0x7b/0xa1
> >  [] x86_backtrace+0x40/0x88
> >  [] ? oprofile_add_sample+0x56/0x84
> >  [] oprofile_add_sample+0x75/0x84
> >  [] op_amd_check_ctrs+0x46/0x260
> >  [] ? wake_up_worker+0x19/0x1b
> >  [] ? insert_work+0x58/0x5c
> >  [] profile_exceptions_notify+0x23/0x4c
> >  [] nmi_handle+0x31/0x4a
> >  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
> >  [] do_nmi+0xa9/0x304
> >  [] ? run_timer_softirq+0x2a/0x1f5
> >  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
> >  [] nmi_stack_correct+0x28/0x2d
> >  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
> >  [] ? do_softirq+0x4b/0x7f
> >   
> >  [] irq_exit+0x35/0x5b
> >  [] smp_apic_timer_interrupt+0x6c/0x7a
> >  [] apic_timer_interrupt+0x2a/0x30
> > Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 
> > 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 
> > 89 55 e0 e8 9d 16 03 00 85 c0 8b 55 e0 75 a6 eb cc 
> > EIP: [] print_context_stack+0x6e/0x8d SS:ESP 0068:f5517ea0
> > CR2: 0040
> > ---[ end trace 0cce5d2b7aa480ce ]---
> > Kernel panic - not syncing: Fatal exception in interrupt
> 
> Could you send me the config file and post the entire dmesg somewhere.

Wei found the root cause of this that is a wrong calcuation in
kernel_stack_pointer() in softirq context for i386. I sent a fix, see
my today's mail with subject:

 Re: [PATCH 1/1] x86/oprofile: Fix the calltrace upon profiling some specified 
events with oprofile.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] perf tool: give user better message if precise is not supported

2012-09-11 Thread Robert Richter

On 10.09.12 10:40:16, David Ahern wrote:
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -294,6 +294,11 @@ try_again:
> perf_evsel__name(pos));
>   rc = -err;
>   goto out;
> + } else if ((err == ENOTSUP) && (attr->precise_ip)) {

It is EOPNOTSUPP, did you test this?

> + ui__error("\'precise\' request not supported. "
> +   "Try removing 'p' modifier\n");

I would better print "... request may not be supported.", since you
don't know for sure if this is the real cause.

> + rc = -err;
> + goto out;
>   }
>  
>   printf("\n");
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 0513aaa..0d3653b 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -975,6 +975,10 @@ try_again:
>   ui__error("Too many events are opened.\n"
>   "Try again after reducing the 
> number of events\n");
>   goto out_err;
> + } else if ((err == ENOTSUP) && (attr->precise_ip)) {
> + ui__error("\'precise\' request not supported. "
> +   "Try removing 'p' modifier\n");

Same here.

> + goto out_err;

To avoid adding more duplicate code, maybe we should start to unify
the code by implementing this in a shared helper function.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] perf tool: give user better message if precise is not supported

2012-09-11 Thread Robert Richter

On 11.09.12 07:22:32, David Ahern wrote:
> On 9/11/12 3:20 AM, Robert Richter wrote:
> > On 10.09.12 10:40:16, David Ahern wrote:
> >> --- a/tools/perf/builtin-record.c
> >> +++ b/tools/perf/builtin-record.c
> >> @@ -294,6 +294,11 @@ try_again:
> >>  perf_evsel__name(pos));
> >>rc = -err;
> >>goto out;
> >> +  } else if ((err == ENOTSUP) && (attr->precise_ip)) {
> >
> > It is EOPNOTSUPP, did you test this?

Ok, wrong question. Better would have been: Did you run it on a
non-pebs Intel machine of an non-ibs AMD machine?

> I do not post patches without testing them. This particular patch was 
> verified in a Virtual Machine (no PEBS) and using :pG modifier.
> 
> 'egrep -r ENOTSUP tools/perf' shows hits in 3 other files, so I am not 
> the only one using the shortcut. I'll change it in the follow up with 
> better commit messages to make it consistent with patch 2.

For VM this might be valid. Don't know where ENOTSUP comes from. It is
neither in kernel/events/ nor arch/x86/kernel/cpu/perf.

If you run this bare-metal on older machines which do not support pebs
or ibs, the syscall returns EOPNOTSUPP. You can trigger the same
behaviour on newer systems with:

 # perf record -e cycles:ppp -c 2097120 -R -a sleep 1
 
   Error: sys_perf_event_open() syscall returned with 95 (Operation not 
supported).  /bin/dmesg may provide additional information.
 ...

It should work in this case too.

> > To avoid adding more duplicate code, maybe we should start to unify
> > the code by implementing this in a shared helper function.
> 
> Doing that requires additional modifications to not break perl and 
> python scripts. Adding it in both commands here is consistent with all 
> other open counter failures. Consolidation of those loops into a common 
> base is known to do item.

I am fine with that too.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] perf tool: give user better message if precise is not supported

2012-09-11 Thread Robert Richter

On 11.09.12 08:32:55, David Ahern wrote:
> My guess would be /usr/include/bits/errno.h:
> 
> /* Linux has no ENOTSUP error code.  */
> # define ENOTSUP EOPNOTSUPP

Ok, so ENOTSUP is actually the same as EOPNOTSUPP. Since the syscall
returns a EOPNOTSUPP, I prefer this when checking perf_event_open()
return codes. ENOTSUP is not used in the kernel. Was there a reason
for choosing ENOTSUP?

> > If you run this bare-metal on older machines which do not support pebs
> > or ibs, the syscall returns EOPNOTSUPP. You can trigger the same
> > behaviour on newer systems with:
> >
> >   # perf record -e cycles:ppp -c 2097120 -R -a sleep 1
> >
> > Error: sys_perf_event_open() syscall returned with 95 (Operation not 
> > supported).  /bin/dmesg may provide additional information.
> >   ...
> >
> > It should work in this case too.
> 
> The commit message was a copy and paste from the failure of both :p in a 
> VM (PEBS is not supported in a VM). I also ran the bare metal case with 
> :pG which per the second patch in this series generates the not 
> supported message.

Since the error codes are the same, your code should work also on
bare-metal. Can you test on a host using :ppp? This should trigger the
same error message as in a vm.

Thanks,

-Robert


-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf, ibs: Add sysfs support

2012-09-12 Thread Robert Richter

Add sysfs format entries for ibs pmus:

 # find /sys/bus/event_source/devices/ibs_*/format
 /sys/bus/event_source/devices/ibs_fetch/format
 /sys/bus/event_source/devices/ibs_fetch/format/rand_en
 /sys/bus/event_source/devices/ibs_op/format
 /sys/bus/event_source/devices/ibs_op/format/cnt_ctl

This allows to specify following ibs options:

 $ perf record -e ibs_fetch/rand_en=1/GH ...
 $ perf record -e ibs_op/cnt_ctl=1/GH ...

Option cnt_ctl is only enabled if the IBS_CAPS_OPCNT bit is set in ibs
cpuid feature flags (family 10h RevC and above).

Signed-off-by: Robert Richter 
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   38 +-
 1 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c 
b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index eebd5ff..4af8275 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -51,6 +51,11 @@ struct perf_ibs {
unsigned long   offset_mask[1];
int offset_max;
struct cpu_perf_ibs __percpu *pcpu;
+
+   struct attribute **format_attrs;
+   struct attribute_group format_group;
+   const struct attribute_group *attr_groups[2];
+
u64 (*get_count)(u64 config);
 };
 
@@ -446,6 +451,19 @@ static void perf_ibs_del(struct perf_event *event, int 
flags)
 
 static void perf_ibs_read(struct perf_event *event) { }
 
+PMU_FORMAT_ATTR(rand_en,   "config:57");
+PMU_FORMAT_ATTR(cnt_ctl,   "config:19");
+
+static struct attribute *ibs_fetch_format_attrs[] = {
+   &format_attr_rand_en.attr,
+   NULL,
+};
+
+static struct attribute *ibs_op_format_attrs[] = {
+   NULL,   /* &format_attr_cnt_ctl.attr if IBS_CAPS_OPCNT */
+   NULL,
+};
+
 static struct perf_ibs perf_ibs_fetch = {
.pmu = {
.task_ctx_nr= perf_invalid_context,
@@ -465,6 +483,7 @@ static struct perf_ibs perf_ibs_fetch = {
.max_period = IBS_FETCH_MAX_CNT << 4,
.offset_mask= { MSR_AMD64_IBSFETCH_REG_MASK },
.offset_max = MSR_AMD64_IBSFETCH_REG_COUNT,
+   .format_attrs   = ibs_fetch_format_attrs,
 
.get_count  = get_ibs_fetch_count,
 };
@@ -488,6 +507,7 @@ static struct perf_ibs perf_ibs_op = {
.max_period = IBS_OP_MAX_CNT << 4,
.offset_mask= { MSR_AMD64_IBSOP_REG_MASK },
.offset_max = MSR_AMD64_IBSOP_REG_COUNT,
+   .format_attrs   = ibs_op_format_attrs,
 
.get_count  = get_ibs_op_count,
 };
@@ -597,6 +617,16 @@ static __init int perf_ibs_pmu_init(struct perf_ibs 
*perf_ibs, char *name)
 
perf_ibs->pcpu = pcpu;
 
+   /* register attributes */
+   if (perf_ibs->format_attrs[0]) {
+   memset(&perf_ibs->format_group, 0, 
sizeof(perf_ibs->format_group));
+   perf_ibs->format_group.name = "format";
+   perf_ibs->format_group.attrs = perf_ibs->format_attrs;
+   memset(&perf_ibs->attr_groups, 0, 
sizeof(perf_ibs->attr_groups));
+   perf_ibs->attr_groups[0] = &perf_ibs->format_group;
+   perf_ibs->pmu.attr_groups = perf_ibs->attr_groups;
+   }
+
ret = perf_pmu_register(&perf_ibs->pmu, name, -1);
if (ret) {
perf_ibs->pcpu = NULL;
@@ -608,13 +638,19 @@ static __init int perf_ibs_pmu_init(struct perf_ibs 
*perf_ibs, char *name)
 
 static __init int perf_event_ibs_init(void)
 {
+   struct attribute **attr = ibs_op_format_attrs;
+
if (!ibs_caps)
return -ENODEV; /* ibs not supported by the cpu */
 
perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
-   if (ibs_caps & IBS_CAPS_OPCNT)
+
+   if (ibs_caps & IBS_CAPS_OPCNT) {
perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
+   *attr++ = &format_attr_cnt_ctl.attr;
+   }
perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+
register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
printk(KERN_INFO "perf: AMD IBS detected (0x%08x)\n", ibs_caps);
 
-- 
1.7.8.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf: Introduce function to report unsupported syscall attribute flags

2012-09-12 Thread Robert Richter

Often a syscall fails with

   Error: sys_perf_event_open() syscall returned with 95 (Operation not 
supported).  /bin/dmesg may provide additional information.

but no additional information.

This patch adds some macro magic and a helper function to check and
report unsupported syscall attribute flags:

 has_unsupported_flags(attr1, attr2);
 has_unsupported_flag(attr, flag);
 __has_unsupported_flags(const struct perf_event_attr *attr,
 u64 notsup);

E.g. dmesg reports the following (checking for the exclude_guest
flag):

 perf: unsupported attribute flags: 0010

or (checking precise modifier :ppp):

 perf: unsupported attribute flags: 00018000

Signed-off-by: Robert Richter 
---
 arch/x86/kernel/cpu/perf_event.c |   19 ++-
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |3 ++-
 include/linux/perf_event.h   |   23 +++
 kernel/events/core.c |2 +-
 4 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 915b876..9ae8676 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -336,7 +336,7 @@ int x86_setup_perfctr(struct perf_event *event)
return -EOPNOTSUPP;
 
/* BTS is currently only allowed for user-mode. */
-   if (!attr->exclude_kernel)
+   if (has_unsupported_flag(attr, exclude_kernel))
return -EOPNOTSUPP;
}
 
@@ -389,8 +389,11 @@ int x86_pmu_hw_config(struct perf_event *event)
precise++;
}
 
-   if (event->attr.precise_ip > precise)
+   if (event->attr.precise_ip > precise) {
+   has_unsupported_flag(&event->attr, precise_ip);
return -EOPNOTSUPP;
+   }
+
/*
 * check that PEBS LBR correction does not conflict with
 * whatever the user is asking with attr->branch_sample_type
@@ -398,13 +401,7 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip > 1) {
u64 *br_type = &event->attr.branch_sample_type;
 
-   if (has_branch_stack(event)) {
-   if (!precise_br_compat(event))
-   return -EOPNOTSUPP;
-
-   /* branch_sample_type is compatible */
-
-   } else {
+   if (!has_branch_stack(event)) {
/*
 * user did not specify  branch_sample_type
 *
@@ -419,7 +416,11 @@ int x86_pmu_hw_config(struct perf_event *event)
 
if (!event->attr.exclude_kernel)
*br_type |= PERF_SAMPLE_BRANCH_KERNEL;
+   } else if (!precise_br_compat(event)) {
+   has_unsupported_flag(&event->attr, precise_ip);
+   return -EOPNOTSUPP;
}
+   /* else: branch_sample_type is compatible */
}
}
 
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c 
b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 4af8275..fc145b0 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -186,6 +186,7 @@ static int perf_ibs_precise_event(struct perf_event *event, 
u64 *config)
case 2:
break;
default:
+   has_unsupported_flag(&event->attr, precise_ip);
return -EOPNOTSUPP;
}
 
@@ -243,7 +244,7 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != &perf_ibs->pmu)
return -ENOENT;
 
-   if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp))
+   if (has_unsupported_flags(&event->attr, &ibs_notsupp))
return -EINVAL;
 
if (config & ~perf_ibs->config_mask)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c36a04f..eb247a5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -305,6 +305,14 @@ struct perf_event_attr {
 };
 
 #define perf_flags(attr)   (*(&(attr)->read_format + 1))
+#define perf_flags_init(flag) ({   \
+   volatile __u64  flags = 0;  \
+   struct perf_event_attr *attr;   \
+   attr = (void *)((char *)&flags -\
+   (char *)&perf_flags((struct perf_event_attr *)(0))); \
+   if (!WARN_ON(&perf_flags(attr) != &a

Re: [PATCH] perf: Introduce function to report unsupported syscall attribute flags

2012-09-12 Thread Robert Richter

On 12.09.12 13:20:43, Peter Zijlstra wrote:
> On Wed, 2012-09-12 at 13:01 +0200, Robert Richter wrote:
> > +   if (notsup)
> > +   pr_warn("perf: unsupported attribute flags: %016llx\n", 
> > notsup); 
> 
> This is a dmesg DoS..

This could be avoided by introducing a cpuinfo like sysfs file for
each pmu:

 /sys/bus/event_source/devices/*/flags

Then, userspace should know how to correctly setup the syscall. All
supported attributes would be known to perf and messages would thrown
only if something goes unexpected wrong.

I suggested this already here:

 https://lkml.org/lkml/2012/8/3/214
 https://lkml.org/lkml/2012/9/6/472

> I'm also not sure dmesg is the right way.. could we not somehow change
> the attrs to provide better diagnostic?

I found this discussion without a solution for the problem:

 http://lwn.net/Articles/374794/

Other options could be pr_debug() or trace_printk()'s for debugging
purposes. Or an error report in sysfs:

 cat /sys/bus/event_source/devices/*/log

But the pmu type needs to be known for this.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -v2] x86, 32-bit: Fix invalid stack address while in softirq

2012-09-12 Thread Robert Richter

Updated version below. Changes are:

* add comments to kernel_stack_pointer()
* always return a valid stack address by falling back to the address
  of regs

-Robert


>From 0114d0e2ff6ce3f6015fd991541a45261f14eab1 Mon Sep 17 00:00:00 2001
From: Robert Richter 
Date: Mon, 3 Sep 2012 20:54:48 +0200
Subject: [PATCH] x86, 32-bit: Fix invalid stack address while in softirq

In 32 bit the stack address provided by kernel_stack_pointer() may
point to an invalid range causing NULL pointer access or page faults
while in NMI (see trace below). This happens if called in softirq
context and if the stack is empty. The address at ®s->sp is then
out of range.

Fixing this by checking if regs and ®s->sp are in the same stack
context. Otherwise return the previous stack pointer stored in struct
thread_info. If that address is invalid too, return address of regs.

 BUG: unable to handle kernel NULL pointer dereference at 000a
 IP: [] print_context_stack+0x6e/0x8d
 *pde = 
 Oops:  [#1] SMP
 Modules linked in:
 Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 
Hewlett-Packard HP xw9400 Workstation/0A1Ch
 EIP: 0060:[] EFLAGS: 00010093 CPU: 0
 EIP is at print_context_stack+0x6e/0x8d
 EAX: e000 EBX: 000a ECX: f4435f94 EDX: 000a
 ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 CR0: 8005003b CR2: 000a CR3: 34ac9000 CR4: 07d0
 DR0:  DR1:  DR2:  DR3: 
 DR6: 0ff0 DR7: 0400
 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000)
 Stack:
  03e8 e000 1ffc f4e39b00  000a f4435f94 c155198c
  f5409ef0 c1003723 c155198c f5409f04  f5409edc  
  f5409ee8 f4435f94 f5409fc4 0001 f5409f1c c12dce1c  c155198c
 Call Trace:
  [] dump_trace+0x7b/0xa1
  [] x86_backtrace+0x40/0x88
  [] ? oprofile_add_sample+0x56/0x84
  [] oprofile_add_sample+0x75/0x84
  [] op_amd_check_ctrs+0x46/0x260
  [] profile_exceptions_notify+0x23/0x4c
  [] nmi_handle+0x31/0x4a
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] do_nmi+0xa0/0x2ff
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] nmi_stack_correct+0x28/0x2d
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] ? do_softirq+0x4b/0x7f
  
  [] irq_exit+0x35/0x5b
  [] smp_apic_timer_interrupt+0x6c/0x7a
  [] apic_timer_interrupt+0x2a/0x30
 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 
73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 
e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc
 EIP: [] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0
 CR2: 000a
 ---[ end trace 62afee3481b00012 ]---
 Kernel panic - not syncing: Fatal exception in interrupt

V2:
* add comments to kernel_stack_pointer()
* always return a valid stack address by falling back to the address
  of regs

Reported-by: Yang Wei 
Cc: sta...@vger.kernel.org
Signed-off-by: Robert Richter 
---
 arch/x86/include/asm/ptrace.h |   15 ---
 arch/x86/kernel/ptrace.c  |   28 
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index dcfde52..19f16eb 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -205,21 +205,14 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
 }
 #endif
 
-/*
- * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
- * when it traps.  The previous stack will be directly underneath the saved
- * registers, and 'sp/ss' won't even have been saved. Thus the '®s->sp'.
- *
- * This is valid only for kernel mode traps.
- */
-static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
-{
 #ifdef CONFIG_X86_32
-   return (unsigned long)(®s->sp);
+extern unsigned long kernel_stack_pointer(struct pt_regs *regs);
 #else
+static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
+{
return regs->sp;
-#endif
 }
+#endif
 
 #define GET_IP(regs) ((regs)->ip)
 #define GET_FP(regs) ((regs)->bp)
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index c4c6a5c..947cf90 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -165,6 +165,34 @@ static inline bool invalid_selector(u16 value)
 
 #define FLAG_MASK  FLAG_MASK_32
 
+/*
+ * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
+ * when it traps.  The previous stack will be directly underneath the saved
+ * registers, and 'sp/ss' won't even have been saved. Thus the '®s->sp'.
+ *
+ * Now, if the stack is empty, '®s->sp' is out of range. In this
+ * case we try to take the previous stack. To always return a non-null
+ * stack pointer we fall back to regs as stack if no previous stack
+ * exists.

Re: [tip:perf/core] perf hists: Introduce perf_hpp for hist period printing

2012-09-12 Thread Robert Richter

On 09.09.12 01:54:39, tip-bot for Namhyung Kim wrote:
> Commit-ID:  ea251d51d2c7d7233790123227f787c477f567f5
> Gitweb: http://git.kernel.org/tip/ea251d51d2c7d7233790123227f787c477f567f5
> Author: Namhyung Kim 
> AuthorDate: Mon, 3 Sep 2012 11:53:06 +0900
> Committer:  Arnaldo Carvalho de Melo 
> CommitDate: Sat, 8 Sep 2012 13:19:44 -0300
> 
> perf hists: Introduce perf_hpp for hist period printing
> 
> Current hist print functions are messy because it has to consider many
> of command line options and the code doing that is scattered around to
> places. So when someone wants to add an option to manipulate the hist
> output it'd very easy to miss to update all of them in sync. And things
> getting worse as more options/features are added continuously.
> 
> So I'd like to refactor them using hpp formats and move common code to
> ui/hist.c in order to make it easy to maintain and to add new features.
> 
> Signed-off-by: Namhyung Kim 
> Cc: Ingo Molnar 
> Cc: Paul Mackerras 
> Cc: Peter Zijlstra 
> Link: 
> http://lkml.kernel.org/r/1346640790-17197-2-git-send-email-namhy...@kernel.org
> Signed-off-by: Arnaldo Carvalho de Melo 
> ---
>  tools/perf/Makefile|2 +
>  tools/perf/builtin-diff.c  |1 +
>  tools/perf/ui/hist.c   |  340 
> 
>  tools/perf/ui/setup.c  |8 +-
>  tools/perf/ui/stdio/hist.c |  238 ++-
>  tools/perf/util/hist.h |   37 +
>  6 files changed, 426 insertions(+), 200 deletions(-)

This patch breaks perf-record/report that the number of samples can't
be shown in pipe mode:

 # perf record -e cycles -aq sleep 1 ; perf report -n --sort comm,dso | sed 
'/%/q;d' ; \
   perf record -e cycles -aq sleep 1 | perf report -n --sort comm,dso | sed 
'/%/q;d'
 99.86%11804   swapper  [kernel.kallsyms]
 91.57%  swapper  [kernel.kallsyms]
   ^^
   number of samples missing

Moving and changing the code at the same time make the patch
unreviewable. So no clue that's the problem here.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] perf tool: precise mode requires exclude_guest

2012-09-12 Thread Robert Richter

On 12.09.12 09:16:28, David Ahern wrote:
> Summary of events per Peter:
> 
>   "Intel PEBS in VT-x context uses the DS address as a guest linear address,
>   even though its programmed by the host as a host linear address. This
>   either results in guest memory corruption and or the hardware faulting and
>   'crashing' the virtual machine.  Therefore we have to disable PEBS on VT-x
>   enter and re-enable on VT-x exit, enforcing a strict exclude_guest.
> 
>   AMB IBS does work but doesn't currently support exclude_* at all,
>   setting an exclude_* bit will make it fail."
> 
> This patch handles userspace perf command, setting the exclude_guest
> attribute if precise mode is requested, but only if a user has not
> specified a request for guest or host only profiling (G or H attribute).
> 
> Kernel side AMD currently ignores all exclude_* bits, so there is no impact
> to existing IBS code paths. Robert Richter has a patch where IBS code will
> return EINVAL if an exclude_* bit is set. When this goes in it means use
> of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
> will be set). Then the existing fallback code within perf will unset
> exclude_guest and try again. The second attempt will succeed if the CPU
> supports IBS profiling.
> 
> Signed-off-by: David Ahern 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Robert Richter 
> Cc: Gleb Natapov 
> Cc: Avi Kivity 
> Link: https://lkml.org/lkml/2012/7/9/264
> ---
>  tools/perf/util/parse-events.c |3 +++
>  1 file changed, 3 insertions(+)

Acked-by: Robert Richter 

I tested the patch set with AMD IBS and it works fine.

-Robert

> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 44afcf4..696cc7e 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -694,6 +694,9 @@ static int get_event_modifier(struct event_modifier *mod, 
> char *str,
>   eH = 0;
>   } else if (*str == 'p') {
>   precise++;
> + /* use of precise requires exclude_guest */
> + if (!exclude_GH)
> + eG = 1;
>   } else
>   break;
>  
> -- 
> 1.7.10.1
> 
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] perf tool: precise mode requires exclude_guest

2012-09-13 Thread Robert Richter

On 12.09.12 11:50:57, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 12, 2012 at 07:46:55PM +0200, Robert Richter escreveu:
> > On 12.09.12 09:16:28, David Ahern wrote:
> > > 
> > > Kernel side AMD currently ignores all exclude_* bits, so there is no 
> > > impact
> > > to existing IBS code paths. Robert Richter has a patch where IBS code will
> > > return EINVAL if an exclude_* bit is set. When this goes in it means use
> > > of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
> > > will be set). Then the existing fallback code within perf will unset
> > > exclude_guest and try again. The second attempt will succeed if the CPU
> > > supports IBS profiling.
> > > 
> > > Signed-off-by: David Ahern 
> > > Cc: Ingo Molnar 
> > > Cc: Peter Zijlstra 
> > > Cc: Robert Richter 
> > > Cc: Gleb Natapov 
> > > Cc: Avi Kivity 
> > > Link: https://lkml.org/lkml/2012/7/9/264
> > > ---
> > >  tools/perf/util/parse-events.c |3 +++
> > >  1 file changed, 3 insertions(+)
> > 
> > Acked-by: Robert Richter 
> > 
> > I tested the patch set with AMD IBS and it works fine.
> 
> Ok, I think that in this specific patchset I can add:
> 
> Tested-by: Robert Richter 
> Reviewed-by: Robert Richter 
> 
> Instead of an Acked-by, may I?

I am fine with this.

Thanks,

-Robert

> 
> - Arnaldo
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf record: Add missing perf_hpp__init for pipe-mode

2012-09-13 Thread Robert Richter

On 13.09.12 13:14:30, Namhyung Kim wrote:
> From: Namhyung Kim 
> 
> The perf_hpp__init() function was only called from setup_browser() so
> that the pipe-mode missed the initialization thus didn't respond to
> related options.  Fix it.
> 
> Reported-by: Robert Richter 
> Signed-off-by: Namhyung Kim 
> ---
>  tools/perf/builtin-report.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

This patch fixes it.

Thanks,

-Robert

> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 97b2e6300f4c..279155a47d1c 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -689,8 +689,10 @@ int cmd_report(int argc, const char **argv, const char 
> *prefix __maybe_unused)
>  
>   if (strcmp(report.input_name, "-") != 0)
>   setup_browser(true);
> - else
> + else {
>   use_browser = 0;
> + perf_hpp__init(false, false);
> + }
>  
>   /*
>* Only in the newt browser we are doing integrated annotation,
> -- 
> 1.7.11.4
> 
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf record: Change error message on failure

2012-09-13 Thread Robert Richter

Only report

 No CONFIG_PERF_EVENTS=y kernel support configured?

if the syscall fails with ENOSYS. In other cases CONFIG_PERF_EVENTS is
set and might confuse users. The default message is now:

 Not all events could be opened.

Signed-off-by: Robert Richter 
---
 tools/perf/builtin-record.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ae7c1c9..0290c91 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -305,19 +305,19 @@ try_again:
error("sys_perf_event_open() syscall returned with %d 
(%s).  /bin/dmesg may provide additional information.\n",
  err, strerror(err));
 
+   if (err == ENOSYS)
+   pr_err("No CONFIG_PERF_EVENTS=y kernel support 
configured?\n");
 #if defined(__i386__) || defined(__x86_64__)
-   if (attr->type == PERF_TYPE_HARDWARE &&
-   err == EOPNOTSUPP) {
+   else if (attr->type == PERF_TYPE_HARDWARE &&
+err == EOPNOTSUPP)
pr_err("No hardware sampling interrupt 
available."
   " No APIC? If so then you can boot the 
kernel"
   " with the \"lapic\" boot parameter to"
   " force-enable it.\n");
-   rc = -err;
-   goto out;
-   }
 #endif
+   else
+   pr_err("Not all events could be opened.\n");
 
-   pr_err("No CONFIG_PERF_EVENTS=y kernel support 
configured?\n");
rc = -err;
goto out;
}
-- 
1.7.8.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf record: Change error message on failure

2012-09-13 Thread Robert Richter

On 13.09.12 17:07:20, Ingo Molnar wrote:
> 
> * Robert Richter  wrote:
> 
> > Only report
> > 
> >  No CONFIG_PERF_EVENTS=y kernel support configured?
> > 
> > if the syscall fails with ENOSYS. In other cases CONFIG_PERF_EVENTS is
> > set and might confuse users. The default message is now:
> > 
> >  Not all events could be opened.
> 
> Indeed, and it would be nice to be even less passive-aggressive 
> and figure out and display the exact error condition?

The complete error message shows the error condition and is like the
following:

 # perf record -e cycles:ppk sleep 1
 
   Error: sys_perf_event_open() syscall returned with 22 (Invalid argument).  
/bin/dmesg may provide additional information.
 
 Not all events could be opened.
 sleep: Terminated

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/2] x86: apic: Export symbols for extended interrupt LVT functions

2008-02-13 Thread Robert Richter

Signed-off-by: Robert Richter <[EMAIL PROTECTED]>
---
 arch/x86/kernel/apic_64.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index d8d03e0..2a9f4bc 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -215,12 +215,14 @@ u8 setup_APIC_eilvt_mce(u8 vector, u8 msg_type, u8 mask)
setup_APIC_eilvt(APIC_EILVT_LVTOFF_MCE, vector, msg_type, mask);
return APIC_EILVT_LVTOFF_MCE;
 }
+EXPORT_SYMBOL(setup_APIC_eilvt_mce);
 
 u8 setup_APIC_eilvt_ibs(u8 vector, u8 msg_type, u8 mask)
 {
setup_APIC_eilvt(APIC_EILVT_LVTOFF_IBS, vector, msg_type, mask);
return APIC_EILVT_LVTOFF_IBS;
 }
+EXPORT_SYMBOL(setup_APIC_eilvt_ibs);
 
 /*
  * Program the next event, relative to now
-- 
1.5.1.6

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: [EMAIL PROTECTED]



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 0/2] x86: Add AMD Barcelona extended interrupt support for 32bit

2008-02-13 Thread Robert Richter

This patch series adds support for extended interrupts on AMD
Barcelona CPUs. Implementation is the same as it exists already for
64bit. Also EXPORT_SYMBOL macros are added to the 64bit code.

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: [EMAIL PROTECTED]



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/2] x86: apic: Extended interrupt LVT support for AMD Barcelona (32bit)

2008-02-13 Thread Robert Richter

Signed-off-by: Robert Richter <[EMAIL PROTECTED]>
---
 arch/x86/kernel/apic_32.c |   31 +++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index 35a568e..a6f9d25 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -621,6 +621,37 @@ int setup_profiling_timer(unsigned int multiplier)
 }
 
 /*
+ * Setup extended LVT, AMD specific (K8, family 10h)
+ *
+ * Vector mappings are hard coded. On K8 only offset 0 (APIC500) and
+ * MCE interrupts are supported. Thus MCE offset must be set to 0.
+ */
+
+#define APIC_EILVT_LVTOFF_MCE 0
+#define APIC_EILVT_LVTOFF_IBS 1
+
+static void setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask)
+{
+   unsigned long reg = (lvt_off << 4) + APIC_EILVT0;
+   unsigned int  v   = (mask << 16) | (msg_type << 8) | vector;
+   apic_write(reg, v);
+}
+
+u8 setup_APIC_eilvt_mce(u8 vector, u8 msg_type, u8 mask)
+{
+   setup_APIC_eilvt(APIC_EILVT_LVTOFF_MCE, vector, msg_type, mask);
+   return APIC_EILVT_LVTOFF_MCE;
+}
+EXPORT_SYMBOL(setup_APIC_eilvt_mce);
+
+u8 setup_APIC_eilvt_ibs(u8 vector, u8 msg_type, u8 mask)
+{
+   setup_APIC_eilvt(APIC_EILVT_LVTOFF_IBS, vector, msg_type, mask);
+   return APIC_EILVT_LVTOFF_IBS;
+}
+EXPORT_SYMBOL(setup_APIC_eilvt_ibs);
+
+/*
  * Local APIC start and shutdown
  */
 
-- 
1.5.1.6

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: [EMAIL PROTECTED]



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] x86: apic: Export symbols for extended interrupt LVT functions

2008-02-14 Thread Robert Richter

On 13.02.08 14:32:56, Arjan van de Ven wrote:
> On Wed, 13 Feb 2008 16:19:36 +0100 (CET)
> "Robert Richter" <[EMAIL PROTECTED]> wrote:
> 
> > Signed-off-by: Robert Richter <[EMAIL PROTECTED]>
> > ---
> >  arch/x86/kernel/apic_64.c |2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
> > index d8d03e0..2a9f4bc 100644
> > --- a/arch/x86/kernel/apic_64.c
> > +++ b/arch/x86/kernel/apic_64.c
> > @@ -215,12 +215,14 @@ u8 setup_APIC_eilvt_mce(u8 vector, u8 msg_type,
> > u8 mask) setup_APIC_eilvt(APIC_EILVT_LVTOFF_MCE, vector, msg_type,
> > mask); return APIC_EILVT_LVTOFF_MCE;
> >  }
> > +EXPORT_SYMBOL(setup_APIC_eilvt_mce);
> >  
> >  u8 setup_APIC_eilvt_ibs(u8 vector, u8 msg_type, u8 mask)
> >  {
> > setup_APIC_eilvt(APIC_EILVT_LVTOFF_IBS, vector, msg_type,
> > mask); return APIC_EILVT_LVTOFF_IBS;
> >  }
> > +EXPORT_SYMBOL(setup_APIC_eilvt_ibs);
> 
> which modules would even consider using any of these?
> Doesn't sound like something we should export..

For IBS it is Perfmon. See here:
http://git.kernel.org/?p=linux/kernel/git/eranian/linux-2.6.git;a=commit;h=7caef3e19d17349f869884f5adf7c9823e32ade7

MCE export has been added for consistency reasons to allow modules to
enable MCE.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: [EMAIL PROTECTED]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Sometimes, there is OOPS happened when we use oprofile.

2012-10-31 Thread Robert Richter

Jun,

On 29.10.12 02:33:54, Zhang, Jun wrote:
> Sometimes, there is OOPS happened when we use oprofile. next
> is the call stack. From call stack, we find in
> call_on_stack if there is a nmi interrupt between "xchgl
> %%ebx,%%esp" and "call *%%edi", system will OOPS.

this should be related and fixed with:

 https://lkml.org/lkml/2012/9/12/269

Ingo, HPA,

please apply the fix of kernel_stack_pointer().

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Sometimes, there is OOPS happened when we use oprofile.

2012-10-31 Thread Robert Richter

On 31.10.12 14:33:17, H. Peter Anvin wrote:
> I'm vaguely concerned about the following:
> 
> + * To always return a non-null
> + * stack pointer we fall back to regs as stack if no previous stack
> + * exists.
> 
> The logic being that if there is no stack pointer and the stack is
> too empty, to simply assume regs point to the top of the stack?  Is
> this possible to ever be actually seen?

I discussed this with Steven too (https://lkml.org/lkml/2012/9/6/322)
and we both had a bad feeling with returning a null pointer by
kernel_stack_pointer() (implemented in version 1 of this patch). It
could be null if tinfo->previous_esp is null (last stack). Not sure
when this may happen.

So using regs as fallback seemed to be ok as this was in for years:

 7b6c6c7 x86, 32-bit: fix kernel_trap_sp()

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/11] perf tool: precise mode requires exclude_guest

2012-07-26 Thread Robert Richter

On 26.07.12 10:07:37, Peter Zijlstra wrote:
> On Thu, 2012-07-26 at 08:50 +0300, Gleb Natapov wrote:
> > The commit is 144d31e6, but it introduces hook that is used on VMX only.
> > SVM does not need it to implement guest/host only counters since it
> > has HW support for that in the PMU.
> 
> Right, seems that has changed now that we support IBS.

Will look at this.

-Robert


-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 08/12] perf, persistent: Exposing persistent events using sysfs

2013-08-27 Thread Robert Richter

On 23.08.13 12:39:53, Vince Weaver wrote:
> On Fri, 23 Aug 2013, Robert Richter wrote:
> Make no assumptions when documenting.  When I as a user have to dig around
> the kernel source tree to find out what is going on then the documentation 
> is lacking.
> 
> > > Is this endian clean?  Will attr5:23 point to the same bit on a 
> > > big-endian 
> > > machine as on little-endian?
> > 
> > It is the endianness used in the syscall. Handled in the same way as
> > for the config fields. I don't see where this could be an issue.
> 
> C bitfields go opposite ways on big-endian vs little-endian systems.
> This has come up with some of the bitfields in the sample buffers.
> It doesn't matter if you just use the bitfields, but if you're trying
> to poke single bits into opaque 64-bit blobs it might be an issue.

Ok, let's improve documentation, how about the patch below?

> > The format directories in /sys/bus/event_source/devices/*/format/ are
> > already there to make it human readable. A user never has to deal
> > directly with syntax provided there and may use already the
> > abstractions for the event syntax.
> 
> The format directory for now was only for the "config" fields which
> traditionally were needed to specify an event, that is all.
> 
> Things get a lot more complex if arbitrary subsets of the the perf_attr
> structure start getting exported.

I don't see anything special with config fields, why not expand the
same functionality to other fields in perf_attr too? Your example for
precise_ip shows this could be useful.

And again, a while ago there was the decision to use sysfs to specify
events (I didn't like the approach too). Now, we must be able to setup
*any* kind of event in that way, not just that ones that can be setup
only by using config fields.

> > The problem with this is that you have to implement this in the event
> > parser of perf tools. Thus, you need to update the parser for any
> > other syntax you want to use. This is not necessary with my
> > implementation. It already provides the above. The pmu driver just
> > need to add the sysfs entry.
> 
> I see.  Are you going to update the parsers for programs that
> also try to read these values, like trinity?  
> 
> Or is the perf tool special because it is in the kernel?

First of all, the change is backward compatible for any tool out. If
not implemented, the parser throws an error and the event with the new
format can not be setup via sysfs. If other tools do not use such
events, no need to update anything.

The event parser became a bit complex already, thus it might be useful
to split the code and put it in some library e.g. liblk or so. There
are plans to do this.

-Robert



diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
index 77f47ff..d8f348f 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
@@ -1,20 +1,50 @@
-Where: /sys/bus/event_source/devices//format
+Where: /sys/bus/event_source/devices//format/
 Date:  January 2012
-Kernel Version: 3.3
+Kernel Version:3.3
+   3.xx(added attr:)
 Contact:   Jiri Olsa 
-Description:
-   Attribute group to describe the magic bits that go into
-   perf_event_attr::config[012] for a particular pmu.
-   Each attribute of this group defines the 'hardware' bitmask
-   we want to export, so that userspace can deal with sane
-   name/value pairs.
-
-   Userspace must be prepared for the possibility that attributes
+
+Description:   Define formats for bit ranges in perf_event_attr
+
+   Specify a format to describe the magic bits that go
+   into struct perf_event_attr for a particular pmu.
+   Userspace can deal then with sane name/value pairs.
+
+   Bit range may be any bit mask of an u64 value (bits 0
+   to 63). The bit range may vary depending on the
+   system's endianess if the underlying field is an u32,
+   for example the format is for bp_type:
+
+   attr7:32-63 ... little endian
+   attr7:0-31  ... big endian
+
+   Userspace must be prepared for the possibility that formats
define overlapping bit ranges. For example:
-   attr1 = 'config:0-23'
-   attr2 = 'config:0-7'
-   attr3 = 'config:12-35'
 
-   Example: 'config1:1,6-10,44'
-   Defines contents of attribute that occupies

Re: [PATCH v3 12/12] [RFC] perf, persistent: ioctl functions to control persistency

2013-08-27 Thread Robert Richter

On 23.08.13 17:08:11, Vince Weaver wrote:
> On Fri, 23 Aug 2013, Borislav Petkov wrote:
> 
> > Maybe this makes it more understandable for you but this is beside the
> > point.
> 
> Understandability doesn't matter?

I agree with Vince on this, the naming should be intuitive. Esp. since
'attach' is used in tracing in different context, we should avoid
using the term here for less confusion.

I got another idea for this, what about UNCLAIM and CLAIM? It is
exactly, what it is. A process unclaims an event telling it doesn't
care anymore.  Another process comes and claims the event, meaning the
process wants the event no longer to be shared with others and being
released after closing.

> > But I have to say the reversed thing above does sound confusing, now
> > that I'm looking at the code. Actually, at the time we discussed this,
> > my idea was to do it like this:
> > 
> > 1. we open a perf event and get its file descriptor
> > 2. ioctl ATTACH to it so that it is attached to the process.
> > 
> > ... do some tracing and collecting and fiddling...

You don't need to attach to a persistent event, you can just use the
event with perf_event_open, mmap, close.

> > 3. ioctl DETACH from it so that it is "forked in the background" so to
> > speak, very similar to a background job in the shell.

With 'detach' we move the event into the 'background' so that it is
still available after opening.

> Would it make sense to actually fork a kernel thread that "owns" the 
> event?  

There is no need for a kernel thread, there is nothing to do. We just
increase the refcount so that the event is not destroyed.

> The way it is now events can "get loose" if either the user
> forgets about them or the tool that opened them crashes, and it's
> impossible to kill these events with normal tools.  You possibly
> wouldn't even know one was running (except you'd have one fewer
> counter to work with) unless you poked around under /sys.

As boris said, there could be some sort of 'kill' tool for events.
That's what we need sysfs for, it tells which events are running.

> > 4. The rest of the code continues and deallocates the event *BUT* (and
> > this is the key thing!) the counter/tracepoint remains operational in
> > the kernel, running all the time.

The event remains operational esp. after closing it or killing/
terminating the process owning it.

> > 5. Now, after a certain point, you come back and ioctl ATTACH to this
> > already opened event and read/collect its buffers again.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 12/12] [RFC] perf, persistent: ioctl functions to control persistency

2013-08-27 Thread Robert Richter

> > On Thu, 22 Aug 2013, Robert Richter wrote:
> > > This is for Linux man-pages:

Updated description below.

-Robert

Author: Robert Richter 
Date:   Tue Aug 13 11:22:22 2013 +0200

[RFC] perf, persistent: ioctl functions to control persistency

Implementing ioctl functions to control persistent events. There are
functions to detach or attach an event to or from a process. The
PERF_EVENT_IOC_DETACH ioctl call makes an event persistent. After
closing the event's fd it runs then in the background of the system
without the need of a controlling process. The perf_event_open()
syscall can be used to reopen the event by any process. The
PERF_EVENT_IOC_ATTACH ioctl attaches the event again so that it is
removed after closing the event's fd.

This is for Linux man-pages:

type ...

PERF_TYPE_PERSISTENT (Since Linux 3.xx)

Open a persistent event that is already running in the
background of the system. There is a unique identifier for
each persistent event that needs to be specified in the
event's attribute config field.  Persistent events are
listed under:

  /sys/bus/event_source/devices/persistent/

See PERF_EVENT_IOC_DETACH how to create a persistent
event. The instance creating such an event should also be
responsible for removing it.

...
persistent : 41, /* always-on event */
...

persistent: (Since Linux 3.xx)

Put event into persistent state after opening. After closing
the event's fd the event is persistent in the system and
continues to run.

perf_event ioctl calls

PERF_EVENT_IOC_DETACH (Since Linux 3.xx)

Any event that was opened with the perf_event_open()
syscall may become a persistent event. This is done by
detaching the event from the controlling process that
holds the event's file descriptor. This ioctl can be used
for doing this. After detaching it, the event is
persistent in the system. An unique identifier for the
persistent event is returned or an error otherwise. After
closing the fd the event will continue to run. The
following allows to connect to the event again:

pe.type = PERF_TYPE_PERSISTENT;
pe.config = ;
...
fd = perf_event_open(...);

The event must be reopened on the same cpu.

PERF_EVENT_IOC_ATTACH (Since Linux 3.xx)

Attach the event specified by the file descriptor to the
current process. The event is no longer persistent in the
system and will be removed after all users disconnected
from the event. Thus, if there are no other users the
event will be closed too after closing its file
descriptor, the event then no longer exists.

    Cc: Vince Weaver 
Signed-off-by: Robert Richter 
Signed-off-by: Robert Richter 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/12] perf, persistent: Add persistent events

2013-08-27 Thread Robert Richter

On 24.08.13 11:38:26, Borislav Petkov wrote:
> this text is very nicely written and should be in a README somewhere.
> :-)

Yeah, will put it under tools/perf/Documentation/...

> > There are ioctl functions to control persistent events that can be
> > used to detach or attach an event to or from a process. The
> > PERF_EVENT_IOC_DETACH ioctl call makes an event persistent.
> 
> Yeah, we probably want to abstract this a step further by allowing
> to attach/detach fds to/from events. Then "persistent" is only one
> incarnation of us always detaching from the event during its lifetime.
> 
> If we close an event while it is attached, it gets destroyed - i.e.,
> current functionality, etc. See the other thread.

I don't know what you mean here exactly, please explain.

> But we probably need more input on this from people. Ingo, Peter?

Yes, that would be great.

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 12/12] [RFC] perf, persistent: ioctl functions to control persistency

2013-08-27 Thread Robert Richter

Ok, starting with your question at the end first since it might
explain the following better.

If the event is no longer persistent and if there are multiple users,
the event remains open and running until the all mmap's are munmap'ed
and the last fd is closed. After that it is released. This is done
with refcounts and already implemented.

On 27.08.13 14:22:42, Borislav Petkov wrote:
> On Tue, Aug 27, 2013 at 01:54:22PM +0200, Robert Richter wrote:
> > I got another idea for this, what about UNCLAIM and CLAIM? It is
> > exactly, what it is. A process unclaims an event telling it doesn't
> > care anymore.  Another process comes and claims the event, meaning the
> > process wants the event no longer to be shared with others and being
> > released after closing.
> 
> This still doesn't pan out because with claiming the event, you state
> that the event is *owned* by this process but with persistent events
> we want to be able to state that they can have multiple users and thus
> multiple buffer consumers, concurrently.

We still have multiple users after 'claiming' the event. The only
thing is that the event will be destroyed after all users went away.
A process (that claimed the event) was responsible for this.

> > > > 3. ioctl DETACH from it so that it is "forked in the background" so to
> > > > speak, very similar to a background job in the shell.
> > 
> > With 'detach' we move the event into the 'background' so that it is
> > still available after opening.
> 
> Ok, maybe ATTACH/DETACH is not the perfect naming for this after all.
> Maybe when we want to state the fact that the event is going to continue
> existing after closing the buffer consumer, we want to do ioctl(event,
> DONT_DESTROY) and when we want to actually get rid of it, one of the
> process does ioctl(event, DESTROY).

I still prefer claim/unclaim. ;)

> Which reminds me, what do we do when one process destroys the event but
> there are other consumers accessing it concurrently? Refcounting?

See above.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-12 Thread Robert Richter

On 12.07.13 10:27:56, Ingo Molnar wrote:
> 
> * Robin Holt  wrote:
> 
> > [...]
> > 
> > With this patch, we did boot a 16TiB machine.  Without the patches, the 
> > v3.10 kernel with the same configuration took 407 seconds for 
> > free_all_bootmem.  With the patches and operating on 2MiB pages instead 
> > of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
> > for how the 1GiB chunk size will perform.
> 
> That's pretty impressive.
> 
> It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
> something else being the current bottleneck, besides page init 
> granularity.
> 
> Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
> memory, and then hot-add that memory? That would allow easy profiling of 
> remaining overhead.
> 
> Side note:
> 
> Robert Richter and Boris Petkov are working on 'persistent events' support 
> for perf, which will eventually allow boot time profiling - I'm not sure 
> if the patches and the tooling support is ready enough yet for your 
> purposes.

The latest patch set is still this:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v2

It requires the perf subsystem to be initialized first which might be
too late, see perf_event_init() in start_kernel(). The patch set is
currently also limited to tracepoints only.

If this is sufficient for you, you might register persistent events
with the function perf_add_persistent_event_by_id(), see
mcheck_init_tp() how to do this. Later you can fetch all samples with:

 # perf record -e persistent// sleep 1

> Robert, Boris, the following workflow would be pretty intuitive:
> 
>  - kernel developer sets boot flag: perf=boot,freq=1khz,size=16MB
> 
>  - we'd get a single (cycles?) event running once the perf subsystem is up
>and running, with a sampling frequency of 1 KHz, sending profiling
>trace events to a sufficiently sized profiling buffer of 16 MB per
>CPU.

I am not sure about the event you want to setup here, if it is a
tracepoint the sample_period should be always 1. The buffer size
parameter looks interesting, for now it is 512kB per cpu per default
(as perf tools setup the buffer).

> 
>  - once the system reaches SYSTEM_RUNNING, profiling is stopped either
>automatically - or the user stops it via a new tooling command.
> 
>  - the profiling buffer is extracted into a regular perf.data via a
>special 'perf record' call or some other, new perf tooling 
>solution/variant.

See the perf-record command above...

> 
>[ Alternatively the kernel could attempt to construct a 'virtual'
>  perf.data from the persistent buffer, available via /sys/debug or
>  elsewhere in /sys - just like the kernel constructs a 'virtual' 
>  /proc/kcore, etc. That file could be copied or used directly. ]
> 
>  - from that point on this workflow joins the regular profiling workflow: 
>perf report, perf script et al can be used to analyze the resulting
>boot profile.

Ingo, thanks for outlining this workflow. We will look how this could
fit into the new version of persistent events we currently working on.

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:perf/urgent] perf tools: Fix build errors with O and DESTDIR make vars set

2013-07-12 Thread Robert Richter

On 12.07.13 01:49:40, tip-bot for Robert Richter wrote:
> Commit-ID:  107de3724eff5a6fa6474a4d2aa5460b63749ebf
> Gitweb: http://git.kernel.org/tip/107de3724eff5a6fa6474a4d2aa5460b63749ebf
> Author:     Robert Richter 
> AuthorDate: Tue, 11 Jun 2013 17:22:38 +0200
> Committer:  Arnaldo Carvalho de Melo 
> CommitDate: Mon, 8 Jul 2013 17:34:00 -0300
> 
> perf tools: Fix build errors with O and DESTDIR make vars set
> 
> Fixing build errors with O and DESTDIR make vars set:
> 
>  $ make prefix=/usr/local O=$builddir DESTDIR=$destdir -C tools/ perf
>  ...
>  make[1]: Entering directory `.../.source/perf/tools/perf'
>  CC .../.build/perf/perf/util/parse-events.o
>  util/parse-events.c:14:32: fatal error: parse-events-bison.h: No such file 
> or directory
>  compilation terminated.
>  make[1]: *** [.../.build/perf/perf/util/parse-events.o] Error 1
>  ...

Boris just raised another variant of building perf tools, which
unfortunately is broken now:

 $ make tools/perf
 LINK /home/robert/cx/linux/tools/perf/perf
 gcc: error: ../linux/tools/lib/lk/liblk.a: No such file or directory

There is also

 $ make -C tools perf
 $ make -C tools/perf

Plus variants with O= and DESTDIR set.

Looking how to fix this...

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] kbuild changes for v3.11-rc1

2013-07-12 Thread Robert Richter

On 11.07.13 10:16:18, Linus Torvalds wrote:
> On Thu, Jul 11, 2013 at 6:54 AM, Michal Marek  wrote:
> >
> > Yeah. It also reveals another bug that we rewrite the kernel.release
> > file each time.

The odd thing I have in a specific configuration is that depmod is
missing the kernelrelease when I only build install rules, though I
had built everything before and kernelrelease should be there:

 $ make 
 $ make  modules_install zinstall
 ...
   DEPMOD  
 Usage: .../scripts/depmod.sh /sbin/depmod 

Note that makeflags include the -j option, INSTALL_MOD_PATH and
INSTALL_PATH variables set, so no root perms required in this case.

I agree, the fix to add a dependency to the kernelrelease file is
wrong here since it rewrites the object tree, which should be readonly
for install rules.

I could reproduce the error above in my setup and will look at it
again, hopefully fixing it more carefully.

Appologize the trouble this made.

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tools: perf: Fix liblk not built when using perf_install target

2013-08-30 Thread Robert Richter

On 30.08.13 17:18:36, Sedat Dilek wrote:
> So, I tried to build perf the "official" way:
> 
>$ make -C tools/ perf_install
> 
> Unfortunately, my build breaks like this:
> ...
> make[2]: Entering directory
> `~/src/linux-kernel/linux/tools/lib/traceevent'
> make[2]: Leaving directory
> `~/src/linux-kernel/linux/tools/lib/traceevent'
> LINK perf
> gcc: error: ~/src/linux-kernel/linux/tools/lib/lk/liblk.a: No such file or 
> directory
> make[1]: *** [perf] Error 1
> make[1]: Leaving directory `~/src/linux-kernel/linux/tools/perf'
> make: *** [perf_install] Error 2
> 
> After some discussion on IRC with peterz and acme and a closer look at
> the targets in "tools/Makefile", I have noticed that the perf_install
> target misses liblk to be built beforehand.

There are no build prerequisites on purpose for install targets.
Install targets are intended to be run with root permissions. We don't
want to create/overwrite files in the build directory as root while
running install rules. Thus, you always need to run:

 $ make -C tools/ perf

or similar first.

> On the contrary the perf_clean target invokes to clean liblk when perf
> is cleaned.
> 
> [ tools/Makefile ]
> ...
> perf_clean: liblk_clean
>   $(call descend,$(@:_clean=),clean)
> ...
> 
> Fix this by adding liblk target to perf_install target.

So no, this is not the proper fix.

Before installing you *need* to build. You might want to fix the
documenation...

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm, kbuild: make "make install" not depend on vmlinux

2013-09-30 Thread Robert Richter

On 18.07.13 11:22:24, Michal Marek wrote:
> > So Michal (or ARM people - whoever wants to take the patch), just take
> > my ack. No objections.
> 
> I can add it to the kbuild tree if needed. Otherwise you can add
> Acked-by: Michal Marek .

This didn't make it upstream yet, can somebody at it to a tree?

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86, 32-bit: Fix invalid stack address while in softirq

2012-09-06 Thread Robert Richter

(Resending patch with [PATCH] in subject line and updated cc list.)

On 06.09.12 09:30:37, wyang1 wrote:
> Robert,
> 
> I agreed what you said, my patch more likes a workaround.
> 
> > So the proper fix I see is to fix kernel_stack_pointer() to return a
> > valid stack in case of an empty stack while in softirq. Something like
> > the patch below. Maybe it must be optimized a bit. I tested the patch
> > over night with no issues found. Please test it too.
> 
> I also tested the following patch over night. it is fine.:-)

Wei,

thanks for testing.

Ingo,

please take a look at this. Not sure if Linus want to look at this too
and if we need more optimization here.

Thanks,

-Robert


>From 8e7c16913b1fcfc63f7b24337551aacc7153c334 Mon Sep 17 00:00:00 2001
From: Robert Richter 
Date: Mon, 3 Sep 2012 20:54:48 +0200
Subject: [PATCH] x86, 32-bit: Fix invalid stack address while in softirq

In 32 bit the stack address provided by kernel_stack_pointer() may
point to an invalid range causing NULL pointer access or page faults
while in NMI (see trace below). This happens if called in softirq
context and if the stack is empty. The address at ®s->sp is then
out of range.

Fixing this by checking if regs and ®s->sp are in the same stack
context. Otherwise return the previous stack pointer stored in struct
thread_info.

 BUG: unable to handle kernel NULL pointer dereference at 000a
 IP: [] print_context_stack+0x6e/0x8d
 *pde = 
 Oops:  [#1] SMP
 Modules linked in:
 Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 
Hewlett-Packard HP xw9400 Workstation/0A1Ch
 EIP: 0060:[] EFLAGS: 00010093 CPU: 0
 EIP is at print_context_stack+0x6e/0x8d
 EAX: e000 EBX: 000a ECX: f4435f94 EDX: 000a
 ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 CR0: 8005003b CR2: 000a CR3: 34ac9000 CR4: 07d0
 DR0:  DR1:  DR2:  DR3: 
 DR6: 0ff0 DR7: 0400
 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000)
 Stack:
  03e8 e000 1ffc f4e39b00  000a f4435f94 c155198c
  f5409ef0 c1003723 c155198c f5409f04  f5409edc  
  f5409ee8 f4435f94 f5409fc4 0001 f5409f1c c12dce1c  c155198c
 Call Trace:
  [] dump_trace+0x7b/0xa1
  [] x86_backtrace+0x40/0x88
  [] ? oprofile_add_sample+0x56/0x84
  [] oprofile_add_sample+0x75/0x84
  [] op_amd_check_ctrs+0x46/0x260
  [] profile_exceptions_notify+0x23/0x4c
  [] nmi_handle+0x31/0x4a
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] do_nmi+0xa0/0x2ff
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] nmi_stack_correct+0x28/0x2d
  [] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [] ? do_softirq+0x4b/0x7f
  
  [] irq_exit+0x35/0x5b
  [] smp_apic_timer_interrupt+0x6c/0x7a
  [] apic_timer_interrupt+0x2a/0x30
 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 
73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 
e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc
 EIP: [] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0
 CR2: 000a
 ---[ end trace 62afee3481b00012 ]---
 Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Yang Wei 
Cc: sta...@vger.kernel.org
Signed-off-by: Robert Richter 
---
 arch/x86/include/asm/ptrace.h |   15 ---
 arch/x86/kernel/ptrace.c  |   21 +
 arch/x86/oprofile/backtrace.c |2 +-
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index dcfde52..19f16eb 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -205,21 +205,14 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
 }
 #endif
 
-/*
- * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
- * when it traps.  The previous stack will be directly underneath the saved
- * registers, and 'sp/ss' won't even have been saved. Thus the '®s->sp'.
- *
- * This is valid only for kernel mode traps.
- */
-static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
-{
 #ifdef CONFIG_X86_32
-   return (unsigned long)(®s->sp);
+extern unsigned long kernel_stack_pointer(struct pt_regs *regs);
 #else
+static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
+{
return regs->sp;
-#endif
 }
+#endif
 
 #define GET_IP(regs) ((regs)->ip)
 #define GET_FP(regs) ((regs)->bp)
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index c4c6a5c..5a9a8c9 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -165,6 +165,27 @@ static inline bool invalid_selector(u16 value)
 
 #define FLAG_MASK  FLAG_MASK_32
 
+/*
+ * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
+ * when it

Re: [PATCH] x86, 32-bit: Fix invalid stack address while in softirq

2012-09-06 Thread Robert Richter

On 06.09.12 09:14:42, Steven Rostedt wrote:
> On Thu, 2012-09-06 at 12:04 +0200, Robert Richter wrote:
> 
> > please take a look at this. Not sure if Linus want to look at this too
> > and if we need more optimization here.
> 
> It could probably go either way. Although the function has several
> lines, it looks like the actual assembly produced wouldn't be much. I
> took a quick look at where kernel_stack_pointer() is used, and I didn't
> find any hot paths. This is why I think it can either be a called
> function or static inline without much difference.

The main reason for putting it into ptrace.c was struct thread_info
which requires the inclusion of linux/thread_info.h. I didn't want to
add this to ptrace.h.

> 
> >  
> >  #define GET_IP(regs) ((regs)->ip)
> >  #define GET_FP(regs) ((regs)->bp)
> > diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> > index c4c6a5c..5a9a8c9 100644
> > --- a/arch/x86/kernel/ptrace.c
> > +++ b/arch/x86/kernel/ptrace.c
> > @@ -165,6 +165,27 @@ static inline bool invalid_selector(u16 value)
> >  
> >  #define FLAG_MASK  FLAG_MASK_32
> >  
> > +/*
> > + * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
> > + * when it traps.  The previous stack will be directly underneath the saved
> > + * registers, and 'sp/ss' won't even have been saved. Thus the '®s->sp'.
> > + *
> > + * This is valid only for kernel mode traps.
> > + */
> > +unsigned long kernel_stack_pointer(struct pt_regs *regs)
> > +{
> > +   unsigned long context = (unsigned long)regs & ~(THREAD_SIZE - 1);
> > +   unsigned long sp = (unsigned long)®s->sp;
> > +   struct thread_info *tinfo;
> > +
> 
> Please add some comments to why you did this. Having this info in just
> the change log is not enough. Reading it with the code makes much more
> sense.

Yes, will update the comment here.

> 
> > +   if (context == (sp & ~(THREAD_SIZE - 1)))
> > +   return sp;
> > +
> > +   tinfo = (struct thread_info *)context;
> > +
> > +   return tinfo->previous_esp;
> > +}
> > +
> >  static unsigned long *pt_regs_access(struct pt_regs *regs, unsigned long 
> > regno)
> >  {
> > BUILD_BUG_ON(offsetof(struct pt_regs, bx) != 0);
> > diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
> > index d6aa6e8..5b5741e 100644
> > --- a/arch/x86/oprofile/backtrace.c
> > +++ b/arch/x86/oprofile/backtrace.c
> > @@ -113,7 +113,7 @@ x86_backtrace(struct pt_regs * const regs, unsigned int 
> > depth)
> >  
> > if (!user_mode_vm(regs)) {
> > unsigned long stack = kernel_stack_pointer(regs);
> > -   if (depth)
> > +   if (depth & stack)
> 
> Can other users of kernel_stack_pointer() be nailed by a return of NULL?

It would be save here too, but dump_trace() falls back to the current
stack in case there is no stack address given which we don't want with
oprofile.

I was looking at all users of kernel_stack_pointer() and could not
find any direct pointer dereference of the sp. The only potential
problems I found could arise here:

 arch/x86/kernel/kprobes.c:resume_execution()
 arch/x86/kernel/time.c:profile_pc()

It is not quite clear if we really need code here that checks the
pointer. Since a NULL pointer access has the same effect as if the
stack address would be wrong which would be the case without the
patch, I rather tend not to change the code here.

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, 32-bit: Fix invalid stack address while in softirq

2012-09-06 Thread Robert Richter

On 06.09.12 17:34:07, Robert Richter wrote:
> On 06.09.12 11:14:42, Steven Rostedt wrote:
> > On Thu, 2012-09-06 at 17:02 +0200, Robert Richter wrote:
> > 
> > > > > --- a/arch/x86/oprofile/backtrace.c
> > > > > +++ b/arch/x86/oprofile/backtrace.c
> > > > > @@ -113,7 +113,7 @@ x86_backtrace(struct pt_regs * const regs, 
> > > > > unsigned int depth)
> > > > >  
> > > > >   if (!user_mode_vm(regs)) {
> > > > >   unsigned long stack = kernel_stack_pointer(regs);
> > > > > - if (depth)
> > > > > + if (depth & stack)
> > > > 
> > > > Can other users of kernel_stack_pointer() be nailed by a return of NULL?
> > > 
> > > It would be save here too, but dump_trace() falls back to the current
> > > stack in case there is no stack address given which we don't want with
> > > oprofile.
> > > 
> > > I was looking at all users of kernel_stack_pointer() and could not
> > > find any direct pointer dereference of the sp. The only potential
> > > problems I found could arise here:
> > > 
> > >  arch/x86/kernel/kprobes.c:resume_execution()
> > >  arch/x86/kernel/time.c:profile_pc()
> > > 
> > > It is not quite clear if we really need code here that checks the
> > > pointer. Since a NULL pointer access has the same effect as if the
> > > stack address would be wrong which would be the case without the
> > > patch, I rather tend not to change the code here.
> > 
> > Then a comment should be in the oprofile code too. Something to the
> > effect that oprofile is special and can cause kernel_stack_pointer() to
> > return NULL in some cases, thus we need to check for it.
> 
> We could return always a non-null stack pointer with:
> 
> unsigned long kernel_stack_pointer(struct pt_regs *regs)
> {
>   unsigned long context = (unsigned long)regs & ~(THREAD_SIZE - 1);
>   unsigned long sp = (unsigned long)®s->sp;
>   struct thread_info *tinfo;
> 
>   if (context == (sp & ~(THREAD_SIZE - 1)))
>   return sp;
>   
>   tinfo = (struct thread_info *)context;
>   if (tinfo->previous_esp)
>   tinfo->previous_esp;
> 
>   return (unsigned long)regs;
> }

I meant:

unsigned long kernel_stack_pointer(struct pt_regs *regs)
{
unsigned long context = (unsigned long)regs & ~(THREAD_SIZE - 1);
unsigned long sp = (unsigned long)®s->sp;
struct thread_info *tinfo;

if (context == (sp & ~(THREAD_SIZE - 1)))
return sp;

tinfo = (struct thread_info *)context;
if (tinfo->previous_esp)
return tinfo->previous_esp;

return (unsigned long)regs;
}

-Robert

> 
> Maybe this is even better.
> 
> -Robert
> 
> -- 
> Advanced Micro Devices, Inc.
> Operating System Research Center

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, 32-bit: Fix invalid stack address while in softirq

2012-09-06 Thread Robert Richter

On 06.09.12 11:14:42, Steven Rostedt wrote:
> On Thu, 2012-09-06 at 17:02 +0200, Robert Richter wrote:
> 
> > > > --- a/arch/x86/oprofile/backtrace.c
> > > > +++ b/arch/x86/oprofile/backtrace.c
> > > > @@ -113,7 +113,7 @@ x86_backtrace(struct pt_regs * const regs, unsigned 
> > > > int depth)
> > > >  
> > > > if (!user_mode_vm(regs)) {
> > > > unsigned long stack = kernel_stack_pointer(regs);
> > > > -   if (depth)
> > > > +   if (depth & stack)
> > > 
> > > Can other users of kernel_stack_pointer() be nailed by a return of NULL?
> > 
> > It would be save here too, but dump_trace() falls back to the current
> > stack in case there is no stack address given which we don't want with
> > oprofile.
> > 
> > I was looking at all users of kernel_stack_pointer() and could not
> > find any direct pointer dereference of the sp. The only potential
> > problems I found could arise here:
> > 
> >  arch/x86/kernel/kprobes.c:resume_execution()
> >  arch/x86/kernel/time.c:profile_pc()
> > 
> > It is not quite clear if we really need code here that checks the
> > pointer. Since a NULL pointer access has the same effect as if the
> > stack address would be wrong which would be the case without the
> > patch, I rather tend not to change the code here.
> 
> Then a comment should be in the oprofile code too. Something to the
> effect that oprofile is special and can cause kernel_stack_pointer() to
> return NULL in some cases, thus we need to check for it.

We could return always a non-null stack pointer with:

unsigned long kernel_stack_pointer(struct pt_regs *regs)
{
unsigned long context = (unsigned long)regs & ~(THREAD_SIZE - 1);
unsigned long sp = (unsigned long)®s->sp;
struct thread_info *tinfo;

if (context == (sp & ~(THREAD_SIZE - 1)))
return sp;

tinfo = (struct thread_info *)context;
if (tinfo->previous_esp)
tinfo->previous_esp;

return (unsigned long)regs;
}

Maybe this is even better.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/11] perf tool: precise mode requires exclude_guest

2012-09-06 Thread Robert Richter

David,

On 06.09.12 00:07:52, Peter Zijlstra wrote:
> On Wed, 2012-09-05 at 09:43 -0600, David Ahern wrote:
> > We need to require exclude_guest when using precise attribute with perf 
> > else all running VMs on Intel-based servers will crash. I do not have an 
> > AMD based server to even attempt the preferred solution.

do you have a perf tool patch ready that enables exclude_guest per
default for precise events? I have ibs kernel code that returns EINVAL
in this case.  Current perf tool should then fall back by sending
another syscall with the exclude_guest modifier disabled. This should
work for ibs. Testing your patch wouln't be an effort for me.

To prevent Intel-based servers from crashing, pebs should be only
enabled if the exclude_guest modifier is enabled. I guess there is
already kernel code to prevent this, but didn't look at the sources.

To prevent probing the kernel with syscalls to detect pmu features, I
suggested some time ago to introduce

 /sys/bus/event_source/devices/*/flags

or so similar to /proc/cpuinfo:flags. Any opinions on that?

There is also the option to emulate exclude_guest for ibs in
software. So far I didn't have the time to look at this. We could
still add this in the future. Since we need to fix current mainline
anyway where such a patch wouldn't go in for v3.6, I will send my
current ibs kernel fixes I mentioned above next days. It would be
great to test this also with your patch.

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/11] perf tool: precise mode requires exclude_guest

2012-09-06 Thread Robert Richter

David,

just found your patch...

On 20.07.12 17:25:53, David Ahern wrote:
> PEBS cannot be used with guest mode. If user adds :p modifier set
> exclude_guest as well.

> @@ -653,6 +653,9 @@ int parse_events_modifier(struct list_head *list, char 
> *str)
>   eH = 0;
>   } else if (*str == 'p') {
>   precise++;
> + /* use of precise requires exclude_guest */
> + if (!exclude_GH)
> + eG = 1;

We should only enable exclude_guest if no G/H modifiers are given on
the command line. If I correctly read the code this is indicated by
exclude_GH and correctly implemented.

To summarize how I think it should work:

 -e cycles:p . Enable exclude_guest bit. If the syscall fails,
   fall back by disabling the guest bit and send
   syscall again

 -e cycles:pGH ... Do not enable exclude_guest/_host. Do not fall back
   on syscall errors.

Same for the case the p modifier is not set.

So we have the following:

 G   ... set exclude_host bit, clear exclude_guest
 H   ... set exclude_guest bit, clear exclude_host
 GH  ... clear both exclude_guest/_host
  ... default (set exclude_host bit, clear exclude_guest),
 but fall back on syscall errors to clear both
 exclude_guest/_host

The modifier parser should be correctly implemented and should work
with ibs. Will test this patch.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/11] perf tool: precise mode requires exclude_guest

2012-09-06 Thread Robert Richter

On 06.09.12 13:17:19, David Ahern wrote:
> On 9/6/12 1:02 PM, Robert Richter wrote:
> > just found your patch...

I tested the patch and it runs ok with my kernel patch that checks
syscall attributes for ibs. perf tools falls back and clears
exclude_guest/_host bits in attr. It also should work if the kernel
ignores those bits (without my patch).

Acked-by: Robert Richter 

> > To summarize how I think it should work:
> >
> >   -e cycles:p . Enable exclude_guest bit. If the syscall fails,
> > fall back by disabling the guest bit and send
> > syscall again
> 
> I think the fallback scenarios are getting out of hand -- too many and 
> too many combinations. For this case (:p only) I proposed setting 
> exclude_guest. It should not fail if the pebs/ibs is supported. If so, 
> give the user a helpful message. e.g., https://lkml.org/lkml/2012/7/20/440
> 
> 
> >
> >   -e cycles:pGH ... Do not enable exclude_guest/_host. Do not fall back
> > on syscall errors.
> 
> In this case give the user a message that the requested combination is 
> invalid. I have a patch similar to the one above for this.

Yes, but cases where only guest or host-only mode is supported are
rare.

> 
> >
> > Same for the case the p modifier is not set.
> >
> > So we have the following:
> >
> >   G   ... set exclude_host bit, clear exclude_guest
> >   H   ... set exclude_guest bit, clear exclude_host
> >   GH  ... clear both exclude_guest/_host
> >... default (set exclude_host bit, clear exclude_guest),
> >   but fall back on syscall errors to clear both
> >   exclude_guest/_host
> 
> you lost me on the last one. If the user specifies no attributes on the 
> events then exclude_host/exclude_guest is controlled by the 
> perf_host/perf_guest variables - see event_attr_init().

As you said above, falling back by disabling flags becomes unhandy.
Current implementation falls back without (?) notice even if the user
explicitly demanded G or H. Falling back should only be the case if no
G/H modifiers are given. This is what I meant here.

> > The modifier parser should be correctly implemented and should work
> > with ibs. Will test this patch.
> 
> Thanks!

Sorry far being late a bit with all this.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf, ibs: Check syscall attribute flags

2012-09-07 Thread Robert Richter

On 06.09.12 21:56:45, Robert Richter wrote:
> I tested the patch and it runs ok with my kernel patch that checks
> syscall attributes for ibs. perf tools falls back and clears
> exclude_guest/_host bits in attr.

This is the patch that checks syscall attributes for ibs. Should be
for tip:perf/urgent for ABI compatibility.

-Robert


>From 1d037614edef576da441936bd8c917d31f57b179 Mon Sep 17 00:00:00 2001
From: Robert Richter 
Date: Wed, 25 Jul 2012 19:12:45 +0200
Subject: [PATCH] perf, ibs: Check syscall attribute flags

Current implementation simply ignores attribute flags. Thus, there is
no notification to userland of unsupported features. Check syscall's
attribute flags to let userland know if a feature is supported by the
kernel. This is also needed to distinguish between future kernels what
might support a feature.

Cc:  v3.5..
Signed-off-by: Robert Richter 
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   11 +++
 include/linux/perf_event.h   |2 ++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c 
b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 7bfb5be..0456061 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -215,6 +215,14 @@ static int perf_ibs_init(struct perf_event *event)
struct perf_ibs *perf_ibs;
u64 max_cnt, config;
int ret;
+   struct perf_event_attr notsupp = {
+   .exclude_user   = 1,
+   .exclude_kernel = 1,
+   .exclude_hv = 1,
+   .exclude_idle   = 1,
+   .exclude_host   = 1,
+   .exclude_guest  = 1,
+   };
 
perf_ibs = get_ibs_pmu(event->attr.type);
if (perf_ibs) {
@@ -229,6 +237,9 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != &perf_ibs->pmu)
return -ENOENT;
 
+   if (perf_flags(&event->attr) & perf_flags(¬supp))
+   return -EINVAL;
+
if (config & ~perf_ibs->config_mask)
return -EINVAL;
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 28f9cee..c36a04f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -304,6 +304,8 @@ struct perf_event_attr {
__u32   __reserved_2;
 };
 
+#define perf_flags(attr)   (*(&(attr)->read_format + 1))
+
 /*
  * Ioctls that can be done on a perf event fd:
  */
-- 
1.7.8.6




-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf, ibs: Check syscall attribute flags

2012-09-07 Thread Robert Richter

On 07.09.12 10:50:35, David Ahern wrote:
> On 9/7/12 10:41 AM, Robert Richter wrote:
> > +   struct perf_event_attr notsupp = {
> > +   .exclude_user   = 1,
> > +   .exclude_kernel = 1,
> > +   .exclude_hv = 1,
> > +   .exclude_idle   = 1,
> > +   .exclude_host   = 1,
> > +   .exclude_guest  = 1,
> > +   };
> >
> > perf_ibs = get_ibs_pmu(event->attr.type);
> > if (perf_ibs) {
> > @@ -229,6 +237,9 @@ static int perf_ibs_init(struct perf_event *event)
> > if (event->pmu != &perf_ibs->pmu)
> > return -ENOENT;
> >
> > +   if (perf_flags(&event->attr) & perf_flags(¬supp))
> > +   return -EINVAL;
> > +
> 
> Am I reading this right - if exclude_guest is set then perf_ibs_init 
> returns -EINVAL?

Yes, the hardware does not support this. I will look for a solution
which emulates this is software.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf, ibs: Check syscall attribute flags

2012-09-07 Thread Robert Richter

On 07.09.12 18:56:27, Peter Zijlstra wrote:

> > +   .exclude_user   = 1,
> > +   .exclude_kernel = 1,
> > +   .exclude_hv = 1,
> > +   .exclude_idle   = 1,
> > +   .exclude_host   = 1,
> > +   .exclude_guest  = 1,
> 
> Ideally we'd grow support for those using SVM entry/exit hooks though.

Yes, I will look at this too. This patch is inteded for stable and
urgent.

> > +#define perf_flags(attr)   (*(&(attr)->read_format + 1))
> 
> 
> Another anonymous union shouldn't hurt..
> 
> ---
>  include/linux/perf_event.h | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index cc5e2cd..5df37a0 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -236,7 +236,9 @@ struct perf_event_attr {
>   __u64   sample_type;
>   __u64   read_format;
>  
> - __u64   disabled   :  1, /* off by default*/
> + union {
> + __u64   flags;
> + __u64   disabled   :  1, /* off by default*/
>   inherit:  1, /* children inherit it   */
>   pinned :  1, /* must always be on PMU */
>   exclusive  :  1, /* only group on PMU */
> @@ -272,6 +274,7 @@ struct perf_event_attr {
>   exclude_callchain_user   : 1, /* exclude user 
> callchains */
>  
>   __reserved_1   : 41;
> + };
>  
>   union {
>   __u32   wakeup_events;/* wakeup every n events */

I was thinking of this too. But this breaks existing code to compile
since static initialization of struct perf_event_attr fails, e.g.:

 builtin-test.c:469:3: error: unknown field ‘watermark’ specified in initializer

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf, ibs: Check syscall attribute flags

2012-09-07 Thread Robert Richter

On 07.09.12 11:11:56, David Ahern wrote:
> On 9/7/12 11:07 AM, Robert Richter wrote:
> >> Am I reading this right - if exclude_guest is set then perf_ibs_init
> >> returns -EINVAL?
> >
> > Yes, the hardware does not support this. I will look for a solution
> > which emulates this is software.
> 
> ugh, that's exactly we need for Intel: if precise is set, exclude_guest 
> must be set.
> 
> In perf I would like to set exclude_guest if precise is set -- what I 
> mentioned yesterday for the patch https://lkml.org/lkml/2012/7/20/437. 
> With my patch -e cycles:p on AMD will then fail EINVAL.

You may set this bit per default. On EINVAL perf falls back and clears
the bit. I tested exactly this for ibs and it works.

So, on Intel, if the guest_exclude bit is not set, the kernel simply
throws an EINVAL too, but since perf sets it per default with the
precise modifier there shouldn't be any problem here.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -v2] perf, ibs: Check syscall attribute flags

2012-09-10 Thread Robert Richter

On 07.09.12 18:56:27, Peter Zijlstra wrote:
> > @@ -215,6 +215,14 @@ static int perf_ibs_init(struct perf_event *event)
> > struct perf_ibs *perf_ibs;
> > u64 max_cnt, config;
> > int ret;
> > +   struct perf_event_attr notsupp = {
> 
> static const ?
> 
> > +   .exclude_user   = 1,
> > +   .exclude_kernel = 1,
> > +   .exclude_hv = 1,
> > +   .exclude_idle   = 1,
> > +   .exclude_host   = 1,
> > +   .exclude_guest  = 1,

Version 2 below with static const used.

-Robert


>From faf80df002709ec95178cf8a6f3f7298200ac1a7 Mon Sep 17 00:00:00 2001
From: Robert Richter 
Date: Wed, 25 Jul 2012 19:12:45 +0200
Subject: [PATCH] perf, ibs: Check syscall attribute flags

Current implementation simply ignores attribute flags. Thus, there is
no notification to userland of unsupported features. Check syscall's
attribute flags to let userland know if a feature is supported by the
kernel. This is also needed to distinguish between future kernels what
might support a feature.

Cc:  v3.5..
Signed-off-by: Robert Richter 
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   12 
 include/linux/perf_event.h   |2 ++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c 
b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 7bfb5be..eebd5ff 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -209,6 +209,15 @@ static int perf_ibs_precise_event(struct perf_event 
*event, u64 *config)
return -EOPNOTSUPP;
 }
 
+static const struct perf_event_attr ibs_notsupp = {
+   .exclude_user   = 1,
+   .exclude_kernel = 1,
+   .exclude_hv = 1,
+   .exclude_idle   = 1,
+   .exclude_host   = 1,
+   .exclude_guest  = 1,
+};
+
 static int perf_ibs_init(struct perf_event *event)
 {
struct hw_perf_event *hwc = &event->hw;
@@ -229,6 +238,9 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != &perf_ibs->pmu)
return -ENOENT;
 
+   if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp))
+   return -EINVAL;
+
if (config & ~perf_ibs->config_mask)
return -EINVAL;
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 28f9cee..c36a04f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -304,6 +304,8 @@ struct perf_event_attr {
__u32   __reserved_2;
 };
 
+#define perf_flags(attr)   (*(&(attr)->read_format + 1))
+
 /*
  * Ioctls that can be done on a perf event fd:
  */
-- 
1.7.8.6




-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf, ibs: Check syscall attribute flags

2012-09-10 Thread Robert Richter

On 07.09.12 11:20:19, David Ahern wrote:
> I see now... intel returns ENOTSUP if exclude_guest is not set, amd 
> returns EINVAL if it is set.
> 
> For the AMD case the fallback_missing_features code kicks in for 
> perf-top and perf-record; I just need to fix up the pr_debug for that case.

It is EOPNOTSUP. I preferred returning EOPNOTSUP too. But then I saw
that fallback code is implemented to work mostly with EINVAL. It seems
to be the default answer to a syscall. ;)

There are several pieces of code in perf that implement fallback code,
it's hard to find it all and this already caused some bugs in the
past. I was looking at this to unify it, but it was too much effort.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 06/37] x86-32: Fix invalid stack address while in softirq

2012-12-04 Thread Robert Richter

On 04.12.12 11:42:58, Herton Ronaldo Krzesinski wrote:
> Hi, this makes build fail with oprofile on i386 on 3.0.54:
> ERROR: "kernel_stack_pointer" [arch/x86/oprofile/oprofile.ko] undefined!
> 
> The following commit should address this failure:
> 
> commit cb57a2b4cff7edf2a4e32c0163200e9434807e0a
> Author: H. Peter Anvin 
> Date:   Tue Nov 20 22:21:02 2012 -0800
> 
> x86-32: Export kernel_stack_pointer() for modules

Correct. I checked the stable Git trees and it is not in 3.0 and 3.4.
My former statement was wrong, got a bit confused about which patch
was added to which tree, sorry.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters

2012-11-12 Thread Robert Richter

Stephane,

On 12.11.12 13:24:38, Stephane Eranian wrote:
> Anybody from AMD or formerly @ AMD care to submit a libpfm4 patch
> to add the Fam15th NB events?
> 
> I'd like to avoid having to type them in manually.

Suravee may probably help you here.

HTH

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters

2012-11-12 Thread Robert Richter

On 11.11.12 19:17:07, Stephane Eranian wrote:
> On Sat, Nov 10, 2012 at 12:50 PM, Robert Richter  wrote:
> > Peter's main concerns were that my patch set is not in the
> > Intel-uncore style. I started reworking this but was not able to
> > finish my work. This concerns still exist.
> >
> That was my concern too. I don't recall exactly why it could not
> be totally disconnected from the core PMU. I think hardware-wise,
> it was possible. Could you refresh my memory?

Current implementation only allows the use of a single x86 pmu.
Multiple instances of x86 pmus in a system like an additional pmu for
nb counters requires a complete rework. You need to remove global
variables and extend function interfaces by arguments pointing to a
struct x86_pmu. All this without adding overhead (e.g. pointer
chasing) compared to the existing code.

This is also the reason why the intel-uncore implemenation is
basically a copy-and-paste of generic x86 pmu code plus uncore
specific changes.

Avoiding code cuplication and a complex rework were the reasons for
simply extending the family 10h implementation to also support family
15h nb counters. This seemed to me the best approach that time.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters

2012-11-12 Thread Robert Richter

Jacob,

On 11.11.12 12:44:26, Jacob Shin wrote:
> Because
> in an upcoming processor family, there is no core performance counter
> extensions, but we do have northbridge performance counters. Meaning
> the counter address base would be c001 and northbridge counters
> live in c0010240, being 0x240 apart, we could make counter masks work
> but that testng awful alot of 0's for every address offset calculation
> .

I agree with you that my imlementation of counter masks does not fit
perfectly for this reason and also since it requires too much changes
in generic code (for_each_generic_counter()) which Peter did not like
either. So it is ok to me to replace your solution with the patch that
implenents counter bitmasks.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/urgent] x86-32: Fix invalid stack address while in softirq

2012-11-13 Thread Robert Richter

On 13.11.12 16:17:27, Ingo Molnar wrote:
> 
> * tip-bot for Robert Richter  wrote:
> 
> > Commit-ID:  fa22b5dfbea95c14dd96da264e80cac68857ceb7
> > Gitweb: 
> > http://git.kernel.org/tip/fa22b5dfbea95c14dd96da264e80cac68857ceb7
> > Author: Robert Richter 
> > AuthorDate: Mon, 3 Sep 2012 20:54:48 +0200
> > Committer:  H. Peter Anvin 
> > CommitDate: Wed, 31 Oct 2012 14:25:18 -0700
> > 
> > x86-32: Fix invalid stack address while in softirq
> 
> Had to remove this one because it broke the 32-bit allmodconfig 
> build:
> 
> ERROR: "kernel_stack_pointer" [arch/x86/oprofile/oprofile.ko] undefined!

Argh! Will look at this.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters

2012-11-10 Thread Robert Richter

On 09.11.12 19:01:34, Jacob Shin wrote:
> The following patchset enables 4 additional performance counters in
> AMD family 15h processors that counts northbridge events -- such as
> DRAM accesses.
> 
> This patchset is based on previous work done by Robert Richter
>  :
> 
> https://lkml.org/lkml/2012/6/19/324

The original patch set of this is here (a rebased version):

http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-nb

This code was tested in detail.

> The main differences are:
> 
> - The northbridge counters are indexed contiguously right above the
>   core performance counters.
> 
> - MSR address offset calculations are moved to architecture specific
>   files.
> 
> - Interrups are set up to be delivered only to a single core.

So I rather suggest to make delta patches on top of my patches.

Peter's main concerns were that my patch set is not in the
Intel-uncore style. I started reworking this but was not able to
finish my work. This concerns still exist.

Due to the current situation I would rather prefer to create a
tip:perf/amd-nb branch that includes my patches and then add all
further necessary steps for mainline acceptance on top of it.

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h

2012-11-16 Thread Robert Richter

Jacob,

On 15.11.12 15:31:53, Jacob Shin wrote:
> On AMD family 15h processors, there are 4 new performance counters
> (in addition to 6 core performance counters) that can be used for
> counting northbridge events (i.e. DRAM accesses). Their bit fields are
> almost identical to the core performance counters. However, unlike the
> core performance counters, these MSRs are shared between multiple
> cores (that share the same northbridge). We will reuse the same code
> path as existing family 10h northbridge event constraints handler
> logic to enforce this sharing.
> 
> These new counters are indexed contiguously right above the existing
> core performance counters, and their indexes correspond to RDPMC ECX
> values.
> 
> Signed-off-by: Jacob Shin 

your approach looks ok to me in general, but see my comments inline.

> @@ -156,31 +161,28 @@ static inline int amd_pmu_addr_offset(int index)
>   if (offset)
>   return offset;
>  
> - if (!cpu_has_perfctr_core)
> + if (!cpu_has_perfctr_core) {
>   offset = index;
> - else
> + ncore = AMD64_NUM_COUNTERS;
> + } else {
>   offset = index << 1;
> + ncore = AMD64_NUM_COUNTERS_CORE;
> + }
> +
> + /* find offset of NB counters with respect to x86_pmu.eventsel */
> + if (cpu_has_perfctr_nb) {
> + if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))
> + offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
> +  ((index - ncore) << 1);
> + }

There is duplicate calculatoin of offset in some cases. Better we
avoid this.

> +static int __amd_nb_hw_config(struct perf_event *event)
> +{
> + if (event->attr.exclude_user || event->attr.exclude_kernel ||
> + event->attr.exclude_host || event->attr.exclude_guest)
> + return -EINVAL;
>  
> - event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> + event->hw.config &= ~ARCH_PERFMON_EVENTSEL_USR;
> + event->hw.config &= ~ARCH_PERFMON_EVENTSEL_OS;
> +
> + if (event->hw.config & ~(AMD64_EVENTSEL_EVENT|
> +  ARCH_PERFMON_EVENTSEL_UMASK |
> +  ARCH_PERFMON_EVENTSEL_INT   |
> +  AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE |
> +  AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK))
> + return -EINVAL;
>  
>   return 0;
>  }

Comments are missing and an AMD64_NB_EVENT_MASK macro should be
defined for the above. See my previous version for reference:

/*
 * AMD NB counters (MSRs 0xc0010240 etc.) do not support the following
 * flags:
 *
 *  Host/Guest Only
 *  Counter Mask
 *  Invert Comparison
 *  Edge Detect
 *  Operating-System Mode
 *  User Mode
 *
 * Try to fix the config for default settings, otherwise fail.
 */
static int amd_nb_event_config(struct perf_event *event)
{
   if (!amd_is_nb_perfctr_event(&event->hw))
   return 0;

   if (event->attr.exclude_host || event->attr.exclude_guest
   || event->attr.exclude_user || event->attr.exclude_kernel)
   goto fail;

   event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | 
ARCH_PERFMON_EVENTSEL_OS);

   if (event->hw.config & ~(AMD64_NB_EVENT_MASK | 
ARCH_PERFMON_EVENTSEL_INT))
   goto fail;

   return 0;
fail:
   pr_debug("Invalid nb counter config value: %016Lx\n", event->hw.config);
   return -EINVAL;
}


> @@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events 
> *cpuc, struct perf_event *ev
>   if (new == -1)
>   return &emptyconstraint;
>  
> + /* set up interrupts to be delivered only to this core */
> + if (cpu_has_perfctr_nb) {
> + struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> +
> + hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> + hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> + hwc->config |= (0ULL | (c->cpu_core_id)) <<
> + AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> + }

Looks like a hack to me. The constaints handler is only supposed to
determine constraints and not to touch anything in the event's
structure. This should be done later when setting up hwc->config in
amd_nb_event_config() or so.

I also do not think that smp_processor_id() is the right thing to do
here. Since cpu_hw_events is per-cpu the cpu is already selected.

> +
>   return &nb->event_constraints[new];
>  }
>  
> @@ -520,6 +575,7 @@ static struct event_constraint amd_f15_PMC3  = 
> EVENT_CONSTRAINT(0, 0x08, 0);
>  static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 
> 0x09, 0);
>  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
>  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
> +static struct event_constraint amd_f15_NBPMC30 = EVENT_CONSTRAINT(0, 0x3C0, 
> 0);

The counter index mask de

Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h

2012-11-16 Thread Robert Richter

On 16.11.12 13:00:30, Jacob Shin wrote:
> On Fri, Nov 16, 2012 at 07:43:44PM +0100, Robert Richter wrote:
> > On 15.11.12 15:31:53, Jacob Shin wrote:
> > > @@ -156,31 +161,28 @@ static inline int amd_pmu_addr_offset(int index)
> > >   if (offset)
> > >   return offset;
> > >  
> > > - if (!cpu_has_perfctr_core)
> > > + if (!cpu_has_perfctr_core) {
> > >   offset = index;
> > > - else
> > > + ncore = AMD64_NUM_COUNTERS;
> > > + } else {

First calculation:

> > >   offset = index << 1;
> > > + ncore = AMD64_NUM_COUNTERS_CORE;
> > > + }
> > > +
> > > + /* find offset of NB counters with respect to x86_pmu.eventsel */
> > > + if (cpu_has_perfctr_nb) {
> > > + if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))

Second calculation:

> > > + offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
> > > +  ((index - ncore) << 1);
> > > + }
> > 
> > There is duplicate calculatoin of offset in some cases. Better we
> > avoid this.
> 
> Which cases? The code calculates the offset for a given index the very
> first time it is called, stores it, and uses that stored offset from
> then on. My [PATCH 3/4] sets that up.

One case above.

It looks like the paths should be defined more clearly.

> > > @@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events 
> > > *cpuc, struct perf_event *ev
> > >   if (new == -1)
> > >   return &emptyconstraint;
> > >  
> > > + /* set up interrupts to be delivered only to this core */
> > > + if (cpu_has_perfctr_nb) {
> > > + struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> > > +
> > > + hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> > > + hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> > > + hwc->config |= (0ULL | (c->cpu_core_id)) <<
> > > + AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> > > + }
> > 
> > Looks like a hack to me. The constaints handler is only supposed to
> > determine constraints and not to touch anything in the event's
> > structure. This should be done later when setting up hwc->config in
> > amd_nb_event_config() or so.
> 
> Hm.. is the hwc->config called after constraints have been set up
> already? If so, I'll change it ..

Should be, since the hw register can be setup only after the counter
is selected.

> 
> > 
> > I also do not think that smp_processor_id() is the right thing to do
> > here. Since cpu_hw_events is per-cpu the cpu is already selected.
> 
> Yeah, I could not figure out how to get the cpu number from cpuc. Is
> there a container_of kind of thing that I can do to get the cpu number
> ?

At some point event->cpu is assigned, I think.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h

2012-11-18 Thread Robert Richter

On 16.11.12 13:00:30, Jacob Shin wrote:
> > >  static int setup_event_constraints(void)
> > >  {
> > > - if (boot_cpu_data.x86 >= 0x15)
> > > + if (boot_cpu_data.x86 == 0x15)
> > >   x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
> > 
> > Since this does not cover family 16h anymore, you also need to extend
> > amd_get_event_ constraints() to setup nb counters with __amd_get_nb_
> > event_constraints() if cpu_has_perfctr_nb is set.
> 
> Yes family 16h will be covered by a separate patch at a later date.

This is fine, if functionality is added later. But If you run the
patch as it is on family 16h, you need to make sure nb counters are
not enabled at all. This is not the case since you have several checks
for cpu_has_perfctr_nb. This code must also run on family 16h as it
is, thus you need strictly disable nb counters in this case, or enable
it.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] ARM: oprofile: add A5/A7/A15 entries in op_perf_name

2012-11-20 Thread Robert Richter

Will,

On 05.11.12 11:31:03, Will Deacon wrote:
> > diff --git a/arch/arm/oprofile/common.c b/arch/arm/oprofile/common.c
> > index 99c63d4b..ec10db1 100644
> > --- a/arch/arm/oprofile/common.c
> > +++ b/arch/arm/oprofile/common.c
> > @@ -37,8 +37,11 @@ static struct op_perf_name {
> > { "xscale1","arm/xscale2"   },
> > { "v6", "arm/armv6" },
> > { "v6mpcore",   "arm/mpcore"},
> > +   { "ARMv7 Cortex-A5","arm/armv7-ca5" },
> > +   { "ARMv7 Cortex-A7","arm/armv7-ca7" },
> > { "ARMv7 Cortex-A8","arm/armv7" },
> > { "ARMv7 Cortex-A9","arm/armv7-ca9" },
> > +   { "ARMv7 Cortex-A15",   "arm/armv7-ca15" },
> >  };

> I'd rather not go down this route now that we have the operf tool as part of
> oprofile, which can use the perf syscall directly and doesn't need this
> string translation.

since this is just an update of cpu detection I would be willing to
include this into kernel code anyway.

We could further move the cpu detection to userspace if perf_event
exists. We let the kernel enable oprofile with cpu_type="unknown".
User space then could either bind mount the file (user could do this
manually) or we implement to write to cpu_type. Doing so oprofile
could use in-kernel perf_events if it exists always as fallback.

Any thoughts?

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] ARM: oprofile: add A5/A7/A15 entries in op_perf_name

2012-11-20 Thread Robert Richter

On 20.11.12 15:57:17, Will Deacon wrote:
> On Tue, Nov 20, 2012 at 12:17:47PM +0000, Robert Richter wrote:
> > Will,
> > 
> > On 05.11.12 11:31:03, Will Deacon wrote:
> > > > diff --git a/arch/arm/oprofile/common.c b/arch/arm/oprofile/common.c
> > > > index 99c63d4b..ec10db1 100644
> > > > --- a/arch/arm/oprofile/common.c
> > > > +++ b/arch/arm/oprofile/common.c
> > > > @@ -37,8 +37,11 @@ static struct op_perf_name {
> > > > { "xscale1","arm/xscale2"   },
> > > > { "v6", "arm/armv6" },
> > > > { "v6mpcore",   "arm/mpcore"},
> > > > +   { "ARMv7 Cortex-A5","arm/armv7-ca5" },
> > > > +   { "ARMv7 Cortex-A7","arm/armv7-ca7" },
> > > > { "ARMv7 Cortex-A8","arm/armv7" },
> > > > { "ARMv7 Cortex-A9","arm/armv7-ca9" },
> > > > +   { "ARMv7 Cortex-A15",   "arm/armv7-ca15" },
> > > >  };
> > 
> > > I'd rather not go down this route now that we have the operf tool as part 
> > > of
> > > oprofile, which can use the perf syscall directly and doesn't need this
> > > string translation.
> > 
> > since this is just an update of cpu detection I would be willing to
> > include this into kernel code anyway.
> 
> Perhaps, but one day we might like to remove this compatibility layer as
> tools move over to the perf system call, so adding new CPUs here is actively
> going against that.

This would help people to use oprofile as they did before with legacy
oprofile tools. There is not much effort to keep oprofile kernel
support for these tools if in-kernel perf_event support exists for new
hardware. As this is not much effort to maintain, we could keep
supporting this.  Forcing users to use operf since this is the only
way to connect to newer hardware might not be what they want.

> > We could further move the cpu detection to userspace if perf_event
> > exists. We let the kernel enable oprofile with cpu_type="unknown".
> > User space then could either bind mount the file (user could do this
> > manually) or we implement to write to cpu_type. Doing so oprofile
> > could use in-kernel perf_events if it exists always as fallback.
> 
> Not sure I follow you... operf already does the CPU detection from
> userspace, so I guess that code could simply be re-used. What does the bind
> mount involve?

I am thinking of the following:

 # cat /root/cpu_type
 arm/armv7-ca5
 # cat /dev/oprofile/cpu_type
 unknown
 # mount --bind /root/cpu_type /dev/oprofile/cpu_type
 # cat /dev/oprofile/cpu_type
 arm/armv7-ca5

>From here legacy oprofile tools work as expected using oprofilefs. (I
think. Did not test it.) We need to change the kernel for this a bit
to return 'unknown'. The mount could be done by the oprofile tools
using existing cpu detection code. This is only one way to setup
cpu_type from userland, there could be other ways too.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf: POWER-event translation questions

2012-11-20 Thread Robert Richter

On 09.11.12 11:26:26, Stephane Eranian wrote:
> On Thu, Nov 8, 2012 at 2:10 AM, Sukadev Bhattiprolu
>  wrote:

> > 2. Can we allow hyphens in the {name} token  (please see my change to
> >util/parse-events.l below). With this change, I can run:
> >
> The current code does not support this but Andi fixed that in his HSW patch
> and I use it for the PEBS-LL patch series as well.
>
> >   perf stat -e cpu/cmplu-stall-bru /tmp/nop
> >
> >without any changes to the user level tool (parse-events.l) I have
> >tested some common cases, not sure if it will break something :-)

But ...

> > diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> > index c87efc1..1967bb2 100644
> > --- a/tools/perf/util/parse-events.l
> > +++ b/tools/perf/util/parse-events.l
> > @@ -80,7 +80,7 @@ event [^,{}/]+
> >  num_dec[0-9]+
> >  num_hex0x[a-fA-F0-9]+
> >  num_raw_hex[a-fA-F0-9]+
> > -name   [a-zA-Z_*?][a-zA-Z0-9_*?]*
> > +name   [-a-zA-Z_*?][-a-zA-Z0-9_*?]*
 ^

... I wouldn't allow hyphens at the beginning of a name.

And, I am wondering if parsing reserved names like 'cpu-cycles' works
too, e.g. cpu/cpu-cycles-foobar/. There are many reserved words in the
lexer with hyphens in it. This might cause unexpected problems.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] ARM: oprofile: add A5/A7/A15 entries in op_perf_name

2012-11-20 Thread Robert Richter

On 20.11.12 16:55:17, Will Deacon wrote:
> On Tue, Nov 20, 2012 at 04:31:58PM +0000, Robert Richter wrote:
> > I am thinking of the following:
> > 
> >  # cat /root/cpu_type
> >  arm/armv7-ca5
> >  # cat /dev/oprofile/cpu_type
> >  unknown
> >  # mount --bind /root/cpu_type /dev/oprofile/cpu_type
> >  # cat /dev/oprofile/cpu_type
> >  arm/armv7-ca5
> > 
> > From here legacy oprofile tools work as expected using oprofilefs. (I
> > think. Did not test it.) We need to change the kernel for this a bit
> > to return 'unknown'. The mount could be done by the oprofile tools
> > using existing cpu detection code. This is only one way to setup
> > cpu_type from userland, there could be other ways too.
> 
> Ok, this is functionally equivalent to the patch that was submitted at the
> start of this thread: it solves the problem of mapping a single ARM core to
> a oprofile's CPU ID string. Technically, I don't mind doing that in the
> kernel (at least, it means you don't need to do your trick above)

The advantage of a solution where userland updates cpu_type is that we
never need to update the kernel anymore. This means, cpu detection can
be part of the tools.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/14] perf: Add persistent event facilities

2013-06-14 Thread Robert Richter

On 14.06.13 11:15:13, Namhyung Kim wrote:
> > +int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr 
> > *attr)
> > +{
> > +   struct pers_event_desc *desc;
> > +
> > +   if (cpu >= (unsigned)nr_cpu_ids)
> > +   return -EINVAL;  
> > +
> > +   list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
> > +   if (desc->attr->config == attr->config)
> > +   return __alloc_persistent_event_fd(desc);
> > +
>
> So it only supports tracepoint events.  Don't we need to add other types
> of event?

We have choosen tracepoints as initial implementation since they are
our first use case and its specifiers are unique already.

So this is for the initial implementation only. We can easily
implement support for other event types later. The only thing we need
to implement for this are unique identifiers for each persistent
event. Since all events are described in sysfs this change can be done
later without changing ABI or breaking userland.

The support for all event types is definitely on our list. We need
this also for a later implemention of registering all system wide
events as persistent events. This allows sharing events between
processes and is one of Ingos use cases to enable persistent events at
runtime.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] perf tools: Retry mapping buffers readonly on EACCES

2013-06-14 Thread Robert Richter

On 14.06.13 11:08:40, Namhyung Kim wrote:
> > -   if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
> > +try_again2:
> > +   if (perf_evlist__mmap(evlist, opts->mmap_pages, opts->mmap_ro) < 0) {
> > +   if (!opts->mmap_ro && errno == EACCES) {
> > +   opts->mmap_ro = true;
> > +   goto try_again2;
> > +   }
> > ui__error("Failed to mmap with %d (%s)\n",
> > errno, strerror(errno));
> > goto out_err;
> 
> 
> You will need this also:
> 
> 
> @@ -161,7 +161,8 @@ static int perf_record__mmap_read(struct perf_record *rec,
> }
>  
> md->prev = old;
> -   perf_mmap__write_tail(md, old);
> +   if (!rec->opts.mmap_ro)
> +   perf_mmap__write_tail(md, old);
>  
>  out:
> return rc;

Yes, indeed. Will add your change

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs

2013-06-14 Thread Robert Richter

On 14.06.13 11:36:00, Namhyung Kim wrote:
> > +static int pers_event_sysfs_register(struct pers_event *event)
> > +{
> > +   struct device_attribute *attr = &event->sysfs.attr;
> > +   int idx;
> > +
> > +   *attr = (struct device_attribute)__ATTR(, 0444, pers_event_sysfs_show,
> > +   NULL);
> > +   attr->attr.name = event->name;
> 
> When I added another persistent event with this API, I got an WARNING
> from lockdep like this:
> 
> [0.432506] BUG: key 88040946f140 not in .data!
> [0.432581] [ cut here ]
> [0.432656] WARNING: at /home/namhyung/project/linux/kernel/lockdep.c:2987 
> lockdep_init_map+0x53d/0x570()
> [0.432763] DEBUG_LOCKS_WARN_ON(1)
> 
> 
> I guess we need the following line here:
> 
>   sysfs_attr_init(&attr->attr);

Yes, added your change. Thanks Namhyung for reviewing and testing.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/16] perf, persistent: Return resonable error code

2013-05-31 Thread Robert Richter

From: Robert Richter 

Returning -ENODEV if no file descriptor is found. An error code of -1
(-EPERM) is misleading in this case.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 6612eb77..190aa75 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -165,7 +165,7 @@ int perf_get_persistent_event_fd(unsigned cpu, struct 
perf_event_attr *attr)
if (desc->attr->config == attr->config)
return __alloc_persistent_event_fd(desc);
 
-   return -1;
+   return -ENODEV;
 }
 
 
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/16] perf, persistent: Fix build error for no-tracepoints configs

2013-05-31 Thread Robert Richter

From: Robert Richter 

The mce_record tracepoint needs tracepoints to be enabled. Fixing
build error for no-tracepoints configs.

Signed-off-by: Robert Richter 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 0ecf4a2..d421937 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1987,6 +1987,8 @@ int __init mcheck_init(void)
return 0;
 }
 
+#ifdef CONFIG_TRACEPOINTS
+
 int __init mcheck_init_tp(void)
 {
if (perf_add_persistent_event_by_id(event_mce_record.event.type)) {
@@ -2001,6 +2003,8 @@ int __init mcheck_init_tp(void)
  */
 fs_initcall_sync(mcheck_init_tp);
 
+#endif /* CONFIG_TRACEPOINTS */
+
 /*
  * mce_syscore: PM support
  */
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/16] perf, persistent: Allow multiple users for an event

2013-05-31 Thread Robert Richter

From: Robert Richter 

Usually a fd close leads to the release of the event too. For
persistent events this is different as the events should be
permanently enabled in the system. Using reference counting to avoid
releasing an event during a fd close. This also allows it to have
multiple users (open file descriptors) for a single persistent event.

While at this, we don't need desc->fd any longer. The fd is attached
to a task and reference counting keeps the event. Removing desc->fd.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 46 --
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index a764144..4920702 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -11,7 +11,6 @@
 struct pers_event_desc {
struct perf_event *event;
struct list_head plist;
-   int fd;
 };
 
 struct pers_event {
@@ -88,6 +87,18 @@ out:
return event;
 }
 
+static void detach_persistent_event(struct pers_event_desc *desc)
+{
+   list_del(&desc->plist);
+   kfree(desc);
+}
+
+static void release_persistent_event(struct perf_event *event)
+{
+   perf_event_disable(event);
+   perf_event_release_kernel(event);
+}
+
 static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 {
struct pers_event_desc *desc;
@@ -100,12 +111,14 @@ static void del_persistent_event(int cpu, struct 
perf_event_attr *attr)
goto out;
event = desc->event;
 
-   list_del(&desc->plist);
-
-   perf_event_disable(event);
-   perf_event_release_kernel(event);
-   put_unused_fd(desc->fd);
-   kfree(desc);
+   /*
+* We primarily want to remove desc from the list. If there
+* are no open files, the refcount is 0 and we need to release
+* the event too.
+*/
+   detach_persistent_event(desc);
+   if (atomic_long_dec_and_test(&event->refcount))
+   release_persistent_event(event);
 out:
mutex_unlock(&per_cpu(pers_events_lock, cpu));
 }
@@ -182,18 +195,31 @@ fail:
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 {
struct pers_event_desc *desc;
+   struct perf_event *event;
int event_fd = -ENODEV;
 
mutex_lock(&per_cpu(pers_events_lock, cpu));
 
desc = get_persistent_event(cpu, attr);
-   if (!desc)
+
+   /* Increment refcount to keep event on put_event() */
+   if (!desc || !atomic_long_inc_not_zero(&desc->event->refcount))
goto out;
 
event_fd = anon_inode_getfd("[pers_event]", &perf_fops,
desc->event, O_RDONLY);
-   if (event_fd >= 0)
-   desc->fd = event_fd;
+
+   if (event_fd < 0) {
+   event = desc->event;
+   if (WARN_ON(atomic_long_dec_and_test(&event->refcount))) {
+   /*
+* May not happen since decrementing refcount is
+* protected by pers_events_lock.
+*/
+   detach_persistent_event(desc);
+   release_persistent_event(event);
+   }
+   }
 out:
mutex_unlock(&per_cpu(pers_events_lock, cpu));
 
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/16] perf, persistent: Return -EACCES if mapped buffers must be readonly

2013-05-31 Thread Robert Richter

From: Robert Richter 

mmap should return EACCES error if fd can not be opened writable.
This error code also helps userland to map buffers readonly on
failure.

Signed-off-by: Robert Richter 
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0730f36..a9b6470 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3761,7 +3761,7 @@ static int perf_mmap(struct file *file, struct 
vm_area_struct *vma)
return -EINVAL;
 
if (event->attr.persistent && (vma->vm_flags & VM_WRITE))
-   return -EINVAL;
+   return -EACCES;
 
vma_size = vma->vm_end - vma->vm_start;
nr_pages = (vma_size / PAGE_SIZE) - 1;
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/16] perf, persistent: Rework struct pers_event_desc

2013-05-31 Thread Robert Richter

From: Robert Richter 

Struct pers_event_desc is only used in kernel/events/persistent.c.
Moving it there. Also, removing attr member as this is a copy of
event->attr.

Signed-off-by: Robert Richter 
---
 include/linux/perf_event.h |  7 ---
 kernel/events/persistent.c | 12 
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d2a42b7..dc72c93 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -518,13 +518,6 @@ struct perf_output_handle {
int page;
 };
 
-struct pers_event_desc {
-   struct perf_event_attr *attr;
-   struct perf_event *event;
-   struct list_head plist;
-   int fd;
-};
-
 #ifdef CONFIG_PERF_EVENTS
 
 extern int perf_pmu_register(struct pmu *pmu, char *name, int type);
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 190aa75..22297e5 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -8,6 +8,12 @@
 /* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */
 #define CPU_BUFFER_NR_PAGES((512 * 1024) / PAGE_SIZE)
 
+struct pers_event_desc {
+   struct perf_event *event;
+   struct list_head plist;
+   int fd;
+};
+
 static DEFINE_PER_CPU(struct list_head, pers_events);
 
 static struct perf_event *
@@ -33,7 +39,6 @@ add_persistent_event_on_cpu(unsigned int cpu, struct 
perf_event_attr *attr,
rcu_assign_pointer(event->rb, buf);
 
desc->event = event;
-   desc->attr  = attr;
 
INIT_LIST_HEAD(&desc->plist);
list_add_tail(&desc->plist, &per_cpu(pers_events, cpu));
@@ -59,7 +64,7 @@ static void del_persistent_event(int cpu, struct 
perf_event_attr *attr)
struct perf_event *event = NULL;
 
list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
-   if (desc->attr->config == attr->config) {
+   if (desc->event->attr.config == attr->config) {
event = desc->event;
break;
}
@@ -78,7 +83,6 @@ static void del_persistent_event(int cpu, struct 
perf_event_attr *attr)
 
perf_event_release_kernel(event);
put_unused_fd(desc->fd);
-   kfree(desc->attr);
kfree(desc);
 }
 
@@ -162,7 +166,7 @@ int perf_get_persistent_event_fd(unsigned cpu, struct 
perf_event_attr *attr)
struct pers_event_desc *desc;
 
list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
-   if (desc->attr->config == attr->config)
+   if (desc->event->attr.config == attr->config)
return __alloc_persistent_event_fd(desc);
 
return -ENODEV;
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/16] perf, persistent: Exposing persistent events using sysfs

2013-05-31 Thread Robert Richter

From: Robert Richter 

Expose persistent events in the system to userland using sysfs. Perf
tools are able to read existing pmu events from sysfs. Now we use a
persistent pmu as an event container containing all registered
persistent events of the system. This patch adds dynamically
registration of persistent events to sysfs. E.g. something like this:

 
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23

Perf tools need to support the attr syntax that is added in a
separate patch set. With it we are able to run perf tool commands to
read persistent events, e.g.:

 # perf record -e persistent/mce_record/ sleep 10
 # perf top -e persistent/mce_record/

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 55 +-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index d5093a3..a764144 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -17,8 +17,10 @@ struct pers_event_desc {
 struct pers_event {
char*name;
struct perf_event_attr  attr;
+   struct perf_pmu_events_attr sysfs;
 };
 
+static struct pmu persistent_pmu;
 static DEFINE_PER_CPU(struct list_head, pers_events);
 static DEFINE_PER_CPU(struct mutex, pers_events_lock);
 
@@ -137,6 +139,8 @@ unwind:
return PTR_ERR(event);
 }
 
+static int pers_event_sysfs_register(struct pers_event *event);
+
 int perf_add_persistent_event_by_id(char* name, int id)
 {
struct pers_event   *event;
@@ -150,6 +154,8 @@ int perf_add_persistent_event_by_id(char* name, int id)
if (!event->name)
goto fail;
 
+   event->sysfs.id = id;
+
attr = &event->attr;
attr->sample_period = 1;
attr->wakeup_events = 1;
@@ -163,6 +169,8 @@ int perf_add_persistent_event_by_id(char* name, int id)
if (ret)
goto fail;
 
+   pers_event_sysfs_register(event);
+
return 0;
 fail:
kfree(event->name);
@@ -204,12 +212,57 @@ static struct attribute_group persistent_format_group = {
.attrs = persistent_format_attrs,
 };
 
+#define MAX_EVENTS 16
+
+static struct attribute *persistent_events_attr[MAX_EVENTS + 1] = { };
+
+static struct attribute_group persistent_events_group = {
+   .name = "events",
+   .attrs = persistent_events_attr,
+};
+
 static const struct attribute_group *persistent_attr_groups[] = {
&persistent_format_group,
+   NULL, /* placeholder: &persistent_events_group */
NULL,
 };
+#define EVENTS_GROUP   (persistent_attr_groups[1])
 
-static struct pmu persistent_pmu;
+static ssize_t pers_event_sysfs_show(struct device *dev,
+   struct device_attribute *__attr, char *page)
+{
+   struct perf_pmu_events_attr *attr =
+   container_of(__attr, struct perf_pmu_events_attr, attr);
+   return sprintf(page, "persistent,config=%lld",
+   (unsigned long long)attr->id);
+}
+
+static int pers_event_sysfs_register(struct pers_event *event)
+{
+   struct device_attribute *attr = &event->sysfs.attr;
+   int idx;
+
+   *attr = (struct device_attribute)__ATTR(, 0444, pers_event_sysfs_show,
+   NULL);
+   attr->attr.name = event->name;
+
+   /* add sysfs attr to events: */
+   for (idx = 0; idx < MAX_EVENTS; idx++) {
+   if (!cmpxchg(persistent_events_attr + idx, NULL, &attr->attr))
+   break;
+   }
+
+   if (idx >= MAX_EVENTS)
+   return -ENOSPC;
+   if (!idx)
+   EVENTS_GROUP = &persistent_events_group;
+   if (!persistent_pmu.dev)
+   return 0;   /* sysfs not yet initialized */
+   if (idx)
+   return sysfs_update_group(&persistent_pmu.dev->kobj,
+   EVENTS_GROUP);
+   return sysfs_create_group(&persistent_pmu.dev->kobj, EVENTS_GROUP);
+}
 
 static int persistent_pmu_init(struct perf_event *event)
 {
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/16] perf, persistent: Avoid adding identical events

2013-05-31 Thread Robert Richter

From: Robert Richter 

Check if an event already exists before adding it.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index ebef089..1e93b51 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -40,6 +40,12 @@ add_persistent_event_on_cpu(unsigned int cpu, struct 
perf_event_attr *attr,
 
mutex_lock(&per_cpu(pers_events_lock, cpu));
 
+   desc = get_persistent_event(cpu, attr);
+   if (desc) {
+   event = ERR_PTR(-EEXIST);
+   goto out;
+   }
+
desc = kzalloc(sizeof(*desc), GFP_KERNEL);
if (!desc) {
event = ERR_PTR(-ENOMEM);
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/16] perf, persistent: Implementing a persistent pmu

2013-05-31 Thread Robert Richter

From: Robert Richter 

We want to use the kernel's pmu design to later expose persistent
events via sysfs to userland. Initially implement a persistent pmu.

The format syntax is introduced allowing to set bits anywhere in
struct perf_event_attr. This is used in this case to set the
persistent flag (attr5:23). The syntax is attr where num is the
index of the u64 array in struct perf_event_attr. Otherwise syntax is
same as for config.

Patches that implement this functionality for perf tools are sent in a
separate patchset.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 1e93b51..95d6943 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -172,10 +172,45 @@ out:
return event_fd;
 }
 
+PMU_FORMAT_ATTR(persistent, "attr5:23");
+
+static struct attribute *persistent_format_attrs[] = {
+   &format_attr_persistent.attr,
+   NULL,
+};
+
+static struct attribute_group persistent_format_group = {
+   .name = "format",
+   .attrs = persistent_format_attrs,
+};
+
+static const struct attribute_group *persistent_attr_groups[] = {
+   &persistent_format_group,
+   NULL,
+};
+
+static struct pmu persistent_pmu;
+
+static int persistent_pmu_init(struct perf_event *event)
+{
+   if (persistent_pmu.type != event->attr.type)
+   return -ENOENT;
+
+   /* Not a persistent event. */
+   return -EFAULT;
+}
+
+static struct pmu persistent_pmu = {
+   .event_init = persistent_pmu_init,
+   .attr_groups= persistent_attr_groups,
+};
+
 void __init persistent_events_init(void)
 {
int cpu;
 
+   perf_pmu_register(&persistent_pmu, "persistent", -1);
+
for_each_possible_cpu(cpu) {
INIT_LIST_HEAD(&per_cpu(pers_events, cpu));
mutex_init(&per_cpu(pers_events_lock, cpu));
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/16] perf, persistent: Remove rb_put()

2013-05-31 Thread Robert Richter

From: Robert Richter 

rb_put() is called already in perf_event_release_kernel(), so no need
to do the same in del_persistent_event(). We also don't need it in
add_persistent_event_on_cpu() after a rework. Since there are no users
of rb_put() anymore, we can make it private again to only
events/core.c.

Signed-off-by: Robert Richter 
---
 kernel/events/core.c   |  4 +++-
 kernel/events/internal.h   |  1 -
 kernel/events/persistent.c | 29 +++--
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a9b6470..8f85caa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3021,6 +3021,8 @@ static void free_event_rcu(struct rcu_head *head)
kfree(event);
 }
 
+static void rb_put(struct ring_buffer *rb);
+
 static void free_event(struct perf_event *event)
 {
irq_work_sync(&event->pending);
@@ -3680,7 +3682,7 @@ static struct ring_buffer *rb_get(struct perf_event 
*event)
return rb;
 }
 
-void rb_put(struct ring_buffer *rb)
+static void rb_put(struct ring_buffer *rb)
 {
struct perf_event *event, *n;
unsigned long flags;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 3b481be..3647289 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -38,7 +38,6 @@ struct ring_buffer {
 extern void rb_free(struct ring_buffer *rb);
 extern struct ring_buffer *
 rb_alloc(int nr_pages, long watermark, int cpu, int flags);
-extern void rb_put(struct ring_buffer *rb);
 extern void perf_event_wakeup(struct perf_event *event);
 
 extern void
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 22297e5..ff1ce3b 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -20,22 +20,22 @@ static struct perf_event *
 add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
unsigned nr_pages)
 {
-   struct perf_event *event = ERR_PTR(-ENOMEM);
+   struct perf_event *event;
struct pers_event_desc *desc;
struct ring_buffer *buf;
 
desc = kzalloc(sizeof(*desc), GFP_KERNEL);
if (!desc)
-   goto out;
-
-   buf = rb_alloc(nr_pages, 0, cpu, 0);
-   if (!buf)
-   goto err_rb;
+   return ERR_PTR(-ENOMEM);
 
event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL);
if (IS_ERR(event))
goto err_event;
 
+   buf = rb_alloc(nr_pages, 0, cpu, 0);
+   if (!buf)
+   goto err_rb;
+
rcu_assign_pointer(event->rb, buf);
 
desc->event = event;
@@ -47,14 +47,12 @@ add_persistent_event_on_cpu(unsigned int cpu, struct 
perf_event_attr *attr,
perf_event_enable(event);
 
goto out;
-
- err_event:
-   rb_put(buf);
-
- err_rb:
+err_rb:
+   perf_event_release_kernel(event);
+   event = ERR_PTR(-ENOMEM);
+err_event:
kfree(desc);
-
- out:
+out:
return event;
 }
 
@@ -76,11 +74,6 @@ static void del_persistent_event(int cpu, struct 
perf_event_attr *attr)
list_del(&desc->plist);
 
perf_event_disable(event);
-   if (event->rb) {
-   rb_put(event->rb);
-   rcu_assign_pointer(event->rb, NULL);
-   }
-
perf_event_release_kernel(event);
put_unused_fd(desc->fd);
kfree(desc);
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/16] perf, persistent: Name each persistent event

2013-05-31 Thread Robert Richter

From: Robert Richter 

For later adding persistent events to sysfs we need a name for each
event. Adding a name to each persistent event.

Signed-off-by: Robert Richter 
---
 arch/x86/kernel/cpu/mcheck/mce.c |  3 ++-
 include/linux/perf_event.h   |  4 ++--
 kernel/events/persistent.c   | 30 +-
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d421937..833eb7a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1991,7 +1991,8 @@ int __init mcheck_init(void)
 
 int __init mcheck_init_tp(void)
 {
-   if (perf_add_persistent_event_by_id(event_mce_record.event.type)) {
+   if (perf_add_persistent_event_by_id(event_mce_record.name,
+   event_mce_record.event.type)) {
pr_err("Error adding MCE persistent event.\n");
return -EINVAL;
}
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index dc72c93..06b4357b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -751,7 +751,7 @@ extern void perf_event_disable(struct perf_event *event);
 extern int __perf_event_disable(void *info);
 extern void perf_event_task_tick(void);
 extern int perf_add_persistent_event(struct perf_event_attr *, unsigned);
-extern int perf_add_persistent_event_by_id(int id);
+extern int perf_add_persistent_event_by_id(char *name, int id);
 #else /* !CONFIG_PERF_EVENTS */
 static inline void
 perf_event_task_sched_in(struct task_struct *prev,
@@ -794,7 +794,7 @@ static inline int __perf_event_disable(void *info)  
{ return -1; }
 static inline void perf_event_task_tick(void)  { }
 static inline int perf_add_persistent_event(struct perf_event_attr *attr,
unsigned nr_pages)  { 
return -EINVAL; }
-static inline int perf_add_persistent_event_by_id(int id)  { 
return -EINVAL; }
+static inline int perf_add_persistent_event_by_id(char *name, int id)  { 
return -EINVAL; }
 #endif /* !CONFIG_PERF_EVENTS */
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 95d6943..d5093a3 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -14,6 +14,11 @@ struct pers_event_desc {
int fd;
 };
 
+struct pers_event {
+   char*name;
+   struct perf_event_attr  attr;
+};
+
 static DEFINE_PER_CPU(struct list_head, pers_events);
 static DEFINE_PER_CPU(struct mutex, pers_events_lock);
 
@@ -132,14 +137,20 @@ unwind:
return PTR_ERR(event);
 }
 
-int perf_add_persistent_event_by_id(int id)
+int perf_add_persistent_event_by_id(char* name, int id)
 {
-   struct perf_event_attr *attr;
+   struct pers_event   *event;
+   struct perf_event_attr  *attr;
+   int ret = -ENOMEM;
 
-   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
-   if (!attr)
+   event = kzalloc(sizeof(*event), GFP_KERNEL);
+   if (!event)
return -ENOMEM;
+   event->name = kstrdup(name, GFP_KERNEL);
+   if (!event->name)
+   goto fail;
 
+   attr = &event->attr;
attr->sample_period = 1;
attr->wakeup_events = 1;
attr->sample_type   = PERF_SAMPLE_RAW;
@@ -148,7 +159,16 @@ int perf_add_persistent_event_by_id(int id)
attr->type  = PERF_TYPE_TRACEPOINT;
attr->size  = sizeof(*attr);
 
-   return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
+   ret = perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
+   if (ret)
+   goto fail;
+
+   return 0;
+fail:
+   kfree(event->name);
+   kfree(event);
+
+   return ret;
 }
 
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/16] perf, persistent: Protect event lists with mutex

2013-05-31 Thread Robert Richter

From: Robert Richter 

Protect esp. access to struct pers_event_desc *desc. There are race
conditions possible where the descriptor could be removed from list
while it is used.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 4ecbdab..ebef089 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -15,6 +15,7 @@ struct pers_event_desc {
 };
 
 static DEFINE_PER_CPU(struct list_head, pers_events);
+static DEFINE_PER_CPU(struct mutex, pers_events_lock);
 
 static struct pers_event_desc
 *get_persistent_event(int cpu, struct perf_event_attr *attr)
@@ -37,9 +38,13 @@ add_persistent_event_on_cpu(unsigned int cpu, struct 
perf_event_attr *attr,
struct pers_event_desc *desc;
struct ring_buffer *buf;
 
+   mutex_lock(&per_cpu(pers_events_lock, cpu));
+
desc = kzalloc(sizeof(*desc), GFP_KERNEL);
-   if (!desc)
-   return ERR_PTR(-ENOMEM);
+   if (!desc) {
+   event = ERR_PTR(-ENOMEM);
+   goto out;
+   }
 
event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL);
if (IS_ERR(event))
@@ -66,6 +71,7 @@ err_rb:
 err_event:
kfree(desc);
 out:
+   mutex_unlock(&per_cpu(pers_events_lock, cpu));
return event;
 }
 
@@ -74,9 +80,11 @@ static void del_persistent_event(int cpu, struct 
perf_event_attr *attr)
struct pers_event_desc *desc;
struct perf_event *event;
 
+   mutex_lock(&per_cpu(pers_events_lock, cpu));
+
desc = get_persistent_event(cpu, attr);
if (!desc)
-   return;
+   goto out;
event = desc->event;
 
list_del(&desc->plist);
@@ -85,6 +93,8 @@ static void del_persistent_event(int cpu, struct 
perf_event_attr *attr)
perf_event_release_kernel(event);
put_unused_fd(desc->fd);
kfree(desc);
+out:
+   mutex_unlock(&per_cpu(pers_events_lock, cpu));
 }
 
 /*
@@ -137,25 +147,31 @@ int perf_add_persistent_event_by_id(int id)
 
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 {
-   struct pers_event_desc *desc = get_persistent_event(cpu, attr);
-   int event_fd;
+   struct pers_event_desc *desc;
+   int event_fd = -ENODEV;
 
+   mutex_lock(&per_cpu(pers_events_lock, cpu));
+
+   desc = get_persistent_event(cpu, attr);
if (!desc)
-   return -ENODEV;
+   goto out;
 
event_fd = anon_inode_getfd("[pers_event]", &perf_fops,
desc->event, O_RDONLY);
if (event_fd >= 0)
desc->fd = event_fd;
+out:
+   mutex_unlock(&per_cpu(pers_events_lock, cpu));
 
return event_fd;
 }
 
-
 void __init persistent_events_init(void)
 {
-   int i;
+   int cpu;
 
-   for_each_possible_cpu(i)
-   INIT_LIST_HEAD(&per_cpu(pers_events, i));
+   for_each_possible_cpu(cpu) {
+   INIT_LIST_HEAD(&per_cpu(pers_events, cpu));
+   mutex_init(&per_cpu(pers_events_lock, cpu));
+   }
 }
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/16] perf, persistent: Setting default buffer size to 512k as in perf tools

2013-05-31 Thread Robert Richter

From: Robert Richter 

The default buffer size used to setup event buffers with perf tools is
512k. Using the same buffer size for persistent events. This also
avoids failed mmap calls due to different buffer sizes.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 9075164..1e6c03a 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -5,6 +5,9 @@
 
 #include "internal.h"
 
+/* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */
+#define CPU_BUFFER_NR_PAGES((512 * 1024) / PAGE_SIZE)
+
 static DEFINE_PER_CPU(struct list_head, pers_events);
 
 static struct perf_event *
@@ -151,7 +154,7 @@ int perf_add_persistent_event_by_id(int id)
attr->type  = PERF_TYPE_TRACEPOINT;
attr->size  = sizeof(*attr);
 
-   return perf_add_persistent_event(attr, 4);
+   return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
 }
 
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/16] perf, persistent: Reworking perf_get_persistent_event_fd()

2013-05-31 Thread Robert Richter

From: Robert Richter 

There is already a function anon_inode_getfd() that does already all
the work. Reworking and simplifying code.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 35 +++
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 7d1aff3..4ecbdab 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -87,33 +87,6 @@ static void del_persistent_event(int cpu, struct 
perf_event_attr *attr)
kfree(desc);
 }
 
-static int __alloc_persistent_event_fd(struct pers_event_desc *desc)
-{
-   struct file *event_file = NULL;
-   int event_fd = -1;
-
-   event_fd = get_unused_fd();
-   if (event_fd < 0)
-   goto out;
-
-   event_file = anon_inode_getfile("[pers_event]", &perf_fops,
-   desc->event, O_RDONLY);
-   if (IS_ERR(event_file))
-   goto err_event_file;
-
-   desc->fd = event_fd;
-   fd_install(event_fd, event_file);
-
-   goto out;
-
-
- err_event_file:
-   put_unused_fd(event_fd);
-
- out:
-   return event_fd;
-}
-
 /*
  * Create and enable the persistent version of the perf event described by
  * @attr.
@@ -165,11 +138,17 @@ int perf_add_persistent_event_by_id(int id)
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 {
struct pers_event_desc *desc = get_persistent_event(cpu, attr);
+   int event_fd;
 
if (!desc)
return -ENODEV;
 
-   return __alloc_persistent_event_fd(desc);
+   event_fd = anon_inode_getfd("[pers_event]", &perf_fops,
+   desc->event, O_RDONLY);
+   if (event_fd >= 0)
+   desc->fd = event_fd;
+
+   return event_fd;
 }
 
 
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/16] perf, persistent: Print error code on failure when adding events

2013-05-31 Thread Robert Richter

From: Robert Richter 

Improve error reporting.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 1e6c03a..6612eb77 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -123,15 +123,15 @@ int perf_add_persistent_event(struct perf_event_attr 
*attr, unsigned nr_pages)
 
for_each_possible_cpu(i) {
event = add_persistent_event_on_cpu(i, attr, nr_pages);
-   if (IS_ERR(event)) {
-   pr_err("%s: Error adding persistent event on cpu %d\n",
-   __func__, i);
+   if (IS_ERR(event))
goto unwind;
-   }
}
return 0;
 
 unwind:
+   pr_err("%s: Error adding persistent event on cpu %d: %ld\n",
+   __func__, i, PTR_ERR(event));
+
while (--i >= 0)
del_persistent_event(i, attr);
 
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/16] perf, persistent: Kernel updates for perf tool integration

2013-05-31 Thread Robert Richter

From: Robert Richter 

This patch set implements out of the box support of perf tools for
persistent events. For this the kernel must provide necessary
information about existing persistent events via sysfs to userland.
Persistent events are provided by the kernel with readonly event
buffers. To allow the independent usage of the event buffers by any
process without limiting other processes, multiple users of a single
event must be handled too.

The basic concept is to use a pmu as an event container for persistent
events. The pmu registers events in sysfs and provides format and
event information for the userland. The persistent event framework
requires to add events to the pmu dynamically.

With the information in sysfs userland knows about how to setup the
perf_event attribute of a persistent event. Since a persistent event
always has the persistent flag set, a way is needed to express this in
sysfs. A new syntax is used for this. With 'attr:' any bit
in the attribute structure may be set in a similar way as using
'config', but  is an index that points to the u64 value to
change within the attribute.

For persistent events the persistent flag (bit 23 of flag field in
struct perf_event_attr) needs to be set which is expressed in sysfs
with "attr5:23". E.g. the mce_record event is described in sysfs as
follows:

 
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23

Note that perf tools need to support the 'attr' syntax that is
added in a separate patch set. With it we are able to run perf tool
commands to read persistent events, e.g.:

 # perf record -e persistent/mce_record/ sleep 10
 # perf top -e persistent/mce_record/

In general the new syntax is flexible to describe with sysfs any event
to be setup by perf tools.

First patches contain also fixes and reworks made after reviewing
code. Could be applied separately.

Patches base on Boris' patches which I have rebased to latest
tip/perf/core. All patches can be found here:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent

-Robert



Robert Richter (16):
  perf, persistent: Fix build error for no-tracepoints configs
  perf, persistent: Fix attr size
  perf, persistent: Setting default buffer size to 512k as in perf tools
  perf, persistent: Print error code on failure when adding events
  perf, persistent: Return resonable error code
  perf, persistent: Return -EACCES if mapped buffers must be readonly
  perf, persistent: Rework struct pers_event_desc
  perf, persistent: Remove rb_put()
  perf, persistent: Introduce get_persistent_event()
  perf, persistent: Reworking perf_get_persistent_event_fd()
  perf, persistent: Protect event lists with mutex
  perf, persistent: Avoid adding identical events
  perf, persistent: Implementing a persistent pmu
  perf, persistent: Name each persistent event
  perf, persistent: Exposing persistent events using sysfs
  perf, persistent: Allow multiple users for an event

 arch/x86/kernel/cpu/mcheck/mce.c |   7 +-
 include/linux/perf_event.h   |  11 +-
 kernel/events/core.c |   6 +-
 kernel/events/internal.h |   1 -
 kernel/events/persistent.c   | 292 +--
 5 files changed, 229 insertions(+), 88 deletions(-)

-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/16] perf, persistent: Introduce get_persistent_event()

2013-05-31 Thread Robert Richter

From: Robert Richter 

Reducing duplicate code by introducing function get_persistent_event()
to get the descriptor of an persistent event.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index ff1ce3b..7d1aff3 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -16,6 +16,19 @@ struct pers_event_desc {
 
 static DEFINE_PER_CPU(struct list_head, pers_events);
 
+static struct pers_event_desc
+*get_persistent_event(int cpu, struct perf_event_attr *attr)
+{
+   struct pers_event_desc *desc;
+
+   list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist) {
+   if (desc->event->attr.config == attr->config)
+   return desc;
+   }
+
+   return NULL;
+}
+
 static struct perf_event *
 add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
unsigned nr_pages)
@@ -58,18 +71,13 @@ out:
 
 static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 {
-   struct pers_event_desc *desc, *tmp;
-   struct perf_event *event = NULL;
-
-   list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
-   if (desc->event->attr.config == attr->config) {
-   event = desc->event;
-   break;
-   }
-   }
+   struct pers_event_desc *desc;
+   struct perf_event *event;
 
-   if (!event)
+   desc = get_persistent_event(cpu, attr);
+   if (!desc)
return;
+   event = desc->event;
 
list_del(&desc->plist);
 
@@ -156,13 +164,12 @@ int perf_add_persistent_event_by_id(int id)
 
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 {
-   struct pers_event_desc *desc;
+   struct pers_event_desc *desc = get_persistent_event(cpu, attr);
 
-   list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
-   if (desc->event->attr.config == attr->config)
-   return __alloc_persistent_event_fd(desc);
+   if (!desc)
+   return -ENODEV;
 
-   return -ENODEV;
+   return __alloc_persistent_event_fd(desc);
 }
 
 
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/16] perf, persistent: Fix attr size

2013-05-31 Thread Robert Richter

From: Robert Richter 

Fixing wrongly determined attribute size.

Signed-off-by: Robert Richter 
---
 kernel/events/persistent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 5335644..9075164 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -149,7 +149,7 @@ int perf_add_persistent_event_by_id(int id)
attr->persistent= 1;
attr->config= id;
attr->type  = PERF_TYPE_TRACEPOINT;
-   attr->size  = sizeof(attr);
+   attr->size  = sizeof(*attr);
 
return perf_add_persistent_event(attr, 4);
 }
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/4] perf tools: Persistent events, changes for perf tool integration

2013-05-31 Thread Robert Richter

From: Robert Richter 

This patch set contains userland changes necessary for out-of-the-box
support of persistent events. These patches are follow on patches of
the kernel patches I sent out today:

 [PATCH 00/16] perf, persistent: Kernel updates for perf tool integration

Persistent events are always enabled kernel events. Buffers are mapped
readonly and multiple users are allowed. The persistent event flag of
the event attribute must be set to specify such an event.

The following changes to perf tools are necessary to support
persistent events. A way is needed to specify sysfs entries to set
event flags. For this a new syntax 'attr' was added to the event
parser, see patch #3. We also need to change perf tools to mmap
persistent event buffers readonly.

All patches can be found here:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent

-Robert



Robert Richter (4):
  perf tools: Rename flex conditions to avoid name conflicts
  perf tools: Modify event parser to update event attribute by index
  perf tools: Add attr syntax to event parser
  perf tools: Retry mapping buffers readonly on EACCES

 tools/perf/builtin-record.c |  7 -
 tools/perf/builtin-top.c|  8 --
 tools/perf/perf.h   |  1 +
 tools/perf/tests/parse-events.c | 12 ++---
 tools/perf/util/parse-events.c  | 59 +++--
 tools/perf/util/parse-events.h  | 12 -
 tools/perf/util/parse-events.l  | 56 +++---
 tools/perf/util/parse-events.y  | 24 ++---
 tools/perf/util/pmu.c   | 32 +-
 tools/perf/util/pmu.h   |  9 ++-
 tools/perf/util/pmu.l   |  1 +
 tools/perf/util/pmu.y   | 18 ++---
 12 files changed, 127 insertions(+), 112 deletions(-)

-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] perf tools: Rename flex conditions to avoid name conflicts

2013-05-31 Thread Robert Richter

From: Robert Richter 

These define's may cause conflicts with other definitions:

 #define INITIAL 0
 #define mem 1
 #define config 2
 #define event 3

Prefix them with cond_* to avoid this.

Signed-off-by: Robert Richter 
---
 tools/perf/util/parse-events.l | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index e9d1134..03ead35 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -69,9 +69,9 @@ static int term(yyscan_t scanner, int type)
 
 %}
 
-%x mem
-%s config
-%x event
+%x cond_mem
+%s cond_config
+%x cond_event
 
 group  [^,{}/]*[{][^}]*[}][^,{}/]*
 event_pmu  [^,{}/]+[/][^/]*[/][^,{}/]*
@@ -94,9 +94,9 @@ modifier_bp   [rwx]{1,3}
start_token = parse_events_get_extra(yyscanner);
 
if (start_token == PE_START_TERMS)
-   BEGIN(config);
+   BEGIN(cond_config);
else if (start_token == PE_START_EVENTS)
-   BEGIN(event);
+   BEGIN(cond_event);
 
if (start_token) {
parse_events_set_extra(NULL, yyscanner);
@@ -105,7 +105,7 @@ modifier_bp [rwx]{1,3}
  }
 %}
 
-{
+{
 
 {group}{
BEGIN(INITIAL); yyless(0);
@@ -160,7 +160,7 @@ speculative-read|speculative-load   |
 refs|Reference|ops|access  |
 misses|miss{ return str(yyscanner, 
PE_NAME_CACHE_OP_RESULT); }
 
-{
+{
 config { return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_CONFIG); }
 config1{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_CONFIG1); }
 config2{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_CONFIG2); }
@@ -172,23 +172,23 @@ branch_type   { return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE
 {name_minus}   { return str(yyscanner, PE_NAME); }
 }
 
-mem:   { BEGIN(mem); return PE_PREFIX_MEM; }
+mem:   { BEGIN(cond_mem); return PE_PREFIX_MEM; }
 r{num_raw_hex} { return raw(yyscanner); }
 {num_dec}  { return value(yyscanner, 10); }
 {num_hex}  { return value(yyscanner, 16); }
 
 {modifier_event}   { return str(yyscanner, PE_MODIFIER_EVENT); }
 {name} { return str(yyscanner, PE_NAME); }
-"/"{ BEGIN(config); return '/'; }
+"/"{ BEGIN(cond_config); return '/'; }
 -  { return '-'; }
-,  { BEGIN(event); return ','; }
+,  { BEGIN(cond_event); return ','; }
 :  { return ':'; }
-"{"{ BEGIN(event); return '{'; }
+"{"{ BEGIN(cond_event); return '{'; }
 "}"{ return '}'; }
 =  { return '='; }
 \n { }
 
-{
+{
 {modifier_bp}  { return str(yyscanner, PE_MODIFIER_BP); }
 :  { return ':'; }
 {num_dec}  { return value(yyscanner, 10); }
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] perf tools: Modify event parser to update event attribute by index

2013-05-31 Thread Robert Richter

From: Robert Richter 

In a later patch we want to introduce a syntax that allows updating
attribute fields by an index pointing to a certain u64 entry of struct
perf_event_attr. We need this to expose any event via sysfs that is
available in the system where especially flag fields need to be set.

Reworking the event parser to use an attribute index. This is done by
introducing type PARSE_EVENTS__TERM_TYPE_ATTR for all numeric values
to be added to the event attribute fields. We use an index to specify
the corresponding u64 attr value.

Signed-off-by: Robert Richter 
---
 tools/perf/tests/parse-events.c | 12 ++---
 tools/perf/util/parse-events.c  | 59 +++--
 tools/perf/util/parse-events.h  | 12 -
 tools/perf/util/parse-events.l  | 18 ++---
 tools/perf/util/parse-events.y  | 24 ++---
 5 files changed, 65 insertions(+), 60 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 0275bab..916a41a 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -471,7 +471,9 @@ static int test__checkterms_simple(struct list_head *terms)
/* config=10 */
term = list_entry(terms->next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
-   term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG);
+   term->type_term == PARSE_EVENTS__TERM_TYPE_ATTR);
+   TEST_ASSERT_VAL("wrong type idx",
+   term->idx == 1);
TEST_ASSERT_VAL("wrong type val",
term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
TEST_ASSERT_VAL("wrong val", term->val.num == 10);
@@ -480,7 +482,9 @@ static int test__checkterms_simple(struct list_head *terms)
/* config1 */
term = list_entry(term->list.next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
-   term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG1);
+   term->type_term == PARSE_EVENTS__TERM_TYPE_ATTR);
+   TEST_ASSERT_VAL("wrong type idx",
+   term->idx == 7);
TEST_ASSERT_VAL("wrong type val",
term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
TEST_ASSERT_VAL("wrong val", term->val.num == 1);
@@ -489,7 +493,9 @@ static int test__checkterms_simple(struct list_head *terms)
/* config2=3 */
term = list_entry(term->list.next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
-   term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG2);
+   term->type_term == PARSE_EVENTS__TERM_TYPE_ATTR);
+   TEST_ASSERT_VAL("wrong type idx",
+   term->idx == 8);
TEST_ASSERT_VAL("wrong type val",
term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
TEST_ASSERT_VAL("wrong val", term->val.num == 3);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6c8bb0f..3ba1450 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -533,6 +533,19 @@ int parse_events_add_breakpoint(struct list_head **list, 
int *idx,
return add_event(list, idx, &attr, NULL);
 }
 
+int parse_events__set_attr(struct perf_event_attr *__attr, u64 idx, u64 val)
+{
+   __u64 *attr = (__u64 *)__attr;
+
+   if (idx * sizeof(*attr) >= sizeof(*__attr))
+   return -EINVAL;
+
+   attr += idx;
+   *attr |= val;
+
+   return 0;
+}
+
 static int config_term(struct perf_event_attr *attr,
   struct parse_events_term *term)
 {
@@ -543,36 +556,17 @@ do {  
\
 } while (0)
 
switch (term->type_term) {
-   case PARSE_EVENTS__TERM_TYPE_CONFIG:
+   case PARSE_EVENTS__TERM_TYPE_ATTR:
CHECK_TYPE_VAL(NUM);
-   attr->config = term->val.num;
-   break;
-   case PARSE_EVENTS__TERM_TYPE_CONFIG1:
-   CHECK_TYPE_VAL(NUM);
-   attr->config1 = term->val.num;
-   break;
-   case PARSE_EVENTS__TERM_TYPE_CONFIG2:
-   CHECK_TYPE_VAL(NUM);
-   attr->config2 = term->val.num;
-   break;
-   case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
-   CHECK_TYPE_VAL(NUM);
-   attr->sample_period = term->val.num;
-   break;
-   case PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE:
-   /*
-* TODO uncomment when the field is available
-* attr->branch_sample_type = term->val.num;
-*/
-   break;
+   return parse_events__set_

[PATCH 3/4] perf tools: Add attr syntax to event parser

2013-05-31 Thread Robert Richter

From: Robert Richter 

The event parser is limited to update only a subset of all fields in
struct perf_event_attr (config*, period, branch_type). We are not able
to set other attr fields, esp. flags.

Introducing a new syntax to set any field of the event attribute by
using an index to the u64 value to be used within struct
perf_event_attr. The new syntax attr is similar to config,
but  specifies the index to be used. E.g. attr5:23 sets bit 23 of
the flag field of attr.

The persistent event implementation is a use case of the above. In
this case sysfs provides:

 
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23

Persistent events are exposed via sysfs and need to set the persistent
flag (bit 23 of the flag field). With the sysfs entry above we are
able to define the persistent flag format and then may setup a
mce_record event with that flag set.

In general we are now flexible to describe with sysfs any event to be
setup by perf tools.

Signed-off-by: Robert Richter 
---
 tools/perf/util/parse-events.l | 14 ++
 tools/perf/util/pmu.c  | 32 ++--
 tools/perf/util/pmu.h  |  9 ++---
 tools/perf/util/pmu.l  |  1 +
 tools/perf/util/pmu.y  | 18 ++
 5 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index f9397cc..b71356e 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -67,6 +67,19 @@ static int attr(yyscan_t scanner, u64 idx)
return PE_TERM_ATTR;
 }
 
+static int attr_parse(yyscan_t scanner)
+{
+   YYSTYPE *yylval = parse_events_get_lval(scanner);
+   char *text = parse_events_get_text(scanner);
+
+   errno = 0;
+   yylval->num = strtoull(text + 4, NULL, 10);
+   if (errno)
+   return PE_ERROR;
+
+   return PE_TERM_ATTR;
+}
+
 %}
 
 %x cond_mem
@@ -161,6 +174,7 @@ refs|Reference|ops|access   |
 misses|miss{ return str(yyscanner, 
PE_NAME_CACHE_OP_RESULT); }
 
 {
+attr[0-9]* { return attr_parse(yyscanner); }
 config { return attr(yyscanner, PERF_ATTR_IDX(config)); }
 config1{ return attr(yyscanner, 
PERF_ATTR_IDX(config1)); }
 config2{ return attr(yyscanner, 
PERF_ATTR_IDX(config2)); }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 4c6f9c4..b2eb9fe 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -17,8 +17,8 @@ struct perf_pmu_alias {
 };
 
 struct perf_pmu_format {
-   char *name;
-   int value;
+   char*name;
+   u64 idx;
DECLARE_BITMAP(bits, PERF_PMU_FORMAT_BITS);
struct list_head list;
 };
@@ -418,7 +418,6 @@ static int pmu_config_term(struct list_head *formats,
   struct parse_events_term *term)
 {
struct perf_pmu_format *format;
-   __u64 *vp;
 
/*
 * Support only for hardcoded and numnerial terms.
@@ -435,27 +434,8 @@ static int pmu_config_term(struct list_head *formats,
if (!format)
return -EINVAL;
 
-   switch (format->value) {
-   case PERF_PMU_FORMAT_VALUE_CONFIG:
-   vp = &attr->config;
-   break;
-   case PERF_PMU_FORMAT_VALUE_CONFIG1:
-   vp = &attr->config1;
-   break;
-   case PERF_PMU_FORMAT_VALUE_CONFIG2:
-   vp = &attr->config2;
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   /*
-* XXX If we ever decide to go with string values for
-* non-hardcoded terms, here's the place to translate
-* them into value.
-*/
-   *vp |= pmu_format_value(format->bits, term->val.num);
-   return 0;
+   return parse_events__set_attr(attr, format->idx,
+   pmu_format_value(format->bits, term->val.num));
 }
 
 int perf_pmu__config_terms(struct list_head *formats,
@@ -537,7 +517,7 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct 
list_head *head_terms)
 }
 
 int perf_pmu__new_format(struct list_head *list, char *name,
-int config, unsigned long *bits)
+   __u64 idx, unsigned long *bits)
 {
struct perf_pmu_format *format;
 
@@ -546,7 +526,7 @@ int perf_pmu__new_format(struct list_head *list, char *name,
return -ENOMEM;
 
format->name = strdup(name);
-   format->value = config;
+   format->idx = idx;
memcpy(format->bits, bits, sizeof(format->bits));
 
list_add_tail(&format->list, list);
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 32fe55b..9c4fac1 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -3,12 +3,7 @@

[PATCH 4/4] perf tools: Retry mapping buffers readonly on EACCES

2013-05-31 Thread Robert Richter

From: Robert Richter 

Persistent event buffers may only be mmapped readonly. Thus, retry
mapping it readonly if mmap returns EACCES after trying to mmap
writable.

Signed-off-by: Robert Richter 
---
 tools/perf/builtin-record.c | 7 ++-
 tools/perf/builtin-top.c| 8 ++--
 tools/perf/perf.h   | 1 +
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cdf58ec..916776d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -255,7 +255,12 @@ try_again:
goto out;
}
 
-   if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
+try_again2:
+   if (perf_evlist__mmap(evlist, opts->mmap_pages, opts->mmap_ro) < 0) {
+   if (!opts->mmap_ro && errno == EACCES) {
+   opts->mmap_ro = true;
+   goto try_again2;
+   }
if (errno == EPERM) {
pr_err("Permission error mapping pages.\n"
   "Consider increasing "
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 67bdb9f..2a5757d 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -899,8 +899,12 @@ try_again:
goto out_err;
}
}
-
-   if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
+try_again2:
+   if (perf_evlist__mmap(evlist, opts->mmap_pages, opts->mmap_ro) < 0) {
+   if (!opts->mmap_ro && errno == EACCES) {
+   opts->mmap_ro = true;
+   goto try_again2;
+   }
ui__error("Failed to mmap with %d (%s)\n",
errno, strerror(errno));
goto out_err;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 32bd102..41acea3 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -221,6 +221,7 @@ struct perf_record_opts {
bool sample_weight;
bool sample_time;
bool period;
+   bool mmap_ro;
unsigned int freq;
unsigned int mmap_pages;
unsigned int user_freq;
-- 
1.8.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/16] perf, persistent: Kernel updates for perf tool integration

2013-05-31 Thread Robert Richter

On 31.05.13 11:15:40, Borislav Petkov wrote:
> On Fri, May 31, 2013 at 10:47:20AM +0200, Robert Richter wrote:
> > Patches base on Boris' patches which I have rebased to latest
> > tip/perf/core. All patches can be found here:
> > 
> >  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent
> 
> Right, we should merge the fixes/changes to the persistent patches with
> the original ones since they are still WIP anyway.
> 
> Feel free to rework them as you see fit, or, I could do that when I get
> back.

Hmm, since the changes in the onliner patches are either hard effort
to find in reviewing/testing or more or less related to the new
implementation, I better prefer to keep authorship as well to document
the code development (don't blame me about patch count ;)). We better
should add a branch (preferable at topic branch in tip?) as soon as
possible for this.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] perf tools: Persistent events, changes for perf tool integration

2013-05-31 Thread Robert Richter

> > On Fri, May 31, 2013 at 02:07:20PM +0200, Ingo Molnar wrote:
> > > For that I'd like to create a persistent event that just keeps
> > > running, and to which I can occasionally attach to read-only to see
> > > what's going on and maybe attach to it read-write to drain the trace
> > > entries. I.e. basically a global trace buffer. How do I achieve that
> > > with this new tooling?

Actually every system-wide event that is opened with readonly buffers
could be shared between processes, which would be the same as
connecting to a persistent event. I didn't want to implement this in a
first step, but I think this could be implemented without too much
effort.

The main problem is that at least one file descriptor needs to be open
at any time. Otherwise the event would be removed. So there is no
concept (except enabling an in-kernel event) yet to keep the event
running without any process having an event file discriptor open. Even
harder will it be to release such an event, there is no distincition
beetween detaching the event from the process or permanently removing
the event.

So the easiest would be to just open a system-wide event and then put
the process into sleep until we want to remove the event. In between
other processes (maybe duplicate on fork?) could attach to the same
buffer. Only if no process is attached to the event enymore the event
will be removed. Would that fit your purpose?

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/16] perf, persistent: Kernel updates for perf tool integration

2013-06-01 Thread Robert Richter

On 31.05.13 14:21:36, Borislav Petkov wrote:
> On Fri, May 31, 2013 at 11:32:10AM +0200, Robert Richter wrote:
> > Hmm, since the changes in the onliner patches are either hard effort
> > to find in reviewing/testing or more or less related to the new
> > implementation, I better prefer to keep authorship as well to document
> > the code development (don't blame me about patch count ;)). We better
> > should add a branch (preferable at topic branch in tip?) as soon as
> > possible for this.
> 
> Yes, you should definitely keep authorship - simply state this in the
> commit message, add your SOB, etc. However, I don't want to have messy
> history for patches which haven't been reviewed yet. This complicates
> review needlessly and makes absolutely no sense for later when you stare
> at the history.

No, it's not about authorship in the sense of copyright, it's just
about keeping track of changes. My changes weren't related to a patch
review. In that case it would be totally fine to instantly merge the
changes.

Instead I reviewed the code while developing it with certain goals in
mind. Most changes I found necessary while building and running a
modified version during development. That never could be found in a
patch review. These changes are what we actually want to see in git
history.

And your argument that changes should be merged to reduce review
effort would actually mean to drop all the code you introduce which is
later removed in my patches (see below for the diff stats).

I also don't think we need to re-review your patches. Most of it has
been reviewed and should also not change much to avoid rebase
conflicts. In my point of view they are fine to be applied to a perf
topic branch. Ingo, would this be ok? There is no messy history if we
later just apply my patches on top. So no, I don't agree with you here
to merge some of my patches.

-Robert

 $ git diff tip/master...ras --stat kernel/
 kernel/events/Makefile  |   2 +-
 kernel/events/core.c|  56 +++
 kernel/events/internal.h|   4 +++
 kernel/events/persistent.c  | 175 

 kernel/events/ring_buffer.c |   7 ++---
 5 files changed, 214 insertions(+), 30 deletions(-)

 $ git diff ras...persistent --stat kernel/
 kernel/events/core.c   |   6 ++-
 kernel/events/internal.h   |   1 -
 kernel/events/persistent.c | 292 
+
 3 files changed, 221 insertions(+), 78 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tools: Fix output directory of Documentation/

2013-06-21 Thread Robert Richter

Jiri and Arnaldo,

On 20.06.13 12:26:03, Arnaldo Carvalho de Melo wrote:
> Robert, so this one is with:
> 
>   cd tools/perf
>   make install-doc
> 
> I tested with:
> 
>   make -C tools/perf O=/tmp/build/perf install-doc
> 
> and it works, so it is just the in tree build that fails.

The patch below fixes this. Thanks for reporting.

-Robert


>From 69714303856005a1dc84c0594a4c3ec1013f8a80 Mon Sep 17 00:00:00 2001
From: Robert Richter 
Date: Fri, 21 Jun 2013 14:26:44 +0200
Subject: [PATCH] perf tools: Fixing in-tree documentation build error

Fixing build error for:

 $ cd tools/perf
 $ make install-doc

The following patch changed handling of the OUTPUT variable:

 79e10cb perf tools: Fix output directory of Documentation/

Now, OUTPUT is unset if the build directory is the current working dir
which is the usual case for kbuild. The xmlto rule could not handle
this. This patch fixes this.

Reported-by: Jiri Olsa 
Signed-off-by: Robert Richter 
Signed-off-by: Robert Richter 
---
 tools/perf/Documentation/Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Documentation/Makefile 
b/tools/perf/Documentation/Makefile
index 1368e88..5a37a7c 100644
--- a/tools/perf/Documentation/Makefile
+++ b/tools/perf/Documentation/Makefile
@@ -271,7 +271,7 @@ $(MAN_HTML): $(OUTPUT)%.html : %.txt
 
 $(OUTPUT)%.1 $(OUTPUT)%.5 $(OUTPUT)%.7 : $(OUTPUT)%.xml
$(QUIET_XMLTO)$(RM) $@ && \
-   $(XMLTO) -o $(OUTPUT) -m $(MANPAGE_XSL) $(XMLTO_EXTRA) man $<
+   $(XMLTO) -o $(OUTPUT). -m $(MANPAGE_XSL) $(XMLTO_EXTRA) man $<
 
 $(OUTPUT)%.xml : %.txt
$(QUIET_ASCIIDOC)$(RM) $@+ $@ && \
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tools: Fix output directory of Documentation/

2013-06-21 Thread Robert Richter

On 21.06.13 10:34:37, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jun 21, 2013 at 02:50:20PM +0200, Robert Richter escreveu:
> > Jiri and Arnaldo,
> > 
> > On 20.06.13 12:26:03, Arnaldo Carvalho de Melo wrote:
> > > Robert, so this one is with:
> > > 
> > >   cd tools/perf
> > >   make install-doc
> > > 
> > > I tested with:
> > > 
> > >   make -C tools/perf O=/tmp/build/perf install-doc
> > > 
> > > and it works, so it is just the in tree build that fails.
> > 
> > The patch below fixes this. Thanks for reporting.
> 
> Ok, I'll fold that with the previous one before pushing to Ingo,

Yes, fine with me.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/14] perf: Add persistent event facilities

2013-06-25 Thread Robert Richter

On 24.06.13 11:44:58, Peter Zijlstra wrote:
> > +static void del_persistent_event(int cpu, struct perf_event_attr *attr)
> > +{
> > +   struct pers_event_desc *desc, *tmp;
> > +   struct perf_event *event = NULL;
> > +
> > +   list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
> > +   if (desc->attr->config == attr->config) {
> 
> attr->config might not be enough; you might also want to compare type,
> config1, config2, and possible bp_type as well; although I think we want
> to avoid having hwbp style persistent events, but then what do I know.

[ I would call this what it is: perf_add_persistent_tracepoint(), or
so :-) ]

We chosed a tracepoint-only implementation as a first try and since
this is our use case. The goal is to have support for any event type.

I also was thinking of comparing the whole attr structure. But since
attr are similar (or almost identical) only you can't compare each
member of attr and need exceptions (e.g. cpu number). These exceptions
are ugly and depending on the specific use case.

I rather tend to assign unique identifiers to each persistent event
and then use attr->config to select the event, e.g.:

if (desc->unique_id == attr->config) ...

Tracepoints have this unique id in it already, thus tp only first.

The change can be made without breaking ABI, since the event setup is
described in sysfs.

-Robert

> 
> > +   event = desc->event;
> > +   break;
> > +   }
> > +   }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 02/14] perf: Add persistent events

2013-06-25 Thread Robert Richter

On 24.06.13 21:24:04, Borislav Petkov wrote:
> On Mon, Jun 24, 2013 at 11:28:07AM +0200, Peter Zijlstra wrote:
> > I find this patch incomplete for its asymmetric; there a dec, but not an
> > implied inc.
> 
> There's one in perf_mmap_open.
> 
> Basically the idea was to pull up the ->mmap_count decrement (haha,
> sounds like excrement :-)) and not deallocate the ring buffers for a
> persistent event.

The asymmetry comes from rb_alloc() in add_persistent_event_on_cpu()
which is usually implemented in the mmap functions. We might better
improve functions to attach/detach events and buffers for more
symmetric code.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/14] perf: Add persistent event facilities

2013-06-25 Thread Robert Richter

On 24.06.13 11:32:04, Peter Zijlstra wrote:
> I generally disapprove of indented labels. None of the perf code has
> that and the tools are easy to 'fix'.
> 
> My quiltrc contains:
> 
> QUILT_DIFF_OPTS="-F ^[[:alpha:]\$_].*[^:]\$"
> 
> my .gitconfig contains:
> 
> [diff "default"]
> xfuncname = "^[[:alpha:]$_].*[^:]$"
> 
> Both avoid diff thinking labels are function names.

Right, we will use your coding style. ;)

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/14] perf: Add persistent event facilities

2013-06-25 Thread Robert Richter

On 25.06.13 09:44:01, Peter Zijlstra wrote:
> Elsewhere in this series you use 'pers' to shorten things; it reads a
> bit odd to me because 'pers' is the Dutch word for press (both meanings
> transfer) but that is just something I'll have to live with isn't it ;-)
> 
> As for tracepoint, it seems common to shorten that to tp; which always
> reminds me of toilet paper, but I suppose more people suffer from that.
> 
> Yielding: perf_add_pers_tp() 
> 
> which I read as adding pressed toilet paper.. a well :-)

Better we avoid this, don't wonat you misread the code. ;)

I also see 'pers_' not as an optimum since it could be mixed-up easily
with 'perf_'. Maybe we take 'persist_' instead?

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration

2013-06-25 Thread Robert Richter

On 24.06.13 12:08:19, Peter Zijlstra wrote:
> >  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git 
> > persistent-v2
> > 
> 
> OK, so I gave up on reading the patches :/ and went and looked at the git
> tree. There's just too much needless churn in the patches.

Ok, we will rework the patches to have a final patch set hiding all
reworks. I was hoping we could work on a public branch since this
eases developing code with multiple people working on it. This also
shows how code matures. But this seems not to fly.

> + int perf_add_persistent_event(struct perf_event_attr *attr, unsigned 
> nr_pages)
> + {
> + struct perf_event *event;
> + int i;
> + 
> + for_each_possible_cpu(i) {
> + event = add_persistent_event_on_cpu(i, attr, nr_pages);
> + if (IS_ERR(event))
> + goto unwind;
> + }
> + return 0;
> + 
> + unwind:
> + pr_err("%s: Error adding persistent event on cpu %d: %ld\n",
> + __func__, i, PTR_ERR(event));
> + 
> + while (--i >= 0)
> + del_persistent_event(i, attr);
> + 
> + return PTR_ERR(event);
> + }
> 
> This assumes cpu_possible_mask doesn't have holes in; I'm fairly sure we 
> cannot
> assume such.

Right, needs to be fixed.

> Furthermore; while the function is exposed and thus looks like a generic 
> usable
> thing; however it looks to 'forget' making a sysfs entry, making it look like
> something incomplete.
> 
> Only the ill named perf_add_persistent_event_by_id() function looks
> something that's usable outside of this.

Right, the name should be actually perf_add_persistent_tp() or so.

And, most of the function should be merged with
perf_add_persistent_event(). Maybe perf_add_persistent_event_by_id()
could be removed completly leaving perf_add_persistent_tp() as wrapper
for tracepoints.

> What's with MAX_EVENTS ?

It is the total number of sysfs entries that can currently be
registered dynamically. The problem with an unlimited event number is
that the sysfs attr list is a fixed array. We need to resize the array
which involves copying entries including locking etc. This was left
for later. ;)

Btw, another thing left for later is cpu hotplug code. I know this
needs to be implemented. But we want to see something running first.
Do you know what happens to system-wide events if a cpu is offline? It
looks like cpu_function_call() in perf_install_in_context() simply
ignores to install an event on an offline cpu. Not sure if it is later
enabled while bringing the cpu online. Quick looking at the code seems
to be not.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/14] perf: Add persistent event facilities

2013-06-25 Thread Robert Richter

On 25.06.13 11:37:06, Borislav Petkov wrote:
> On Tue, Jun 25, 2013 at 11:24:39AM +0200, Robert Richter wrote:
> > I also see 'pers_' not as an optimum since it could be mixed-up easily
> > with 'perf_'. Maybe we take 'persist_' instead?
> 
> Yep, although it reads wrong:
> 
> perf_add_persist_event

We could use persist_ as prefix for static functions and use the long
versions for the interface only.

But all this is a bit bikeshedding. I am sure we find something.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/14] perf: Add persistent event facilities

2013-06-25 Thread Robert Richter

On 25.06.13 17:29:41, Borislav Petkov wrote:
> How about
> 
> perf_add_pevent?
> 
> It is nice and short, although maybe too short but "pevent" is almost
> like a special term and the name shows that it is a special type of
> event, i.e. a p-event.

Fine with me, though it's more for internal usage. There is also use
of it as name for event pointers. And, it is used in perf tools as
event parser structure. But otherwise its ok.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration

2013-06-25 Thread Robert Richter

On 24.06.13 12:22:00, Peter Zijlstra wrote:
> On Tue, Jun 11, 2013 at 06:42:26PM +0200, Robert Richter wrote:
> > Note that perf tools need to support the 'attr' syntax that is
> > added in a separate patch set. With it we are able to run perf tool
> > commands to read persistent events, e.g.:
> 
> where is this patch? I can't find it.

I posted it with that subject:

 [PATCH 0/4] perf tools: Persistent events, changes for perf tool integration

> I also find attr: a bit weird. So far we've used
> :, so while initializing anonymous
> unions is a bit difficult with 'older' GCCs and we cannot actually do:
> 
> struct perf_event_attr {
>   ...
>   union {
>   u64 flags;
>   u64 disabled : 1,
>   ...
>   __reserved_1 : X;
>   }
>   ...
> };
> 
> We could fake it in userspace by allowing things like: flags:23. It
> would not be a much worse hack than attr: I suppose.

A goal is to be able to entirely describe the complete attr structure
with sysfs. perf tools should work out-of-the-box for every kind of
event. For this particular case a 'flags' syntax would be ok, but not
if you like to fill in other attr members.

With attr: you are flexible to put in *everything* into
attr. And for better readability there are abstractions as formats
like:

 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23

Not sure if the event parser handles recursion already, but we could
also support the 'flag' format you describe that way:

 /sys/bus/event_source/devices/persistent/format/flags:attr5:0-63
 /sys/bus/event_source/devices/persistent/format/persistent:flags:23

We could even list events for other pmus with small changes in the
kernel, e.g:

 /sys/bus/event_source/devices/persistent/format/type:attr0:0-31

/sys/bus/event_source/devices/persistent/events/some_tracepoint:type=2,config=

The example is a bit weird. But in general we can translate struct
perf_event_attr to a string and vice versa. And perf tools understand
how to set up every bit of the event attr structure.

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 962 matches

Mail list logo