[PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Alban Crequy
When a kretprobe is installed on a kernel function, there is a maximum
limit of how many calls in parallel it can catch (aka "maxactive"). A
kernel module could call register_kretprobe() and initialize maxactive
(see example in samples/kprobes/kretprobe_example.c).

But that is not exposed to userspace and it is currently not possible to
choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events

The default maxactive can be as low as 1 on single-core with a
non-preemptive kernel. This is too low and we need to increase it not
only for recursive functions, but for functions that sleep or resched.

This patch updates the format of the command that can be written to
kprobe_events so that maxactive can be optionally specified.

I need this for a bpf program attached to the kretprobe of
inet_csk_accept, which can sleep for a long time.

BugLink: https://github.com/iovisor/bcc/issues/1072
Signed-off-by: Alban Crequy 
---
 Documentation/trace/kprobetrace.txt |  4 +++-
 kernel/trace/trace_kprobe.c | 34 +-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 41ef9d8..655ca7e 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
 Synopsis of kprobe_events
 -
   p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
-  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]: Set a return probe
+  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
   -:[GRP/]EVENT: Clear a probe
 
  GRP   : Group name. If omitted, use "kprobes" for it.
@@ -32,6 +32,8 @@ Synopsis of kprobe_events
  MOD   : Module name which has given SYM.
  SYM[+offs]: Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
+ MAXACTIVE : Maximum number of instances of the specified function that
+ can be probed simultaneously, or 0 for the default.(*)
 
  FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5f688cc..807e01c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -282,6 +282,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
 void *addr,
 const char *symbol,
 unsigned long offs,
+int maxactive,
 int nargs, bool is_return)
 {
struct trace_kprobe *tk;
@@ -309,6 +310,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
else
tk->rp.kp.pre_handler = kprobe_dispatcher;
 
+   tk->rp.maxactive = maxactive;
+
if (!event || !is_good_name(event)) {
ret = -EINVAL;
goto error;
@@ -598,8 +601,10 @@ static int create_trace_kprobe(int argc, char **argv)
 {
/*
 * Argument syntax:
-*  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
-*  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
+*  - Add kprobe:
+*  p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
+*  - Add kretprobe:
+*  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
 * Fetch args:
 *  $retval : fetch return value
 *  $stack  : fetch stack address
@@ -619,6 +624,7 @@ static int create_trace_kprobe(int argc, char **argv)
int i, ret = 0;
bool is_return = false, is_delete = false;
char *symbol = NULL, *event = NULL, *group = NULL;
+   int maxactive = 0;
char *arg;
unsigned long offset = 0;
void *addr = NULL;
@@ -637,8 +643,26 @@ static int create_trace_kprobe(int argc, char **argv)
return -EINVAL;
}
 
-   if (argv[0][1] == ':') {
+   if (is_return && isdigit(argv[0][1]) && strchr(&argv[0][1], ':')) {
+   event = strchr(&argv[0][1], ':') + 1;
+   event[-1] = '\0';
+   ret = kstrtouint(&argv[0][1], 0, &maxactive);
+   if (ret) {
+   pr_info("Failed to parse maxactive.\n");
+   return ret;
+   }
+   /* kretprobes instances are iterated over via a list. The
+* maximum should stay reasonable.
+*/
+   if (maxactive > 1024) {
+   pr_info("Maxactive is too big.\n");
+   return -EINVAL;
+   }
+   } else if (argv[0][1] == ':') {
event = &argv[0][2];
+ 

Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests

2017-03-28 Thread Radim Krčmář
2017-03-27 15:34+0200, Alexander Graf:
> On 15/03/2017 22:22, Michael S. Tsirkin wrote:
>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
>> unless explicitly provided with kernel command line argument
>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
>> without checking CPUID.
>> 
>> We currently emulate that as a NOP but on VMX we can do better: let
>> guest stop the CPU until timer, IPI or memory change.  CPU will be busy
>> but that isn't any worse than a NOP emulation.
>> 
>> Note that mwait within guests is not the same as on real hardware
>> because halt causes an exit while mwait doesn't.  For this reason it
>> might not be a good idea to use the regular MWAIT flag in CPUID to
>> signal this capability.  Add a flag in the hypervisor leaf instead.
> 
> So imagine we had proper MWAIT emulation capabilities based on page faults.
> In that case, we could do something as fancy as
> 
> Treat MWAIT as pass-through by default
> 
> Have a per-vcpu monitor timer 10 times a second in the background that
> checks which instruction we're in
> 
> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT,
> if $IP was in non-mwait within that time, reset counter.

Or we could reuse external interrupts for sampling.  Exits trigerred by
them would check for current instruction (probably would be best to
limit just to timer tick) and a sufficient ratio (> 0?) of other exits
would imply that MWAIT is not used.

> Or instead maybe just reuse the adapter hlt logic?

Emulated MWAIT is very similar to emulated HLT, so reusing the logic
makes sense.  We would just add new wakeup methods.

> Either way, with that we should be able to get super low latency IPIs
> running while still maintaining some sanity on systems which don't have
> dedicated CPUs for workloads.
> 
> And we wouldn't need guest modifications, which is a great plus. So older
> guests (and Windows?) could benefit from mwait as well.

There is no need guest modifications -- it could be exposed as standard
MWAIT feature to the guest, with responsibilities for guest/host-impact
on the user.

I think that the page-fault based MWAIT would require paravirt if it
should be enabled by default, because of performance concerns:
Enabling write protection on a page needs a VM exit on all other VCPUs
when beginning monitoring (to reload page permissions and prevent missed
writes).
We'd want to keep trapping writes to the page all the time because
toggling is slow, but this could regress performance for an OS that has
other data accessed by other VCPUs in that page.
No current interface can tell the guest that it should reserve the whole
page instead of what CPUID[5] says and that writes to the monitored page
are not "cheap", but can trigger a VM exit ...

And before we disable MWAIT exiting by default, we also have to
understand the old OS X on core 2 bug from Gabriel.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Steven Rostedt

"[PATCH v1]" I like your confidence, or lack of, that there isn't going
to be a v2 or v3 ;-)


Masami, what do you think of this?

-- Steve

On Tue, 28 Mar 2017 15:52:22 +0200
Alban Crequy  wrote:

> When a kretprobe is installed on a kernel function, there is a maximum
> limit of how many calls in parallel it can catch (aka "maxactive"). A
> kernel module could call register_kretprobe() and initialize maxactive
> (see example in samples/kprobes/kretprobe_example.c).
> 
> But that is not exposed to userspace and it is currently not possible to
> choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
> 
> The default maxactive can be as low as 1 on single-core with a
> non-preemptive kernel. This is too low and we need to increase it not
> only for recursive functions, but for functions that sleep or resched.
> 
> This patch updates the format of the command that can be written to
> kprobe_events so that maxactive can be optionally specified.
> 
> I need this for a bpf program attached to the kretprobe of
> inet_csk_accept, which can sleep for a long time.
> 
> BugLink: https://github.com/iovisor/bcc/issues/1072
> Signed-off-by: Alban Crequy 
> ---
>  Documentation/trace/kprobetrace.txt |  4 +++-
>  kernel/trace/trace_kprobe.c | 34 +-
>  2 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/trace/kprobetrace.txt 
> b/Documentation/trace/kprobetrace.txt
> index 41ef9d8..655ca7e 100644
> --- a/Documentation/trace/kprobetrace.txt
> +++ b/Documentation/trace/kprobetrace.txt
> @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
>  Synopsis of kprobe_events
>  -
>p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]   : Set a probe
> -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
> +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]   : Set a return 
> probe
>-:[GRP/]EVENT  : Clear a probe
>  
>   GRP : Group name. If omitted, use "kprobes" for it.
> @@ -32,6 +32,8 @@ Synopsis of kprobe_events
>   MOD : Module name which has given SYM.
>   SYM[+offs]  : Symbol+offset where the probe is inserted.
>   MEMADDR : Address where the probe is inserted.
> + MAXACTIVE   : Maximum number of instances of the specified function that
> +   can be probed simultaneously, or 0 for the default.(*)
>  
>   FETCHARGS   : Arguments. Each probe can have up to 128 args.
>%REG   : Fetch register REG
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 5f688cc..807e01c 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -282,6 +282,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
> *group,
>void *addr,
>const char *symbol,
>unsigned long offs,
> +  int maxactive,
>int nargs, bool is_return)
>  {
>   struct trace_kprobe *tk;
> @@ -309,6 +310,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
> *group,
>   else
>   tk->rp.kp.pre_handler = kprobe_dispatcher;
>  
> + tk->rp.maxactive = maxactive;
> +
>   if (!event || !is_good_name(event)) {
>   ret = -EINVAL;
>   goto error;
> @@ -598,8 +601,10 @@ static int create_trace_kprobe(int argc, char **argv)
>  {
>   /*
>* Argument syntax:
> -  *  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> -  *  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
> +  *  - Add kprobe:
> +  *  p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> +  *  - Add kretprobe:
> +  *  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
>* Fetch args:
>*  $retval : fetch return value
>*  $stack  : fetch stack address
> @@ -619,6 +624,7 @@ static int create_trace_kprobe(int argc, char **argv)
>   int i, ret = 0;
>   bool is_return = false, is_delete = false;
>   char *symbol = NULL, *event = NULL, *group = NULL;
> + int maxactive = 0;
>   char *arg;
>   unsigned long offset = 0;
>   void *addr = NULL;
> @@ -637,8 +643,26 @@ static int create_trace_kprobe(int argc, char **argv)
>   return -EINVAL;
>   }
>  
> - if (argv[0][1] == ':') {
> + if (is_return && isdigit(argv[0][1]) && strchr(&argv[0][1], ':')) {
> + event = strchr(&argv[0][1], ':') + 1;
> + event[-1] = '\0';
> + ret = kstrtouint(&argv[0][1], 0, &maxactive);
> + if (ret) {
> + pr_info("Failed to parse maxactive.\n");
> + return ret;
> + }
> + /* kretprobes instances are iterated ove

Re: [PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Masami Hiramatsu
On Tue, 28 Mar 2017 15:52:22 +0200
Alban Crequy  wrote:

> When a kretprobe is installed on a kernel function, there is a maximum
> limit of how many calls in parallel it can catch (aka "maxactive"). A
> kernel module could call register_kretprobe() and initialize maxactive
> (see example in samples/kprobes/kretprobe_example.c).
> 
> But that is not exposed to userspace and it is currently not possible to
> choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events

I see, I found same issue last week...

> 
> The default maxactive can be as low as 1 on single-core with a
> non-preemptive kernel. This is too low and we need to increase it not
> only for recursive functions, but for functions that sleep or resched.
> 
> This patch updates the format of the command that can be written to
> kprobe_events so that maxactive can be optionally specified.

Good! this is completely same what I'm planning to add.

> 
> I need this for a bpf program attached to the kretprobe of
> inet_csk_accept, which can sleep for a long time.

I'm also preparing another patch for using kmalloc(GFP_ATOMIC) in
kretprobe_pre_handler(), since this manual way is sometimes hard to
estimate how many instances needed. Anyway, since that may involve
unwilling memory allocation, this patch also needed.

> 
> BugLink: https://github.com/iovisor/bcc/issues/1072

Could you also add Lukasz to Cc list, since he had reported an issue
related this one.

https://www.spinics.net/lists/linux-trace/msg00448.html

Please see my comments below.

> Signed-off-by: Alban Crequy 
> ---
>  Documentation/trace/kprobetrace.txt |  4 +++-
>  kernel/trace/trace_kprobe.c | 34 +-
>  2 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/trace/kprobetrace.txt 
> b/Documentation/trace/kprobetrace.txt
> index 41ef9d8..655ca7e 100644
> --- a/Documentation/trace/kprobetrace.txt
> +++ b/Documentation/trace/kprobetrace.txt
> @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
>  Synopsis of kprobe_events
>  -
>p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]   : Set a probe
> -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
> +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]   : Set a return 
> probe
>-:[GRP/]EVENT  : Clear a probe
>  
>   GRP : Group name. If omitted, use "kprobes" for it.
> @@ -32,6 +32,8 @@ Synopsis of kprobe_events
>   MOD : Module name which has given SYM.
>   SYM[+offs]  : Symbol+offset where the probe is inserted.
>   MEMADDR : Address where the probe is inserted.
> + MAXACTIVE   : Maximum number of instances of the specified function that
> +   can be probed simultaneously, or 0 for the default.(*)

Here, yoy don't need (*), since [MAXACTIVE] is not a FETCHARGS.

>  
>   FETCHARGS   : Arguments. Each probe can have up to 128 args.
>%REG   : Fetch register REG
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 5f688cc..807e01c 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -282,6 +282,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
> *group,
>void *addr,
>const char *symbol,
>unsigned long offs,
> +  int maxactive,
>int nargs, bool is_return)
>  {
>   struct trace_kprobe *tk;
> @@ -309,6 +310,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
> *group,
>   else
>   tk->rp.kp.pre_handler = kprobe_dispatcher;
>  
> + tk->rp.maxactive = maxactive;
> +
>   if (!event || !is_good_name(event)) {
>   ret = -EINVAL;
>   goto error;
> @@ -598,8 +601,10 @@ static int create_trace_kprobe(int argc, char **argv)
>  {
>   /*
>* Argument syntax:
> -  *  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> -  *  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
> +  *  - Add kprobe:
> +  *  p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> +  *  - Add kretprobe:
> +  *  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
>* Fetch args:
>*  $retval : fetch return value
>*  $stack  : fetch stack address
> @@ -619,6 +624,7 @@ static int create_trace_kprobe(int argc, char **argv)
>   int i, ret = 0;
>   bool is_return = false, is_delete = false;
>   char *symbol = NULL, *event = NULL, *group = NULL;
> + int maxactive = 0;
>   char *arg;
>   unsigned long offset = 0;
>   void *addr = NULL;
> @@ -637,8 +643,26 @@ static int create_trace_kprobe(int argc, char **argv)
>   return -EINVAL;
>   }
>  
> -  

Re: [PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Steven Rostedt
On Wed, 29 Mar 2017 00:23:35 +0900
Masami Hiramatsu  wrote:

> > @@ -598,8 +601,10 @@ static int create_trace_kprobe(int argc, char **argv)
> >  {
> > /*
> >  * Argument syntax:
> > -*  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> > -*  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
> > +*  - Add kprobe:
> > +*  p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> > +*  - Add kretprobe:
> > +*  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
> >  * Fetch args:
> >  *  $retval : fetch return value
> >  *  $stack  : fetch stack address
> > @@ -619,6 +624,7 @@ static int create_trace_kprobe(int argc, char **argv)
> > int i, ret = 0;
> > bool is_return = false, is_delete = false;
> > char *symbol = NULL, *event = NULL, *group = NULL;
> > +   int maxactive = 0;
> > char *arg;
> > unsigned long offset = 0;
> > void *addr = NULL;
> > @@ -637,8 +643,26 @@ static int create_trace_kprobe(int argc, char **argv)
> > return -EINVAL;
> > }
> >  
> > -   if (argv[0][1] == ':') {
> > +   if (is_return && isdigit(argv[0][1]) && strchr(&argv[0][1], ':')) {  
> 
> This only supports r[MAXACTIVE:[GRP/]EVENT]. e.g "r100" without event name.

You mean it doesn't support adding MAXACTIVE without the ':event'.

> 
> Thank you,
> 
> > +   event = strchr(&argv[0][1], ':') + 1;
> > +   event[-1] = '\0';
> > +   ret = kstrtouint(&argv[0][1], 0, &maxactive);
> > +   if (ret) {
> > +   pr_info("Failed to parse maxactive.\n");
> > +   return ret;
> > +   }
> > +   /* kretprobes instances are iterated over via a list. The
> > +* maximum should stay reasonable.
> > +*/
> > +   if (maxactive > 1024) {

Also, can we get rid of magic numbers within the code. There should be a
const or macro defined as MAX_MAXACTIVE or something, and use that here.

-- Steve


> > +   pr_info("Maxactive is too big.\n");
> > +   return -EINVAL;
> > +   }
> > +   } else if (argv[0][1] == ':') {
> > event = &argv[0][2];
> > +   }
> > +
> > +   if (event) {
> > if (strchr(event, '/')) {
> > group = event;
> > event = strchr(group, '/') + 1;
> > @@ -718,8 +742,8 @@ static int create_trace_kprobe(int argc, char **argv)
> >  is_return ? 'r' : 'p', addr);
> > event = buf;
> > }
> > -   tk = alloc_trace_kprobe(group, event, addr, symbol, offset, argc,
> > -  is_return);
> > +   tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
> > +  argc, is_return);
> > if (IS_ERR(tk)) {
> > pr_info("Failed to allocate trace_probe.(%d)\n",
> > (int)PTR_ERR(tk));
> > -- 
> > 2.7.4
> >   
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Alban Crequy
Thanks for the review,

On Tue, Mar 28, 2017 at 5:23 PM, Masami Hiramatsu  wrote:
> On Tue, 28 Mar 2017 15:52:22 +0200
> Alban Crequy  wrote:
>
>> When a kretprobe is installed on a kernel function, there is a maximum
>> limit of how many calls in parallel it can catch (aka "maxactive"). A
>> kernel module could call register_kretprobe() and initialize maxactive
>> (see example in samples/kprobes/kretprobe_example.c).
>>
>> But that is not exposed to userspace and it is currently not possible to
>> choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
>
> I see, I found same issue last week...
>
>>
>> The default maxactive can be as low as 1 on single-core with a
>> non-preemptive kernel. This is too low and we need to increase it not
>> only for recursive functions, but for functions that sleep or resched.
>>
>> This patch updates the format of the command that can be written to
>> kprobe_events so that maxactive can be optionally specified.
>
> Good! this is completely same what I'm planning to add.
>
>>
>> I need this for a bpf program attached to the kretprobe of
>> inet_csk_accept, which can sleep for a long time.
>
> I'm also preparing another patch for using kmalloc(GFP_ATOMIC) in
> kretprobe_pre_handler(), since this manual way is sometimes hard to
> estimate how many instances needed. Anyway, since that may involve
> unwilling memory allocation, this patch also needed.

Where is that kretprobe_pre_handler()? I don't see any allocations in
pre_handler_kretprobe().

>> BugLink: https://github.com/iovisor/bcc/issues/1072
>
> Could you also add Lukasz to Cc list, since he had reported an issue
> related this one.
>
> https://www.spinics.net/lists/linux-trace/msg00448.html
>
> Please see my comments below.
>
>> Signed-off-by: Alban Crequy 
>> ---
>>  Documentation/trace/kprobetrace.txt |  4 +++-
>>  kernel/trace/trace_kprobe.c | 34 +-
>>  2 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/trace/kprobetrace.txt 
>> b/Documentation/trace/kprobetrace.txt
>> index 41ef9d8..655ca7e 100644
>> --- a/Documentation/trace/kprobetrace.txt
>> +++ b/Documentation/trace/kprobetrace.txt
>> @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
>>  Synopsis of kprobe_events
>>  -
>>p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]   : Set a probe
>> -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
>> +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]   : Set a return 
>> probe
>>-:[GRP/]EVENT  : Clear a probe
>>
>>   GRP : Group name. If omitted, use "kprobes" for it.
>> @@ -32,6 +32,8 @@ Synopsis of kprobe_events
>>   MOD : Module name which has given SYM.
>>   SYM[+offs]  : Symbol+offset where the probe is inserted.
>>   MEMADDR : Address where the probe is inserted.
>> + MAXACTIVE   : Maximum number of instances of the specified function that
>> +   can be probed simultaneously, or 0 for the default.(*)
>
> Here, yoy don't need (*), since [MAXACTIVE] is not a FETCHARGS.

Why not? (*) refers to "(*) only for return probe." and the maxactive
only makes sense for the kretprobe.

>>   FETCHARGS   : Arguments. Each probe can have up to 128 args.
>>%REG   : Fetch register REG
>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
>> index 5f688cc..807e01c 100644
>> --- a/kernel/trace/trace_kprobe.c
>> +++ b/kernel/trace/trace_kprobe.c
>> @@ -282,6 +282,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const 
>> char *group,
>>void *addr,
>>const char *symbol,
>>unsigned long offs,
>> +  int maxactive,
>>int nargs, bool is_return)
>>  {
>>   struct trace_kprobe *tk;
>> @@ -309,6 +310,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const 
>> char *group,
>>   else
>>   tk->rp.kp.pre_handler = kprobe_dispatcher;
>>
>> + tk->rp.maxactive = maxactive;
>> +
>>   if (!event || !is_good_name(event)) {
>>   ret = -EINVAL;
>>   goto error;
>> @@ -598,8 +601,10 @@ static int create_trace_kprobe(int argc, char **argv)
>>  {
>>   /*
>>* Argument syntax:
>> -  *  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
>> -  *  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
>> +  *  - Add kprobe:
>> +  *  p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
>> +  *  - Add kretprobe:
>> +  *  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
>>* Fetch args:
>>*  $retval : fetch return value
>>*  $stack  : fetch stack address
>> @@ -619,6 +624,7 @@ static int create_trace_kprobe(in

Re: [PATCH v5 6/9] coresight: add support for CPU debug module

2017-03-28 Thread Mathieu Poirier
Hi Leo,

On Sun, Mar 26, 2017 at 02:23:14AM +0800, Leo Yan wrote:
> Coresight includes debug module and usually the module connects with CPU
> debug logic. ARMv8 architecture reference manual (ARM DDI 0487A.k) has
> description for related info in "Part H: External Debug".
> 
> Chapter H7 "The Sample-based Profiling Extension" introduces several
> sampling registers, e.g. we can check program counter value with
> combined CPU exception level, secure state, etc. So this is helpful for
> analysis CPU lockup scenarios, e.g. if one CPU has run into infinite
> loop with IRQ disabled. In this case the CPU cannot switch context and
> handle any interrupt (including IPIs), as the result it cannot handle
> SMP call for stack dump.
> 
> This patch is to enable coresight debug module, so firstly this driver
> is to bind apb clock for debug module and this is to ensure the debug
> module can be accessed from program or external debugger. And the driver
> uses sample-based registers for debug purpose, e.g. when system triggers
> panic, the driver will dump program counter and combined context
> registers (EDCIDSR, EDVIDSR); by parsing context registers so can
> quickly get to know CPU secure state, exception level, etc.
> 
> Some of the debug module registers are located in CPU power domain, so
> this requires the CPU power domain stays on when access related debug
> registers, but the power management for CPU power domain is quite
> dependent on SoC integration for power management. For the platforms
> which with sane power controller implementations, this driver follows
> the method to set EDPRCR to try to pull the CPU out of low power state
> and then set 'no power down request' bit so the CPU has no chance to
> lose power.
> 
> If the SoC has not followed up this design well for power management
> controller, the driver introduces module parameter "idle_constraint".
> Setting this parameter for latency requirement in microseconds, finally
> we can constrain all or partial idle states to ensure the CPU power
> domain is enabled, this is a backup method to access coresight CPU
> debug component safely.
> 
> Signed-off-by: Leo Yan 
> ---
>  drivers/hwtracing/coresight/Kconfig   |  11 +
>  drivers/hwtracing/coresight/Makefile  |   1 +
>  drivers/hwtracing/coresight/coresight-cpu-debug.c | 704 
> ++
>  3 files changed, 716 insertions(+)
>  create mode 100644 drivers/hwtracing/coresight/coresight-cpu-debug.c
> 
> diff --git a/drivers/hwtracing/coresight/Kconfig 
> b/drivers/hwtracing/coresight/Kconfig
> index 130cb21..18d7931 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -89,4 +89,15 @@ config CORESIGHT_STM
> logging useful software events or data coming from various entities
> in the system, possibly running different OSs
>  
> +config CORESIGHT_CPU_DEBUG
> + tristate "CoreSight CPU Debug driver"
> + depends on ARM || ARM64
> + depends on DEBUG_FS
> + help
> +   This driver provides support for coresight debugging module. This
> +   is primarily used to dump sample-based profiling registers when
> +   system triggers panic, the driver will parse context registers so
> +   can quickly get to know program counter (PC), secure state,
> +   exception level, etc.
> +
>  endif
> diff --git a/drivers/hwtracing/coresight/Makefile 
> b/drivers/hwtracing/coresight/Makefile
> index af480d9..433d590 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -16,3 +16,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
>   coresight-etm4x-sysfs.o
>  obj-$(CONFIG_CORESIGHT_QCOM_REPLICATOR) += coresight-replicator-qcom.o
>  obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
> +obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
> diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
> b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> new file mode 100644
> index 000..fbec1d1
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> @@ -0,0 +1,704 @@
> +/*
> + * Copyright (c) 2017 Linaro Limited. All rights reserved.
> + *
> + * Author: Leo Yan 
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published 
> by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but 
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + *
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#incl

Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests

2017-03-28 Thread Jim Mattson
On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář  wrote:
> 2017-03-27 15:34+0200, Alexander Graf:
>> On 15/03/2017 22:22, Michael S. Tsirkin wrote:
>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
>>> unless explicitly provided with kernel command line argument
>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
>>> without checking CPUID.
>>>
>>> We currently emulate that as a NOP but on VMX we can do better: let
>>> guest stop the CPU until timer, IPI or memory change.  CPU will be busy
>>> but that isn't any worse than a NOP emulation.
>>>
>>> Note that mwait within guests is not the same as on real hardware
>>> because halt causes an exit while mwait doesn't.  For this reason it
>>> might not be a good idea to use the regular MWAIT flag in CPUID to
>>> signal this capability.  Add a flag in the hypervisor leaf instead.
>>
>> So imagine we had proper MWAIT emulation capabilities based on page faults.
>> In that case, we could do something as fancy as
>>
>> Treat MWAIT as pass-through by default
>>
>> Have a per-vcpu monitor timer 10 times a second in the background that
>> checks which instruction we're in
>>
>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT,
>> if $IP was in non-mwait within that time, reset counter.
>
> Or we could reuse external interrupts for sampling.  Exits trigerred by
> them would check for current instruction (probably would be best to
> limit just to timer tick) and a sufficient ratio (> 0?) of other exits
> would imply that MWAIT is not used.
>
>> Or instead maybe just reuse the adapter hlt logic?
>
> Emulated MWAIT is very similar to emulated HLT, so reusing the logic
> makes sense.  We would just add new wakeup methods.
>
>> Either way, with that we should be able to get super low latency IPIs
>> running while still maintaining some sanity on systems which don't have
>> dedicated CPUs for workloads.
>>
>> And we wouldn't need guest modifications, which is a great plus. So older
>> guests (and Windows?) could benefit from mwait as well.
>
> There is no need guest modifications -- it could be exposed as standard
> MWAIT feature to the guest, with responsibilities for guest/host-impact
> on the user.
>
> I think that the page-fault based MWAIT would require paravirt if it
> should be enabled by default, because of performance concerns:
> Enabling write protection on a page needs a VM exit on all other VCPUs
> when beginning monitoring (to reload page permissions and prevent missed
> writes).
> We'd want to keep trapping writes to the page all the time because
> toggling is slow, but this could regress performance for an OS that has
> other data accessed by other VCPUs in that page.
> No current interface can tell the guest that it should reserve the whole
> page instead of what CPUID[5] says and that writes to the monitored page
> are not "cheap", but can trigger a VM exit ...

CPUID.05H:EBX is supposed to address the false sharing issue. IIRC,
VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX
when running Mac OS X guests. Per Intel's SDM volume 3, section
8.10.5, "To avoid false wake-ups; use the largest monitor line size to
pad the data structure used to monitor writes. Software must make sure
that beyond the data structure, no unrelated data variable exists in
the triggering area for MWAIT. A pad may be needed to avoid this
situation." Unfortunately, most operating systems do not follow this
advice.

>
> And before we disable MWAIT exiting by default, we also have to
> understand the old OS X on core 2 bug from Gabriel.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Masami Hiramatsu
On Tue, 28 Mar 2017 11:34:07 -0400
Steven Rostedt  wrote:

> On Wed, 29 Mar 2017 00:23:35 +0900
> Masami Hiramatsu  wrote:
> 
> > > @@ -598,8 +601,10 @@ static int create_trace_kprobe(int argc, char **argv)
> > >  {
> > >   /*
> > >* Argument syntax:
> > > -  *  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> > > -  *  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
> > > +  *  - Add kprobe:
> > > +  *  p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
> > > +  *  - Add kretprobe:
> > > +  *  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
> > >* Fetch args:
> > >*  $retval : fetch return value
> > >*  $stack  : fetch stack address
> > > @@ -619,6 +624,7 @@ static int create_trace_kprobe(int argc, char **argv)
> > >   int i, ret = 0;
> > >   bool is_return = false, is_delete = false;
> > >   char *symbol = NULL, *event = NULL, *group = NULL;
> > > + int maxactive = 0;
> > >   char *arg;
> > >   unsigned long offset = 0;
> > >   void *addr = NULL;
> > > @@ -637,8 +643,26 @@ static int create_trace_kprobe(int argc, char **argv)
> > >   return -EINVAL;
> > >   }
> > >  
> > > - if (argv[0][1] == ':') {
> > > + if (is_return && isdigit(argv[0][1]) && strchr(&argv[0][1], ':')) {  
> > 
> > This only supports r[MAXACTIVE:[GRP/]EVENT]. e.g "r100" without event name.
> 
> You mean it doesn't support adding MAXACTIVE without the ':event'.

Yeah, sorry for lacking the explanation...

> 
> > 
> > Thank you,
> > 
> > > + event = strchr(&argv[0][1], ':') + 1;
> > > + event[-1] = '\0';
> > > + ret = kstrtouint(&argv[0][1], 0, &maxactive);
> > > + if (ret) {
> > > + pr_info("Failed to parse maxactive.\n");
> > > + return ret;
> > > + }
> > > + /* kretprobes instances are iterated over via a list. The
> > > +  * maximum should stay reasonable.
> > > +  */
> > > + if (maxactive > 1024) {
> 
> Also, can we get rid of magic numbers within the code. There should be a
> const or macro defined as MAX_MAXACTIVE or something, and use that here.

Good catch!

Thanks,

> 
> -- Steve
> 
> 
> > > + pr_info("Maxactive is too big.\n");
> > > + return -EINVAL;
> > > + }
> > > + } else if (argv[0][1] == ':') {
> > >   event = &argv[0][2];
> > > + }
> > > +
> > > + if (event) {
> > >   if (strchr(event, '/')) {
> > >   group = event;
> > >   event = strchr(group, '/') + 1;
> > > @@ -718,8 +742,8 @@ static int create_trace_kprobe(int argc, char **argv)
> > >is_return ? 'r' : 'p', addr);
> > >   event = buf;
> > >   }
> > > - tk = alloc_trace_kprobe(group, event, addr, symbol, offset, argc,
> > > -is_return);
> > > + tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
> > > +argc, is_return);
> > >   if (IS_ERR(tk)) {
> > >   pr_info("Failed to allocate trace_probe.(%d)\n",
> > >   (int)PTR_ERR(tk));
> > > -- 
> > > 2.7.4
> > >   
> > 
> > 
> 


-- 
Masami Hiramatsu 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Masami Hiramatsu
On Tue, 28 Mar 2017 18:08:16 +0200
Alban Crequy  wrote:

> Thanks for the review,
> 
> On Tue, Mar 28, 2017 at 5:23 PM, Masami Hiramatsu  wrote:
> > On Tue, 28 Mar 2017 15:52:22 +0200
> > Alban Crequy  wrote:
> >
> >> When a kretprobe is installed on a kernel function, there is a maximum
> >> limit of how many calls in parallel it can catch (aka "maxactive"). A
> >> kernel module could call register_kretprobe() and initialize maxactive
> >> (see example in samples/kprobes/kretprobe_example.c).
> >>
> >> But that is not exposed to userspace and it is currently not possible to
> >> choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
> >
> > I see, I found same issue last week...
> >
> >>
> >> The default maxactive can be as low as 1 on single-core with a
> >> non-preemptive kernel. This is too low and we need to increase it not
> >> only for recursive functions, but for functions that sleep or resched.
> >>
> >> This patch updates the format of the command that can be written to
> >> kprobe_events so that maxactive can be optionally specified.
> >
> > Good! this is completely same what I'm planning to add.
> >
> >>
> >> I need this for a bpf program attached to the kretprobe of
> >> inet_csk_accept, which can sleep for a long time.
> >
> > I'm also preparing another patch for using kmalloc(GFP_ATOMIC) in
> > kretprobe_pre_handler(), since this manual way is sometimes hard to
> > estimate how many instances needed. Anyway, since that may involve
> > unwilling memory allocation, this patch also needed.
> 
> Where is that kretprobe_pre_handler()? I don't see any allocations in
> pre_handler_kretprobe().

pre_handler_kretprobe(), I'll send my patch, but note that I also considered
to introduce a patch which exactly same as yours. So even with my patch,
this patch should be introduced.

> 
> >> BugLink: https://github.com/iovisor/bcc/issues/1072
> >
> > Could you also add Lukasz to Cc list, since he had reported an issue
> > related this one.
> >
> > https://www.spinics.net/lists/linux-trace/msg00448.html
> >
> > Please see my comments below.
> >
> >> Signed-off-by: Alban Crequy 
> >> ---
> >>  Documentation/trace/kprobetrace.txt |  4 +++-
> >>  kernel/trace/trace_kprobe.c | 34 
> >> +-
> >>  2 files changed, 32 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/Documentation/trace/kprobetrace.txt 
> >> b/Documentation/trace/kprobetrace.txt
> >> index 41ef9d8..655ca7e 100644
> >> --- a/Documentation/trace/kprobetrace.txt
> >> +++ b/Documentation/trace/kprobetrace.txt
> >> @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
> >>  Synopsis of kprobe_events
> >>  -
> >>p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]   : Set a probe
> >> -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
> >> +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]   : Set a 
> >> return probe
> >>-:[GRP/]EVENT  : Clear a 
> >> probe
> >>
> >>   GRP : Group name. If omitted, use "kprobes" for it.
> >> @@ -32,6 +32,8 @@ Synopsis of kprobe_events
> >>   MOD : Module name which has given SYM.
> >>   SYM[+offs]  : Symbol+offset where the probe is inserted.
> >>   MEMADDR : Address where the probe is inserted.
> >> + MAXACTIVE   : Maximum number of instances of the specified function that
> >> +   can be probed simultaneously, or 0 for the default.(*)
> >
> > Here, yoy don't need (*), since [MAXACTIVE] is not a FETCHARGS.
> 
> Why not? (*) refers to "(*) only for return probe." and the maxactive
> only makes sense for the kretprobe.

Because simply synopsis already explained MAXACTIVE is only for 'r' :)
The reason why I added the (*) note is that FETCHARGS are shown in
both synopsis of 'p' and 'r'.

> 
> >>   FETCHARGS   : Arguments. Each probe can have up to 128 args.
> >>%REG   : Fetch register REG
> >> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> >> index 5f688cc..807e01c 100644
> >> --- a/kernel/trace/trace_kprobe.c
> >> +++ b/kernel/trace/trace_kprobe.c
> >> @@ -282,6 +282,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const 
> >> char *group,
> >>void *addr,
> >>const char *symbol,
> >>unsigned long offs,
> >> +  int maxactive,
> >>int nargs, bool is_return)
> >>  {
> >>   struct trace_kprobe *tk;
> >> @@ -309,6 +310,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const 
> >> char *group,
> >>   else
> >>   tk->rp.kp.pre_handler = kprobe_dispatcher;
> >>
> >> + tk->rp.maxactive = maxactive;
> >> +
> >>   if (!event || !is_good_name(event)) {
> >>   ret = -EINVAL;
> >>   goto error;
> >> @@ -598,

Re: [PATCH] Add initial SX3000b platform related documentation to document tree

2017-03-28 Thread Rob Herring
On Wed, Mar 22, 2017 at 05:59:49AM +, Amit Kama IL wrote:
> Add initial SX3000b platform related documentation to document tree:
>  - Vendor prefix
>  - Platform binding documentation
>  - Interrupt Controller Unit binding documentation.

Probably should be 3 patches. Preferred subject prefix is "dt-bindings: 
..."

> 
> Signed-off-by: Amit Kama 
> 
> diff --git 
> a/Documentation/devicetree/bindings/interrupt-controller/satixfy-icu.txt 
> b/Documentation/devicetree/bindings/interrupt-controller/satixfy-icu.txt
> index 000..1893393
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/interrupt-controller/satixfy-icu.txt
> @@ -0,0 +1,47 @@
> +Satixfy SX3000B Interrupt Controller Unit (ICU)
> +
> +The ICU routes HW interrupts from the inter-module fabric to the
> +processor. For the MIPS interaptive, all interrupts are then routed

interaptive or interaptiv? I see both in the doc.


> +to the GIC.
> +
> +Required properties:
> +- compatible : Should be "satixfy,icu".

icu is fairly generic sounding. Should be satixfy,sx3000b-icu.

> +- reg - must be present and equal <0x1D4D 0x1C0>
> +- interrupt-controller : Identifies the node as an interrupt controller
> +- #interrupt-cells : Specifies the number of cells needed to encode an
> +  interrupt specifier.  Should be 1 - the GIC interrupt number
> +- interrupt-parent - Currently only the MIPS GIC is supported, so
> +<&gic> must be specified as parent
> +- interrupts : in interrupt parent form. For GIC it's
> + where x is the interrupt number
> +allocated for ICU in GIC.
> +
> +
> +
> +
> +
> +Example:
> +
> + icu: interrupt-controller@1d4d {
> + compatible = "sx,icu";
> + reg = <0x1D4D 0x1C0>;

lowercase

> +
> + interrupt-controller;
> + #interrupt-cells = <1>;
> +
> + interrupt-parent = <&gic>;
> + interrupts = ;
> + };
> +
> + uart0: uart@1D4D09C0 {

serial@1d4d09c0

> + compatible = "ns16550a";
> + reg = <0x1D4D09C0 0x100>;

lowercase

> +
> + interrupt-parent = <&icu>;
> + interrupts = <3>;
> +
> + clock-frequency = <27000>;
> +
> + reg-shift = <2>;
> + reg-io-width = <4>;
> + };
> diff --git a/Documentation/devicetree/bindings/mips/satixfy/sx3000b.txt 
> b/Documentation/devicetree/bindings/mips/satixfy/sx3000b.txt
> index 000..7cae67b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mips/satixfy/sx3000b.txt
> @@ -0,0 +1,37 @@
> +Satixfy SX3000b SoC
> +=
> +
> +Required properties:
> +
> + - compatible: Must include "satixfy,sx3000".

The "b" in SX3000b is not significant?

> +
> +CPU nodes:
> +--
> +A "cpus" node is required.  Required properties:
> + - #address-cells: Must be 1.
> + - #size-cells: Must be 0.
> +A CPU sub-node is also required for at least CPU 0.  Since the topology may
> +be probed via CPS, it is not necessary to specify secondary CPUs.  Required
> +properties:
> + - device_type: Must be "cpu".
> + - compatible: Must be "mti,interaptiv".
> + - reg: CPU number.
> + - clocks: Must include the CPU clock.  See ../../clock/clock-bindings.txt 
> for
> +   details on clock bindings.
> +Example:
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + cpu@0 {
> + device_type = "cpu";
> + compatible = "mti,interaptiv";
> + clocks  = <&ext>;
> + reg = <0>;
> + };
> + };
> +
> +Interrupt controllers:
> +--
> +Two nodes are required:
> + - mips,gic - MIPS Global Interrupt Controller - see 
> Documentation/devicetree/bindings/interrupt-controller/mips-gic.txt
> + - satixfy,icu - SX3000b SoC Interrupt Controller Unit - see 
> Documentation/devicetree/bindings/interrupt-controller/satixfy-icu.txt
> diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
> b/Documentation/devicetree/bindings/vendor-prefixes.txt
> index ec0bfb9..76819dd
> --- a/Documentation/devicetree/bindings/vendor-prefixes.txt
> +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
> @@ -261,6 +261,7 @@ rockchip  Fuzhou Rockchip Electronics Co., Ltd
>  samsung  Samsung Semiconductor
>  samtec   Samtec/Softing company
>  sandisk  Sandisk Corporation
> +satixfy Satixfy Technologies Ltd
>  sbs  Smart Battery System
>  schindlerSchindler
>  seagate  Seagate Technology PLC
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 6/9] coresight: add support for CPU debug module

2017-03-28 Thread Leo Yan
Hi Mathieu,

On Tue, Mar 28, 2017 at 10:50:10AM -0600, Mathieu Poirier wrote:
> On Sun, Mar 26, 2017 at 02:23:14AM +0800, Leo Yan wrote:

[...]

> > +static void debug_force_cpu_powered_up(struct debug_drvdata *drvdata)
> > +{
> > +   int timeout = DEBUG_WAIT_TIMEOUT;
> > +
> > +   drvdata->edprsr = readl_relaxed(drvdata->base + EDPRSR);
> > +
> > +   CS_UNLOCK(drvdata->base);
> > +
> > +   /* Bail out if CPU is powered up yet */
> > +   if (drvdata->edprsr & EDPRSR_PU)
> > +   goto out_powered_up;
> > +
> > +   /*
> > +* Send request to power management controller and assert
> > +* DBGPWRUPREQ signal; if power management controller has
> > +* sane implementation, it should enable CPU power domain
> > +* in case CPU is in low power state.
> > +*/
> > +   drvdata->edprsr = readl(drvdata->base + EDPRCR);
> > +   drvdata->edprsr |= EDPRCR_COREPURQ;
> > +   writel(drvdata->edprsr, drvdata->base + EDPRCR);
> 
> Here ->edprsr is used but EDPRCR is accessed.  Is this intentional or a
> copy/paste error?  The same is true for accesses in the out_powered_up 
> section.

Thanks for pointing out. This is a typo error and will fix.

> > +
> > +   /* Wait for CPU to be powered up (timeout~=32ms) */
> > +   while (timeout--) {
> > +   drvdata->edprsr = readl_relaxed(drvdata->base + EDPRSR);
> > +   if (drvdata->edprsr & EDPRSR_PU)
> > +   break;
> > +
> > +   usleep_range(1000, 2000);
> > +   }
> 
> See if function readx_poll_timeout() can be used.

Will use it.

> > +
> > +   /*
> > +* Unfortunately the CPU cannot be powered up, so return
> > +* back and later has no permission to access other
> > +* registers. For this case, should set 'idle_constraint'
> > +* to ensure CPU power domain is enabled!
> > +*/
> > +   if (!(drvdata->edprsr & EDPRSR_PU)) {
> > +   pr_err("%s: power up request for CPU%d failed\n",
> > +   __func__, drvdata->cpu);
> > +   goto out;
> > +   }
> > +
> > +out_powered_up:
> > +   debug_os_unlock(drvdata);
> > +
> > +   /*
> > +* At this point the CPU is powered up, so set the no powerdown
> > +* request bit so we don't lose power and emulate power down.
> > +*/
> > +   drvdata->edprsr = readl(drvdata->base + EDPRCR);
> > +   drvdata->edprsr |= EDPRCR_COREPURQ | EDPRCR_CORENPDRQ;
> 
> If we are here the core is already up.  Shouldn't we need to set
> EDPRCR_CORENPDRQ only?

Yeah. Will fix.

> > +   writel(drvdata->edprsr, drvdata->base + EDPRCR);
> 
> This section is a little racy - between the time the PU bit has been
> checked and the time COREPDRQ has been flipped, the state of PU may have
> changed.  You can probably get around this by checking edprsr.PU rigth here.  
> If
> it is not set you go through the process again.  Note that doing this will
> probably force a refactoring of the whole function.  

Agree. Will handle this.

[...]

> > +static ssize_t debug_func_knob_write(struct file *f,
> > +   const char __user *buf, size_t count, loff_t *ppos)
> > +{
> > e   u8 on;
> > +   int ret;
> > +
> > +   ret = kstrtou8_from_user(buf, count, 2, &on);
> > +   if (ret)
> > +   return ret;
> > +
> > +   mutex_lock(&debug_lock);
> > +
> > +   if (!on ^ debug_enable)
> > +   goto out;
> 
> I had to read this condition too many times - please refactor.

Will do it.

> > +
> > +   if (on) {
> > +   ret = debug_enable_func();
> > +   if (ret) {
> > +   pr_err("%s: unable to disable debug function: %d\n",
> > +  __func__, ret);
> 
> Based on the semantic this is the wrong error message.

Yeah. Will fix.

> > +   goto err;
> > +   }
> > +   } else
> > +   debug_disable_func();
> 
> Did checkpatch.pl complain about extra curly braces?  If not please add them.

checkpatch.pl doesn't report for this. Will add.

> > +
> > +   debug_enable = on;
> 
> Here we can't set debug_enable if we just called debug_disable_func().  Maybe
> I'm missing something.  If that's the case a comment in the code would be 
> worth
> it.

After called debug_disable_func(), debug_enable is set to 0 (on = 0).

> > +
> > +out:
> > +   ret = count;
> > +err:
> > +   mutex_unlock(&debug_lock);
> > +   return ret;
> > +}
> > +
> > +static ssize_t debug_func_knob_read(struct file *f,
> > +   char __user *ubuf, size_t count, loff_t *ppos)
> > +{
> > +   char val[] = { '0' + debug_enable, '\n' };
> > +
> > +   return simple_read_from_buffer(ubuf, count, ppos, val, sizeof(val));
> 
> Use the debug_lock to avoid race conditions.

Will do it.

> > +}
> > +
> > +static ssize_t debug_idle_constraint_write(struct file *f,
> > +   const char __user *buf, size_t count, loff_t *ppos)
> > +{
> > +   int val;
> > +   ssize_t ret;
> > +
> > +   ret = kstrtoint_from_user(buf, count, 10, &val);
> > +   if (ret)
> > +   return ret;
> > +
> > +   mutex_lock(&debug_lock);
> > +   idle_constra

Re: [PATCH v5 6/9] coresight: add support for CPU debug module

2017-03-28 Thread Leo Yan
Hi Suzuki,

On Mon, Mar 27, 2017 at 05:34:57PM +0100, Suzuki K Poulose wrote:
> On 25/03/17 18:23, Leo Yan wrote:

[...]

> Leo,
> 
> Thanks a lot for the quick rework. I don't fully understand (yet!) why we 
> need the
> idle_constraint. I will leave it for Sudeep to comment on it, as he is the 
> expert
> in that area. Some minor comments below.

Thanks a lot for quick reviewing :)

> >Signed-off-by: Leo Yan 
> >---
> > drivers/hwtracing/coresight/Kconfig   |  11 +
> > drivers/hwtracing/coresight/Makefile  |   1 +
> > drivers/hwtracing/coresight/coresight-cpu-debug.c | 704 
> > ++
> > 3 files changed, 716 insertions(+)
> > create mode 100644 drivers/hwtracing/coresight/coresight-cpu-debug.c
> >
> >diff --git a/drivers/hwtracing/coresight/Kconfig 
> >b/drivers/hwtracing/coresight/Kconfig
> >index 130cb21..18d7931 100644
> >--- a/drivers/hwtracing/coresight/Kconfig
> >+++ b/drivers/hwtracing/coresight/Kconfig
> >@@ -89,4 +89,15 @@ config CORESIGHT_STM
> >   logging useful software events or data coming from various entities
> >   in the system, possibly running different OSs
> >
> >+config CORESIGHT_CPU_DEBUG
> >+tristate "CoreSight CPU Debug driver"
> >+depends on ARM || ARM64
> >+depends on DEBUG_FS
> >+help
> >+  This driver provides support for coresight debugging module. This
> >+  is primarily used to dump sample-based profiling registers when
> >+  system triggers panic, the driver will parse context registers so
> >+  can quickly get to know program counter (PC), secure state,
> >+  exception level, etc.
> 
> May be we should mention/warn the user about the possible caveats of using
> this feature to help him make a better decision ? And / Or we should add a 
> documentation
> for it. We have collected some real good information over the discussions and
> it is a good idea to capture it somewhere.

Sure, I will add a documentation for this.

[...]

> >+static struct pm_qos_request debug_qos_req;
> >+static int idle_constraint = PM_QOS_DEFAULT_VALUE;
> >+module_param(idle_constraint, int, 0600);
> >+MODULE_PARM_DESC(idle_constraint, "Latency requirement in microseconds for 
> >CPU "
> >+ "idle states (default is -1, which means have no limiation "
> >+ "to CPU idle states; 0 means disabling all idle states; user "
> >+ "can choose other platform dependent values so can disable "
> >+ "specific idle states for the platform)");
> 
> Correct me if I am wrong,
> 
> All we want to do is disable the CPUIdle explicitly if the user knows that 
> this
> could be a problem to use CPU debug on his platform. So, in effect, we should
> only be using idle_constraint = 0 or -1.
> 
> In which case, we could make it easier for the user to tell us, either
> 
>  0 - Don't do anything with CPUIdle (default)
>  1 - Disable CPUIdle for me as I know the platform has issues with CPU debug 
> and CPUidle.

The reason for not using bool flag is: usually SoC may have many idle
states, so if user wants to partially enable some states then can set
the latency to constraint.

But of course, we can change this to binary value as you suggested,
this means turn on of turn off all states. The only one reason to use
latency value is it is more friendly for hardware design, e.g. some
platforms can enable partial states to save power and avoid overheat
after using this driver.

If you guys think this is a bit over design, I will follow up your
suggestion. I also have some replying in Mathieu's reviewing, please
help review as well.

> than explaining the miscrosecond latency etc and make the appropriate calls 
> underneath.
> something like (not necessarily the same name) :
> 
> module_param(broken_with_cpuidle, bool, 0600);
> MODULE_PARAM_DESC(broken_with_cpuidle, "Specifies whether the CPU debug has 
> issues with CPUIdle on"
>  " the platform. Non-zero value implies 
> CPUIdle has to be"
>  " explicitly disabled.",);

[...]

> >+/*
> >+ * Unfortunately the CPU cannot be powered up, so return
> >+ * back and later has no permission to access other
> >+ * registers. For this case, should set 'idle_constraint'
> >+ * to ensure CPU power domain is enabled!
> >+ */
> >+if (!(drvdata->edprsr & EDPRSR_PU)) {
> >+pr_err("%s: power up request for CPU%d failed\n",
> >+__func__, drvdata->cpu);
> >+goto out;
> >+}
> >+
> >+out_powered_up:
> >+debug_os_unlock(drvdata);
> 
> Question: Do we need a matching debug_os_lock() once we are done ?

I have checked ARM ARMv8, but there have no detailed description for
this. I refered coresight-etmv4 code and Mike's pseudo code, ther have
no debug_os_lock() related operations.

Mike, Mathieu, could you also help confirm this?

[...]

> >+static void debug_init_arch_data(void *info)
> >+{
> >+struct debug_drvdata *drvdat

[RFC PATCH tip/master 0/3] kprobes: tracing: kretprobe_instance dynamic allocation

2017-03-28 Thread Masami Hiramatsu
Here is a correction of patches to introduce kretprobe_instance
dynamic allocation for avoiding kretprobe silently miss-hits.

Original issue was reported by Lukasz on linux-trace ml.
 https://www.spinics.net/lists/linux-trace/msg00448.html

Also Alban is working on kprobe-tracer side because of
iovisor's issue.
 https://www.spinics.net/lists/netdev/msg427149.html
(Note that this series is independently applicable, no conflict)

This series is a kind of complementary patches for
Alban's patch. So I think both of them are needed.

 [1/3]: Add kretprobe's miss-hit counter to miss-hit
column on kprobe_profile. This helps user to
see what happened.
 [2/3]: Introduce kretprobe_instance dynamic allocation.
This will help user not to miss the ret probes
even it has low number of maxactive.
 [3/3]: Set maximum limitation for pre-allocated maxactive.
This can avoid miss configuration of struct kretprobe.

The downside of this patch is, dynamic allocation will
involve memory allocation, which sometimes traced by
kprobes. In that case those nested kprobes are missed.
To avoid this kind of miss-hit, user may need to make
maxactive enough large when registering kretprobes.

However, in other case, this can reduce the possibility
of miss-hit of kretprobes. Since the maxactive increased
automatically, user will not need to retry tracing with
larger maxactive.

Alban, you can reuse KRETPROBE_MAXACTIVE_ALLOC for checking
upper limiation in trace_kprobe.c too.

Thank you,

---

Masami Hiramatsu (3):
  trace: kprobes: Show sum of probe/retprobe nmissed count
  kprobes: Allocate kretprobe instance if its free list is empty
  kprobes: Limit kretprobe maximum instances


 Documentation/kprobes.txt   |   25 +++-
 include/linux/kprobes.h |2 ++
 kernel/kprobes.c|   44 +++
 kernel/trace/trace_kprobe.c |2 +-
 4 files changed, 50 insertions(+), 23 deletions(-)

--
Masami Hiramatsu (Linaro) 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH tip/master 1/3] trace: kprobes: Show sum of probe/retprobe nmissed count

2017-03-28 Thread Masami Hiramatsu
Show sum of probe and retprobe nmissed count in
kprobe_profile, since retprobe can be missed even
if the kprobe itself succeeeded.
This explains user why their return probe didn't hit
sometimes.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 013f4e7..bbdc3de 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -896,7 +896,7 @@ static int probes_profile_seq_show(struct seq_file *m, void 
*v)
seq_printf(m, "  %-44s %15lu %15lu\n",
   trace_event_name(&tk->tp.call),
   trace_kprobe_nhit(tk),
-  tk->rp.kp.nmissed);
+  tk->rp.kp.nmissed + tk->rp.nmissed);
 
return 0;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH tip/master 2/3] kprobes: Allocate kretprobe instance if its free list is empty

2017-03-28 Thread Masami Hiramatsu
Try to allocate kretprobe instance by GFP_ATOMIC if kretprobe's
free list is empty. This can prevent kretprobe miss-hit on the
function which can be heavily invoked and slept inside (like
locking syscall entries.)

NOTE: This may easily cause nested kprobe call which will be
just skipped (but nmissed count is incremented), if someone
probe functions on the memory allocation path.

Signed-off-by: Masami Hiramatsu 
---
 Documentation/kprobes.txt |   25 +++--
 include/linux/kprobes.h   |2 ++
 kernel/kprobes.c  |   41 +
 3 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 1f6d45a..2de6533 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -131,11 +131,13 @@ kretprobe, then sets the saved instruction pointer to the 
saved return
 address, and that's where execution resumes upon return from the trap.
 
 While the probed function is executing, its return address is
-stored in an object of type kretprobe_instance.  Before calling
-register_kretprobe(), the user sets the maxactive field of the
-kretprobe struct to specify how many instances of the specified
-function can be probed simultaneously.  register_kretprobe()
-pre-allocates the indicated number of kretprobe_instance objects.
+stored in an object of type kretprobe_instance. Usually, this
+kretprobe_instance will be allocated dynamically.
+Since the dynamic allocation can cause mis-hit of other probes
+on memory allocation path, user can set maxactive field for pooling
+pre-allocated instances before calling register_kretprobe().
+This maxactive indicates how many instances of the specified
+function can be probed simultaneously.
 
 For example, if the function is non-recursive and is called with a
 spinlock held, maxactive = 1 should be enough.  If the function is
@@ -144,11 +146,14 @@ or preemption), NR_CPUS should be enough.  If maxactive 
<= 0, it is
 set to a default value.  If CONFIG_PREEMPT is enabled, the default
 is max(10, 2*NR_CPUS).  Otherwise, the default is NR_CPUS.
 
-It's not a disaster if you set maxactive too low; you'll just miss
-some probes.  In the kretprobe struct, the nmissed field is set to
-zero when the return probe is registered, and is incremented every
-time the probed function is entered but there is no kretprobe_instance
-object available for establishing the return probe.
+It's not a disaster if you set maxactive too low; you'll just see
+some probes on memory allocation path missed if it exists.
+
+In the kretprobe struct, the nmissed field is set to zero when the
+return probe is registered, and is incremented every time the probed
+function is entered but there is no kretprobe_instance object available
+and it fails to allocate new one, or hit the upper limit of maxactive
+(KRETPROBE_MAXACTIVE_ALLOC, currently it is 4096.)
 
 1.3.2 Kretprobe entry-handler
 
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 47e4da5..8064e14 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -192,6 +192,8 @@ struct kretprobe {
struct hlist_head free_instances;
raw_spinlock_t lock;
 };
+/* Upper limit of maxactive for dynamic allocation */
+#define KRETPROBE_MAXACTIVE_ALLOC 4096
 
 struct kretprobe_instance {
struct hlist_node hlist;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d733479..75c5390 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -57,7 +57,6 @@
 #define KPROBE_HASH_BITS 6
 #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
 
-
 /*
  * Some oddball architectures like 64bit powerpc have function descriptors
  * so this must be overridable.
@@ -1824,6 +1823,30 @@ void unregister_jprobes(struct jprobe **jps, int num)
 EXPORT_SYMBOL_GPL(unregister_jprobes);
 
 #ifdef CONFIG_KRETPROBES
+
+/* Try to use free instance first, if failed, try to allocate new instance */
+struct kretprobe_instance *kretprobe_alloc_instance(struct kretprobe *rp)
+{
+   struct kretprobe_instance *ri = NULL;
+   unsigned long flags = 0;
+
+   raw_spin_lock_irqsave(&rp->lock, flags);
+   if (!hlist_empty(&rp->free_instances)) {
+   ri = hlist_entry(rp->free_instances.first,
+   struct kretprobe_instance, hlist);
+   hlist_del(&ri->hlist);
+   }
+   raw_spin_unlock_irqrestore(&rp->lock, flags);
+
+   /* Populate max active instance if possible */
+   if (!ri && rp->maxactive < KRETPROBE_MAXACTIVE_ALLOC) {
+   ri = kmalloc(sizeof(*ri) + rp->data_size, GFP_ATOMIC);
+   if (ri)
+   rp->maxactive++;
+   }
+
+   return ri;
+}
 /*
  * This kprobe pre_handler is registered with every kretprobe. When probe
  * hits it will set up the return probe.
@@ -1846,14 +1869,8 @@ static int pre_handler_kretprobe(struct kprobe *p, 
struct pt_regs *regs)
}
 
/* TODO: consider to only swap the

[RFC PATCH tip/master 3/3] kprobes: Limit kretprobe maximum instances

2017-03-28 Thread Masami Hiramatsu
Limit kretprobe maximum instance up to MAXACTIVE_ALLOC.
Without this limit, kretprobe user can specify huge number
(e.g. forget to zero-fill struct kretprobe) to maxactive
and may cause out-of-memory.

Signed-off-by: Masami Hiramatsu 
---
 kernel/kprobes.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 75c5390..f1bebcf 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1942,6 +1942,9 @@ int register_kretprobe(struct kretprobe *rp)
rp->kp.break_handler = NULL;
 
/* Pre-allocate memory for max kretprobe instances */
+   if (rp->maxactive > KRETPROBE_MAXACTIVE_ALLOC)
+   return -E2BIG;
+
if (rp->maxactive <= 0) {
 #ifdef CONFIG_PREEMPT
rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus());

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH tip/master 2/3] kprobes: Allocate kretprobe instance if its free list is empty

2017-03-28 Thread Ingo Molnar

* Masami Hiramatsu  wrote:

> @@ -1824,6 +1823,30 @@ void unregister_jprobes(struct jprobe **jps, int num)
>  EXPORT_SYMBOL_GPL(unregister_jprobes);
>  
>  #ifdef CONFIG_KRETPROBES
> +
> +/* Try to use free instance first, if failed, try to allocate new instance */
> +struct kretprobe_instance *kretprobe_alloc_instance(struct kretprobe *rp)
> +{
> + struct kretprobe_instance *ri = NULL;
> + unsigned long flags = 0;
> +
> + raw_spin_lock_irqsave(&rp->lock, flags);
> + if (!hlist_empty(&rp->free_instances)) {
> + ri = hlist_entry(rp->free_instances.first,
> + struct kretprobe_instance, hlist);
> + hlist_del(&ri->hlist);
> + }
> + raw_spin_unlock_irqrestore(&rp->lock, flags);
> +
> + /* Populate max active instance if possible */
> + if (!ri && rp->maxactive < KRETPROBE_MAXACTIVE_ALLOC) {
> + ri = kmalloc(sizeof(*ri) + rp->data_size, GFP_ATOMIC);
> + if (ri)
> + rp->maxactive++;
> + }
> +
> + return ri;
> +}
>  /*
>   * This kprobe pre_handler is registered with every kretprobe. When probe
>   * hits it will set up the return probe.
> @@ -1846,14 +1869,8 @@ static int pre_handler_kretprobe(struct kprobe *p, 
> struct pt_regs *regs)
>   }
>  
>   /* TODO: consider to only swap the RA after the last pre_handler fired 
> */
> - hash = hash_ptr(current, KPROBE_HASH_BITS);
> - raw_spin_lock_irqsave(&rp->lock, flags);
> - if (!hlist_empty(&rp->free_instances)) {
> - ri = hlist_entry(rp->free_instances.first,
> - struct kretprobe_instance, hlist);
> - hlist_del(&ri->hlist);
> - raw_spin_unlock_irqrestore(&rp->lock, flags);
> -
> + ri = kretprobe_alloc_instance(rp);
> + if (ri) {
>   ri->rp = rp;
>   ri->task = current;
>  
> @@ -1868,13 +1885,13 @@ static int pre_handler_kretprobe(struct kprobe *p, 
> struct pt_regs *regs)
>  
>   /* XXX(hch): why is there no hlist_move_head? */
>   INIT_HLIST_NODE(&ri->hlist);
> + hash = hash_ptr(current, KPROBE_HASH_BITS);
>   kretprobe_table_lock(hash, &flags);
>   hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
>   kretprobe_table_unlock(hash, &flags);
> - } else {
> + } else
>   rp->nmissed++;
> - raw_spin_unlock_irqrestore(&rp->lock, flags);
> - }
> +
>   return 0;
>  }
>  NOKPROBE_SYMBOL(pre_handler_kretprobe);

So this is something I missed while the original code was merged, but the 
concept 
looks a bit weird: why do we do any "allocation" while a handler is executing?

That's fundamentally fragile. What's the maximum number of parallel 
'kretprobe_instance' required per kretprobe - one per CPU?

If so then we should preallocate all of them when they are installed and not do 
any alloc/free dance when executing them.

This will also speed them up, and increase robustness all around.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html