"interesting" entry in hibernation code was Re: [lkp-robot] [x86/asm] 51bad67ffb: int3:#[##]

2018-05-19 Thread Pavel Machek
Hi!

> Side note: doing some grepping, I find some other sequences that are a bit
> scary, like this:
> 
> arch/x86/kernel/acpi/wakeup_32.S-.data
> arch/x86/kernel/acpi/wakeup_32.S-ALIGN
> arch/x86/kernel/acpi/wakeup_32.S:ENTRY(saved_magic) .long   0
> arch/x86/kernel/acpi/wakeup_32.S:ENTRY(saved_eip)   .long   0
> 
> so apparently people are using ENTRY() for data too (the same pattern
> exists in wakeup_64.S).
> 
> So we end up having those odd 0x90 bytes (now 0xcc) in the data section as
> "padding" between those two values. Crazy.

Sorry about that. I'm pretty sure intention was simply to use the
variable from C code.. and ENTRY() worked. I was not aware that it has
side effect of padding...

Let me see how this can be improved... (untested).

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 0c26b1b..d6f477f 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -89,8 +89,8 @@ ret_point:
 
 .data
 ALIGN
-ENTRY(saved_magic) .long   0
-ENTRY(saved_eip)   .long   0
+GLOBAL(saved_magic).long   0
+saved_eip: .long   0
 
 # saved registers
 saved_idt: .long   0,0


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH v2] usbip: vhci_sysfs: fix potential Spectre v1

2018-05-19 Thread Greg Kroah-Hartman
On Fri, May 18, 2018 at 05:27:22PM -0500, Gustavo A. R. Silva wrote:
> 
> 
> On 05/18/2018 11:06 AM, Shuah Khan wrote:
> > On 05/18/2018 07:47 AM, Greg Kroah-Hartman wrote:
> > > On Thu, May 17, 2018 at 03:16:28PM -0500, Gustavo A. R. Silva wrote:
> > > > pdev_nr and rhport can be controlled by user-space, hence leading to
> > > > a potential exploitation of the Spectre variant 1 vulnerability.
> > > > 
> > > > This issue was detected with the help of Smatch:
> > > > drivers/usb/usbip/vhci_sysfs.c:238 detach_store() warn: potential 
> > > > spectre issue 'vhcis'
> > > > drivers/usb/usbip/vhci_sysfs.c:328 attach_store() warn: potential 
> > > > spectre issue 'vhcis'
> > > > drivers/usb/usbip/vhci_sysfs.c:338 attach_store() warn: potential 
> > > > spectre issue 'vhci->vhci_hcd_ss->vdev'
> > > > drivers/usb/usbip/vhci_sysfs.c:340 attach_store() warn: potential 
> > > > spectre issue 'vhci->vhci_hcd_hs->vdev'
> > > > 
> > > > Fix this by sanitizing pdev_nr and rhport before using them to index
> > > > vhcis and vhci->vhci_hcd_ss->vdev respectively.
> > > > 
> > > > Notice that given that speculation windows are large, the policy is
> > > > to kill the speculation on the first load and not worry if it can be
> > > > completed with a dependent load/store [1].
> > > > 
> > > > [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
> > > > 
> > > > Cc: sta...@vger.kernel.org
> > > > Signed-off-by: Gustavo A. R. Silva 
> > > > ---
> > > > Changes in v2:
> > > >   - Place the barriers into valid_port.
> > attach_store() doesn't call valid_port() - can you make the change to
> > have attach_store() call valid_port() to protect that code path.
> > 
> > > 
> > > Thanks for the change.  I'll wait for Shuah's ack/review before queueing
> > > this up just as she knows that codebase much better than anyone else.
> > > > 
> > 
> 
> Greg,
> 
> I've been talking with Dan Williams (intel) about this kind of issues [1]
> and it seems my original assumptions are correct. Hence, this patch is not
> useful and, in order to actually prevent speculation here we would need to
> pass the address of pdev_nr and rhport into valid_port, otherwise there may
> be speculation at drivers/usb/usbip/vhci_sysfs.c:235:
> 
> if (!valid_port(pdev_nr, rhport))
> return -EINVAL;
> 
> hcd = platform_get_drvdata(vhcis[pdev_nr].pdev);

Ah, yes, sorry, you do need to pass the address through, my mistake
completely.  But the location for the checking is still the right place
to do it, so I was half-right :)

thanks

greg k-h


Re: [PATCH v3 2/6] mfd: at91-usart: added mfd driver for usart

2018-05-19 Thread Alexandre Belloni
On 18/05/2018 17:19:49-0500, Rob Herring wrote:
> On Fri, May 11, 2018 at 01:38:18PM +0300, Radu Pirea wrote:
> > This mfd driver is just a wrapper over atmel_serial driver and
> > spi-at91-usart driver. Selection of one of the drivers is based on a
> > property from device tree. If the property is not specified, the default
> > driver is atmel_serial.
> > 
> > Signed-off-by: Radu Pirea 
> > ---
> >  drivers/mfd/Kconfig  | 10 
> >  drivers/mfd/Makefile |  1 +
> >  drivers/mfd/at91-usart.c | 75 
> >  include/dt-bindings/mfd/at91-usart.h | 17 +++
> >  4 files changed, 103 insertions(+)
> >  create mode 100644 drivers/mfd/at91-usart.c
> >  create mode 100644 include/dt-bindings/mfd/at91-usart.h
> > 
> 
> > +#ifndef __DT_BINDINGS_AT91_USART_H__
> > +#define __DT_BINDINGS_AT91_USART_H__
> > +
> > +#define AT91_USART_MODE_SERIAL 1
> > +#define AT91_USART_MODE_SPI2
> 
> Won't this require a DT update for serial mode to add the mode property? 
> That breaks compatibility.
> 

If the mode property is not present, it defaults to serial to keep
compatibility.


-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH v2 12/26] drm/sun4i: Add support for multiple DW HDMI PHY clock parents

2018-05-19 Thread Jernej Škrabec
Hi,

Dne petek, 18. maj 2018 ob 17:26:51 CEST je Maxime Ripard napisal(a):
> On Fri, May 18, 2018 at 04:46:41PM +0200, Jernej Škrabec wrote:
> > > And this is a bit sloppy, since if phy_clk_num == 3, you won't try to
> > > lookup pll-2 either.
> > 
> > It is highly unlikely this will be higher than 2, at least for this HDMI
> > PHY, since it has only 1 bit reserved for parent selection. But since I
> > have to fix it, I'll add ">= 2"
> 
> If we're only going to have two parents at most, ever, why don't we
> had just a single other boolean. This would be less intrusive, and we
> wouldn't have to check for those corner cases.

It seems that usage of "bool" data type in structures is not wanted anymore 
according to checkpatch and this: https://lkml.org/lkml/2017/11/21/384

I guess I'll use "unsigned int" as recommended by Linus and named it 
"has_second_parent" to be unambigous that it's boolean in reality.

Best regards,
Jernej

> 
> > BTW, I'll resend fixed version of this patch for my R40 HDMI series, since
> > there is nothing to hold it back, unlike for this.
> 
> Awesome, thanks!
> Maxime
> 
> --
> Maxime Ripard, Bootlin (formerly Free Electrons)
> Embedded Linux and Kernel engineering
> https://bootlin.com






Re: [PATCH 4.16 00/55] 4.16.10-stable review

2018-05-19 Thread Greg Kroah-Hartman
On Fri, May 18, 2018 at 02:45:08PM -0600, Shuah Khan wrote:
> On 05/18/2018 02:14 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.16.10 release.
> > There are 55 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun May 20 08:14:42 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.10-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.16.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Great, thanks for testing all 3 of these and letting me know.

greg k-h


Re: [PATCH v3 1/6] x86/stacktrace: do not unwind after user regs

2018-05-19 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Fri, May 18, 2018 at 08:55:47AM +0200, Ingo Molnar wrote:
> > 
> > * Jiri Slaby  wrote:
> > 
> > > Josh pointed out, that there is no way a frame can be after user regs.
> > > So remove the last unwind and the check.
> > > 
> > > Signed-off-by: Jiri Slaby 
> > > Cc: Thomas Gleixner 
> > > Cc: Ingo Molnar 
> > > Cc: "H. Peter Anvin" 
> > > Cc: x...@kernel.org
> > > Cc: Josh Poimboeuf 
> > 
> > Josh: an Acked-by or Reviewed-by for the whole series from you would be 
> > nice.
> 
> The patches look good, but I want to run it through some testing before
> I give them an ACK.

Sure thing and thanks!

Ingo


Re: [PATCH 4.16 00/55] 4.16.10-stable review

2018-05-19 Thread Greg Kroah-Hartman
On Sat, May 19, 2018 at 12:55:56AM +0530, Naresh Kamboju wrote:
> On 18 May 2018 at 13:44, Greg Kroah-Hartman  
> wrote:
> > This is the start of the stable review cycle for the 4.16.10 release.
> > There are 55 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun May 20 08:14:42 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.10-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.16.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> 
> Results from Linaro’s test farm.
> No regressions on arm64, arm and x86_64.

Wonderful, thanks for testing these and letting me know.

greg k-h


Re: [PATCH v6 17/28] x86/asm: use SYM_INNER_LABEL instead of GLOBAL

2018-05-19 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Fri, May 18, 2018 at 2:17 AM Jiri Slaby  wrote:
> 
> > GLOBAL had several meanings and is going away. In this patch, convert
> > all the inner function labels marked with GLOBAL to use SYM_INNER_LABEL
> > instead.
> 
> > Note that retint_user needs not be global, perhaps since commit
> > 2ec67971facc ("x86/entry/64/compat: Remove most of the fast system call
> > machinery"), where entry_64_compat's caller was removed. So mark the
> > label as LOCAL.
> 
> 
> > -GLOBAL(entry_SYSCALL_64_after_hwframe)
> > +SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
> 
> I've missed all the context here.   I agree that GLOBAL is misleading, and
> "inner label" is nice.  But this is a rather wordy macro.  Would:
> 
> INNER_LABEL_GLOBAL(name)
> 
> be better?  (With just INNER_LABEL(name) for the local version?)

Please keep the SYM_ global namespace for all these symbol macros - but the 
rest 
of the name can be shortened.

Thanks,

Ingo


WARNING in xfrm6_tunnel_net_exit (2)

2018-05-19 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:2c71d338bef2 Merge tag 'powerpc-4.17-6' of git://git.kerne..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12a7bd5780
kernel config:  https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
dashboard link: https://syzkaller.appspot.com/bug?extid=e9aebef558e3ed673934
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=17409d5780

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e9aebef558e3ed673...@syzkaller.appspotmail.com

bond0: Enslaving bond_slave_1 as an active interface with an up link
IPv6: ADDRCONF(NETDEV_UP): veth0_to_bond: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bond: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bond: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bond: link becomes ready
WARNING: CPU: 1 PID: 6 at net/ipv6/xfrm6_tunnel.c:348  
xfrm6_tunnel_net_exit+0x2df/0x510 net/ipv6/xfrm6_tunnel.c:348

IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bond: link becomes ready
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 4.17.0-rc5+ #57
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bond: link becomes ready
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: netns cleanup_net
IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bond: link becomes ready
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 panic+0x22f/0x4de kernel/panic.c:184
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bond: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bond: link becomes ready
 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
IPv6: ADDRCONF(NETDEV_UP): veth0_to_bond: link is not ready
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:xfrm6_tunnel_net_exit+0x2df/0x510 net/ipv6/xfrm6_tunnel.c:348
RSP: 0018:8801d9a973d8 EFLAGS: 00010293
RAX: 8801d9a88180 RBX: 8801b6eda2b8 RCX: 868ff5f5
RDX:  RSI: 868ff5ff RDI: 0007
RBP: 8801d9a974f8 R08: 8801d9a88180 R09: 0006
R10: 8801d9a88180 R11:  R12: 00ff
R13: ed003b352e82 R14: 8801d9a974d0 R15: 8801b32f0700
 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
 kthread+0x345/0x410 kernel/kthread.c:240
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


Re: [PATCH] objtool: Detect assembly code falling through to INT3 padding

2018-05-19 Thread hpa
On May 18, 2018 10:51:36 AM PDT, Alexey Dobriyan  wrote:
>On Fri, May 18, 2018 at 09:18:14AM +0200, Ingo Molnar wrote:
>> The concept of built-in kernel tooling working at the machine code
>level is just 
>> so powerful - we should have added our own KCC compiler 20 years ago.
>
>...for two very serious reasons
>
>* C as a language moves very slowly, last help from the comittee were
>  C99 intializers which are OK, but, say, memory model was explictly
>  rejected. However the project expands and becomes more complex much
>  faster than C working group sets up meetings. Compiler authors help
>with extensions but ultimately can not be relied on (see "inline"
>saga).
>
>  Recently everyone was celebrating new and improved min() and max()
> macros admiring creativity and knowledge of intricate language details
>  (me too, don't get this wrong).
>
>  Now this is how it can be done in a language which is not stupid:
>
>   constexpr int min(int a, int b)
>   {
>   return a < b ? a : b;
>   }
>
>  That's literally all. And you can also do
>
>   template
>   void min(T a, char b) = delete;
>
>   template
>   void min(char a, T b) = delete;
>
>  because "char" is char.
>
>  Having control over compiler things like that can be addded more
>  quickly.
>
>
>* insulating the project from the whims of compiler authors who every
>  once in a while use "undefined behaviour" or other kinds of language
>  lawyering to do strange things.
>
>  Other serious projects do this too. Database people use O_DIRECT
>  to insulate themselves from kernel people for the very same reasons.

Sounds like you are proposing switching to C++ more than anything else.

*Steps aside and grabs popcorn*
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH rdma-next 1/5] RDMA/hns: Implement the disassociate_ucontext API

2018-05-19 Thread Wei Hu (Xavier)


On 2018/5/17 23:00, Jason Gunthorpe wrote:
> On Thu, May 17, 2018 at 04:02:49PM +0800, Wei Hu (Xavier) wrote:
>> This patch Implements the IB core disassociate_ucontext API.
>>
>> Signed-off-by: Wei Hu (Xavier) 
>>  drivers/infiniband/hw/hns/hns_roce_main.c | 36 
>> +++
>>  1 file changed, 36 insertions(+)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
>> b/drivers/infiniband/hw/hns/hns_roce_main.c
>> index 96fb6a9..7fafe9d 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
>> @@ -33,6 +33,9 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -422,6 +425,38 @@ static int hns_roce_port_immutable(struct ib_device 
>> *ib_dev, u8 port_num,
>>  return 0;
>>  }
>>  
>> +static void hns_roce_disassociate_ucontext(struct ib_ucontext *ibcontext)
>> +{
>> +struct task_struct *process;
>> +struct mm_struct   *mm;
>> +
>> +process = get_pid_task(ibcontext->tgid, PIDTYPE_PID);
>> +if (!process)
>> +return;
>> +
>> +mm = get_task_mm(process);
>> +if (!mm) {
>> +pr_info("no mm, disassociate ucontext is pending task 
>> termination\n");
>> +while (1) {
>> +put_task_struct(process);
>> +usleep_range(1000, 2000);
>> +process = get_pid_task(ibcontext->tgid, PIDTYPE_PID);
>> +if (!process || process->state == TASK_DEAD) {
>> +pr_info("disassociate ucontext done, task was 
>> terminated\n");
>> +/* if task was dead, need to release the task
>> + * struct.
>> + */
>> +if (process)
>> +put_task_struct(process);
>> +return;
>> +}
>> +}
>> +}
> I don't want to see this boilerplate code copied into every
> driver. Hoist it into the core code, have the disassociate driver callback
> accept a mm_struct parameter, and refactor the other drivers using this.

When the userspace RDMA application process is suspended for some reason
without executing ibv_close_device function,
There will be calltrace as follows when rmmod roce kernel driver ko in
the current version.
It looks like a common problem to every driver and the code segment
above is suitable for every driver.
Pardon me for asking, but if you have any plan to do this?

root@(none)# rmmod
../ko/hns-roce-hw-v2.ko 


[ 1222.676069] INFO: task rmmod:1996 blocked for more than 120
seconds.
[ 1222.682423]   Not tainted 4.16.0-rc1-29112-ge237d0c-dirty
#15   
[ 1222.688507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.   
[ 1222.696327] rmmod   D0  1996   1951
0x  

[ 1222.701807] Call
trace:  
   

[ 1222.704252] 
__switch_to+0x9c/0xd8   
   

[ 1222.707644] 
__schedule+0x1d8/0x854  
   

[ 1222.711125] 
schedule+0x3c/0x9c  
   

[ 1222.714258] 
schedule_timeout+0x1dc/0x3f8
   

[ 1222.718260] 
wait_for_common+0x120/0x1e0 
   

[ 1222.722174] 
wait_for_completion+0x28/0x34   
   

[ 1222.726264] 
ib_uverbs_remove_one+0x29c/0x2bc
   

[ 1222.730614] 
ib_unregister_device+0xe8/0x198 
   

[ 1222.734888]  hns_roce_exit+0xb4/0xc4
[hns_roce]  
   

[ 1222.739414]  hns_roce_hw_v2_uninit_instance+0x24/0x40
[hns_roce_hw_v2]  
[ 1222.745934] 
hclge_uninit_client_instance+0x88/0xb8  
   

[ 1222.750803] 
hnae3_match_n_instantiate+0xbc/0xd0 
   

[ 1222.755411] 
hnae3_unregister_client+0x50/

Re: "interesting" entry in hibernation code was Re: [lkp-robot] [x86/asm] 51bad67ffb: int3:#[##]

2018-05-19 Thread Rafael J. Wysocki
On Saturday, May 19, 2018 9:00:08 AM CEST Pavel Machek wrote:
> Hi!
> 
> > Side note: doing some grepping, I find some other sequences that are a bit
> > scary, like this:
> > 
> > arch/x86/kernel/acpi/wakeup_32.S-.data
> > arch/x86/kernel/acpi/wakeup_32.S-ALIGN
> > arch/x86/kernel/acpi/wakeup_32.S:ENTRY(saved_magic) .long   0
> > arch/x86/kernel/acpi/wakeup_32.S:ENTRY(saved_eip)   .long   0
> > 
> > so apparently people are using ENTRY() for data too (the same pattern
> > exists in wakeup_64.S).
> > 
> > So we end up having those odd 0x90 bytes (now 0xcc) in the data section as
> > "padding" between those two values. Crazy.
> 
> Sorry about that. I'm pretty sure intention was simply to use the
> variable from C code.. and ENTRY() worked. I was not aware that it has
> side effect of padding...
> 
> Let me see how this can be improved... (untested).
> 
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S 
> b/arch/x86/kernel/acpi/wakeup_32.S
> index 0c26b1b..d6f477f 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -89,8 +89,8 @@ ret_point:
>  
>  .data
>  ALIGN
> -ENTRY(saved_magic)   .long   0
> -ENTRY(saved_eip) .long   0
> +GLOBAL(saved_magic)  .long   0
> +saved_eip:   .long   0
>  
>  # saved registers
>  saved_idt:   .long   0,0

The Jiri Slaby's annotation patches touch this:

https://patchwork.kernel.org/patch/10409073/

Thanks,
Rafael



Re: [PATCH 1/7] x86/boot/compressed/64: Fix trampoline page table address calculation

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> Hugh noticied that I calculate address of trampoline page table wrongly
> in cleanup_trampoline(). TRAMPOLINE_32BIT_PGTABLE_OFFSET has to be
> divided by sizeof(unsigned long) since trampoline_32bit is unsigned long
> pointer.
> 
> TRAMPOLINE_32BIT_PGTABLE_OFFSET is zero so the bug doesn't have a
> visible effect.
> 
> Signed-off-by: Kirill A. Shutemov 
> Reported-by: Hugh Dickins 
> Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page 
> table for trampoline")

Reviewed-by: Thomas Gleixner 


Re: [PATCH 2/7] x86/mm: Unify pgtable_l5_enabled usage in early boot code

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> Usually pgtable_l5_enabled is defined using cpu_feature_enabled().
> cpu_feature_enabled() is not available in early boot code. We use
> several different preprocessor tricks to get around it. It's messy.
> 
> Unify them all.
> 
> If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can
> be defined before all includes. It makes pgtable_l5_enabled rely on
> __pgtable_l5_enabled variable instead. This approach fits all early
> users.
> 
> Signed-off-by: Kirill A. Shutemov 

Reviewed-by: Thomas Gleixner 


Re: [PATCH 3/7] x86/mm: Stop pretending pgtable_l5_enabled is a variable

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer
> to it as a variable. This is misleading.
> 
> Make pgtable_l5_enabled() a function.
> 
> We cannot literally define it as a function due to circular dependencies
> between header files. Function-alike macros is close enough.
> 
> Signed-off-by: Kirill A. Shutemov 

Reviewed-by: Thomas Gleixner 


[PATCH 2/2] rtc: mxc_v2: let the core handle rtc range

2018-05-19 Thread Alexandre Belloni
This RTC is a 32-bit second counter.

This also solves an issue where mxc_rtc_set_alarm() can return with the
lock taken.

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-mxc_v2.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/rtc/rtc-mxc_v2.c b/drivers/rtc/rtc-mxc_v2.c
index 4cc121a41fe0..24ca74ca632a 100644
--- a/drivers/rtc/rtc-mxc_v2.c
+++ b/drivers/rtc/rtc-mxc_v2.c
@@ -165,11 +165,6 @@ static int mxc_rtc_set_time(struct device *dev, struct 
rtc_time *tm)
time64_t time = rtc_tm_to_time64(tm);
int ret;
 
-   if (time > U32_MAX) {
-   dev_err(dev, "RTC exceeded by %llus\n", time - U32_MAX);
-   return -EINVAL;
-   }
-
ret = mxc_rtc_lock(pdata);
if (ret)
return ret;
@@ -248,11 +243,6 @@ static int mxc_rtc_set_alarm(struct device *dev, struct 
rtc_wkalrm *alrm)
if (ret)
return ret;
 
-   if (time > U32_MAX) {
-   dev_err(dev, "Hopefully I am out of service by then :-(\n");
-   return -EINVAL;
-   }
-
writel((u32)time, pdata->ioaddr + SRTC_LPSAR);
 
/* clear alarm interrupt status bit */
@@ -348,6 +338,7 @@ static int mxc_rtc_probe(struct platform_device *pdev)
return PTR_ERR(pdata->rtc);
 
pdata->rtc->ops = &mxc_rtc_ops;
+   pdata->rtc->range_max = U32_MAX;
 
clk_disable(pdata->clk);
platform_set_drvdata(pdev, pdata);
-- 
2.17.0



Re: [PATCH 4/7] x86/mm: Introduce 'no5lvl' kernel parameter

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> The kernel parameter allows to force kernel to use 4-level paging even
> if hardware and kernel support 5-level paging.
> 
> The option may be useful to workaround regressions related to 5-level
> paging.
> 
> Signed-off-by: Kirill A. Shutemov 

Reviewed-by: Thomas Gleixner 


Re: [PATCH 5/7] x86/cpu: Move early cpu initialization into a separate translation unit

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> __pgtable_l5_enabled shouldn't be needed after system has booted, we can
> mark it as __initdata, but it requires preparation.
> 
> This patch moves early cpu initialization into a separate translation
> unit. This limits effect of USE_EARLY_PGTABLE_L5 to less code.
> 
> Without the change cpu_init() uses __pgtable_l5_enabled. cpu_init() is
> not __init function and it leads to section mismatch.
> 
> Signed-off-by: Kirill A. Shutemov 

This makes a lot of sense independent of 5level changes.

Reviewed-by: Thomas Gleixner 


[PATCH 1/2] rtc: mxc_v2: fix possible race condition

2018-05-19 Thread Alexandre Belloni
The IRQ is requested before the struct rtc is allocated and registered, but
this struct is used in the IRQ handler. This may lead to a NULL pointer
dereference.

Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
before requesting the IRQ.

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-mxc_v2.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/rtc/rtc-mxc_v2.c b/drivers/rtc/rtc-mxc_v2.c
index 9e14efb990b2..4cc121a41fe0 100644
--- a/drivers/rtc/rtc-mxc_v2.c
+++ b/drivers/rtc/rtc-mxc_v2.c
@@ -343,6 +343,12 @@ static int mxc_rtc_probe(struct platform_device *pdev)
return ret;
}
 
+   pdata->rtc = devm_rtc_allocate_device(&pdev->dev);
+   if (IS_ERR(pdata->rtc))
+   return PTR_ERR(pdata->rtc);
+
+   pdata->rtc->ops = &mxc_rtc_ops;
+
clk_disable(pdata->clk);
platform_set_drvdata(pdev, pdata);
ret =
@@ -354,15 +360,11 @@ static int mxc_rtc_probe(struct platform_device *pdev)
return ret;
}
 
-   pdata->rtc =
-   devm_rtc_device_register(&pdev->dev, pdev->name, &mxc_rtc_ops,
-THIS_MODULE);
-   if (IS_ERR(pdata->rtc)) {
+   ret = rtc_register_device(pdata->rtc);
+   if (ret < 0)
clk_unprepare(pdata->clk);
-   return PTR_ERR(pdata->rtc);
-   }
 
-   return 0;
+   return ret;
 }
 
 static int mxc_rtc_remove(struct platform_device *pdev)
-- 
2.17.0



Re: [PATCH 7/7] x86/mm: Mark __pgtable_l5_enabled __initdata

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> __pgtable_l5_enabled shouldn't be needed after system has booted.
> All preparation is done. We can now mark it as __initdata.
> 
> Signed-off-by: Kirill A. Shutemov 

Reviewed-by: Thomas Gleixner 


Re: [PATCH 6/7] x86/mm: Mark p4d_offset() __always_inline

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> __pgtable_l5_enabled shouldn't be needed after system has booted, we can
> mark it as __initdata, but it requires preparation.
> 
> KASAN initialization code is a user of USE_EARLY_PGTABLE_L5, so all
> pgtable_l5_enabled() translated to __pgtable_l5_enabled there, including
> the one in p4d_offset().
> 
> It may lead to section mismatch, if a compiler would not inline
> p4d_offset(), but leave it as a standalone function: p4d_offset() is not
> marked as __init.
> 
> Marking p4d_offset() as __always_inline fixes the issue.
> 
> Signed-off-by: Kirill A. Shutemov 

Reviewed-by: Thomas Gleixner 


Re: [PATCHv5 0/7] 5-level paging changes for v4.18

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Kirill A. Shutemov wrote:

> Here's several patches that I would like to queue for v4.18. Please review
> and consider applying.
> 
> In this version I've addressed Thomas' feedback.
> 
> Changing __pgtable_l5_enabled to __initdata is not as trivial as I hoped.
> It requires few tricks to avoid section mismatch. I'm not sure if it worth
> the gain. We can keep it __ro_after_init.
> 
> If you feel it's too invasive, just drop last three patches.

Well done. Thanks for cleaning it up.

Thanks,

tglx


[PATCH] rtc: mxc_v2: use rtc_time64_to_tm in mxc_rtc_read_alarm

2018-05-19 Thread Alexandre Belloni
Use the 64-bit version of rtc_time_to_tm in mxc_rtc_read_alarm

Signed-off-by: Alexandre Belloni 
---
 drivers/rtc/rtc-mxc_v2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/rtc/rtc-mxc_v2.c b/drivers/rtc/rtc-mxc_v2.c
index 24ca74ca632a..c75f26dc8fcc 100644
--- a/drivers/rtc/rtc-mxc_v2.c
+++ b/drivers/rtc/rtc-mxc_v2.c
@@ -193,7 +193,7 @@ static int mxc_rtc_read_alarm(struct device *dev, struct 
rtc_wkalrm *alrm)
if (ret)
return ret;
 
-   rtc_time_to_tm(readl(ioaddr + SRTC_LPSAR), &alrm->time);
+   rtc_time64_to_tm(readl(ioaddr + SRTC_LPSAR), &alrm->time);
alrm->pending = !!(readl(ioaddr + SRTC_LPSR) & SRTC_LPSR_ALP);
return mxc_rtc_unlock(pdata);
 }
-- 
2.17.0



Re: [PATCH 4/5] pinctrl: actions: Add gpio support for Actions S900 SoC

2018-05-19 Thread Christian Lamparter
On Friday, May 18, 2018 4:30:55 AM CEST Manivannan Sadhasivam wrote:
> Add gpio support to pinctrl driver for Actions Semi S900 SoC.
> 
> Signed-off-by: Manivannan Sadhasivam 
> ---
> [...]
> +static int owl_gpio_init(struct owl_pinctrl *pctrl)
> +{
> + struct gpio_chip *chip;
> + int ret;
> +
> + chip = &pctrl->chip;
> + chip->base = -1;
> + chip->ngpio = pctrl->soc->ngpios;
> + chip->label = dev_name(pctrl->dev);
> + chip->parent = pctrl->dev;
> + chip->owner = THIS_MODULE;
> + chip->of_node = pctrl->dev->of_node;
> +
> + ret = gpiochip_add_data(&pctrl->chip, pctrl);
> + if (ret) {
> + dev_err(pctrl->dev, "failed to register gpiochip\n");
> + return ret;
> + }
> +
> + ret = gpiochip_add_pin_range(&pctrl->chip, dev_name(pctrl->dev),
> + 0, 0, chip->ngpio);
> + if (ret) {
> + dev_err(pctrl->dev, "failed to add pin range\n");
> + gpiochip_remove(&pctrl->chip);
> + return ret;
> + }
> +
gpiochip_add_pin_range()? That's not going to work with gpio-hogs. 

But, you can easily test this. Just add a gpio-hog [0] 
( Section 2. gpio-controller nodes) into the Devicetree's
pinctrl node.

something like: (No idea if GPIO1 is already used, but any free
gpio will do)
|   [...]
|   pinctrl@e01b {
|   compatible = "actions,s900-pinctrl";
|   reg = <0x0 0xe01b 0x0 0x1000>;
|   clocks = <&cmu CLK_GPIO>;
|   gpio-controller;
|   #gpio-cells = <2>;
|
|   line_b {
|   gpio-hog;
|   gpios = <1 GPIO_ACTIVE_HIGH>;
|   output-low;
|   line-name = "foo-bar-gpio";
|   };
|   };

The pinctrl probe will fail. You can fix this by
replacing the gpiochip_add_pin_range() and use
the gpio-ranges [0] property to define the range.

[0] 





Re: [PATCHv2] drm/i2c: tda998x: Remove VLA usage

2018-05-19 Thread Russell King - ARM Linux
On Fri, May 18, 2018 at 11:01:55AM -0700, Kees Cook wrote:
> On Tue, Apr 10, 2018 at 6:03 PM, Laura Abbott  wrote:
> > There's an ongoing effort to remove VLAs[1] from the kernel to eventually
> > turn on -Wvla. The vla in reg_write_range is based on the length of data
> > passed. The one use of a non-constant size for this range is bounded by
> > the size buffer passed to hdmi_infoframe_pack which is a fixed size.
> > Switch to this upper bound.
> >
> > [1] https://lkml.org/lkml/2018/3/7/621
> >
> > Signed-off-by: Laura Abbott 
> 
> Reviewed-by: Kees Cook 
> 
> Same question for this patch: who's best to take this?

I had decided that I'm not taking any tda998x stuff until we get the
CEC support merged upstream, as that has been hanging around for ages.
Progress has been slow on that, but it finally got to the point where
everyone was happy with it, and I sent a pull request to David Airlie
on April 24th for it.

Unfortunately, that pull request has not been actioned to date.  I've
sent a chaser, and last night, I checked with David Airlie on IRC.
It seems David is not aware of my pull request.  David says he'll look
into this on Monday.

Until David does take it, I can't add anything further to my git tree
for tda998x development, as that would change what was sent to David
back in April.

The alternative would be for drm-misc to take it - I don't think it
will conflict with anything I've already asked David to take, so that
should be a safe route for _this_ patch.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH v3 4/4] arm64: dts: Add Mediatek SoC MT8183 and evaluation board dts and Makefile

2018-05-19 Thread kbuild test robot
Hi Ben,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on robh/for-next]
[also build test ERROR on v4.17-rc5 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Erin-Lo/Add-basic-support-for-Mediatek-MT8183-SoC/20180519-160349
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
config: arm64-alldefconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> Error: arch/arm64/boot/dts/mediatek/mt8183.dtsi:137.9-10 syntax error
   FATAL ERROR: Unable to parse input tree

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 4/5] pinctrl: actions: Add gpio support for Actions S900 SoC

2018-05-19 Thread Manivannan Sadhasivam
Hi Christian,

On Sat, May 19, 2018 at 11:18:53AM +0200, Christian Lamparter wrote:
> On Friday, May 18, 2018 4:30:55 AM CEST Manivannan Sadhasivam wrote:
> > Add gpio support to pinctrl driver for Actions Semi S900 SoC.
> > 
> > Signed-off-by: Manivannan Sadhasivam 
> > ---
> > [...]
> > +static int owl_gpio_init(struct owl_pinctrl *pctrl)
> > +{
> > +   struct gpio_chip *chip;
> > +   int ret;
> > +
> > +   chip = &pctrl->chip;
> > +   chip->base = -1;
> > +   chip->ngpio = pctrl->soc->ngpios;
> > +   chip->label = dev_name(pctrl->dev);
> > +   chip->parent = pctrl->dev;
> > +   chip->owner = THIS_MODULE;
> > +   chip->of_node = pctrl->dev->of_node;
> > +
> > +   ret = gpiochip_add_data(&pctrl->chip, pctrl);
> > +   if (ret) {
> > +   dev_err(pctrl->dev, "failed to register gpiochip\n");
> > +   return ret;
> > +   }
> > +
> > +   ret = gpiochip_add_pin_range(&pctrl->chip, dev_name(pctrl->dev),
> > +   0, 0, chip->ngpio);
> > +   if (ret) {
> > +   dev_err(pctrl->dev, "failed to add pin range\n");
> > +   gpiochip_remove(&pctrl->chip);
> > +   return ret;
> > +   }
> > +
> gpiochip_add_pin_range()? That's not going to work with gpio-hogs. 
> 

Hmmm. Just looked into the gpio-hog mechanism and the patch you have
implemented for MSM driver. I agree with you on replacing
gpiochip_add_pin_range() with gpio-ranges property. But I'm curious
whether we should document it somewhere or not (probably in [1]).

Anyway I will send the v2 incorporating your suggestion.

Thanks,
Mani

[1] Documentation/devicetree/bindings/gpio/gpio.txt

> But, you can easily test this. Just add a gpio-hog [0] 
> ( Section 2. gpio-controller nodes) into the Devicetree's
> pinctrl node.
> 
> something like: (No idea if GPIO1 is already used, but any free
> gpio will do)
> | [...]
> | pinctrl@e01b {
> | compatible = "actions,s900-pinctrl";
> | reg = <0x0 0xe01b 0x0 0x1000>;
> | clocks = <&cmu CLK_GPIO>;
> | gpio-controller;
> | #gpio-cells = <2>;
> |
> | line_b {
> | gpio-hog;
> | gpios = <1 GPIO_ACTIVE_HIGH>;
> | output-low;
> | line-name = "foo-bar-gpio";
> | };
> | };
> 
> The pinctrl probe will fail. You can fix this by
> replacing the gpiochip_add_pin_range() and use
> the gpio-ranges [0] property to define the range.
> 
> [0] 
> 
> 
> 
> 


Re: [PATCH] net: sched: don't disable bh when accessing action idr

2018-05-19 Thread Vlad Buslov

On Sat 19 May 2018 at 02:59, Cong Wang  wrote:
> On Fri, May 18, 2018 at 8:45 AM, Vlad Buslov  wrote:
>> Underlying implementation of action map has changed and doesn't require
>> disabling bh anymore. Replace all action idr spinlock usage with regular
>> calls that do not disable bh.
>
> Please explain explicitly why it is not required, don't let people
> dig, this would save everyone's time.

Underlying implementation of actions lookup has changed from hashtable
to idr. Every current action implementation just calls act_api lookup
function instead of implementing its own lookup. I asked author of idr
change if there is a reason to continue to use _bh versions and he
replied that he just left them as-is.

>
> Also, this should be targeted for net-next, right?

Right.

>
> Thanks.



[tip:x86/urgent] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-19 Thread tip-bot for Dmitry Safonov
Commit-ID:  5f7633ba75acd80dd1dd4ef408bc6d98f9e2b194
Gitweb: https://git.kernel.org/tip/5f7633ba75acd80dd1dd4ef408bc6d98f9e2b194
Author: Dmitry Safonov 
AuthorDate: Fri, 18 May 2018 00:35:10 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 12:22:24 +0200

x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32child->thread.status & TS_COMPAT
   x32child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64!ia32 && !x32

__set_personality_x32() was dropping TS_COMPAT flag, but
set_personality_64bit() has kept compat syscall flag making
in_compat_syscall() return true during the first exec() syscall.

Which in result has user-visible effects, mentioned by Alexey:
1) It breaks ASAN
$ gcc -fsanitize=address wrap.c -o wrap-asan
$ ./wrap32 ./wrap-asan true
==1217==Shadow memory range interleaves with an existing memory mapping. ASan 
cannot proceed correctly. ABORTING.
==1217==ASan shadow was supposed to be located in the 
[0x7fff7000-0x10007fff7fff] range.
==1217==Process memory map follows:
0x0040-0x00401000   
/home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
0x0060-0x00601000   
/home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
0x00601000-0x00602000   
/home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
0xf7dbd000-0xf7de2000   /lib64/ld-2.27.so
0xf7fe2000-0xf7fe3000   /lib64/ld-2.27.so
0xf7fe3000-0xf7fe4000   /lib64/ld-2.27.so
0xf7fe4000-0xf7fe5000
0x7fed9abff000-0x7fed9af54000
0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
[snip]

2) It doesn't seem to be great for security if an attacker always knows
that ld.so is going to be mapped into the first 4GB in this case
(the same thing happens for PIEs as well).

The testcase:
$ cat wrap.c

int main(int argc, char *argv[]) {
  execvp(argv[1], &argv[1]);
  return 127;
}

$ gcc wrap.c -o wrap
$ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
AT_BASE: 0x7f63b8309000
AT_BASE: 0x7faec143c000
AT_BASE: 0x7fbdb25fa000

$ gcc -m32 wrap.c -o wrap32
$ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
AT_BASE: 0xf7eff000
AT_BASE: 0xf7cee000
AT_BASE: 0x7f8b9774e000

Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Fixes: ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec")
Reported-by: Alexey Izbyshev 
Bisected-by: Alexander Monakov 
Investigated-by: Andy Lutomirski 
Signed-off-by: Dmitry Safonov 
Signed-off-by: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Alexander Monakov 
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Cc: sta...@vger.kernel.org
Cc: linux...@kvack.org
Cc: Andy Lutomirski 
Cc: "H. Peter Anvin" 
Cc: Cyrill Gorcunov 
Cc: "Kirill A. Shutemov" 
Link: https://lkml.kernel.org/r/20180517233510.24996-1-d...@arista.com

---
 arch/x86/kernel/process_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4b100fe0f508..12bb445fb98d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -542,6 +542,7 @@ void set_personality_64bit(void)
clear_thread_flag(TIF_X32);
/* Pretend that this comes from a 64bit execve */
task_pt_regs(current)->orig_ax = __NR_execve;
+   current_thread_info()->status &= ~TS_COMPAT;
 
/* Ensure the corresponding mm is not marked. */
if (current->mm)


[tip:x86/urgent] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-19 Thread tip-bot for Dmitry Safonov
Commit-ID:  acf46020012ccbca1172e9c7aeab399c950d9212
Gitweb: https://git.kernel.org/tip/acf46020012ccbca1172e9c7aeab399c950d9212
Author: Dmitry Safonov 
AuthorDate: Fri, 18 May 2018 00:35:10 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 12:31:05 +0200

x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32child->thread.status & TS_COMPAT
   x32child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64!ia32 && !x32

__set_personality_x32() was dropping TS_COMPAT flag, but
set_personality_64bit() has kept compat syscall flag making
in_compat_syscall() return true during the first exec() syscall.

Which in result has user-visible effects, mentioned by Alexey:
1) It breaks ASAN
$ gcc -fsanitize=address wrap.c -o wrap-asan
$ ./wrap32 ./wrap-asan true
==1217==Shadow memory range interleaves with an existing memory mapping. ASan 
cannot proceed correctly. ABORTING.
==1217==ASan shadow was supposed to be located in the 
[0x7fff7000-0x10007fff7fff] range.
==1217==Process memory map follows:
0x0040-0x00401000   
/home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
0x0060-0x00601000   
/home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
0x00601000-0x00602000   
/home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
0xf7dbd000-0xf7de2000   /lib64/ld-2.27.so
0xf7fe2000-0xf7fe3000   /lib64/ld-2.27.so
0xf7fe3000-0xf7fe4000   /lib64/ld-2.27.so
0xf7fe4000-0xf7fe5000
0x7fed9abff000-0x7fed9af54000
0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
[snip]

2) It doesn't seem to be great for security if an attacker always knows
that ld.so is going to be mapped into the first 4GB in this case
(the same thing happens for PIEs as well).

The testcase:
$ cat wrap.c

int main(int argc, char *argv[]) {
  execvp(argv[1], &argv[1]);
  return 127;
}

$ gcc wrap.c -o wrap
$ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
AT_BASE: 0x7f63b8309000
AT_BASE: 0x7faec143c000
AT_BASE: 0x7fbdb25fa000

$ gcc -m32 wrap.c -o wrap32
$ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
AT_BASE: 0xf7eff000
AT_BASE: 0xf7cee000
AT_BASE: 0x7f8b9774e000

Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Fixes: ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec")
Reported-by: Alexey Izbyshev 
Bisected-by: Alexander Monakov 
Investigated-by: Andy Lutomirski 
Signed-off-by: Dmitry Safonov 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Cyrill Gorcunov 
Cc: Borislav Petkov 
Cc: Alexander Monakov 
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Cc: sta...@vger.kernel.org
Cc: linux...@kvack.org
Cc: Andy Lutomirski 
Cc: "H. Peter Anvin" 
Cc: Cyrill Gorcunov 
Cc: "Kirill A. Shutemov" 
Link: https://lkml.kernel.org/r/20180517233510.24996-1-d...@arista.com
---
 arch/x86/kernel/process_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4b100fe0f508..12bb445fb98d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -542,6 +542,7 @@ void set_personality_64bit(void)
clear_thread_flag(TIF_X32);
/* Pretend that this comes from a 64bit execve */
task_pt_regs(current)->orig_ax = __NR_execve;
+   current_thread_info()->status &= ~TS_COMPAT;
 
/* Ensure the corresponding mm is not marked. */
if (current->mm)


Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-19 Thread Marc Zyngier
On Fri, 18 May 2018 19:31:50 +0100,
Nick Desaulniers wrote:
> 
> On Fri, May 18, 2018 at 11:13 AM Marc Zyngier  wrote:
> > What I'd really like is to apply that patch knowing that:
> 
> > - you have checked that with a released version of the compiler, you
> > don't observe any absolute address in any of the objects that are going
> > to be executed at EL2 on a mainline kernel,
> 
> To verify, we should disassemble objects from arch/arm64/kvm/hyp/*.o and
> make sure we don't see absolute addresses?  I can work with Sami to get a
> sense of what the before and after of this patch looks like in disassembly,
> then verify those changes are pervasive.

That seems sensible. You definitely want to look for things stored in
constant pools and subsequently used as an address. Also, you may have
to look at the .hyp.text section of the vmlinux binary, rather than
the individual *.o files, as the linker will likely rewrite things
(the compiler doesn't know about the kernel link address).

> > - you have successfully run guests with a mainline kernel,
> 
> I believe Andrey has already done this.  If he can verify (maybe
> during working hours next week), then maybe we can add his Tested-by
> to this patches commit message?

That would definitely be the right thing to do. Make sure you (or
Andrey tests with the latest released mainline kernel (4.16 for now)
or (even better) the tip of Linus' tree.

> > - it works for a reasonable set of common kernel configurations
> > (defconfig and some of the most useful debug options),
> 
> It's easy for us to test our kernel configs for Android, ChromeOS,
> and defconfig.  I'd be curious to know the shortlist of "most useful
> debug options" just to be a better kernel developer, personally.

Activate the various sanitizers, and all the tracing options, for a
start. They are the most likely to do weird things...

> > - I can reproduce your findings with the same released compiler.
> 
> Lets wait for Andrey to confirm his test setup.  On the Android side, I
> think you should be able to get by with a released version, but I'd be
> curious to hear from Andrey.

Android has all kind of additional patches, and I'm solely concerned
with mainline. If it is what Andrey runs, that's great.

> > Is that the case? I don't think any of the above is completely outlandish.
> 
> These are all reasonable. Thanks for the feedback.

Cheers,

M.

-- 
Jazz is not dead, it just smell funny.


Re: [PATCH] irqchip: dw-apb-ictl: switch to SPDX license identifier

2018-05-19 Thread Thomas Gleixner
On Wed, 16 May 2018, Jisheng Zhang wrote:

> Use the appropriate SPDX license identifier and drop the previous
> license text.

CC+: Sebastian.

> Signed-off-by: Jisheng Zhang 
> ---
>  drivers/irqchip/irq-dw-apb-ictl.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-dw-apb-ictl.c 
> b/drivers/irqchip/irq-dw-apb-ictl.c
> index 0a19618ce2c8..fdc5c7d19651 100644
> --- a/drivers/irqchip/irq-dw-apb-ictl.c
> +++ b/drivers/irqchip/irq-dw-apb-ictl.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0
>  /*
>   * Synopsys DW APB ICTL irqchip driver.
>   *
> @@ -5,10 +6,6 @@
>   *
>   * based on GPL'ed 2.6 kernel sources
>   *  (c) Marvell International Ltd.
> - *
> - * This file is licensed under the terms of the GNU General Public
> - * License version 2.  This program is licensed "as is" without any
> - * warranty of any kind, whether express or implied.
>   */
>  
>  #include 
> -- 
> 2.17.0
> 
> 


[PATCH 04/18] tools lib api fs tracing_path: Introduce get/put_events_file() helpers

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

To make reading events files a tad more compact than with
get_tracing_files("events/foo").

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-do6xgtwpmfl8zjs1euxsd...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c| 15 +++
 tools/lib/api/fs/tracing_path.h|  5 +
 tools/perf/util/trace-event-info.c | 11 +--
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 6f5fe942eff4..9cd282425929 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -86,6 +86,21 @@ void put_tracing_file(char *file)
free(file);
 }
 
+char *get_events_file(const char *name)
+{
+   char *file;
+
+   if (asprintf(&file, "%s/events/%s", tracing_path_mount(), name) < 0)
+   return NULL;
+
+   return file;
+}
+
+void put_events_file(char *file)
+{
+   free(file);
+}
+
 int tracing_path__strerror_open_tp(int err, char *buf, size_t size,
   const char *sys, const char *name)
 {
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 1b65decedfc0..3b32fb439f12 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -12,5 +12,10 @@ const char *tracing_path_mount(void);
 char *get_tracing_file(const char *name);
 void put_tracing_file(char *file);
 
+char *get_events_file(const char *name);
+void put_events_file(char *file);
+
+#define zput_events_file(ptr) ({ free(*ptr); *ptr = NULL; })
+
 int tracing_path__strerror_open_tp(int err, char *buf, size_t size, const char 
*sys, const char *name);
 #endif /* __API_FS_TRACING_PATH_H */
diff --git a/tools/perf/util/trace-event-info.c 
b/tools/perf/util/trace-event-info.c
index d7f2113462fb..c85d0d1a65ed 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -103,11 +103,10 @@ static int record_file(const char *file, ssize_t hdr_sz)
 
 static int record_header_files(void)
 {
-   char *path;
+   char *path = get_events_file("header_page");
struct stat st;
int err = -EIO;
 
-   path = get_tracing_file("events/header_page");
if (!path) {
pr_debug("can't get tracing/events/header_page");
return -ENOMEM;
@@ -128,9 +127,9 @@ static int record_header_files(void)
goto out;
}
 
-   put_tracing_file(path);
+   put_events_file(path);
 
-   path = get_tracing_file("events/header_event");
+   path = get_events_file("header_event");
if (!path) {
pr_debug("can't get tracing/events/header_event");
err = -ENOMEM;
@@ -154,7 +153,7 @@ static int record_header_files(void)
 
err = 0;
 out:
-   put_tracing_file(path);
+   put_events_file(path);
return err;
 }
 
@@ -243,7 +242,7 @@ static int record_ftrace_files(struct tracepoint_path *tps)
char *path;
int ret;
 
-   path = get_tracing_file("events/ftrace");
+   path = get_events_file("ftrace");
if (!path) {
pr_debug("can't get tracing/events/ftrace");
return -ENOMEM;
-- 
2.14.3



[PATCH 07/18] tools lib api fs tracing_path: Introduce opendir() method

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

That takes care of using the right call to get the tracing_path
directory, the one that will end up calling tracing_path_set() to figure
out where tracefs is mounted.

One more step in doing just lazy reading of system structures to reduce
the number of operations done unconditionaly at 'perf' start.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-42zzi0f274909bg9mxzl8...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 13 +
 tools/lib/api/fs/tracing_path.h |  3 +++
 tools/perf/tests/parse-events.c |  2 +-
 tools/perf/util/parse-events.c  |  8 
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 9cd282425929..9b451af0721c 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -101,6 +101,19 @@ void put_events_file(char *file)
free(file);
 }
 
+DIR *tracing_events__opendir(void)
+{
+   DIR *dir = NULL;
+   char *path = get_tracing_file("events");
+
+   if (path) {
+   dir = opendir(path);
+   put_events_file(path);
+   }
+
+   return dir;
+}
+
 int tracing_path__strerror_open_tp(int err, char *buf, size_t size,
   const char *sys, const char *name)
 {
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 3b32fb439f12..904d085b2ae7 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -3,9 +3,12 @@
 #define __API_FS_TRACING_PATH_H
 
 #include 
+#include 
 
 extern char tracing_events_path[];
 
+DIR *tracing_events__opendir(void);
+
 void tracing_path_set(const char *mountpoint);
 const char *tracing_path_mount(void);
 
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 6d57d7082637..b9ebe15afb13 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1323,7 +1323,7 @@ static int count_tracepoints(void)
DIR *events_dir;
int cnt = 0;
 
-   events_dir = opendir(tracing_events_path);
+   events_dir = tracing_events__opendir();
 
TEST_ASSERT_VAL("Can't open events dir", events_dir);
 
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 24668300b327..15eec49e71a1 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -191,7 +191,7 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
char evt_path[MAXPATHLEN];
char *dir_path;
 
-   sys_dir = opendir(tracing_events_path);
+   sys_dir = tracing_events__opendir();
if (!sys_dir)
return NULL;
 
@@ -578,7 +578,7 @@ static int add_tracepoint_multi_sys(struct list_head *list, 
int *idx,
DIR *events_dir;
int ret = 0;
 
-   events_dir = opendir(tracing_events_path);
+   events_dir = tracing_events__opendir();
if (!events_dir) {
tracepoint_error(err, errno, sys_name, evt_name);
return -1;
@@ -2106,7 +2106,7 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob,
bool evt_num_known = false;
 
 restart:
-   sys_dir = opendir(tracing_events_path);
+   sys_dir = tracing_events__opendir();
if (!sys_dir)
return;
 
@@ -2200,7 +2200,7 @@ int is_valid_tracepoint(const char *event_string)
char evt_path[MAXPATHLEN];
char *dir_path;
 
-   sys_dir = opendir(tracing_events_path);
+   sys_dir = tracing_events__opendir();
if (!sys_dir)
return 0;
 
-- 
2.14.3



[PATCH 13/18] perf script: Show symbol offsets by default

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Sandipan Das 

Since the ip shown for a symbol is now always a virtual address, it
becomes difficult to correlate this with objdump output and determine
the exact instruction address. So, we always show the offset from the
start of the symbol.

This can be verified on a powerpc64le system running Fedora 27 as
follows:

  # perf probe -a sys_write
  # perf record -e probe:sys_write -g ~/test

Before applying this patch:

  # perf script

  test  9710 [013] 95614.332431: probe:sys_write: (c04025b0)
  c04025b0 sys_write (/lib/modules/4.17.0-rc4+/build/vmlinux)
  c000b9e0 system_call (/lib/modules/4.17.0-rc4+/build/vmlinux)
  7fffb70d8234 __GI___libc_write (/usr/lib64/libc-2.26.so)
  7fffb7052c74 _IO_file_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
  5afc1818 [unknown] ([unknown])
  7fffb7051a60 new_do_write (/usr/lib64/libc-2.26.so)
  7fffb7054638 _IO_do_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
  7fffb7054bbc _IO_file_overflow@@GLIBC_2.17 
(/usr/lib64/libc-2.26.so)
  7fffb7055a24 __overflow (/usr/lib64/libc-2.26.so)
  7fffb7044548 _IO_puts (/usr/lib64/libc-2.26.so)
  1440 main (/home/sandipan/test)
  7fffb6fe36a0 generic_start_main.isra.0 (/usr/lib64/libc-2.26.so)
  7fffb6fe3898 __libc_start_main (/usr/lib64/libc-2.26.so)
 0 [unknown] ([unknown])
  ...

After applying this patch:

  # perf script

  test  9710 [013] 95614.332431: probe:sys_write: (c04025b0)
  c04025b0 sys_write+0x10 
(/lib/modules/4.17.0-rc4+/build/vmlinux)
  c000b9e0 system_call+0x58 
(/lib/modules/4.17.0-rc4+/build/vmlinux)
  7fffb70d8234 __GI___libc_write+0x24 (/usr/lib64/libc-2.26.so)
  7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44 
(/usr/lib64/libc-2.26.so)
  5afc1818 [unknown] ([unknown])
  7fffb7051a60 new_do_write+0x90 (/usr/lib64/libc-2.26.so)
  7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38 
(/usr/lib64/libc-2.26.so)
  7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c 
(/usr/lib64/libc-2.26.so)
  7fffb7055a24 __overflow+0x64 (/usr/lib64/libc-2.26.so)
  7fffb7044548 _IO_puts+0x218 (/usr/lib64/libc-2.26.so)
  1440 main+0x20 (/home/sandipan/test)
  7fffb6fe36a0 generic_start_main.isra.0+0x140 
(/usr/lib64/libc-2.26.so)
  7fffb6fe3898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
 0 [unknown] ([unknown])
  ...

Signed-off-by: Sandipan Das 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Naveen N. Rao 
Cc: Ravi Bangoria 
Link: http://lkml.kernel.org/r/20180517063326.6319-2-sandi...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c| 26 --
 .../tests/shell/record+probe_libc_inet_pton.sh | 12 +-
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index fa2c7a288750..cefc8813e91e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -153,8 +153,8 @@ static struct {
.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
  PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
  PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
- PERF_OUTPUT_SYM | PERF_OUTPUT_DSO |
- PERF_OUTPUT_PERIOD,
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
 
.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
},
@@ -165,8 +165,9 @@ static struct {
.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
  PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
  PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
- PERF_OUTPUT_SYM | PERF_OUTPUT_DSO |
- PERF_OUTPUT_PERIOD | PERF_OUTPUT_BPF_OUTPUT,
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD |
+ PERF_OUTPUT_BPF_OUTPUT,
 
.invalid_fields = PERF_OUTPUT_TRACE,
},
@@ -185,10 +186,10 @@ static struct {
.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
  PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
  PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
- PERF_OUTPUT_SYM | PERF_OUTPUT_DSO |
- PERF_OUTPUT_PERIOD |  PERF_OUTPUT_ADDR |
- PERF_OUTPUT_DATA_SRC | PERF_OUTPUT_WEIGHT |
- PERF_OUTPUT_PHYS_ADDR,
+  

[GIT PULL 00/18] perf/core improvements and fixes

2018-05-19 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 5aafae8d097e2161ee5c6a12ad532100f8885d2b:

  Merge tag 'perf-core-for-mingo-4.18-20180516' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
(2018-05-16 17:56:43 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-core-for-mingo-4.18-20180519

for you to fetch changes up to 19422a9f2a3be7f3a046285ffae4cbb571aa853a:

  perf tools: Fix kernel_start for PTI on x86 (2018-05-19 06:42:51 -0300)


perf/core improvements and fixes:

- Record min/max LBR cycles (>= skylake) and add 'perf annotate' TUI
  hotkey to show it (c) (Jin Yao)

- Fix machine->kernel_start for PTI on x86 (Adrian Hunter)

- Make machine->env->arch always available, e.g. in 'perf top', not
  just when reading that info from perf.data files (Adrian Hunter)

- Reduce the number of files read at 'perf' start, leaving information such as
  cacheline size, tracefs mount point determination, max_stack, etc, to be
  lazily read as tools needs then (Arnaldo Carvalho de Melo)

- Fixup BPF include and examples install messages (Arnaldo Carvalho de Melo)

- Fixup callchain addresses and symbol offsets in 'perf script', to help
  correlating with objdump output (Sandipan Das)

Signed-off-by: Arnaldo Carvalho de Melo 


Adrian Hunter (2):
  perf machine: Add machine__is() to identify machine arch
  perf tools: Fix kernel_start for PTI on x86

Arnaldo Carvalho de Melo (12):
  perf config: Call perf_config__init() lazily
  tools lib api: The tracing_mnt variable doesn't need to be global
  tools lib api: Unexport 'tracing_path' variable
  tools lib api fs tracing_path: Introduce get/put_events_file() helpers
  perf tools: Reuse the path to the tracepoint /events/ directory
  perf parse-events: Use get/put_events_file()
  tools lib api fs tracing_path: Introduce opendir() method
  tools lib api fs tracing_path: Make tracing_events_path private
  tools include compiler-gcc: Add __pure attribute helper
  perf tools: Read the cache line size lazily
  perf tools: No need to unconditionally read the max_stack sysctls
  perf bpf: Fixup include and examples install messages

Jin Yao (2):
  perf annotate: Record the min/max cycles
  perf annotate: Create hotkey 'c' to show min/max cycles

Sandipan Das (2):
  perf script: Show virtual addresses instead of offsets
  perf script: Show symbol offsets by default

 tools/include/linux/compiler-gcc.h |  3 +
 tools/lib/api/fs/tracing_path.c| 40 +---
 tools/lib/api/fs/tracing_path.h|  9 ++-
 tools/perf/Makefile.perf   |  2 +
 tools/perf/builtin-script.c| 26 
 tools/perf/builtin-top.c   |  2 +-
 tools/perf/builtin-trace.c |  2 +-
 tools/perf/perf.c  | 24 +--
 tools/perf/tests/parse-events.c|  9 +--
 .../tests/shell/record+probe_libc_inet_pton.sh | 12 ++--
 tools/perf/ui/browsers/annotate.c  |  8 +++
 tools/perf/util/annotate.c | 51 ---
 tools/perf/util/annotate.h | 11 +++-
 tools/perf/util/config.c   | 16 ++---
 tools/perf/util/config.h   |  1 -
 tools/perf/util/env.c  | 18 ++
 tools/perf/util/env.h  |  2 +
 tools/perf/util/evsel.c|  2 +-
 tools/perf/util/machine.c  | 18 +-
 tools/perf/util/machine.h  |  2 +
 tools/perf/util/parse-events.c | 73 +-
 tools/perf/util/probe-file.c   |  3 +-
 tools/perf/util/sort.c |  4 +-
 tools/perf/util/sort.h |  4 +-
 tools/perf/util/trace-event-info.c | 11 ++--
 tools/perf/util/trace-event.c  |  8 ++-
 tools/perf/util/util.c | 34 +-
 tools/perf/util/util.h |  4 +-
 28 files changed, 279 insertions(+), 120 deletions(-)

Test results:

The first ones are container (docker) based builds of tools/perf with
and without libelf support.  Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.

The objtool and samples/bpf/ builds are disabled now t

[PATCH 16/18] perf bpf: Fixup include and examples install messages

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Before:

  INSTALL  lib
install include/bpf/*.h '/home/acme/lib/include/perf/bpf'
  INSTALL  lib
install examples/bpf/*.c '/home/acme/lib/examples/perf/bpf'

After:

  INSTALL  lib
  INSTALL  include/bpf
  INSTALL  lib
  INSTALL  examples/bpf

Reported-by: Ingo Molnar 
Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Fixes: dd8e4ead6e98 ("perf bpf: Add bpf.h to be used in eBPF proggies")
Fixes: 8f12a2ff00e5 ("perf bpf: Add 'examples' directories")
Link: https://lkml.kernel.org/n/tip-icljqe87e8pak8mu6mkki...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile.perf | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index c63a3971d719..ecc9fc952655 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -770,9 +770,11 @@ endif
 ifndef NO_LIBBPF
$(call QUIET_INSTALL, lib) \
$(INSTALL) -d -m 755 
'$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf'
+   $(call QUIET_INSTALL, include/bpf) \
$(INSTALL) include/bpf/*.h 
'$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf'
$(call QUIET_INSTALL, lib) \
$(INSTALL) -d -m 755 
'$(DESTDIR_SQ)$(perf_examples_instdir_SQ)/bpf'
+   $(call QUIET_INSTALL, examples/bpf) \
$(INSTALL) examples/bpf/*.c 
'$(DESTDIR_SQ)$(perf_examples_instdir_SQ)/bpf'
 endif
$(call QUIET_INSTALL, perf-archive) \
-- 
2.14.3



[PATCH 17/18] perf machine: Add machine__is() to identify machine arch

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Adrian Hunter 

Add a function to identify the machine architecture.

Signed-off-by: Adrian Hunter 
Tested-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Joerg Roedel 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: x...@kernel.org
Link: 
http://lkml.kernel.org/r/1526548928-20790-6-git-send-email-adrian.hun...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/env.c | 18 ++
 tools/perf/util/env.h |  2 ++
 tools/perf/util/machine.c |  9 +
 tools/perf/util/machine.h |  2 ++
 4 files changed, 31 insertions(+)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4c842762e3f2..319fb0a0d05e 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -93,6 +93,24 @@ int perf_env__read_cpu_topology_map(struct perf_env *env)
return 0;
 }
 
+static int perf_env__read_arch(struct perf_env *env)
+{
+   struct utsname uts;
+
+   if (env->arch)
+   return 0;
+
+   if (!uname(&uts))
+   env->arch = strdup(uts.machine);
+
+   return env->arch ? 0 : -ENOMEM;
+}
+
+const char *perf_env__raw_arch(struct perf_env *env)
+{
+   return env && !perf_env__read_arch(env) ? env->arch : "unknown";
+}
+
 void cpu_cache_level__free(struct cpu_cache_level *cache)
 {
free(cache->type);
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c4ef2e523367..62e193948608 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -76,4 +76,6 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
 const char *perf_env__arch(struct perf_env *env);
+const char *perf_env__raw_arch(struct perf_env *env);
+
 #endif /* __PERF_ENV_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7c777cb32806..107bae7676b1 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2296,6 +2296,15 @@ int machine__set_current_tid(struct machine *machine, 
int cpu, pid_t pid,
return 0;
 }
 
+/*
+ * Compares the raw arch string. N.B. see instead perf_env__arch() if a
+ * normalized arch is needed.
+ */
+bool machine__is(struct machine *machine, const char *arch)
+{
+   return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
+}
+
 int machine__get_kernel_start(struct machine *machine)
 {
struct map *map = machine__kernel_map(machine);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 388fb4741c54..b31d33b5aa2a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -188,6 +188,8 @@ static inline bool machine__is_host(struct machine *machine)
return machine ? machine->pid == HOST_KERNEL_ID : false;
 }
 
+bool machine__is(struct machine *machine, const char *arch);
+
 struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, 
pid_t tid);
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, 
pid_t tid);
 
-- 
2.14.3



[PATCH 15/18] perf annotate: Create hotkey 'c' to show min/max cycles

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Jin Yao 

In the 'perf annotate' view, a new hotkey 'c' is created for showing the
min/max cycles.

For example, when press 'c', the annotate view is:

  Percent│ IPC Cycle(min/max)
 │
 │
 │ Disassembly of section .text:
 │
 │ 0003aab0 :
8.22 │3.92   sub$0x18,%rsp
 │3.92   mov$0x1,%esi
 │3.92   xor%eax,%eax
 │3.92   cmpl   
$0x0,argp_program_version_hook@@G
 │3.92 1(2/1)  ↓ je 20
 │   lock   cmpxchg 
%esi,__abort_msg@@GLIBC_P
 │ ↓ jne29
 │ ↓ jmp43
 │1.10 20:   cmpxchg 
%esi,__abort_msg@@GLIBC_PRIVATE+
8.93 │1.10 1(5/1)  ↓ je 43

When press 'c' again, the annotate view is switched back:

  Percent│ IPC Cycle
 │
 │
 │Disassembly of section .text:
 │
 │0003aab0 :
8.22 │3.92  sub$0x18,%rsp
 │3.92  mov$0x1,%esi
 │3.92  xor%eax,%eax
 │3.92  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
 │3.92 1  ↓ je 20
 │  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
 │↓ jne29
 │↓ jmp43
 │1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
8.93 │1.10 1  ↓ je 43

Signed-off-by: Jin Yao 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao@linux.intel.com
[ Rename all maxmin to minmax ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/browsers/annotate.c |  8 
 tools/perf/util/annotate.c| 37 +++--
 tools/perf/util/annotate.h|  7 ++-
 3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 3781d74088a7..8be40fa903aa 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -695,6 +695,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
"O Bump offset level (jump targets -> +call -> all 
-> cycle thru)\n"
"s Toggle source code view\n"
"t Circulate percent, total period, samples view\n"
+   "c Show min/max cycle\n"
"/ Search string\n"
"k Toggle line numbers\n"
"P Print to [symbol_name].annotation file.\n"
@@ -791,6 +792,13 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
notes->options->show_total_period = true;
annotation__update_column_widths(notes);
continue;
+   case 'c':
+   if (notes->options->show_minmax_cycle)
+   notes->options->show_minmax_cycle = false;
+   else
+   notes->options->show_minmax_cycle = true;
+   annotation__update_column_widths(notes);
+   continue;
case K_LEFT:
case K_ESC:
case 'q':
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4fcfefea3bc2..6612c7f90af4 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2498,13 +2498,38 @@ static void __annotation_line__write(struct 
annotation_line *al, struct annotati
else
obj__printf(obj, "%*s ", ANNOTATION__IPC_WIDTH - 1, 
"IPC");
 
-   if (al->cycles)
-   obj__printf(obj, "%*" PRIu64 " ",
+   if (!notes->options->show_minmax_cycle) {
+   if (al->cycles)
+   obj__printf(obj, "%*" PRIu64 " ",
   ANNOTATION__CYCLES_WIDTH - 1, 
al->cycles);
-   else if (!show_title)
-   obj__printf(obj, "%*s", ANNOTATION__CYCLES_WIDTH, " ");
-   else
-   obj__printf(obj, "%*s ", ANNOTATION__CYCLES_WIDTH - 1, 
"Cycle");
+   else if (!show_title)
+   obj__printf(obj, "%*s",
+   ANNOTATION__CYCLES_WIDTH, " ");
+   else
+   obj__printf(obj, "%*s ",
+   ANNOTATION__CYCLES_WIDTH - 1,
+   

[PATCH 18/18] perf tools: Fix kernel_start for PTI on x86

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Adrian Hunter 

Opickn x86_64, PTI entry trampolines are less than the start of kernel text,
but still above 2^63. So leave kernel_start = 1ULL << 63 for x86_64.

Signed-off-by: Adrian Hunter 
Tested-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Joerg Roedel 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: x...@kernel.org
Link: 
http://lkml.kernel.org/r/1526548928-20790-7-git-send-email-adrian.hun...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 107bae7676b1..e011a7160380 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2321,7 +2321,12 @@ int machine__get_kernel_start(struct machine *machine)
machine->kernel_start = 1ULL << 63;
if (map) {
err = map__load(map);
-   if (!err)
+   /*
+* On x86_64, PTI entry trampolines are less than the
+* start of kernel text, but still above 2^63. So leave
+* kernel_start = 1ULL << 63 for x86_64.
+*/
+   if (!err && !machine__is(machine, "x86_64"))
machine->kernel_start = map->start;
}
return err;
-- 
2.14.3



[PATCH 14/18] perf annotate: Record the min/max cycles

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Jin Yao 

Currently perf has a feature to account cycles for LBRs

For example, on skylake:

  perf record -b ...
  perf report or perf annotate

And then browsing the annotate browser gives average cycle counts for
program blocks.

For some analysis it would be useful if we could know not only the
average cycles but also the min and max cycles.

This patch records the min and max cycles.

Signed-off-by: Jin Yao 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1526569118-14217-2-git-send-email-yao@linux.intel.com
[ Switch from max/min to min/max ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 14 +-
 tools/perf/util/annotate.h |  4 
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 5d74a30fe00f..4fcfefea3bc2 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -760,6 +760,15 @@ static int __symbol__account_cycles(struct annotation 
*notes,
ch[offset].num_aggr++;
ch[offset].cycles_aggr += cycles;
 
+   if (cycles > ch[offset].cycles_max)
+   ch[offset].cycles_max = cycles;
+
+   if (ch[offset].cycles_min) {
+   if (cycles && cycles < ch[offset].cycles_min)
+   ch[offset].cycles_min = cycles;
+   } else
+   ch[offset].cycles_min = cycles;
+
if (!have_start && ch[offset].have_start)
return 0;
if (ch[offset].num) {
@@ -953,8 +962,11 @@ void annotation__compute_ipc(struct annotation *notes, 
size_t size)
if (ch->have_start)
annotation__count_and_fill(notes, ch->start, 
offset, ch);
al = notes->offsets[offset];
-   if (al && ch->num_aggr)
+   if (al && ch->num_aggr) {
al->cycles = ch->cycles_aggr / ch->num_aggr;
+   al->cycles_max = ch->cycles_max;
+   al->cycles_min = ch->cycles_min;
+   }
notes->have_cycles = true;
}
}
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index f28a9e43421d..d50363d56f73 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -105,6 +105,8 @@ struct annotation_line {
int  jump_sources;
floatipc;
u64  cycles;
+   u64  cycles_max;
+   u64  cycles_min;
size_t   privsize;
char*path;
u32  idx;
@@ -186,6 +188,8 @@ struct cyc_hist {
u64 start;
u64 cycles;
u64 cycles_aggr;
+   u64 cycles_max;
+   u64 cycles_min;
u32 num;
u32 num_aggr;
u8  have_start;
-- 
2.14.3



[PATCH 09/18] tools include compiler-gcc: Add __pure attribute helper

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Adopt it from the kernel sources, will be used soon.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-oubheiqj8edo5rzewt11c...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/include/linux/compiler-gcc.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/include/linux/compiler-gcc.h 
b/tools/include/linux/compiler-gcc.h
index a3a4427441bf..70fe61295733 100644
--- a/tools/include/linux/compiler-gcc.h
+++ b/tools/include/linux/compiler-gcc.h
@@ -21,6 +21,9 @@
 /* &a[0] degrades to a pointer: a different type from an array */
 #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
 
+#ifndef __pure
+#define  __pure__attribute__((pure))
+#endif
 #define  noinline  __attribute__((noinline))
 #ifndef __packed
 #define __packed   __attribute__((packed))
-- 
2.14.3



[PATCH 11/18] perf tools: No need to unconditionally read the max_stack sysctls

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Let tools that need to have those variables with the sysctl current
values use a function that will read them.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9v...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-top.c   |  2 +-
 tools/perf/builtin-trace.c |  2 +-
 tools/perf/perf.c  |  7 ---
 tools/perf/util/evsel.c|  2 +-
 tools/perf/util/util.c | 13 +
 tools/perf/util/util.h |  2 ++
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 3c061c57afb6..7a349fcd3864 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1264,7 +1264,7 @@ int cmd_top(int argc, const char **argv)
.proc_map_timeout= 500,
.overwrite  = 1,
},
-   .max_stack   = sysctl_perf_event_max_stack,
+   .max_stack   = sysctl__max_stack(),
.sym_pcnt_filter = 5,
.nr_threads_synthesize = UINT_MAX,
};
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index c7effcfc40ed..560aed7da36a 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3162,7 +3162,7 @@ int cmd_trace(int argc, const char **argv)
mmap_pages_user_set = false;
 
if (trace.max_stack == UINT_MAX) {
-   trace.max_stack = input_name ? PERF_MAX_STACK_DEPTH : 
sysctl_perf_event_max_stack;
+   trace.max_stack = input_name ? PERF_MAX_STACK_DEPTH : 
sysctl__max_stack();
max_stack_user_set = false;
}
 
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index cefd8f74630c..51c81509a315 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -426,7 +426,6 @@ int main(int argc, const char **argv)
int err;
const char *cmd;
char sbuf[STRERR_BUFSIZE];
-   int value;
 
/* libsubcmd init */
exec_cmd_init("perf", PREFIX, PERF_EXEC_PATH, EXEC_PATH_ENVIRONMENT);
@@ -435,12 +434,6 @@ int main(int argc, const char **argv)
/* The page_size is placed in util object. */
page_size = sysconf(_SC_PAGE_SIZE);
 
-   if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0)
-   sysctl_perf_event_max_stack = value;
-
-   if (sysctl__read_int("kernel/perf_event_max_contexts_per_stack", 
&value) == 0)
-   sysctl_perf_event_max_contexts_per_stack = value;
-
cmd = extract_argv0_path(argv[0]);
if (!cmd)
cmd = "perf-help";
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 4cd2cf93f726..150db5ed7400 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2862,7 +2862,7 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, 
struct target *target,
return scnprintf(msg, size,
 "Not enough memory to setup event with 
callchain.\n"
 "Hint: Try tweaking 
/proc/sys/kernel/perf_event_max_stack\n"
-"Hint: Current value: %d", 
sysctl_perf_event_max_stack);
+"Hint: Current value: %d", 
sysctl__max_stack());
break;
case ENODEV:
if (target->cpu_list)
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 99ab52165680..eac5b858a371 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -62,6 +62,19 @@ int cacheline_size(void)
 int sysctl_perf_event_max_stack = PERF_MAX_STACK_DEPTH;
 int sysctl_perf_event_max_contexts_per_stack = PERF_MAX_CONTEXTS_PER_STACK;
 
+int sysctl__max_stack(void)
+{
+   int value;
+
+   if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0)
+   sysctl_perf_event_max_stack = value;
+
+   if (sysctl__read_int("kernel/perf_event_max_contexts_per_stack", 
&value) == 0)
+   sysctl_perf_event_max_contexts_per_stack = value;
+
+   return sysctl_perf_event_max_stack;
+}
+
 bool test_attr__enabled;
 
 bool perf_host  = true;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 74d21dfe0d29..dc58254a2b69 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -45,6 +45,8 @@ int hex2u64(const char *ptr, u64 *val);
 extern unsigned int page_size;
 int __pure cacheline_size(void);
 
+int sysctl__max_stack(void);
+
 int fetch_kernel_version(unsigned int *puint,
 char *str, size_t str_sz);
 #define KVER_VERSION(x)(((x) >> 16) & 0xff)
-- 
2.14.3



[PATCH 12/18] perf script: Show virtual addresses instead of offsets

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Sandipan Das 

When perf data is recorded with the call-graph option enabled, the
callchain shown by perf script shows the binary offsets of the symbols
as the ip. This is incorrect for kernel symbols as the ip values are
always off by a fixed offset depending on the architecture. If the
offsets from the start of the symbols are printed, they are also
incorrect for both kernel and userspace symbols.

Without the call-graph option, the callchain shows the virtual addresses
of the symbols rather than their binary offsets. The offsets printed in
this case are also correct.

This fixes the inconsistency in perf script's output.

This can be verified on a powerpc64le system running Fedora 27 as
follows:

  # cat /proc/kallsyms | grep sys_write
  ...
  c04025a0 T sys_write
  c04025a0 T __se_sys_write
  ...

  # perf probe -a sys_write

Before applying this patch:

  # perf record -e probe:sys_write -g ~/test
  # perf script -F ip,sym,symoff

4125b0 sys_write+0x80008010
 1b9e0 system_call+0x80008058
118234 __GI___libc_write+0xf52c0024
 92c74 _IO_file_write@@GLIBC_2.17+0xf52c0044
  5afbfd8a [unknown]
 91a60 new_do_write+0xf52c0090
 94638 _IO_do_write@@GLIBC_2.17+0xf52c0038
 94bbc _IO_file_overflow@@GLIBC_2.17+0xf52c014c
 95a24 __overflow+0xf52c0064
 84548 _IO_puts+0xf52c0218
   440 main+0xe020
 236a0 generic_start_main.isra.0+0xf52c0140
 23898 __libc_start_main+0xf52c00b8
 0 [unknown]
  ...

  # perf record -e probe:sys_write ~/test
  # perf script -F ip,sym,symoff

  c04025b0 sys_write+0x10
  ...

After applying this patch:

  # perf record -e probe:sys_write -g ~/test
  # perf script -F ip,sym,symoff

  c04025b0 sys_write+0x10
  c000b9e0 system_call+0x58
  7fffb70d8234 __GI___libc_write+0x24
  7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44
  5afc1818 [unknown]
  7fffb7051a60 new_do_write+0x90
  7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38
  7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c
  7fffb7055a24 __overflow+0x64
  7fffb7044548 _IO_puts+0x218
  1440 main+0x20
  7fffb6fe36a0 generic_start_main.isra.0+0x140
  7fffb6fe3898 __libc_start_main+0xb8
 0 [unknown]
  ...

  # perf record -e probe:sys_write ~/test
  # perf script -F ip,sym,symoff

  c04025b0 sys_write+0x10
  ...

Signed-off-by: Sandipan Das 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Naveen N. Rao 
Cc: Ravi Bangoria 
Link: http://lkml.kernel.org/r/20180517063326.6319-1-sandi...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 72a351613d85..7c777cb32806 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1764,7 +1764,7 @@ static int add_callchain_ip(struct thread *thread,
}
 
srcline = callchain_srcline(al.map, al.sym, al.addr);
-   return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
+   return callchain_cursor_append(cursor, ip, al.map, al.sym,
   branch, flags, nr_loop_iter,
   iter_cycles, branch_from, srcline);
 }
-- 
2.14.3



[PATCH 10/18] perf tools: Read the cache line size lazily

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

It is not read as commonly as 'page_size', so it makes sense to read it
lazily, caching its value when it is first read.

Less files open unconditionally at startup.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-35xhrq91u94uc1djtclek...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/perf.c  | 11 ---
 tools/perf/util/sort.c |  4 ++--
 tools/perf/util/sort.h |  4 ++--
 tools/perf/util/util.c | 21 -
 tools/perf/util/util.h |  2 +-
 5 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index d5a0878de816..cefd8f74630c 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -421,16 +421,6 @@ void pthread__unblock_sigwinch(void)
pthread_sigmask(SIG_UNBLOCK, &set, NULL);
 }
 
-#ifdef _SC_LEVEL1_DCACHE_LINESIZE
-#define cache_line_size(cacheline_sizep) *cacheline_sizep = 
sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
-#else
-static void cache_line_size(int *cacheline_sizep)
-{
-   if 
(sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", 
cacheline_sizep))
-   pr_debug("cannot determine cache line size");
-}
-#endif
-
 int main(int argc, const char **argv)
 {
int err;
@@ -444,7 +434,6 @@ int main(int argc, const char **argv)
 
/* The page_size is placed in util object. */
page_size = sysconf(_SC_PAGE_SIZE);
-   cache_line_size(&cacheline_size);
 
if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0)
sysctl_perf_event_max_stack = value;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e65903a695a6..4058ade352a5 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2582,7 +2582,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const 
char *tok,
if (sort__mode != SORT_MODE__MEMORY)
return -EINVAL;
 
-   if (sd->entry == &sort_mem_dcacheline && cacheline_size == 0)
+   if (sd->entry == &sort_mem_dcacheline && cacheline_size() == 0)
return -EINVAL;
 
if (sd->entry == &sort_mem_daddr_sym)
@@ -2628,7 +2628,7 @@ static int setup_sort_list(struct perf_hpp_list *list, 
char *str,
if (*tok) {
ret = sort_dimension__add(list, tok, evlist, level);
if (ret == -EINVAL) {
-   if (!cacheline_size && !strncasecmp(tok, 
"dcacheline", strlen(tok)))
+   if (!cacheline_size() && !strncasecmp(tok, 
"dcacheline", strlen(tok)))
pr_err("The \"dcacheline\" --sort key 
needs to know the cacheline size and it couldn't be determined on this system");
else
pr_err("Invalid --sort key: `%s'", tok);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 035b62e2c60b..9e6896293bbd 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -186,13 +186,13 @@ static inline float hist_entry__get_percent_limit(struct 
hist_entry *he)
 static inline u64 cl_address(u64 address)
 {
/* return the cacheline of the address */
-   return (address & ~(cacheline_size - 1));
+   return (address & ~(cacheline_size() - 1));
 }
 
 static inline u64 cl_offset(u64 address)
 {
/* return the cacheline of the address */
-   return (address & (cacheline_size - 1));
+   return (address & (cacheline_size() - 1));
 }
 
 enum sort_mode {
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 1019bbc5dbd8..99ab52165680 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -38,7 +38,26 @@ void perf_set_multithreaded(void)
 }
 
 unsigned int page_size;
-int cacheline_size;
+
+#ifdef _SC_LEVEL1_DCACHE_LINESIZE
+#define cache_line_size(cacheline_sizep) *cacheline_sizep = 
sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
+#else
+static void cache_line_size(int *cacheline_sizep)
+{
+   if 
(sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", 
cacheline_sizep))
+   pr_debug("cannot determine cache line size");
+}
+#endif
+
+int cacheline_size(void)
+{
+   static int size;
+
+   if (!size)
+   cache_line_size(&size);
+
+   return size;
+}
 
 int sysctl_perf_event_max_stack = PERF_MAX_STACK_DEPTH;
 int sysctl_perf_event_max_contexts_per_stack = PERF_MAX_CONTEXTS_PER_STACK;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index c9626c206208..74d21dfe0d29 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -43,7 +43,7 @@ size_t hex_width(u64 v);
 int hex2u64(const char *ptr, u64 *val);
 
 extern unsigned int page_size;
-extern int cacheline_size;
+int __pure cacheline_size(void);
 
 int fetch_kernel_version(unsigned int *puint,
 c

[PATCH 08/18] tools lib api fs tracing_path: Make tracing_events_path private

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Not anymore accessed outside this library, keep it private.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-wg1m07flfrg1rm06jjzie...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 3 +--
 tools/lib/api/fs/tracing_path.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 9b451af0721c..120037496f77 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -15,8 +15,7 @@
 
 static char tracing_mnt[PATH_MAX]  = "/sys/kernel/debug";
 static char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
-char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
-
+static char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
 
 static void __tracing_path_set(const char *tracing, const char *mountpoint)
 {
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 904d085b2ae7..a19136b086dc 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -5,8 +5,6 @@
 #include 
 #include 
 
-extern char tracing_events_path[];
-
 DIR *tracing_events__opendir(void);
 
 void tracing_path_set(const char *mountpoint);
-- 
2.14.3



[PATCH 01/18] perf config: Call perf_config__init() lazily

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

We check what perf_config__init() does at each perf_config() call,
namely if the static perf_config instance was created, so instead of
bailing out in that case, try to allocate it, bailing if it fails.

Now to get the perf_config() call out of the start of perf's main()
function, doing it also lazily.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Taeung Song 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-4bo45k6ivsmbxpfpdte4o...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/perf.c|  1 -
 tools/perf/util/config.c | 16 +---
 tools/perf/util/config.h |  1 -
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 20a08cb32332..cd6ea55d4b0c 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -458,7 +458,6 @@ int main(int argc, const char **argv)
 
srandom(time(NULL));
 
-   perf_config__init();
err = perf_config(perf_default_config, NULL);
if (err)
return err;
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 84eb9393c7db..5ac157056cdf 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -707,6 +707,14 @@ struct perf_config_set *perf_config_set__new(void)
return set;
 }
 
+static int perf_config__init(void)
+{
+   if (config_set == NULL)
+   config_set = perf_config_set__new();
+
+   return config_set == NULL;
+}
+
 int perf_config(config_fn_t fn, void *data)
 {
int ret = 0;
@@ -714,7 +722,7 @@ int perf_config(config_fn_t fn, void *data)
struct perf_config_section *section;
struct perf_config_item *item;
 
-   if (config_set == NULL)
+   if (config_set == NULL && perf_config__init())
return -1;
 
perf_config_set__for_each_entry(config_set, section, item) {
@@ -735,12 +743,6 @@ int perf_config(config_fn_t fn, void *data)
return ret;
 }
 
-void perf_config__init(void)
-{
-   if (config_set == NULL)
-   config_set = perf_config_set__new();
-}
-
 void perf_config__exit(void)
 {
perf_config_set__delete(config_set);
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index baf82bf227ac..bd0a5897c76a 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -38,7 +38,6 @@ struct perf_config_set *perf_config_set__new(void);
 void perf_config_set__delete(struct perf_config_set *set);
 int perf_config_set__collect(struct perf_config_set *set, const char 
*file_name,
 const char *var, const char *value);
-void perf_config__init(void);
 void perf_config__exit(void);
 void perf_config__refresh(void);
 
-- 
2.14.3



[PATCH 05/18] perf tools: Reuse the path to the tracepoint /events/ directory

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

When using for_each_event() we needlessly rebuild the whole path to
the tracepoint directory, reuse the dir_path instead, saving some cycles
and reducing the size of the next patch.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-54bcs15n0cp6gwcgpc4hp...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/parse-events.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 2fc4ee8b86c1..f9d5bbd63484 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -156,13 +156,12 @@ struct event_symbol event_symbols_sw[PERF_COUNT_SW_MAX] = 
{
(strcmp(sys_dirent->d_name, ".")) &&\
(strcmp(sys_dirent->d_name, "..")))
 
-static int tp_event_has_id(struct dirent *sys_dir, struct dirent *evt_dir)
+static int tp_event_has_id(const char *dir_path, struct dirent *evt_dir)
 {
char evt_path[MAXPATHLEN];
int fd;
 
-   snprintf(evt_path, MAXPATHLEN, "%s/%s/%s/id", tracing_events_path,
-   sys_dir->d_name, evt_dir->d_name);
+   snprintf(evt_path, MAXPATHLEN, "%s/%s/id", dir_path, evt_dir->d_name);
fd = open(evt_path, O_RDONLY);
if (fd < 0)
return -EINVAL;
@@ -171,12 +170,12 @@ static int tp_event_has_id(struct dirent *sys_dir, struct 
dirent *evt_dir)
return 0;
 }
 
-#define for_each_event(sys_dirent, evt_dir, evt_dirent)\
+#define for_each_event(dir_path, evt_dir, evt_dirent)  \
while ((evt_dirent = readdir(evt_dir)) != NULL) \
if (evt_dirent->d_type == DT_DIR && \
(strcmp(evt_dirent->d_name, ".")) &&\
(strcmp(evt_dirent->d_name, "..")) &&   \
-   (!tp_event_has_id(sys_dirent, evt_dirent)))
+   (!tp_event_has_id(dir_path, evt_dirent)))
 
 #define MAX_EVENT_LENGTH 512
 
@@ -204,7 +203,7 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
if (!evt_dir)
continue;
 
-   for_each_event(sys_dirent, evt_dir, evt_dirent) {
+   for_each_event(dir_path, evt_dir, evt_dirent) {
 
scnprintf(evt_path, MAXPATHLEN, "%s/%s/id", dir_path,
  evt_dirent->d_name);
@@ -2119,7 +2118,7 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob,
if (!evt_dir)
continue;
 
-   for_each_event(sys_dirent, evt_dir, evt_dirent) {
+   for_each_event(dir_path, evt_dir, evt_dirent) {
if (event_glob != NULL &&
!strglobmatch(evt_dirent->d_name, event_glob))
continue;
@@ -2199,7 +2198,7 @@ int is_valid_tracepoint(const char *event_string)
if (!evt_dir)
continue;
 
-   for_each_event(sys_dirent, evt_dir, evt_dirent) {
+   for_each_event(dir_path, evt_dir, evt_dirent) {
snprintf(evt_path, MAXPATHLEN, "%s:%s",
 sys_dirent->d_name, evt_dirent->d_name);
if (!strcmp(evt_path, event_string)) {
-- 
2.14.3



[PATCH 06/18] perf parse-events: Use get/put_events_file()

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Instead of accessing the trace_events_path variable directly, that may
not have been properly initialized wrt detecting where tracefs is
mounted.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-id7hzn1ydgkxbumeve5wa...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/parse-events.c |  7 +++---
 tools/perf/util/parse-events.c  | 50 +++--
 tools/perf/util/trace-event.c   |  8 +--
 3 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 6829dd416a99..6d57d7082637 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1328,7 +1328,7 @@ static int count_tracepoints(void)
TEST_ASSERT_VAL("Can't open events dir", events_dir);
 
while ((events_ent = readdir(events_dir))) {
-   char sys_path[PATH_MAX];
+   char *sys_path;
struct dirent *sys_ent;
DIR *sys_dir;
 
@@ -1339,8 +1339,8 @@ static int count_tracepoints(void)
|| !strcmp(events_ent->d_name, "header_page"))
continue;
 
-   scnprintf(sys_path, PATH_MAX, "%s/%s",
- tracing_events_path, events_ent->d_name);
+   sys_path = get_events_file(events_ent->d_name);
+   TEST_ASSERT_VAL("Can't get sys path", sys_path);
 
sys_dir = opendir(sys_path);
TEST_ASSERT_VAL("Can't open sys dir", sys_dir);
@@ -1356,6 +1356,7 @@ static int count_tracepoints(void)
}
 
closedir(sys_dir);
+   put_events_file(sys_path);
}
 
closedir(events_dir);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index f9d5bbd63484..24668300b327 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -189,19 +189,19 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
int fd;
u64 id;
char evt_path[MAXPATHLEN];
-   char dir_path[MAXPATHLEN];
+   char *dir_path;
 
sys_dir = opendir(tracing_events_path);
if (!sys_dir)
return NULL;
 
for_each_subsystem(sys_dir, sys_dirent) {
-
-   snprintf(dir_path, MAXPATHLEN, "%s/%s", tracing_events_path,
-sys_dirent->d_name);
+   dir_path = get_events_file(sys_dirent->d_name);
+   if (!dir_path)
+   continue;
evt_dir = opendir(dir_path);
if (!evt_dir)
-   continue;
+   goto next;
 
for_each_event(dir_path, evt_dir, evt_dirent) {
 
@@ -217,6 +217,7 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
close(fd);
id = atoll(id_buf);
if (id == config) {
+   put_events_file(dir_path);
closedir(evt_dir);
closedir(sys_dir);
path = zalloc(sizeof(*path));
@@ -241,6 +242,8 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
}
}
closedir(evt_dir);
+next:
+   put_events_file(dir_path);
}
 
closedir(sys_dir);
@@ -511,14 +514,19 @@ static int add_tracepoint_multi_event(struct list_head 
*list, int *idx,
  struct parse_events_error *err,
  struct list_head *head_config)
 {
-   char evt_path[MAXPATHLEN];
+   char *evt_path;
struct dirent *evt_ent;
DIR *evt_dir;
int ret = 0, found = 0;
 
-   snprintf(evt_path, MAXPATHLEN, "%s/%s", tracing_events_path, sys_name);
+   evt_path = get_events_file(sys_name);
+   if (!evt_path) {
+   tracepoint_error(err, errno, sys_name, evt_name);
+   return -1;
+   }
evt_dir = opendir(evt_path);
if (!evt_dir) {
+   put_events_file(evt_path);
tracepoint_error(err, errno, sys_name, evt_name);
return -1;
}
@@ -544,6 +552,7 @@ static int add_tracepoint_multi_event(struct list_head 
*list, int *idx,
ret = -1;
}
 
+   put_events_file(evt_path);
closedir(evt_dir);
return ret;
 }
@@ -2091,7 +2100,7 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob,
DIR *sys_dir, *evt_dir;
struct dirent *sys_dirent, *evt_dirent;
char evt_path[MAXPATHLEN];
-   char dir_path[MAXPATHLEN];
+   char *dir_path;
char **evt_list = NULL;
unsigned int evt_i = 0, evt_num = 0;
bool evt_num_known = false;
@@ -2112,11 +2121,12 @

[PATCH 03/18] tools lib api: Unexport 'tracing_path' variable

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

One should use tracing_path_mount() instead, so more things get done
lazily instead of at every 'perf' tool call startup.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-fci4yll35idd9yuslp67v...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 4 ++--
 tools/lib/api/fs/tracing_path.h | 1 -
 tools/perf/perf.c   | 5 +
 tools/perf/util/probe-file.c| 3 +--
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 4f8ec7d476b8..6f5fe942eff4 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -14,7 +14,7 @@
 #include "tracing_path.h"
 
 static char tracing_mnt[PATH_MAX]  = "/sys/kernel/debug";
-char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
+static char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
 char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
 
 
@@ -75,7 +75,7 @@ char *get_tracing_file(const char *name)
 {
char *file;
 
-   if (asprintf(&file, "%s/%s", tracing_path, name) < 0)
+   if (asprintf(&file, "%s/%s", tracing_path_mount(), name) < 0)
return NULL;
 
return file;
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 0066f06cc381..1b65decedfc0 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -4,7 +4,6 @@
 
 #include 
 
-extern char tracing_path[];
 extern char tracing_events_path[];
 
 void tracing_path_set(const char *mountpoint);
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index cd6ea55d4b0c..d5a0878de816 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -238,7 +238,7 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
(*argc)--;
} else if (strstarts(cmd, CMD_DEBUGFS_DIR)) {
tracing_path_set(cmd + strlen(CMD_DEBUGFS_DIR));
-   fprintf(stderr, "dir: %s\n", tracing_path);
+   fprintf(stderr, "dir: %s\n", tracing_path_mount());
if (envchanged)
*envchanged = 1;
} else if (!strcmp(cmd, "--list-cmds")) {
@@ -463,9 +463,6 @@ int main(int argc, const char **argv)
return err;
set_buildid_dir(NULL);
 
-   /* get debugfs/tracefs mount point from /proc/mounts */
-   tracing_path_mount();
-
/*
 * "perf-" is the same as "perf ", but we obviously:
 *
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4ae1123c6794..b76088fadf3d 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -84,8 +84,7 @@ int open_trace_file(const char *trace_file, bool readwrite)
char buf[PATH_MAX];
int ret;
 
-   ret = e_snprintf(buf, PATH_MAX, "%s/%s",
-tracing_path, trace_file);
+   ret = e_snprintf(buf, PATH_MAX, "%s/%s", tracing_path_mount(), 
trace_file);
if (ret >= 0) {
pr_debug("Opening %s write=%d\n", buf, readwrite);
if (readwrite && !probe_event_dry_run)
-- 
2.14.3



[PATCH 02/18] tools lib api: The tracing_mnt variable doesn't need to be global

2018-05-19 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Its only used in the file it is defined, so just make it static.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-p5x29u6mq2ml3mtnbg984...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 7b7fd0b18551..4f8ec7d476b8 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -13,8 +13,7 @@
 
 #include "tracing_path.h"
 
-
-char tracing_mnt[PATH_MAX] = "/sys/kernel/debug";
+static char tracing_mnt[PATH_MAX]  = "/sys/kernel/debug";
 char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
 char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
 
@@ -129,7 +128,7 @@ int tracing_path__strerror_open_tp(int err, char *buf, 
size_t size,
snprintf(buf, size,
 "Error:\tNo permissions to read %s/%s\n"
 "Hint:\tTry 'sudo mount -o remount,mode=755 %s'\n",
-tracing_events_path, filename, tracing_mnt);
+tracing_events_path, filename, tracing_path_mount());
}
break;
default:
-- 
2.14.3



Re: [PATCH RESEND] display: panel: Add KOE tx14d24vm1bpa display support (320x240)

2018-05-19 Thread Lukasz Majewski
Hi Thierry,

> This commit adds support for KOE's 5.7" display.
> 

Thierry, shall I perform some more work on this code, or is it
eligible for applying to your tree?

Best regards,
Łukasz

> Signed-off-by: Lukasz Majewski 
> ---
>  .../bindings/display/panel/koe,tx14d24vm1bpa.txt   | 42
> ++
> drivers/gpu/drm/panel/panel-simple.c   | 26
> ++ 2 files changed, 68 insertions(+) create mode 100644
> Documentation/devicetree/bindings/display/panel/koe,tx14d24vm1bpa.txt
> 
> diff --git
> a/Documentation/devicetree/bindings/display/panel/koe,tx14d24vm1bpa.txt
> b/Documentation/devicetree/bindings/display/panel/koe,tx14d24vm1bpa.txt
> new file mode 100644 index ..be7ac666807b --- /dev/null
> +++
> b/Documentation/devicetree/bindings/display/panel/koe,tx14d24vm1bpa.txt
> @@ -0,0 +1,42 @@ +Kaohsiung Opto-Electronics Inc. 5.7" QVGA (320 x
> 240) TFT LCD panel +
> +Required properties:
> +- compatible: should be "koe,tx14d24vm1bpa"
> +- backlight: phandle of the backlight device attached to the panel
> +- power-supply: single regulator to provide the supply voltage
> +
> +Required nodes:
> +- port: Parallel port mapping to connect this display
> +
> +This panel needs single power supply voltage. Its backlight is
> conntrolled +via PWM signal.
> +
> +Example:
> +
> +
> +Example device-tree definition when connected to iMX53 based board
> +
> + lcd_panel: lcd-panel {
> + compatible = "koe,tx14d24vm1bpa";
> + backlight = <&backlight_lcd>;
> + power-supply = <®_3v3>;
> +
> + port {
> + lcd_panel_in: endpoint {
> + remote-endpoint = <&lcd_display_out>;
> + };
> + };
> + };
> +
> +Then one needs to extend the dispX node:
> +
> + lcd_display: disp1 {
> +
> + port@1 {
> + reg = <1>;
> +
> + lcd_display_out: endpoint {
> + remote-endpoint = <&lcd_panel_in>;
> + };
> + };
> + };
> diff --git a/drivers/gpu/drm/panel/panel-simple.c
> b/drivers/gpu/drm/panel/panel-simple.c index
> d9984bdb5bb5..103b43ce7dee 100644 ---
> a/drivers/gpu/drm/panel/panel-simple.c +++
> b/drivers/gpu/drm/panel/panel-simple.c @@ -1268,6 +1268,29 @@ static
> const struct panel_desc innolux_zj070na_01p = { },
>  };
>  
> +static const struct display_timing koe_tx14d24vm1bpa_timing = {
> + .pixelclock = { 558, 585, 620 },
> + .hactive = { 320, 320, 320 },
> + .hfront_porch = { 30, 30, 30 },
> + .hback_porch = { 30, 30, 30 },
> + .hsync_len = { 1, 5, 17 },
> + .vactive = { 240, 240, 240 },
> + .vfront_porch = { 6, 6, 6 },
> + .vback_porch = { 5, 5, 5 },
> + .vsync_len = { 1, 2, 11 },
> + .flags = DISPLAY_FLAGS_DE_HIGH,
> +};
> +
> +static const struct panel_desc koe_tx14d24vm1bpa = {
> + .timings = &koe_tx14d24vm1bpa_timing,
> + .num_timings = 1,
> + .bpc = 6,
> + .size = {
> + .width = 115,
> + .height = 86,
> + },
> +};
> +
>  static const struct display_timing koe_tx31d200vm0baa_timing = {
>   .pixelclock = { 3960, 4320, 4800 },
>   .hactive = { 1280, 1280, 1280 },
> @@ -2204,6 +2227,9 @@ static const struct of_device_id
> platform_of_match[] = { .compatible = "innolux,zj070na-01p",
>   .data = &innolux_zj070na_01p,
>   }, {
> + .compatible = "koe,tx14d24vm1bpa",
> + .data = &koe_tx14d24vm1bpa,
> + }, {
>   .compatible = "koe,tx31d200vm0baa",
>   .data = &koe_tx31d200vm0baa,
>   }, {


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de


pgpKhbAWgV_r1.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 RESEND] display: panel: Add AUO g070vvn01 display support (800x480)

2018-05-19 Thread Lukasz Majewski
Hi Thierry,

> This commit adds support for AUO's 7.0" display.
> 

Thierry, shall I perform some more work on this code, or is it
eligible for applying to your tree?

Best regards,
Łukasz

> Signed-off-by: Lukasz Majewski 
> Reviewed-by: Rob Herring 
> 
> ---
> Changes for v3:
> - Remove not used 'bus-format-override = "rgb565";' property
> 
> Changes for v2:
> - Add *.txt suffix to the auo,g070wn01 file
> ---
>  .../bindings/display/panel/auo,g070vvn01.txt   | 29
> 
> drivers/gpu/drm/panel/panel-simple.c   | 31
> ++ 2 files changed, 60 insertions(+) create mode
> 100644
> Documentation/devicetree/bindings/display/panel/auo,g070vvn01.txt
> 
> diff --git
> a/Documentation/devicetree/bindings/display/panel/auo,g070vvn01.txt
> b/Documentation/devicetree/bindings/display/panel/auo,g070vvn01.txt
> new file mode 100644 index ..49e4105378f6 --- /dev/null
> +++
> b/Documentation/devicetree/bindings/display/panel/auo,g070vvn01.txt
> @@ -0,0 +1,29 @@ +AU Optronics Corporation 7.0" FHD (800 x 480) TFT
> LCD panel +
> +Required properties:
> +- compatible: should be "auo,g070vvn01"
> +- backlight: phandle of the backlight device attached to the panel
> +- power-supply: single regulator to provide the supply voltage
> +
> +Required nodes:
> +- port: Parallel port mapping to connect this display
> +
> +This panel needs single power supply voltage. Its backlight is
> conntrolled +via PWM signal.
> +
> +Example:
> +
> +
> +Example device-tree definition when connected to iMX6Q based board
> +
> + lcd_panel: lcd-panel {
> + compatible = "auo,g070vvn01";
> + backlight = <&backlight_lcd>;
> + power-supply = <®_display>;
> +
> + port {
> + lcd_panel_in: endpoint {
> + remote-endpoint = <&lcd_display_out>;
> + };
> + };
> + };
> diff --git a/drivers/gpu/drm/panel/panel-simple.c
> b/drivers/gpu/drm/panel/panel-simple.c index
> cbf1ab404ee7..d9984bdb5bb5 100644 ---
> a/drivers/gpu/drm/panel/panel-simple.c +++
> b/drivers/gpu/drm/panel/panel-simple.c @@ -581,6 +581,34 @@ static
> const struct panel_desc auo_b133htn01 = { },
>  };
>  
> +static const struct display_timing auo_g070vvn01_timings = {
> + .pixelclock = { 3330, 34209000, 4500 },
> + .hactive = { 800, 800, 800 },
> + .hfront_porch = { 20, 40, 200 },
> + .hback_porch = { 87, 40, 1 },
> + .hsync_len = { 1, 48, 87 },
> + .vactive = { 480, 480, 480 },
> + .vfront_porch = { 5, 13, 200 },
> + .vback_porch = { 31, 31, 29 },
> + .vsync_len = { 1, 1, 3 },
> +};
> +
> +static const struct panel_desc auo_g070vvn01 = {
> + .timings = &auo_g070vvn01_timings,
> + .num_timings = 1,
> + .bpc = 8,
> + .size = {
> + .width = 152,
> + .height = 91,
> + },
> + .delay = {
> + .prepare = 200,
> + .enable = 50,
> + .disable = 50,
> + .unprepare = 1000,
> + },
> +};
> +
>  static const struct drm_display_mode auo_g104sn02_mode = {
>   .clock = 4,
>   .hdisplay = 800,
> @@ -2095,6 +2123,9 @@ static const struct of_device_id
> platform_of_match[] = { .compatible = "auo,b133xtn01",
>   .data = &auo_b133xtn01,
>   }, {
> + .compatible = "auo,g070vvn01",
> + .data = &auo_g070vvn01,
> + }, {
>   .compatible = "auo,g104sn02",
>   .data = &auo_g104sn02,
>   }, {




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de


pgpj18wuHTf8B.pgp
Description: OpenPGP digital signature


RE: [PATCH v8 10/15] cpufreq: Add Kryo CPU scaling driver

2018-05-19 Thread ilialin
commit 4abe2cd7176a43c77e9a462e4f6ec51aa7552e73
Author: Ilia Lin 
Date:   Thu May 17 13:55:12 2018 +0300

cpufreq: Add Kryo CPU scaling driver

In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
the CPU frequency subset and voltage value of each OPP varies
based on the silicon variant in use. Qualcomm Process Voltage Scaling
Tables
defines the voltage and frequency value based on the msm-id in SMEM
and speedbin blown in the efuse combination.
The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the
SoC
to provide the OPP framework with required information.
This is used to determine the voltage and frequency value for each OPP
of
operating-points-v2 table when it is parsed by the OPP framework.

Signed-off-by: Ilia Lin 
Acked-by: Viresh Kumar 

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index de55c7d..0bfd40e 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -124,6 +124,16 @@ config ARM_OMAP2PLUS_CPUFREQ
depends on ARCH_OMAP2PLUS
default ARCH_OMAP2PLUS

+config ARM_QCOM_CPUFREQ_KRYO
+   bool "Qualcomm Kryo based CPUFreq"
+   depends on QCOM_QFPROM
+   depends on QCOM_SMEM
+   select PM_OPP
+   help
+ This adds the CPUFreq driver for Qualcomm Kryo SoC based boards.
+
+ If in doubt, say N.
+
 config ARM_S3C_CPUFREQ
bool
help
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 8d24ade..fb4a2ec 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_MACH_MVEBU_V7)   += mvebu-cpufreq.o
 obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)+= omap-cpufreq.o
 obj-$(CONFIG_ARM_PXA2xx_CPUFREQ)   += pxa2xx-cpufreq.o
 obj-$(CONFIG_PXA3xx)   += pxa3xx-cpufreq.o
+obj-$(CONFIG_ARM_QCOM_CPUFREQ_KRYO)+= qcom-cpufreq-kryo.o
 obj-$(CONFIG_ARM_S3C2410_CPUFREQ)  += s3c2410-cpufreq.o
 obj-$(CONFIG_ARM_S3C2412_CPUFREQ)  += s3c2412-cpufreq.o
 obj-$(CONFIG_ARM_S3C2416_CPUFREQ)  += s3c2416-cpufreq.o
diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c
b/drivers/cpufreq/cpufreq-dt-platdev.c
index 3b585e4..77d6ab8 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -118,6 +118,9 @@

{ .compatible = "nvidia,tegra124", },

+   { .compatible = "qcom,apq8096", },
+   { .compatible = "qcom,msm8996", },
+
{ .compatible = "st,stih407", },
{ .compatible = "st,stih410", },

diff --git a/drivers/cpufreq/qcom-cpufreq-kryo.c
b/drivers/cpufreq/qcom-cpufreq-kryo.c
new file mode 100644
index 000..b024b23
--- /dev/null
+++ b/drivers/cpufreq/qcom-cpufreq-kryo.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018, The Linux Foundation. All rights reserved.
+ */
+
+/*
+ * In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
+ * the CPU frequency subset and voltage value of each OPP varies
+ * based on the silicon variant in use. Qualcomm Process Voltage Scaling
Tables
+ * defines the voltage and frequency value based on the msm-id in SMEM
+ * and speedbin blown in the efuse combination.
+ * The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the
SoC
+ * to provide the OPP framework with required information.
+ * This is used to determine the voltage and frequency value for each OPP
of
+ * operating-points-v2 table when it is parsed by the OPP framework.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MSM_ID_SMEM137
+#define SILVER_LEAD0
+#define GOLD_LEAD  2
+
+enum _msm_id {
+   MSM8996V3 = 0xF6ul,
+   APQ8096V3 = 0x123ul,
+   MSM8996SG = 0x131ul,
+   APQ8096SG = 0x138ul,
+};
+
+enum _msm8996_version {
+   MSM8996_V3,
+   MSM8996_SG,
+   NUM_OF_MSM8996_VERSIONS,
+};
+
+static enum _msm8996_version __init qcom_cpufreq_kryo_get_msm_id(void)
+{
+   size_t len;
+   u32 *msm_id;
+   enum _msm8996_version version;
+
+   msm_id = qcom_smem_get(QCOM_SMEM_HOST_ANY, MSM_ID_SMEM, &len);
+   /* The first 4 bytes are format, next to them is the actual msm-id
*/
+   msm_id++;
+
+   switch ((enum _msm_id)*msm_id) {
+   case MSM8996V3:
+   case APQ8096V3:
+   version = MSM8996_V3;
+   break;
+   case MSM8996SG:
+   case APQ8096SG:
+   version = MSM8996_SG;
+   break;
+   default:
+   version = NUM_OF_MSM8996_VERSIONS;
+   }
+
+   return version;
+}
+
+static int __init qcom_cpufreq_kryo_driver_init(void)
+{
+   struct device *cpu_dev_silver, *cpu_dev_gold;
+   struct opp_table *opp_silver, *opp_gold;
+   enum _msm8996_version msm8996_version;
+   struct nvmem_cell *speedbin_nvmem;
+   struct platform_device *pdev;
+   struct device_node *np;
+   u8 *speedbin;
+ 

Re: WARNING and PANIC in irq_matrix_free

2018-05-19 Thread Thomas Gleixner
On Fri, 18 May 2018, Dmitry Safonov wrote:
> I'm not entirely sure that it's the same fault, but at least backtrace
> looks resembling.

Yes, it's similar, but not the same issue. I'll stare are the code ...

Thanks,

tglx


[tip:x86/cache] x86/intel_rdt/mba_sc: Add initialization support

2018-05-19 Thread tip-bot for Vikas Shivappa
Commit-ID:  1bd2a63b4f0deefe745aa0fd969c07b2eb9ee99e
Gitweb: https://git.kernel.org/tip/1bd2a63b4f0deefe745aa0fd969c07b2eb9ee99e
Author: Vikas Shivappa 
AuthorDate: Fri, 20 Apr 2018 15:36:18 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:16:43 +0200

x86/intel_rdt/mba_sc: Add initialization support

When MBA software controller is enabled, a per domain storage is required
for user specified bandwidth in "MBps" and the "percentage" values which
are programmed into the IA32_MBA_THRTL_MSR. Add support for these data
structures and initialization.

The MBA percentage values have a default max value of 100 but however the
max value in MBps is not available from the hardware so it's set to
U32_MAX.

This simply says that the control group can use all bandwidth by default
but does not say what is the actual max bandwidth available. The actual
bandwidth that is available may depend on lot of factors like QPI link,
number of memory channels, memory channel frequency, its width and memory
speed, how many channels are configured and also if memory interleaving is
enabled. So there is no way to determine the maximum at runtime reliably.

Signed-off-by: Vikas Shivappa 
Signed-off-by: Thomas Gleixner 
Cc: ravi.v.shan...@intel.com
Cc: tony.l...@intel.com
Cc: fenghua...@intel.com
Cc: vikas.shiva...@intel.com
Cc: a...@linux.intel.com
Cc: h...@zytor.com
Link: 
https://lkml.kernel.org/r/1524263781-14267-4-git-send-email-vikas.shiva...@linux.intel.com

---
 arch/x86/kernel/cpu/intel_rdt.c  | 37 +++-
 arch/x86/kernel/cpu/intel_rdt.h  |  3 +++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  3 +++
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 53ee6838c496..8c09e9db2fc6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -35,6 +35,7 @@
 
 #define MAX_MBA_BW 100u
 #define MBA_IS_LINEAR  0x4
+#define MBA_MAX_MBPS   U32_MAX
 
 /* Mutex to protect rdtgroup access. */
 DEFINE_MUTEX(rdtgroup_mutex);
@@ -439,25 +440,40 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource 
*r, int id,
return NULL;
 }
 
+void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
+{
+   int i;
+
+   /*
+* Initialize the Control MSRs to having no control.
+* For Cache Allocation: Set all bits in cbm
+* For Memory Allocation: Set b/w requested to 100%
+* and the bandwidth in MBps to U32_MAX
+*/
+   for (i = 0; i < r->num_closid; i++, dc++, dm++) {
+   *dc = r->default_ctrl;
+   *dm = MBA_MAX_MBPS;
+   }
+}
+
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 {
struct msr_param m;
-   u32 *dc;
-   int i;
+   u32 *dc, *dm;
 
dc = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
if (!dc)
return -ENOMEM;
 
-   d->ctrl_val = dc;
+   dm = kmalloc_array(r->num_closid, sizeof(*d->mbps_val), GFP_KERNEL);
+   if (!dm) {
+   kfree(dc);
+   return -ENOMEM;
+   }
 
-   /*
-* Initialize the Control MSRs to having no control.
-* For Cache Allocation: Set all bits in cbm
-* For Memory Allocation: Set b/w requested to 100
-*/
-   for (i = 0; i < r->num_closid; i++, dc++)
-   *dc = r->default_ctrl;
+   d->ctrl_val = dc;
+   d->mbps_val = dm;
+   setup_default_ctrlval(r, dc, dm);
 
m.low = 0;
m.high = r->num_closid;
@@ -596,6 +612,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
}
 
kfree(d->ctrl_val);
+   kfree(d->mbps_val);
kfree(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 74aee0fdc97c..91cc31087e8a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -202,6 +202,7 @@ struct mbm_state {
  * @cqm_work_cpu:
  * worker cpu for CQM h/w counters
  * @ctrl_val:  array of cache or mem ctrl values (indexed by CLOSID)
+ * @mbps_val:  When mba_sc is enabled, this holds the bandwidth in MBps
  * @new_ctrl:  new ctrl value to be loaded
  * @have_new_ctrl: did user provide new_ctrl for this domain
  */
@@ -217,6 +218,7 @@ struct rdt_domain {
int mbm_work_cpu;
int cqm_work_cpu;
u32 *ctrl_val;
+   u32 *mbps_val;
u32 new_ctrl;
boolhave_new_ctrl;
 };
@@ -448,6 +450,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
unsigned long delay_ms);
 void mbm_handle_overflow(struct work_struct *work);
 bool is_mba_sc(struct rdt_resource 

[tip:x86/cache] x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc)

2018-05-19 Thread tip-bot for Vikas Shivappa
Commit-ID:  d6c64a4f49fdea0ae79addc3282ae8eb8581bdfc
Gitweb: https://git.kernel.org/tip/d6c64a4f49fdea0ae79addc3282ae8eb8581bdfc
Author: Vikas Shivappa 
AuthorDate: Fri, 20 Apr 2018 15:36:16 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:16:42 +0200

x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc)

Add documentation about the feedback loop mechanism (MBA software
controller) which lets the user specify the memory bandwidth allocation
in MBps. This includes some changes to "schemata" formati with
examples.

Signed-off-by: Vikas Shivappa 
Signed-off-by: Thomas Gleixner 
Cc: ravi.v.shan...@intel.com
Cc: tony.l...@intel.com
Cc: fenghua...@intel.com
Cc: vikas.shiva...@intel.com
Cc: a...@linux.intel.com
Cc: h...@zytor.com
Link: 
https://lkml.kernel.org/r/1524263781-14267-2-git-send-email-vikas.shiva...@linux.intel.com

---
 Documentation/x86/intel_rdt_ui.txt | 75 ++
 1 file changed, 67 insertions(+), 8 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt 
b/Documentation/x86/intel_rdt_ui.txt
index 71c30984e94d..a16aa2113840 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -17,12 +17,14 @@ MBA (Memory Bandwidth Allocation) - "mba"
 
 To use the feature mount the file system:
 
- # mount -t resctrl resctrl [-o cdp[,cdpl2]] /sys/fs/resctrl
+ # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
 
 mount options are:
 
 "cdp": Enable code/data prioritization in L3 cache allocations.
 "cdpl2": Enable code/data prioritization in L2 cache allocations.
+"mba_MBps": Enable the MBA Software Controller(mba_sc) to specify MBA
+ bandwidth in MBps
 
 L2 and L3 CDP are controlled seperately.
 
@@ -270,10 +272,11 @@ and 0xA are not.  On a system with a 20-bit mask each bit 
represents 5%
 of the capacity of the cache. You could partition the cache into four
 equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
 
-Memory bandwidth(b/w) percentage
-
-For Memory b/w resource, user controls the resource by indicating the
-percentage of total memory b/w.
+Memory bandwidth Allocation and monitoring
+--
+
+For Memory bandwidth resource, by default the user controls the resource
+by indicating the percentage of total memory bandwidth.
 
 The minimum bandwidth percentage value for each cpu model is predefined
 and can be looked up through "info/MB/min_bandwidth". The bandwidth
@@ -285,7 +288,47 @@ to the next control step available on the hardware.
 The bandwidth throttling is a core specific mechanism on some of Intel
 SKUs. Using a high bandwidth and a low bandwidth setting on two threads
 sharing a core will result in both threads being throttled to use the
-low bandwidth.
+low bandwidth. The fact that Memory bandwidth allocation(MBA) is a core
+specific mechanism where as memory bandwidth monitoring(MBM) is done at
+the package level may lead to confusion when users try to apply control
+via the MBA and then monitor the bandwidth to see if the controls are
+effective. Below are such scenarios:
+
+1. User may *not* see increase in actual bandwidth when percentage
+   values are increased:
+
+This can occur when aggregate L2 external bandwidth is more than L3
+external bandwidth. Consider an SKL SKU with 24 cores on a package and
+where L2 external  is 10GBps (hence aggregate L2 external bandwidth is
+240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20
+threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3
+bandwidth of 100GBps although the percentage value specified is only 50%
+<< 100%. Hence increasing the bandwidth percentage will not yeild any
+more bandwidth. This is because although the L2 external bandwidth still
+has capacity, the L3 external bandwidth is fully used. Also note that
+this would be dependent on number of cores the benchmark is run on.
+
+2. Same bandwidth percentage may mean different actual bandwidth
+   depending on # of threads:
+
+For the same SKU in #1, a 'single thread, with 10% bandwidth' and '4
+thread, with 10% bandwidth' can consume upto 10GBps and 40GBps although
+they have same percentage bandwidth of 10%. This is simply because as
+threads start using more cores in an rdtgroup, the actual bandwidth may
+increase or vary although user specified bandwidth percentage is same.
+
+In order to mitigate this and make the interface more user friendly,
+resctrl added support for specifying the bandwidth in MBps as well.  The
+kernel underneath would use a software feedback mechanism or a "Software
+Controller(mba_sc)" which reads the actual bandwidth using MBM counters
+and adjust the memowy bandwidth percentages to ensure
+
+   "actual bandwidth < user specified bandwidth".
+
+By default, the schemata would take the bandwidth percentage values
+where as user can switch to the "MBA software controller" mode using
+a mount option 'mba_MBps'.

[tip:x86/cache] x86/intel_rdt/mba_sc: Enable/disable MBA software controller

2018-05-19 Thread tip-bot for Vikas Shivappa
Commit-ID:  19c635ab24a1e94a759e82bfb34554a6a0db215e
Gitweb: https://git.kernel.org/tip/19c635ab24a1e94a759e82bfb34554a6a0db215e
Author: Vikas Shivappa 
AuthorDate: Fri, 20 Apr 2018 15:36:17 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:16:43 +0200

x86/intel_rdt/mba_sc: Enable/disable MBA software controller

Currently user does memory bandwidth allocation(MBA) by specifying the
bandwidth in percentage via the resctrl schemata file:
"/sys/fs/resctrl/schemata"

Add a new mount option "mba_MBps" to enable the user to specify MBA
in MBps:

$mount -t resctrl resctrl [-o cdp[,cdpl2][mba_MBps]] /sys/fs/resctrl

Signed-off-by: Vikas Shivappa 
Signed-off-by: Thomas Gleixner 
Cc: ravi.v.shan...@intel.com
Cc: tony.l...@intel.com
Cc: fenghua...@intel.com
Cc: vikas.shiva...@intel.com
Cc: a...@linux.intel.com
Cc: h...@zytor.com
Link: 
https://lkml.kernel.org/r/1524263781-14267-3-git-send-email-vikas.shiva...@linux.intel.com

---
 arch/x86/kernel/cpu/intel_rdt.c  |  8 
 arch/x86/kernel/cpu/intel_rdt.h  |  3 +++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 30 ++
 3 files changed, 41 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 589b948e6e01..53ee6838c496 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -230,6 +230,14 @@ static inline void cache_alloc_hsw_probe(void)
rdt_alloc_capable = true;
 }
 
+bool is_mba_sc(struct rdt_resource *r)
+{
+   if (!r)
+   return rdt_resources_all[RDT_RESOURCE_MBA].membw.mba_sc;
+
+   return r->membw.mba_sc;
+}
+
 /*
  * rdt_get_mb_table() - get a mapping of bandwidth(b/w) percentage values
  * exposed to user interface and the h/w understandable delay values.
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 3fd7a70ee04a..74aee0fdc97c 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -259,6 +259,7 @@ struct rdt_cache {
  * @min_bw:Minimum memory bandwidth percentage user can request
  * @bw_gran:   Granularity at which the memory bandwidth is allocated
  * @delay_linear:  True if memory B/W delay is in linear scale
+ * @mba_sc:True if MBA software controller(mba_sc) is enabled
  * @mb_map:Mapping of memory B/W percentage to memory B/W delay
  */
 struct rdt_membw {
@@ -266,6 +267,7 @@ struct rdt_membw {
u32 min_bw;
u32 bw_gran;
u32 delay_linear;
+   boolmba_sc;
u32 *mb_map;
 };
 
@@ -445,6 +447,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_domain 
*d,
 void mbm_setup_overflow_handler(struct rdt_domain *dom,
unsigned long delay_ms);
 void mbm_handle_overflow(struct work_struct *work);
+bool is_mba_sc(struct rdt_resource *r);
 void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index fca759d272a1..440025446239 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1005,6 +1005,11 @@ static void l2_qos_cfg_update(void *arg)
wrmsrl(IA32_L2_QOS_CFG, *enable ? L2_QOS_CDP_ENABLE : 0ULL);
 }
 
+static inline bool is_mba_linear(void)
+{
+   return rdt_resources_all[RDT_RESOURCE_MBA].membw.delay_linear;
+}
+
 static int set_cache_qos_cfg(int level, bool enable)
 {
void (*update)(void *arg);
@@ -1041,6 +1046,25 @@ static int set_cache_qos_cfg(int level, bool enable)
return 0;
 }
 
+/*
+ * Enable or disable the MBA software controller
+ * which helps user specify bandwidth in MBps.
+ * MBA software controller is supported only if
+ * MBM is supported and MBA is in linear scale.
+ */
+static int set_mba_sc(bool mba_sc)
+{
+   struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA];
+
+   if (!is_mbm_enabled() || !is_mba_linear() ||
+   mba_sc == is_mba_sc(r))
+   return -EINVAL;
+
+   r->membw.mba_sc = mba_sc;
+
+   return 0;
+}
+
 static int cdp_enable(int level, int data_type, int code_type)
 {
struct rdt_resource *r_ldata = &rdt_resources_all[data_type];
@@ -1123,6 +1147,10 @@ static int parse_rdtgroupfs_options(char *data)
ret = cdpl2_enable();
if (ret)
goto out;
+   } else if (!strcmp(token, "mba_MBps")) {
+   ret = set_mba_sc(true);
+   if (ret)
+   goto out;
} else {
ret = -EINVAL;
goto out;
@@ -1445,6 +1473,8 @@ static void rdt_kill_sb(struct super_block

[tip:x86/cache] x86/intel_rdt/mba_sc: Add schemata support

2018-05-19 Thread tip-bot for Vikas Shivappa
Commit-ID:  8205a078ba7819c23558e31af4b3bda04d9b3bae
Gitweb: https://git.kernel.org/tip/8205a078ba7819c23558e31af4b3bda04d9b3bae
Author: Vikas Shivappa 
AuthorDate: Fri, 20 Apr 2018 15:36:19 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:16:44 +0200

x86/intel_rdt/mba_sc: Add schemata support

Currently when user updates the "schemata" with new MBA percentage
values, kernel writes the corresponding bandwidth percentage values to
the IA32_MBA_THRTL_MSR.

When MBA is expressed in MBps, the schemata format is changed to have the
per package memory bandwidth in MBps instead of being specified in
percentage. Do not write the IA32_MBA_THRTL_MSRs when the schemata is
updated as that is handled separately.

Signed-off-by: Vikas Shivappa 
Signed-off-by: Thomas Gleixner 
Cc: ravi.v.shan...@intel.com
Cc: tony.l...@intel.com
Cc: fenghua...@intel.com
Cc: vikas.shiva...@intel.com
Cc: a...@linux.intel.com
Cc: h...@zytor.com
Link: 
https://lkml.kernel.org/r/1524263781-14267-5-git-send-email-vikas.shiva...@linux.intel.com

---
 arch/x86/kernel/cpu/intel_rdt.c |  2 +-
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 24 +++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 8c09e9db2fc6..ad03d975883e 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -179,7 +179,7 @@ struct rdt_resource rdt_resources_all[] = {
.msr_update = mba_wrmsr,
.cache_level= 3,
.parse_ctrlval  = parse_bw,
-   .format_str = "%d=%*d",
+   .format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
},
 };
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 23e1d5c249c6..116d57b248d3 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -53,7 +53,8 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
return false;
}
 
-   if (bw < r->membw.min_bw || bw > r->default_ctrl) {
+   if ((bw < r->membw.min_bw || bw > r->default_ctrl) &&
+   !is_mba_sc(r)) {
rdt_last_cmd_printf("MB value %ld out of range [%d,%d]\n", bw,
r->membw.min_bw, r->default_ctrl);
return false;
@@ -179,6 +180,8 @@ static int update_domains(struct rdt_resource *r, int 
closid)
struct msr_param msr_param;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
+   bool mba_sc;
+   u32 *dc;
int cpu;
 
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -188,13 +191,20 @@ static int update_domains(struct rdt_resource *r, int 
closid)
msr_param.high = msr_param.low + 1;
msr_param.res = r;
 
+   mba_sc = is_mba_sc(r);
list_for_each_entry(d, &r->domains, list) {
-   if (d->have_new_ctrl && d->new_ctrl != d->ctrl_val[closid]) {
+   dc = !mba_sc ? d->ctrl_val : d->mbps_val;
+   if (d->have_new_ctrl && d->new_ctrl != dc[closid]) {
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
-   d->ctrl_val[closid] = d->new_ctrl;
+   dc[closid] = d->new_ctrl;
}
}
-   if (cpumask_empty(cpu_mask))
+
+   /*
+* Avoid writing the control msr with control values when
+* MBA software controller is enabled
+*/
+   if (cpumask_empty(cpu_mask) || mba_sc)
goto done;
cpu = get_cpu();
/* Update CBM on this cpu if it's in cpu_mask. */
@@ -282,13 +292,17 @@ static void show_doms(struct seq_file *s, struct 
rdt_resource *r, int closid)
 {
struct rdt_domain *dom;
bool sep = false;
+   u32 ctrl_val;
 
seq_printf(s, "%*s:", max_name_width, r->name);
list_for_each_entry(dom, &r->domains, list) {
if (sep)
seq_puts(s, ";");
+
+   ctrl_val = (!is_mba_sc(r) ? dom->ctrl_val[closid] :
+   dom->mbps_val[closid]);
seq_printf(s, r->format_str, dom->id, max_data_width,
-  dom->ctrl_val[closid]);
+  ctrl_val);
sep = true;
}
seq_puts(s, "\n");


[tip:x86/cache] x86/intel_rdt/mba_sc: Prepare for feedback loop

2018-05-19 Thread tip-bot for Vikas Shivappa
Commit-ID:  ba0f26d8529c2dfc9aa6d9e8a338180737f8c1be
Gitweb: https://git.kernel.org/tip/ba0f26d8529c2dfc9aa6d9e8a338180737f8c1be
Author: Vikas Shivappa 
AuthorDate: Fri, 20 Apr 2018 15:36:20 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:16:44 +0200

x86/intel_rdt/mba_sc: Prepare for feedback loop

This is a preparatory patch for the mba feedback loop. Add support to
measure the "bandwidth in MBps" and the "delta bandwidth". Measure it by
reading the MBM IA32_QM_CTR MSRs and calculating the amount of "bytes"
moved. There is no user space interface for this and will only be used by
the feedback loop patch.

Signed-off-by: Vikas Shivappa 
Signed-off-by: Thomas Gleixner 
Cc: ravi.v.shan...@intel.com
Cc: tony.l...@intel.com
Cc: fenghua...@intel.com
Cc: vikas.shiva...@intel.com
Cc: a...@linux.intel.com
Cc: h...@zytor.com
Link: 
https://lkml.kernel.org/r/1524263781-14267-6-git-send-email-vikas.shiva...@linux.intel.com

---
 arch/x86/kernel/cpu/intel_rdt.h | 10 
 arch/x86/kernel/cpu/intel_rdt_monitor.c | 44 -
 2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 91cc31087e8a..66a0ba37a8a3 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -180,10 +180,20 @@ struct rftype {
  * struct mbm_state - status for each MBM counter in each domain
  * @chunks:Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_msr   Value of IA32_QM_CTR for this RMID last time we read it
+ * @chunks_bw  Total local data moved. Used for bandwidth calculation
+ * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
+ * @prev_bwThe most recent bandwidth in MBps
+ * @delta_bw   Difference between the current and previous bandwidth
+ * @delta_comp Indicates whether to compute the delta_bw
  */
 struct mbm_state {
u64 chunks;
u64 prev_msr;
+   u64 chunks_bw;
+   u64 prev_bw_msr;
+   u32 prev_bw;
+   u32 delta_bw;
+   booldelta_comp;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/intel_rdt_monitor.c 
b/arch/x86/kernel/cpu/intel_rdt_monitor.c
index 681450eee428..7690402c42b7 100644
--- a/arch/x86/kernel/cpu/intel_rdt_monitor.c
+++ b/arch/x86/kernel/cpu/intel_rdt_monitor.c
@@ -225,10 +225,18 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
 }
 
+static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr)
+{
+   u64 shift = 64 - MBM_CNTR_WIDTH, chunks;
+
+   chunks = (cur_msr << shift) - (prev_msr << shift);
+   return chunks >>= shift;
+}
+
 static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 {
-   u64 chunks, shift, tval;
struct mbm_state *m;
+   u64 chunks, tval;
 
tval = __rmid_read(rmid, rr->evtid);
if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
@@ -254,14 +262,12 @@ static int __mon_event_count(u32 rmid, struct rmid_read 
*rr)
}
 
if (rr->first) {
-   m->prev_msr = tval;
-   m->chunks = 0;
+   memset(m, 0, sizeof(struct mbm_state));
+   m->prev_bw_msr = m->prev_msr = tval;
return 0;
}
 
-   shift = 64 - MBM_CNTR_WIDTH;
-   chunks = (tval << shift) - (m->prev_msr << shift);
-   chunks >>= shift;
+   chunks = mbm_overflow_count(m->prev_msr, tval);
m->chunks += chunks;
m->prev_msr = tval;
 
@@ -269,6 +275,32 @@ static int __mon_event_count(u32 rmid, struct rmid_read 
*rr)
return 0;
 }
 
+/*
+ * Supporting function to calculate the memory bandwidth
+ * and delta bandwidth in MBps.
+ */
+static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
+{
+   struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];
+   struct mbm_state *m = &rr->d->mbm_local[rmid];
+   u64 tval, cur_bw, chunks;
+
+   tval = __rmid_read(rmid, rr->evtid);
+   if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+   return;
+
+   chunks = mbm_overflow_count(m->prev_bw_msr, tval);
+   m->chunks_bw += chunks;
+   m->chunks = m->chunks_bw;
+   cur_bw = (chunks * r->mon_scale) >> 20;
+
+   if (m->delta_comp)
+   m->delta_bw = abs(cur_bw - m->prev_bw);
+   m->delta_comp = false;
+   m->prev_bw = cur_bw;
+   m->prev_bw_msr = tval;
+}
+
 /*
  * This is called via IPI to read the CQM/MBM counters
  * on a domain.


[tip:x86/cache] x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth

2018-05-19 Thread tip-bot for Vikas Shivappa
Commit-ID:  de73f38f768021610bd305cf74ef3702fcf6a1eb
Gitweb: https://git.kernel.org/tip/de73f38f768021610bd305cf74ef3702fcf6a1eb
Author: Vikas Shivappa 
AuthorDate: Fri, 20 Apr 2018 15:36:21 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:16:44 +0200

x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth

mba_sc is a feedback loop where we periodically read MBM counters and
try to restrict the bandwidth below a max value so the below is always
true:

  "current bandwidth(cur_bw) < user specified bandwidth(user_bw)"

The frequency of these checks is currently 1s and we just tag along the
MBM overflow timer to do the updates. Doing it once in a second also
makes the calculation of bandwidth easy. The steps of increase or
decrease of bandwidth is the minimum granularity specified by the
hardware.

Although the MBA's goal is to restrict the bandwidth below a maximum,
there may be a need to even increase the bandwidth. Since MBA controls
the L2 external bandwidth where as MBM measures the L3 external
bandwidth, we may end up restricting some rdtgroups unnecessarily. This
may happen in the sequence where rdtgroup (set of jobs) had high
"L3 <-> memory traffic" in initial phases -> mba_sc kicks in and reduced
bandwidth percentage values -> but after some it has mostly "L2 <-> L3"
traffic. In this scenario mba_sc increases the bandwidth percentage when
there is lesser memory traffic.

Signed-off-by: Vikas Shivappa 
Signed-off-by: Thomas Gleixner 
Cc: ravi.v.shan...@intel.com
Cc: tony.l...@intel.com
Cc: fenghua...@intel.com
Cc: vikas.shiva...@intel.com
Cc: a...@linux.intel.com
Cc: h...@zytor.com
Link: 
https://lkml.kernel.org/r/1524263781-14267-7-git-send-email-vikas.shiva...@linux.intel.com

---
 arch/x86/kernel/cpu/intel_rdt.c |   3 +-
 arch/x86/kernel/cpu/intel_rdt.h |   2 +
 arch/x86/kernel/cpu/intel_rdt_monitor.c | 126 +++-
 3 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index ad03d975883e..24bfa63e86cf 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -33,7 +33,6 @@
 #include 
 #include "intel_rdt.h"
 
-#define MAX_MBA_BW 100u
 #define MBA_IS_LINEAR  0x4
 #define MBA_MAX_MBPS   U32_MAX
 
@@ -350,7 +349,7 @@ static int get_cache_id(int cpu, int level)
  * that can be written to QOS_MSRs.
  * There are currently no SKUs which support non linear delay values.
  */
-static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
+u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
 {
if (r->membw.delay_linear)
return MAX_MBA_BW - bw;
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 66a0ba37a8a3..39752825e376 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -28,6 +28,7 @@
 
 #define MBM_CNTR_WIDTH 24
 #define MBM_OVERFLOW_INTERVAL  1000
+#define MAX_MBA_BW 100u
 
 #define RMID_VAL_ERROR BIT_ULL(63)
 #define RMID_VAL_UNAVAIL   BIT_ULL(62)
@@ -461,6 +462,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
 void mbm_handle_overflow(struct work_struct *work);
 bool is_mba_sc(struct rdt_resource *r);
 void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm);
+u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
 void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
diff --git a/arch/x86/kernel/cpu/intel_rdt_monitor.c 
b/arch/x86/kernel/cpu/intel_rdt_monitor.c
index 7690402c42b7..b0f3aed76b75 100644
--- a/arch/x86/kernel/cpu/intel_rdt_monitor.c
+++ b/arch/x86/kernel/cpu/intel_rdt_monitor.c
@@ -329,6 +329,118 @@ void mon_event_count(void *info)
}
 }
 
+/*
+ * Feedback loop for MBA software controller (mba_sc)
+ *
+ * mba_sc is a feedback loop where we periodically read MBM counters and
+ * adjust the bandwidth percentage values via the IA32_MBA_THRTL_MSRs so
+ * that:
+ *
+ *   current bandwdith(cur_bw) < user specified bandwidth(user_bw)
+ *
+ * This uses the MBM counters to measure the bandwidth and MBA throttle
+ * MSRs to control the bandwidth for a particular rdtgrp. It builds on the
+ * fact that resctrl rdtgroups have both monitoring and control.
+ *
+ * The frequency of the checks is 1s and we just tag along the MBM overflow
+ * timer. Having 1s interval makes the calculation of bandwidth simpler.
+ *
+ * Although MBA's goal is to restrict the bandwidth to a maximum, there may
+ * be a need to increase the bandwidth to avoid uncecessarily restricting
+ * the L2 <-> L3 traffic.
+ *
+ * Since MBA controls the L2 external bandwidth where as MBM measures the
+ * L3 external bandwidth the following sequence could lead to such a
+ * situation.
+ *
+ * Consider an

[tip:x86/hyperv] X86/Hyper-V: Enlighten APIC access

2018-05-19 Thread tip-bot for K. Y. Srinivasan
Commit-ID:  6b48cb5f8347bc0153ff1d7b075db92e6723ffdb
Gitweb: https://git.kernel.org/tip/6b48cb5f8347bc0153ff1d7b075db92e6723ffdb
Author: K. Y. Srinivasan 
AuthorDate: Wed, 16 May 2018 14:53:30 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:23:17 +0200

X86/Hyper-V: Enlighten APIC access

Hyper-V supports MSR based APIC access; implement
the enlightenment.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Michael Kelley 
Cc: o...@aepfle.de
Cc: sthem...@microsoft.com
Cc: gre...@linuxfoundation.org
Cc: jasow...@redhat.com
Cc: michael.h.kel...@microsoft.com
Cc: h...@zytor.com
Cc: a...@canonical.com
Cc: de...@linuxdriverproject.org
Cc: vkuzn...@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-1-...@linuxonhyperv.com

---
 arch/x86/hyperv/Makefile|   2 +-
 arch/x86/hyperv/hv_apic.c   | 104 
 arch/x86/hyperv/hv_init.c   |   5 +-
 arch/x86/include/asm/mshyperv.h |   4 +-
 4 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 367a8203cfcf..00ce4df01a09 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1 +1 @@
-obj-y  := hv_init.o mmu.o
+obj-y  := hv_init.o mmu.o hv_apic.o
diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
new file mode 100644
index ..ca20e31d311c
--- /dev/null
+++ b/arch/x86/hyperv/hv_apic.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Hyper-V specific APIC code.
+ *
+ * Copyright (C) 2018, Microsoft, Inc.
+ *
+ * Author : K. Y. Srinivasan 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_X86_64
+#if IS_ENABLED(CONFIG_HYPERV)
+
+static u64 hv_apic_icr_read(void)
+{
+   u64 reg_val;
+
+   rdmsrl(HV_X64_MSR_ICR, reg_val);
+   return reg_val;
+}
+
+static void hv_apic_icr_write(u32 low, u32 id)
+{
+   u64 reg_val;
+
+   reg_val = SET_APIC_DEST_FIELD(id);
+   reg_val = reg_val << 32;
+   reg_val |= low;
+
+   wrmsrl(HV_X64_MSR_ICR, reg_val);
+}
+
+static u32 hv_apic_read(u32 reg)
+{
+   u32 reg_val, hi;
+
+   switch (reg) {
+   case APIC_EOI:
+   rdmsr(HV_X64_MSR_EOI, reg_val, hi);
+   return reg_val;
+   case APIC_TASKPRI:
+   rdmsr(HV_X64_MSR_TPR, reg_val, hi);
+   return reg_val;
+
+   default:
+   return native_apic_mem_read(reg);
+   }
+}
+
+static void hv_apic_write(u32 reg, u32 val)
+{
+   switch (reg) {
+   case APIC_EOI:
+   wrmsr(HV_X64_MSR_EOI, val, 0);
+   break;
+   case APIC_TASKPRI:
+   wrmsr(HV_X64_MSR_TPR, val, 0);
+   break;
+   default:
+   native_apic_mem_write(reg, val);
+   }
+}
+
+static void hv_apic_eoi_write(u32 reg, u32 val)
+{
+   wrmsr(HV_X64_MSR_EOI, val, 0);
+}
+
+void __init hv_apic_init(void)
+{
+   if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) {
+   pr_info("Hyper-V: Using MSR based APIC access\n");
+   apic_set_eoi_write(hv_apic_eoi_write);
+   apic->read  = hv_apic_read;
+   apic->write = hv_apic_write;
+   apic->icr_write = hv_apic_icr_write;
+   apic->icr_read  = hv_apic_icr_read;
+   }
+}
+
+#endif /* CONFIG_HYPERV */
+#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index cfecc2272f2d..71e50fc2b7ef 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -242,8 +242,9 @@ static int hv_cpu_die(unsigned int cpu)
  *
  * 1. Setup the hypercall page.
  * 2. Register Hyper-V specific clocksource.
+ * 3. Setup Hyper-V specific APIC entry points.
  */
-void hyperv_init(void)
+void __init hyperv_init(void)
 {
u64 guest_id, required_msrs;
union hv_x64_msr_hypercall_contents hypercall_msr;
@@ -298,6 +299,8 @@ void hyperv_init(void)
 
hyper_alloc_mmu();
 
+   hv_apic_init();
+
/*
 * Register Hyper-V specific clocksource.
 */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index b90e79610cf7..162977b82e2e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -258,7 +258,7 @@ static inline int hv_cpu_number_to_vp_number(int cpu_number)
return hv_vp_index[cpu_number]

[tip:x86/hyperv] X86/Hyper-V: Enable IPI enlightenments

2018-05-19 Thread tip-bot for K. Y. Srinivasan
Commit-ID:  68bb7bfb7985df2bd15c2dc975cb68b7a901488a
Gitweb: https://git.kernel.org/tip/68bb7bfb7985df2bd15c2dc975cb68b7a901488a
Author: K. Y. Srinivasan 
AuthorDate: Wed, 16 May 2018 14:53:31 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:23:17 +0200

X86/Hyper-V: Enable IPI enlightenments

Hyper-V supports hypercalls to implement IPI; use them.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Michael Kelley 
Cc: o...@aepfle.de
Cc: sthem...@microsoft.com
Cc: gre...@linuxfoundation.org
Cc: jasow...@redhat.com
Cc: michael.h.kel...@microsoft.com
Cc: h...@zytor.com
Cc: a...@canonical.com
Cc: de...@linuxdriverproject.org
Cc: vkuzn...@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-2-...@linuxonhyperv.com

---
 arch/x86/hyperv/hv_apic.c  | 117 +
 arch/x86/hyperv/hv_init.c  |  27 +
 arch/x86/include/asm/hyperv-tlfs.h |  15 +
 arch/x86/include/asm/mshyperv.h|   1 +
 4 files changed, 160 insertions(+)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index ca20e31d311c..3e0de61f1a7c 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -33,6 +33,8 @@
 #ifdef CONFIG_X86_64
 #if IS_ENABLED(CONFIG_HYPERV)
 
+static struct apic orig_apic;
+
 static u64 hv_apic_icr_read(void)
 {
u64 reg_val;
@@ -88,8 +90,123 @@ static void hv_apic_eoi_write(u32 reg, u32 val)
wrmsr(HV_X64_MSR_EOI, val, 0);
 }
 
+/*
+ * IPI implementation on Hyper-V.
+ */
+static bool __send_ipi_mask(const struct cpumask *mask, int vector)
+{
+   int cur_cpu, vcpu;
+   struct ipi_arg_non_ex **arg;
+   struct ipi_arg_non_ex *ipi_arg;
+   int ret = 1;
+   unsigned long flags;
+
+   if (cpumask_empty(mask))
+   return true;
+
+   if (!hv_hypercall_pg)
+   return false;
+
+   if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
+   return false;
+
+   local_irq_save(flags);
+   arg = (struct ipi_arg_non_ex **)this_cpu_ptr(hyperv_pcpu_input_arg);
+
+   ipi_arg = *arg;
+   if (unlikely(!ipi_arg))
+   goto ipi_mask_done;
+
+   ipi_arg->vector = vector;
+   ipi_arg->reserved = 0;
+   ipi_arg->cpu_mask = 0;
+
+   for_each_cpu(cur_cpu, mask) {
+   vcpu = hv_cpu_number_to_vp_number(cur_cpu);
+   /*
+* This particular version of the IPI hypercall can
+* only target upto 64 CPUs.
+*/
+   if (vcpu >= 64)
+   goto ipi_mask_done;
+
+   __set_bit(vcpu, (unsigned long *)&ipi_arg->cpu_mask);
+   }
+
+   ret = hv_do_hypercall(HVCALL_SEND_IPI, ipi_arg, NULL);
+
+ipi_mask_done:
+   local_irq_restore(flags);
+   return ((ret == 0) ? true : false);
+}
+
+static bool __send_ipi_one(int cpu, int vector)
+{
+   struct cpumask mask = CPU_MASK_NONE;
+
+   cpumask_set_cpu(cpu, &mask);
+   return __send_ipi_mask(&mask, vector);
+}
+
+static void hv_send_ipi(int cpu, int vector)
+{
+   if (!__send_ipi_one(cpu, vector))
+   orig_apic.send_IPI(cpu, vector);
+}
+
+static void hv_send_ipi_mask(const struct cpumask *mask, int vector)
+{
+   if (!__send_ipi_mask(mask, vector))
+   orig_apic.send_IPI_mask(mask, vector);
+}
+
+static void hv_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
+{
+   unsigned int this_cpu = smp_processor_id();
+   struct cpumask new_mask;
+   const struct cpumask *local_mask;
+
+   cpumask_copy(&new_mask, mask);
+   cpumask_clear_cpu(this_cpu, &new_mask);
+   local_mask = &new_mask;
+   if (!__send_ipi_mask(local_mask, vector))
+   orig_apic.send_IPI_mask_allbutself(mask, vector);
+}
+
+static void hv_send_ipi_allbutself(int vector)
+{
+   hv_send_ipi_mask_allbutself(cpu_online_mask, vector);
+}
+
+static void hv_send_ipi_all(int vector)
+{
+   if (!__send_ipi_mask(cpu_online_mask, vector))
+   orig_apic.send_IPI_all(vector);
+}
+
+static void hv_send_ipi_self(int vector)
+{
+   if (!__send_ipi_one(smp_processor_id(), vector))
+   orig_apic.send_IPI_self(vector);
+}
+
 void __init hv_apic_init(void)
 {
+   if (ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) {
+   pr_info("Hyper-V: Using IPI hypercalls\n");
+   /*
+* Set the IPI entry points.
+*/
+   orig_apic = *apic;
+
+   apic->send_IPI = hv_send_ipi;
+   apic->send_IPI_mask = hv_send_ipi_mask;
+   apic->send_IPI_mask_allbutself = hv_send_ipi_mask_allbutself;
+   apic->send_IPI_allbutself = hv_send_ipi_allbutself;
+   apic->send_IPI_all = hv_send_ipi_all;
+   apic->send_IPI_self = hv_send_ipi_self;
+   }
+
if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) {
pr

[tip:x86/hyperv] X86/Hyper-V: Enhanced IPI enlightenment

2018-05-19 Thread tip-bot for K. Y. Srinivasan
Commit-ID:  366f03b0cf90ef55f063d4a54cf62b0ac9b6da9d
Gitweb: https://git.kernel.org/tip/366f03b0cf90ef55f063d4a54cf62b0ac9b6da9d
Author: K. Y. Srinivasan 
AuthorDate: Wed, 16 May 2018 14:53:32 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:23:17 +0200

X86/Hyper-V: Enhanced IPI enlightenment

Support enhanced IPI enlightenments (to target more than 64 CPUs).

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Michael Kelley 
Cc: o...@aepfle.de
Cc: sthem...@microsoft.com
Cc: gre...@linuxfoundation.org
Cc: jasow...@redhat.com
Cc: michael.h.kel...@microsoft.com
Cc: h...@zytor.com
Cc: a...@canonical.com
Cc: de...@linuxdriverproject.org
Cc: vkuzn...@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-3-...@linuxonhyperv.com

---
 arch/x86/hyperv/hv_apic.c  | 42 +-
 arch/x86/hyperv/mmu.c  |  2 +-
 arch/x86/include/asm/hyperv-tlfs.h | 15 +-
 arch/x86/include/asm/mshyperv.h| 33 ++
 4 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 3e0de61f1a7c..192b6ad6a361 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -93,6 +93,40 @@ static void hv_apic_eoi_write(u32 reg, u32 val)
 /*
  * IPI implementation on Hyper-V.
  */
+static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector)
+{
+   struct ipi_arg_ex **arg;
+   struct ipi_arg_ex *ipi_arg;
+   unsigned long flags;
+   int nr_bank = 0;
+   int ret = 1;
+
+   local_irq_save(flags);
+   arg = (struct ipi_arg_ex **)this_cpu_ptr(hyperv_pcpu_input_arg);
+
+   ipi_arg = *arg;
+   if (unlikely(!ipi_arg))
+   goto ipi_mask_ex_done;
+
+   ipi_arg->vector = vector;
+   ipi_arg->reserved = 0;
+   ipi_arg->vp_set.valid_bank_mask = 0;
+
+   if (!cpumask_equal(mask, cpu_present_mask)) {
+   ipi_arg->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+   nr_bank = cpumask_to_vpset(&(ipi_arg->vp_set), mask);
+   }
+   if (!nr_bank)
+   ipi_arg->vp_set.format = HV_GENERIC_SET_ALL;
+
+   ret = hv_do_rep_hypercall(HVCALL_SEND_IPI_EX, 0, nr_bank,
+ ipi_arg, NULL);
+
+ipi_mask_ex_done:
+   local_irq_restore(flags);
+   return ((ret == 0) ? true : false);
+}
+
 static bool __send_ipi_mask(const struct cpumask *mask, int vector)
 {
int cur_cpu, vcpu;
@@ -110,6 +144,9 @@ static bool __send_ipi_mask(const struct cpumask *mask, int 
vector)
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return false;
 
+   if ((ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
+   return __send_ipi_mask_ex(mask, vector);
+
local_irq_save(flags);
arg = (struct ipi_arg_non_ex **)this_cpu_ptr(hyperv_pcpu_input_arg);
 
@@ -193,7 +230,10 @@ static void hv_send_ipi_self(int vector)
 void __init hv_apic_init(void)
 {
if (ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) {
-   pr_info("Hyper-V: Using IPI hypercalls\n");
+   if ((ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
+   pr_info("Hyper-V: Using ext hypercalls for IPI\n");
+   else
+   pr_info("Hyper-V: Using IPI hypercalls\n");
/*
 * Set the IPI entry points.
 */
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index 56c9ebac946f..adee39a7a3f2 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -239,7 +239,7 @@ static void hyperv_flush_tlb_others_ex(const struct cpumask 
*cpus,
flush->hv_vp_set.valid_bank_mask = 0;
 
if (!cpumask_equal(cpus, cpu_present_mask)) {
-   flush->hv_vp_set.format = HV_GENERIC_SET_SPARCE_4K;
+   flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
nr_bank = cpumask_to_vp_set(flush, cpus);
}
 
diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index 332e786d4deb..3bfa92c2793c 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -344,6 +344,7 @@ struct hv_tsc_emulation_status {
 #define HVCALL_SEND_IPI0x000b
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX  0x0013
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX   0x0014
+#define HVCALL_SEND_IPI_EX 0x0015
 #define HVCALL_POST_MESSAGE0x005c
 #define HVCALL_SIGNAL_EVENT0x005d
 
@@ -369,7 +370,7 @@ struct hv_tsc_emulation_status {
 #define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT BIT(3)
 
 enum HV_GENERIC_SET_FORMAT {
-   HV_GENERIC_SET_SPARCE_4K,
+   HV_GENERIC_SET_SPARSE_4K,
HV_GENERIC_SET_ALL,
 };
 
@@ -721,4 +722,16 @@ struct ipi_arg_non_ex {
u64 cpu_mask;
 };
 
+struct hv_vpset {
+

[tip:x86/hyperv] X86/Hyper-V: Consolidate code for converting cpumask to vpset

2018-05-19 Thread tip-bot for K. Y. Srinivasan
Commit-ID:  800b8f03fdc8d66885ff03de531285526a4ca0d4
Gitweb: https://git.kernel.org/tip/800b8f03fdc8d66885ff03de531285526a4ca0d4
Author: K. Y. Srinivasan 
AuthorDate: Wed, 16 May 2018 14:53:33 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:23:18 +0200

X86/Hyper-V: Consolidate code for converting cpumask to vpset

Consolidate code for converting cpumask to vpset.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Michael Kelley 
Cc: o...@aepfle.de
Cc: sthem...@microsoft.com
Cc: gre...@linuxfoundation.org
Cc: jasow...@redhat.com
Cc: michael.h.kel...@microsoft.com
Cc: h...@zytor.com
Cc: a...@canonical.com
Cc: de...@linuxdriverproject.org
Cc: vkuzn...@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-4-...@linuxonhyperv.com

---
 arch/x86/hyperv/mmu.c | 43 ++-
 1 file changed, 2 insertions(+), 41 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index adee39a7a3f2..c9cd28f0bae4 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -25,11 +25,7 @@ struct hv_flush_pcpu {
 struct hv_flush_pcpu_ex {
u64 address_space;
u64 flags;
-   struct {
-   u64 format;
-   u64 valid_bank_mask;
-   u64 bank_contents[];
-   } hv_vp_set;
+   struct hv_vpset hv_vp_set;
u64 gva_list[];
 };
 
@@ -70,41 +66,6 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
return gva_n - offset;
 }
 
-/* Return the number of banks in the resulting vp_set */
-static inline int cpumask_to_vp_set(struct hv_flush_pcpu_ex *flush,
-   const struct cpumask *cpus)
-{
-   int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
-
-   /* valid_bank_mask can represent up to 64 banks */
-   if (hv_max_vp_index / 64 >= 64)
-   return 0;
-
-   /*
-* Clear all banks up to the maximum possible bank as hv_flush_pcpu_ex
-* structs are not cleared between calls, we risk flushing unneeded
-* vCPUs otherwise.
-*/
-   for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
-   flush->hv_vp_set.bank_contents[vcpu_bank] = 0;
-
-   /*
-* Some banks may end up being empty but this is acceptable.
-*/
-   for_each_cpu(cpu, cpus) {
-   vcpu = hv_cpu_number_to_vp_number(cpu);
-   vcpu_bank = vcpu / 64;
-   vcpu_offset = vcpu % 64;
-   __set_bit(vcpu_offset, (unsigned long *)
- &flush->hv_vp_set.bank_contents[vcpu_bank]);
-   if (vcpu_bank >= nr_bank)
-   nr_bank = vcpu_bank + 1;
-   }
-   flush->hv_vp_set.valid_bank_mask = GENMASK_ULL(nr_bank - 1, 0);
-
-   return nr_bank;
-}
-
 static void hyperv_flush_tlb_others(const struct cpumask *cpus,
const struct flush_tlb_info *info)
 {
@@ -240,7 +201,7 @@ static void hyperv_flush_tlb_others_ex(const struct cpumask 
*cpus,
 
if (!cpumask_equal(cpus, cpu_present_mask)) {
flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
-   nr_bank = cpumask_to_vp_set(flush, cpus);
+   nr_bank = cpumask_to_vpset(&(flush->hv_vp_set), cpus);
}
 
if (!nr_bank) {


[tip:x86/hyperv] X86/Hyper-V: Consolidate the allocation of the hypercall input page

2018-05-19 Thread tip-bot for K. Y. Srinivasan
Commit-ID:  9a2d78e291a7dea0ae4b4a06ce6bbbe4f1ab7c13
Gitweb: https://git.kernel.org/tip/9a2d78e291a7dea0ae4b4a06ce6bbbe4f1ab7c13
Author: K. Y. Srinivasan 
AuthorDate: Wed, 16 May 2018 14:53:34 -0700
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:23:18 +0200

X86/Hyper-V: Consolidate the allocation of the hypercall input page

Consolidate the allocation of the hypercall input page.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Michael Kelley 
Cc: o...@aepfle.de
Cc: sthem...@microsoft.com
Cc: gre...@linuxfoundation.org
Cc: jasow...@redhat.com
Cc: michael.h.kel...@microsoft.com
Cc: h...@zytor.com
Cc: a...@canonical.com
Cc: de...@linuxdriverproject.org
Cc: vkuzn...@redhat.com
Link: https://lkml.kernel.org/r/20180516215334.6547-5-...@linuxonhyperv.com

---
 arch/x86/hyperv/hv_init.c   |  2 --
 arch/x86/hyperv/mmu.c   | 30 ++
 arch/x86/include/asm/mshyperv.h |  1 -
 3 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6bc90d68ac8b..4c431e1c1eff 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -324,8 +324,6 @@ void __init hyperv_init(void)
hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
-   hyper_alloc_mmu();
-
hv_apic_init();
 
/*
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index c9cd28f0bae4..5f053d7d1bd9 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -32,9 +32,6 @@ struct hv_flush_pcpu_ex {
 /* Each gva in gva_list encodes up to 4096 pages to flush */
 #define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
 
-static struct hv_flush_pcpu __percpu **pcpu_flush;
-
-static struct hv_flush_pcpu_ex __percpu **pcpu_flush_ex;
 
 /*
  * Fills in gva_list starting from offset. Returns the number of items added.
@@ -77,7 +74,7 @@ static void hyperv_flush_tlb_others(const struct cpumask 
*cpus,
 
trace_hyperv_mmu_flush_tlb_others(cpus, info);
 
-   if (!pcpu_flush || !hv_hypercall_pg)
+   if (!hv_hypercall_pg)
goto do_native;
 
if (cpumask_empty(cpus))
@@ -85,10 +82,8 @@ static void hyperv_flush_tlb_others(const struct cpumask 
*cpus,
 
local_irq_save(flags);
 
-   flush_pcpu = this_cpu_ptr(pcpu_flush);
-
-   if (unlikely(!*flush_pcpu))
-   *flush_pcpu = page_address(alloc_page(GFP_ATOMIC));
+   flush_pcpu = (struct hv_flush_pcpu **)
+this_cpu_ptr(hyperv_pcpu_input_arg);
 
flush = *flush_pcpu;
 
@@ -164,7 +159,7 @@ static void hyperv_flush_tlb_others_ex(const struct cpumask 
*cpus,
 
trace_hyperv_mmu_flush_tlb_others(cpus, info);
 
-   if (!pcpu_flush_ex || !hv_hypercall_pg)
+   if (!hv_hypercall_pg)
goto do_native;
 
if (cpumask_empty(cpus))
@@ -172,10 +167,8 @@ static void hyperv_flush_tlb_others_ex(const struct 
cpumask *cpus,
 
local_irq_save(flags);
 
-   flush_pcpu = this_cpu_ptr(pcpu_flush_ex);
-
-   if (unlikely(!*flush_pcpu))
-   *flush_pcpu = page_address(alloc_page(GFP_ATOMIC));
+   flush_pcpu = (struct hv_flush_pcpu_ex **)
+this_cpu_ptr(hyperv_pcpu_input_arg);
 
flush = *flush_pcpu;
 
@@ -257,14 +250,3 @@ void hyperv_setup_mmu_ops(void)
pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others_ex;
}
 }
-
-void hyper_alloc_mmu(void)
-{
-   if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
-   return;
-
-   if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
-   pcpu_flush = alloc_percpu(struct hv_flush_pcpu *);
-   else
-   pcpu_flush_ex = alloc_percpu(struct hv_flush_pcpu_ex *);
-}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 0ee82519957b..9aaa493f5756 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -294,7 +294,6 @@ static inline int cpumask_to_vpset(struct hv_vpset *vpset,
 
 void __init hyperv_init(void);
 void hyperv_setup_mmu_ops(void);
-void hyper_alloc_mmu(void);
 void hyperv_report_panic(struct pt_regs *regs, long err);
 bool hv_is_hyperv_initialized(void);
 void hyperv_cleanup(void);


Re: [GIT PULL 00/18] perf/core improvements and fixes

2018-05-19 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 5aafae8d097e2161ee5c6a12ad532100f8885d2b:
> 
>   Merge tag 'perf-core-for-mingo-4.18-20180516' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2018-05-16 17:56:43 +0200)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.18-20180519
> 
> for you to fetch changes up to 19422a9f2a3be7f3a046285ffae4cbb571aa853a:
> 
>   perf tools: Fix kernel_start for PTI on x86 (2018-05-19 06:42:51 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> - Record min/max LBR cycles (>= skylake) and add 'perf annotate' TUI
>   hotkey to show it (c) (Jin Yao)
> 
> - Fix machine->kernel_start for PTI on x86 (Adrian Hunter)
> 
> - Make machine->env->arch always available, e.g. in 'perf top', not
>   just when reading that info from perf.data files (Adrian Hunter)
> 
> - Reduce the number of files read at 'perf' start, leaving information such as
>   cacheline size, tracefs mount point determination, max_stack, etc, to be
>   lazily read as tools needs then (Arnaldo Carvalho de Melo)
> 
> - Fixup BPF include and examples install messages (Arnaldo Carvalho de Melo)
> 
> - Fixup callchain addresses and symbol offsets in 'perf script', to help
>   correlating with objdump output (Sandipan Das)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Adrian Hunter (2):
>   perf machine: Add machine__is() to identify machine arch
>   perf tools: Fix kernel_start for PTI on x86
> 
> Arnaldo Carvalho de Melo (12):
>   perf config: Call perf_config__init() lazily
>   tools lib api: The tracing_mnt variable doesn't need to be global
>   tools lib api: Unexport 'tracing_path' variable
>   tools lib api fs tracing_path: Introduce get/put_events_file() helpers
>   perf tools: Reuse the path to the tracepoint /events/ directory
>   perf parse-events: Use get/put_events_file()
>   tools lib api fs tracing_path: Introduce opendir() method
>   tools lib api fs tracing_path: Make tracing_events_path private
>   tools include compiler-gcc: Add __pure attribute helper
>   perf tools: Read the cache line size lazily
>   perf tools: No need to unconditionally read the max_stack sysctls
>   perf bpf: Fixup include and examples install messages
> 
> Jin Yao (2):
>   perf annotate: Record the min/max cycles
>   perf annotate: Create hotkey 'c' to show min/max cycles
> 
> Sandipan Das (2):
>   perf script: Show virtual addresses instead of offsets
>   perf script: Show symbol offsets by default
> 
>  tools/include/linux/compiler-gcc.h |  3 +
>  tools/lib/api/fs/tracing_path.c| 40 +---
>  tools/lib/api/fs/tracing_path.h|  9 ++-
>  tools/perf/Makefile.perf   |  2 +
>  tools/perf/builtin-script.c| 26 
>  tools/perf/builtin-top.c   |  2 +-
>  tools/perf/builtin-trace.c |  2 +-
>  tools/perf/perf.c  | 24 +--
>  tools/perf/tests/parse-events.c|  9 +--
>  .../tests/shell/record+probe_libc_inet_pton.sh | 12 ++--
>  tools/perf/ui/browsers/annotate.c  |  8 +++
>  tools/perf/util/annotate.c | 51 ---
>  tools/perf/util/annotate.h | 11 +++-
>  tools/perf/util/config.c   | 16 ++---
>  tools/perf/util/config.h   |  1 -
>  tools/perf/util/env.c  | 18 ++
>  tools/perf/util/env.h  |  2 +
>  tools/perf/util/evsel.c|  2 +-
>  tools/perf/util/machine.c  | 18 +-
>  tools/perf/util/machine.h  |  2 +
>  tools/perf/util/parse-events.c | 73 
> +-
>  tools/perf/util/probe-file.c   |  3 +-
>  tools/perf/util/sort.c |  4 +-
>  tools/perf/util/sort.h |  4 +-
>  tools/perf/util/trace-event-info.c | 11 ++--
>  tools/perf/util/trace-event.c  |  8 ++-
>  tools/perf/util/util.c | 34 +-
>  tools/perf/util/util.h |  4 +-
>  28 files changed, 279 insertions(+), 120 deletions(-)

Pulled, thanks a lot Arnaldo!

Ingo


[tip:x86/boot] x86/boot/compressed/64: Fix trampoline page table address calculation

2018-05-19 Thread tip-bot for Kirill A. Shutemov
Commit-ID:  30bbf728ba91b1e8b0e539126cd105ad7e2fa16a
Gitweb: https://git.kernel.org/tip/30bbf728ba91b1e8b0e539126cd105ad7e2fa16a
Author: Kirill A. Shutemov 
AuthorDate: Fri, 18 May 2018 13:35:22 +0300
Committer:  Ingo Molnar 
CommitDate: Sat, 19 May 2018 11:56:57 +0200

x86/boot/compressed/64: Fix trampoline page table address calculation

Hugh noticied that we calculate the address of the trampoline page table
incorrectly in cleanup_trampoline().

TRAMPOLINE_32BIT_PGTABLE_OFFSET has to be divided by sizeof(unsigned long),
since trampoline_32bit is an 'unsigned long' pointer.

TRAMPOLINE_32BIT_PGTABLE_OFFSET is zero so the bug doesn't have a
visible effect.

Reported-by: Hugh Dickins 
Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table 
for trampoline")
Link: 
http://lkml.kernel.org/r/20180518103528.59260-2-kirill.shute...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/pgtable_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/pgtable_64.c 
b/arch/x86/boot/compressed/pgtable_64.c
index a362fa0b849c..23707e1da1ff 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -130,7 +130,7 @@ void cleanup_trampoline(void *pgtable)
 {
void *trampoline_pgtable;
 
-   trampoline_pgtable = trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET;
+   trampoline_pgtable = trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET 
/ sizeof(unsigned long);
 
/*
 * Move the top level page table out of trampoline memory,


[tip:x86/boot] x86/mm: Unify pgtable_l5_enabled usage in early boot code

2018-05-19 Thread tip-bot for Kirill A. Shutemov
Commit-ID:  ad3fe525b9507d8d750d60e8e5dd8e0c0836fb99
Gitweb: https://git.kernel.org/tip/ad3fe525b9507d8d750d60e8e5dd8e0c0836fb99
Author: Kirill A. Shutemov 
AuthorDate: Fri, 18 May 2018 13:35:23 +0300
Committer:  Ingo Molnar 
CommitDate: Sat, 19 May 2018 11:56:57 +0200

x86/mm: Unify pgtable_l5_enabled usage in early boot code

Usually pgtable_l5_enabled is defined using cpu_feature_enabled().
cpu_feature_enabled() is not available in early boot code. We use
several different preprocessor tricks to get around it. It's messy.

Unify them all.

If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can
be defined before all includes. It makes pgtable_l5_enabled rely on
__pgtable_l5_enabled variable instead. This approach fits all early
users.

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Thomas Gleixner 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20180518103528.59260-3-kirill.shute...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c|  4 ++--
 arch/x86/boot/compressed/misc.h |  6 ++
 arch/x86/include/asm/pgtable_64_types.h | 13 ++---
 arch/x86/kernel/head64.c| 12 +---
 arch/x86/mm/kasan_init_64.c |  6 ++
 5 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index a0a50b91ecef..b87a7582853d 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -47,7 +47,7 @@
 #include 
 
 #ifdef CONFIG_X86_5LEVEL
-unsigned int pgtable_l5_enabled __ro_after_init;
+unsigned int __pgtable_l5_enabled;
 unsigned int pgdir_shift __ro_after_init = 39;
 unsigned int ptrs_per_p4d __ro_after_init = 1;
 #endif
@@ -734,7 +734,7 @@ void choose_random_location(unsigned long input,
 
 #ifdef CONFIG_X86_5LEVEL
if (__read_cr4() & X86_CR4_LA57) {
-   pgtable_l5_enabled = 1;
+   __pgtable_l5_enabled = 1;
pgdir_shift = 48;
ptrs_per_p4d = 512;
}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 9e11be4cae19..a423bdb42686 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -12,10 +12,8 @@
 #undef CONFIG_PARAVIRT_SPINLOCKS
 #undef CONFIG_KASAN
 
-#ifdef CONFIG_X86_5LEVEL
-/* cpu_feature_enabled() cannot be used that early */
-#define pgtable_l5_enabled __pgtable_l5_enabled
-#endif
+/* cpu_feature_enabled() cannot be used this early */
+#define USE_EARLY_PGTABLE_L5
 
 #include 
 #include 
diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index adb47552e6bb..c14a4116a693 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -22,12 +22,19 @@ typedef struct { pteval_t pte; } pte_t;
 
 #ifdef CONFIG_X86_5LEVEL
 extern unsigned int __pgtable_l5_enabled;
-#ifndef pgtable_l5_enabled
+
+#ifdef USE_EARLY_PGTABLE_L5
+/*
+ * cpu_feature_enabled() is not available in early boot code.
+ * Use variable instead.
+ */
+#define pgtable_l5_enabled __pgtable_l5_enabled
+#else
 #define pgtable_l5_enabled cpu_feature_enabled(X86_FEATURE_LA57)
-#endif
+#endif /* USE_EARLY_PGTABLE_L5 */
 #else
 #define pgtable_l5_enabled 0
-#endif
+#endif /* CONFIG_X86_5LEVEL */
 
 extern unsigned int pgdir_shift;
 extern unsigned int ptrs_per_p4d;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2d29e47c056e..494fea1dbd6e 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -6,6 +6,10 @@
  */
 
 #define DISABLE_BRANCH_PROFILING
+
+/* cpu_feature_enabled() cannot be used this early */
+#define USE_EARLY_PGTABLE_L5
+
 #include 
 #include 
 #include 
@@ -32,11 +36,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_X86_5LEVEL
-#undef pgtable_l5_enabled
-#define pgtable_l5_enabled __pgtable_l5_enabled
-#endif
-
 /*
  * Manage page tables very early on.
  */
@@ -46,7 +45,6 @@ pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & 
~(_PAGE_GLOBAL | _PAGE_NX);
 
 #ifdef CONFIG_X86_5LEVEL
 unsigned int __pgtable_l5_enabled __ro_after_init;
-EXPORT_SYMBOL(__pgtable_l5_enabled);
 unsigned int pgdir_shift __ro_after_init = 39;
 EXPORT_SYMBOL(pgdir_shift);
 unsigned int ptrs_per_p4d __ro_after_init = 1;
@@ -88,7 +86,7 @@ static bool __head check_la57_support(unsigned long physaddr)
if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31
return false;
 
-   *fixup_int(&pgtable_l5_enabled, physaddr) = 1;
+   *fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
*fixup_int(&pgdir_shift, physaddr) = 48;
*fixup_int(&ptrs_per_p4d, physaddr) = 512;
*fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 980dbebd0ca7..340bb9b32e01 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -2,10 +2,8 @@
 #define 

[PATCH] cpufreq: Add Kryo CPU scaling driver

2018-05-19 Thread Ilia Lin
In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
the CPU frequency subset and voltage value of each OPP varies
based on the silicon variant in use. Qualcomm Process Voltage Scaling Tables
defines the voltage and frequency value based on the msm-id in SMEM
and speedbin blown in the efuse combination.
The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the SoC
to provide the OPP framework with required information.
This is used to determine the voltage and frequency value for each OPP of
operating-points-v2 table when it is parsed by the OPP framework.

Signed-off-by: Ilia Lin 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/Kconfig.arm  |  10 +++
 drivers/cpufreq/Makefile |   1 +
 drivers/cpufreq/cpufreq-dt-platdev.c |   3 +
 drivers/cpufreq/qcom-cpufreq-kryo.c  | 164 +++
 4 files changed, 178 insertions(+)
 create mode 100644 drivers/cpufreq/qcom-cpufreq-kryo.c

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index de55c7d..0bfd40e 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -124,6 +124,16 @@ config ARM_OMAP2PLUS_CPUFREQ
depends on ARCH_OMAP2PLUS
default ARCH_OMAP2PLUS
 
+config ARM_QCOM_CPUFREQ_KRYO
+   bool "Qualcomm Kryo based CPUFreq"
+   depends on QCOM_QFPROM
+   depends on QCOM_SMEM
+   select PM_OPP
+   help
+ This adds the CPUFreq driver for Qualcomm Kryo SoC based boards.
+
+ If in doubt, say N.
+
 config ARM_S3C_CPUFREQ
bool
help
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 8d24ade..fb4a2ec 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_MACH_MVEBU_V7)   += mvebu-cpufreq.o
 obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)+= omap-cpufreq.o
 obj-$(CONFIG_ARM_PXA2xx_CPUFREQ)   += pxa2xx-cpufreq.o
 obj-$(CONFIG_PXA3xx)   += pxa3xx-cpufreq.o
+obj-$(CONFIG_ARM_QCOM_CPUFREQ_KRYO)+= qcom-cpufreq-kryo.o
 obj-$(CONFIG_ARM_S3C2410_CPUFREQ)  += s3c2410-cpufreq.o
 obj-$(CONFIG_ARM_S3C2412_CPUFREQ)  += s3c2412-cpufreq.o
 obj-$(CONFIG_ARM_S3C2416_CPUFREQ)  += s3c2416-cpufreq.o
diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c 
b/drivers/cpufreq/cpufreq-dt-platdev.c
index 3b585e4..77d6ab8 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -118,6 +118,9 @@
 
{ .compatible = "nvidia,tegra124", },
 
+   { .compatible = "qcom,apq8096", },
+   { .compatible = "qcom,msm8996", },
+
{ .compatible = "st,stih407", },
{ .compatible = "st,stih410", },
 
diff --git a/drivers/cpufreq/qcom-cpufreq-kryo.c 
b/drivers/cpufreq/qcom-cpufreq-kryo.c
new file mode 100644
index 000..ae2d1b9
--- /dev/null
+++ b/drivers/cpufreq/qcom-cpufreq-kryo.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018, The Linux Foundation. All rights reserved.
+ */
+
+/*
+ * In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
+ * the CPU frequency subset and voltage value of each OPP varies
+ * based on the silicon variant in use. Qualcomm Process Voltage Scaling Tables
+ * defines the voltage and frequency value based on the msm-id in SMEM
+ * and speedbin blown in the efuse combination.
+ * The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the SoC
+ * to provide the OPP framework with required information.
+ * This is used to determine the voltage and frequency value for each OPP of
+ * operating-points-v2 table when it is parsed by the OPP framework.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MSM_ID_SMEM137
+#define SILVER_LEAD0
+#define GOLD_LEAD  2
+
+enum _msm_id {
+   MSM8996V3 = 0xF6ul,
+   APQ8096V3 = 0x123ul,
+   MSM8996SG = 0x131ul,
+   APQ8096SG = 0x138ul,
+};
+
+enum _msm8996_version {
+   MSM8996_V3,
+   MSM8996_SG,
+   NUM_OF_MSM8996_VERSIONS,
+};
+
+static enum _msm8996_version __init qcom_cpufreq_kryo_get_msm_id(void)
+{
+   size_t len;
+   u32 *msm_id;
+   enum _msm8996_version version;
+
+   msm_id = qcom_smem_get(QCOM_SMEM_HOST_ANY, MSM_ID_SMEM, &len);
+   /* The first 4 bytes are format, next to them is the actual msm-id */
+   msm_id++;
+
+   switch ((enum _msm_id)*msm_id) {
+   case MSM8996V3:
+   case APQ8096V3:
+   version = MSM8996_V3;
+   break;
+   case MSM8996SG:
+   case APQ8096SG:
+   version = MSM8996_SG;
+   break;
+   default:
+   version = NUM_OF_MSM8996_VERSIONS;
+   }
+
+   return version;
+}
+
+static int __init qcom_cpufreq_kryo_driver_init(void)
+{
+   struct device *cpu_dev_silver, *cpu_dev_gold;
+   struct opp_table *opp_silver, *opp_gold;
+   enum _msm8996_version msm8996_version;
+  

[tip:x86/boot] x86/mm: Stop pretending pgtable_l5_enabled is a variable

2018-05-19 Thread tip-bot for Kirill A. Shutemov
Commit-ID:  ed7588d5dc6f5e7202fb9bbeb14d94706ba225d7
Gitweb: https://git.kernel.org/tip/ed7588d5dc6f5e7202fb9bbeb14d94706ba225d7
Author: Kirill A. Shutemov 
AuthorDate: Fri, 18 May 2018 13:35:24 +0300
Committer:  Ingo Molnar 
CommitDate: Sat, 19 May 2018 11:56:57 +0200

x86/mm: Stop pretending pgtable_l5_enabled is a variable

pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer
to it as a variable. This is misleading.

Make pgtable_l5_enabled() a function.

We cannot literally define it as a function due to circular dependencies
between header files. Function-alike macros is close enough.

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Thomas Gleixner 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20180518103528.59260-4-kirill.shute...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/page_64_types.h|  2 +-
 arch/x86/include/asm/paravirt.h |  4 ++--
 arch/x86/include/asm/pgalloc.h  |  4 ++--
 arch/x86/include/asm/pgtable.h  | 10 +-
 arch/x86/include/asm/pgtable_32_types.h |  2 +-
 arch/x86/include/asm/pgtable_64.h   |  2 +-
 arch/x86/include/asm/pgtable_64_types.h | 14 +-
 arch/x86/include/asm/sparsemem.h|  4 ++--
 arch/x86/kernel/head64.c|  2 +-
 arch/x86/kernel/machine_kexec_64.c  |  3 ++-
 arch/x86/mm/dump_pagetables.c   |  6 +++---
 arch/x86/mm/fault.c |  4 ++--
 arch/x86/mm/ident_map.c |  2 +-
 arch/x86/mm/init_64.c   |  8 
 arch/x86/mm/kasan_init_64.c |  8 
 arch/x86/mm/kaslr.c |  8 
 arch/x86/mm/tlb.c   |  2 +-
 arch/x86/platform/efi/efi_64.c  |  2 +-
 arch/x86/power/hibernate_64.c   |  2 +-
 19 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 2c5a966dc222..6afac386a434 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -53,7 +53,7 @@
 #define __PHYSICAL_MASK_SHIFT  52
 
 #ifdef CONFIG_X86_5LEVEL
-#define __VIRTUAL_MASK_SHIFT   (pgtable_l5_enabled ? 56 : 47)
+#define __VIRTUAL_MASK_SHIFT   (pgtable_l5_enabled() ? 56 : 47)
 #else
 #define __VIRTUAL_MASK_SHIFT   47
 #endif
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 9be2bf13825b..d49bbf4bb5c8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -574,14 +574,14 @@ static inline void __set_pgd(pgd_t *pgdp, pgd_t pgd)
 }
 
 #define set_pgd(pgdp, pgdval) do { \
-   if (pgtable_l5_enabled) \
+   if (pgtable_l5_enabled())   
\
__set_pgd(pgdp, pgdval);\
else\
set_p4d((p4d_t *)(pgdp), (p4d_t) { (pgdval).pgd }); \
 } while (0)
 
 #define pgd_clear(pgdp) do {   \
-   if (pgtable_l5_enabled) \
+   if (pgtable_l5_enabled())   
\
set_pgd(pgdp, __pgd(0));\
 } while (0)
 
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index 263c142a6a6c..ada6410fd2ec 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -167,7 +167,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
 #if CONFIG_PGTABLE_LEVELS > 4
 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
 {
-   if (!pgtable_l5_enabled)
+   if (!pgtable_l5_enabled())
return;
paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT);
set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
@@ -193,7 +193,7 @@ extern void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t 
*p4d);
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
  unsigned long address)
 {
-   if (pgtable_l5_enabled)
+   if (pgtable_l5_enabled())
___p4d_free_tlb(tlb, p4d);
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f1633de5a675..5715647fc4fe 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -65,7 +65,7 @@ extern pmdval_t early_pmd_flags;
 
 #ifndef __PAGETABLE_P4D_FOLDED
 #define set_pgd(pgdp, pgd) native_set_pgd(pgdp, pgd)
-#define pgd_clear(pgd) (pgtable_l5_enabled ? 
native_pgd_clear(pgd) : 0)
+#define pgd_clear(pgd) (pgtable_l5_enabled() ? 
native_pgd_clear(pgd) : 0)
 #endif
 
 #ifndef set_p4d
@@ -881,7 +881,7 @@ static inline unsigned long p4d_index(unsigned long address

[tip:x86/boot] x86/mm: Introduce the 'no5lvl' kernel parameter

2018-05-19 Thread tip-bot for Kirill A. Shutemov
Commit-ID:  372fddf709041743a93e381556f4c41aad1e28f8
Gitweb: https://git.kernel.org/tip/372fddf709041743a93e381556f4c41aad1e28f8
Author: Kirill A. Shutemov 
AuthorDate: Fri, 18 May 2018 13:35:25 +0300
Committer:  Ingo Molnar 
CommitDate: Sat, 19 May 2018 11:56:57 +0200

x86/mm: Introduce the 'no5lvl' kernel parameter

This kernel parameter allows to force kernel to use 4-level paging even
if hardware and kernel support 5-level paging.

The option may be useful to work around regressions related to 5-level
paging.

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Thomas Gleixner 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shute...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/x86/boot/compressed/cmdline.c  |  2 +-
 arch/x86/boot/compressed/head_64.S  |  1 +
 arch/x86/boot/compressed/pgtable_64.c   | 12 ++--
 arch/x86/kernel/cpu/common.c| 15 +++
 arch/x86/kernel/head64.c|  9 +
 6 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28ecdb6d..364a33c1534d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2600,6 +2600,9 @@
emulation library even if a 387 maths coprocessor
is present.
 
+   no5lvl  [X86-64] Disable 5-level paging mode. Forces
+   kernel to use 4-level paging instead.
+
no_console_suspend
[HW] Never suspend the console
Disable suspending of consoles during suspend and
diff --git a/arch/x86/boot/compressed/cmdline.c 
b/arch/x86/boot/compressed/cmdline.c
index 0cb325734cfb..af6cda0b7900 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "misc.h"
 
-#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE
+#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE || CONFIG_X86_5LEVEL
 
 static unsigned long fs;
 static inline void set_fs(unsigned long seg)
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 8169e8b7a4dc..64037895b085 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -365,6 +365,7 @@ ENTRY(startup_64)
 * this function call.
 */
pushq   %rsi
+   movq%rsi, %rdi  /* real mode address */
callpaging_prepare
popq%rsi
 
diff --git a/arch/x86/boot/compressed/pgtable_64.c 
b/arch/x86/boot/compressed/pgtable_64.c
index 23707e1da1ff..8c5107545251 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -31,16 +31,23 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
  */
 unsigned long *trampoline_32bit __section(.data);
 
-struct paging_config paging_prepare(void)
+extern struct boot_params *boot_params;
+int cmdline_find_option_bool(const char *option);
+
+struct paging_config paging_prepare(void *rmode)
 {
struct paging_config paging_config = {};
unsigned long bios_start, ebda_start;
 
+   /* Initialize boot_params. Required for cmdline_find_option_bool(). */
+   boot_params = rmode;
+
/*
 * Check if LA57 is desired and supported.
 *
-* There are two parts to the check:
+* There are several parts to the check:
 *   - if the kernel supports 5-level paging: CONFIG_X86_5LEVEL=y
+*   - if user asked to disable 5-level paging: no5lvl in cmdline
 *   - if the machine supports 5-level paging:
 * + CPUID leaf 7 is supported
 * + the leaf has the feature bit set
@@ -48,6 +55,7 @@ struct paging_config paging_prepare(void)
 * That's substitute for boot_cpu_has() in early boot code.
 */
if (IS_ENABLED(CONFIG_X86_5LEVEL) &&
+   !cmdline_find_option_bool("no5lvl") &&
native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31 
{
paging_config.l5_required = 1;
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 39ed2e6ff8a0..27f68d14c962 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1028,6 +1028,21 @@ static void __init early_identify_cpu(struct cpuinfo_x86 
*c)
 */
setup_clear_cpu_cap(X86_FEATURE_PCID);
 #endif
+
+   /*
+* Later in the boot process pgtable_l5_enabled() relies on
+* cpu_feature_enabled(X86_FEATURE_LA57). If 5-level paging is not
+* enabled by this point we need to clear the feature bit to avoid
+* 

[tip:x86/boot] x86/mm: Mark __pgtable_l5_enabled __initdata

2018-05-19 Thread tip-bot for Kirill A. Shutemov
Commit-ID:  e4e961e36f063484c48bed919013c106d178995d
Gitweb: https://git.kernel.org/tip/e4e961e36f063484c48bed919013c106d178995d
Author: Kirill A. Shutemov 
AuthorDate: Fri, 18 May 2018 13:35:28 +0300
Committer:  Ingo Molnar 
CommitDate: Sat, 19 May 2018 11:56:58 +0200

x86/mm: Mark __pgtable_l5_enabled __initdata

__pgtable_l5_enabled shouldn't be needed after system has booted.
All preparation is done. We can now mark it as __initdata.

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Thomas Gleixner 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20180518103528.59260-8-kirill.shute...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/head64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8047379e575a..a21d6ace648e 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -44,7 +44,7 @@ static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
 #ifdef CONFIG_X86_5LEVEL
-unsigned int __pgtable_l5_enabled __ro_after_init;
+unsigned int __pgtable_l5_enabled __initdata;
 unsigned int pgdir_shift __ro_after_init = 39;
 EXPORT_SYMBOL(pgdir_shift);
 unsigned int ptrs_per_p4d __ro_after_init = 1;


[tip:x86/boot] x86/mm: Mark p4d_offset() __always_inline

2018-05-19 Thread tip-bot for Kirill A. Shutemov
Commit-ID:  1ea66554d3b09ce09c42e6a871899c84a276bb39
Gitweb: https://git.kernel.org/tip/1ea66554d3b09ce09c42e6a871899c84a276bb39
Author: Kirill A. Shutemov 
AuthorDate: Fri, 18 May 2018 13:35:27 +0300
Committer:  Ingo Molnar 
CommitDate: Sat, 19 May 2018 11:56:57 +0200

x86/mm: Mark p4d_offset() __always_inline

__pgtable_l5_enabled shouldn't be needed after system has booted, we can
mark it as __initdata, but it requires preparation.

KASAN initialization code is a user of USE_EARLY_PGTABLE_L5, so all
pgtable_l5_enabled() translated to __pgtable_l5_enabled there, including
the one in p4d_offset().

It may lead to section mismatch, if a compiler would not inline
p4d_offset(), but leave it as a standalone function: p4d_offset() is not
marked as __init.

Marking p4d_offset() as __always_inline fixes the issue.

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Thomas Gleixner 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20180518103528.59260-7-kirill.shute...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/pgtable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5715647fc4fe..99ecde23c3ec 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -898,7 +898,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
 #define pgd_page(pgd)  pfn_to_page(pgd_pfn(pgd))
 
 /* to find an entry in a page-table-directory. */
-static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
+static __always_inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
 {
if (!pgtable_l5_enabled())
return (p4d_t *)pgd;


Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-19 Thread Vasily Gorbik
On Fri, May 18, 2018 at 09:12:56PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2018 at 8:43 PM Al Viro  wrote:
> 
> > Not quite.  The things like
> >  if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> >  return 0;
> >  iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> > protect most of the regular files (see mm/filemap.c).  And for devices
> (which is
> > where the majority of crap ->read()/->write() is) it's obviously not
> applicable -
> > ->s_maxbytes of *what*?
> 
> Yeah that "s_maxbytes of what" is I think the real issue. We should never
> have made s_maxbytes be super-block specific: we should have made it be
> per-inode, and then have inode_init_always() initialize it using something
> like the file_mmap_size_max() logic.
> 
> (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
> would only be used as a "fill in inode value for regular files for this
> superblock").
> 
> Then we could actually protect read/write properly, because many of the
> nasty bugs have been in character device drivers.
> 
> Oh well. It would still be a good thing to do some day, I suspect, but it's
> clearly not the case now, and so s_maxbytes actually has much less coverage
> than I was hoping for.
> 
> (And thus also the problems with /proc/vmcore - it never saw s_maxbytes
> limits before).
> 
> Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously
> means that my objections against Vasily's patch are mostly invalid. Even if
> /proc does use "generic_file_llseek()" a lot and that should limit things
> to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up
> the offset.
> 
> I'd still prefer to limit the damage to just "vmcore".
> 
> Something like the below COMPLETELY UNTESTED patch? Vasily?

Would work, but file_mmap_size_max first checks
if (S_ISREG(inode->i_mode))
return inode->i_sb->s_maxbytes;
before
if (file->f_mode & FMODE_UNSIGNED_OFFSET)
return 0;
so, as it is this patch does not fix the issue.

>  Linus
>
>  fs/proc/vmcore.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index a45f0af22a60..83278c547127 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct 
> vm_area_struct *vma)
>  }
>  #endif
>  
> +/* Mark vmcore as being able and willing to do 64-bit mmaps */
> +static int vmcore_open(struct inode *inode, struct file *file)
> +{
> + file->f_mode |= FMODE_UNSIGNED_OFFSET;
> + return 0;
> +}
> +
>  static const struct file_operations proc_vmcore_operations = {
> + .open   = vmcore_open,
>   .read   = read_vmcore,
>   .llseek = default_llseek,
>   .mmap   = mmap_vmcore,



[tip:perf/core] perf config: Call perf_config__init() lazily

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  d01bd1ac920e98e2a64f6bb5adf907180e0aaac7
Gitweb: https://git.kernel.org/tip/d01bd1ac920e98e2a64f6bb5adf907180e0aaac7
Author: Arnaldo Carvalho de Melo 
AuthorDate: Wed, 16 May 2018 16:09:08 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 16 May 2018 16:11:09 -0300

perf config: Call perf_config__init() lazily

We check what perf_config__init() does at each perf_config() call,
namely if the static perf_config instance was created, so instead of
bailing out in that case, try to allocate it, bailing if it fails.

Now to get the perf_config() call out of the start of perf's main()
function, doing it also lazily.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Taeung Song 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-4bo45k6ivsmbxpfpdte4o...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/perf.c|  1 -
 tools/perf/util/config.c | 16 +---
 tools/perf/util/config.h |  1 -
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 20a08cb32332..cd6ea55d4b0c 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -458,7 +458,6 @@ int main(int argc, const char **argv)
 
srandom(time(NULL));
 
-   perf_config__init();
err = perf_config(perf_default_config, NULL);
if (err)
return err;
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 84eb9393c7db..5ac157056cdf 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -707,6 +707,14 @@ struct perf_config_set *perf_config_set__new(void)
return set;
 }
 
+static int perf_config__init(void)
+{
+   if (config_set == NULL)
+   config_set = perf_config_set__new();
+
+   return config_set == NULL;
+}
+
 int perf_config(config_fn_t fn, void *data)
 {
int ret = 0;
@@ -714,7 +722,7 @@ int perf_config(config_fn_t fn, void *data)
struct perf_config_section *section;
struct perf_config_item *item;
 
-   if (config_set == NULL)
+   if (config_set == NULL && perf_config__init())
return -1;
 
perf_config_set__for_each_entry(config_set, section, item) {
@@ -735,12 +743,6 @@ int perf_config(config_fn_t fn, void *data)
return ret;
 }
 
-void perf_config__init(void)
-{
-   if (config_set == NULL)
-   config_set = perf_config_set__new();
-}
-
 void perf_config__exit(void)
 {
perf_config_set__delete(config_set);
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index baf82bf227ac..bd0a5897c76a 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -38,7 +38,6 @@ struct perf_config_set *perf_config_set__new(void);
 void perf_config_set__delete(struct perf_config_set *set);
 int perf_config_set__collect(struct perf_config_set *set, const char 
*file_name,
 const char *var, const char *value);
-void perf_config__init(void);
 void perf_config__exit(void);
 void perf_config__refresh(void);
 


[tip:perf/core] tools lib api: The tracing_mnt variable doesn't need to be global

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  00a6270361c025bfaae1d70ef1b596d182e05e8a
Gitweb: https://git.kernel.org/tip/00a6270361c025bfaae1d70ef1b596d182e05e8a
Author: Arnaldo Carvalho de Melo 
AuthorDate: Wed, 16 May 2018 16:20:12 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 16 May 2018 16:20:12 -0300

tools lib api: The tracing_mnt variable doesn't need to be global

Its only used in the file it is defined, so just make it static.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-p5x29u6mq2ml3mtnbg984...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 7b7fd0b18551..4f8ec7d476b8 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -13,8 +13,7 @@
 
 #include "tracing_path.h"
 
-
-char tracing_mnt[PATH_MAX] = "/sys/kernel/debug";
+static char tracing_mnt[PATH_MAX]  = "/sys/kernel/debug";
 char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
 char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
 
@@ -129,7 +128,7 @@ int tracing_path__strerror_open_tp(int err, char *buf, 
size_t size,
snprintf(buf, size,
 "Error:\tNo permissions to read %s/%s\n"
 "Hint:\tTry 'sudo mount -o remount,mode=755 %s'\n",
-tracing_events_path, filename, tracing_mnt);
+tracing_events_path, filename, tracing_path_mount());
}
break;
default:


[tip:perf/core] tools lib api fs tracing_path: Introduce get/put_events_file() helpers

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  40c3c0c9ac2befd28df3844dc9efdbadee3af5c0
Gitweb: https://git.kernel.org/tip/40c3c0c9ac2befd28df3844dc9efdbadee3af5c0
Author: Arnaldo Carvalho de Melo 
AuthorDate: Wed, 16 May 2018 16:42:26 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 12:01:50 -0300

tools lib api fs tracing_path: Introduce get/put_events_file() helpers

To make reading events files a tad more compact than with
get_tracing_files("events/foo").

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-do6xgtwpmfl8zjs1euxsd...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c| 15 +++
 tools/lib/api/fs/tracing_path.h|  5 +
 tools/perf/util/trace-event-info.c | 11 +--
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 6f5fe942eff4..9cd282425929 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -86,6 +86,21 @@ void put_tracing_file(char *file)
free(file);
 }
 
+char *get_events_file(const char *name)
+{
+   char *file;
+
+   if (asprintf(&file, "%s/events/%s", tracing_path_mount(), name) < 0)
+   return NULL;
+
+   return file;
+}
+
+void put_events_file(char *file)
+{
+   free(file);
+}
+
 int tracing_path__strerror_open_tp(int err, char *buf, size_t size,
   const char *sys, const char *name)
 {
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 1b65decedfc0..3b32fb439f12 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -12,5 +12,10 @@ const char *tracing_path_mount(void);
 char *get_tracing_file(const char *name);
 void put_tracing_file(char *file);
 
+char *get_events_file(const char *name);
+void put_events_file(char *file);
+
+#define zput_events_file(ptr) ({ free(*ptr); *ptr = NULL; })
+
 int tracing_path__strerror_open_tp(int err, char *buf, size_t size, const char 
*sys, const char *name);
 #endif /* __API_FS_TRACING_PATH_H */
diff --git a/tools/perf/util/trace-event-info.c 
b/tools/perf/util/trace-event-info.c
index d7f2113462fb..c85d0d1a65ed 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -103,11 +103,10 @@ out:
 
 static int record_header_files(void)
 {
-   char *path;
+   char *path = get_events_file("header_page");
struct stat st;
int err = -EIO;
 
-   path = get_tracing_file("events/header_page");
if (!path) {
pr_debug("can't get tracing/events/header_page");
return -ENOMEM;
@@ -128,9 +127,9 @@ static int record_header_files(void)
goto out;
}
 
-   put_tracing_file(path);
+   put_events_file(path);
 
-   path = get_tracing_file("events/header_event");
+   path = get_events_file("header_event");
if (!path) {
pr_debug("can't get tracing/events/header_event");
err = -ENOMEM;
@@ -154,7 +153,7 @@ static int record_header_files(void)
 
err = 0;
 out:
-   put_tracing_file(path);
+   put_events_file(path);
return err;
 }
 
@@ -243,7 +242,7 @@ static int record_ftrace_files(struct tracepoint_path *tps)
char *path;
int ret;
 
-   path = get_tracing_file("events/ftrace");
+   path = get_events_file("ftrace");
if (!path) {
pr_debug("can't get tracing/events/ftrace");
return -ENOMEM;


[tip:perf/core] tools lib api: Unexport 'tracing_path' variable

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  17c257e867be1880eaf7c1b9dac286086d75d1ec
Gitweb: https://git.kernel.org/tip/17c257e867be1880eaf7c1b9dac286086d75d1ec
Author: Arnaldo Carvalho de Melo 
AuthorDate: Wed, 16 May 2018 16:27:14 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 16 May 2018 16:27:14 -0300

tools lib api: Unexport 'tracing_path' variable

One should use tracing_path_mount() instead, so more things get done
lazily instead of at every 'perf' tool call startup.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-fci4yll35idd9yuslp67v...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 4 ++--
 tools/lib/api/fs/tracing_path.h | 1 -
 tools/perf/perf.c   | 5 +
 tools/perf/util/probe-file.c| 3 +--
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 4f8ec7d476b8..6f5fe942eff4 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -14,7 +14,7 @@
 #include "tracing_path.h"
 
 static char tracing_mnt[PATH_MAX]  = "/sys/kernel/debug";
-char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
+static char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
 char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
 
 
@@ -75,7 +75,7 @@ char *get_tracing_file(const char *name)
 {
char *file;
 
-   if (asprintf(&file, "%s/%s", tracing_path, name) < 0)
+   if (asprintf(&file, "%s/%s", tracing_path_mount(), name) < 0)
return NULL;
 
return file;
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 0066f06cc381..1b65decedfc0 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -4,7 +4,6 @@
 
 #include 
 
-extern char tracing_path[];
 extern char tracing_events_path[];
 
 void tracing_path_set(const char *mountpoint);
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index cd6ea55d4b0c..d5a0878de816 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -238,7 +238,7 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
(*argc)--;
} else if (strstarts(cmd, CMD_DEBUGFS_DIR)) {
tracing_path_set(cmd + strlen(CMD_DEBUGFS_DIR));
-   fprintf(stderr, "dir: %s\n", tracing_path);
+   fprintf(stderr, "dir: %s\n", tracing_path_mount());
if (envchanged)
*envchanged = 1;
} else if (!strcmp(cmd, "--list-cmds")) {
@@ -463,9 +463,6 @@ int main(int argc, const char **argv)
return err;
set_buildid_dir(NULL);
 
-   /* get debugfs/tracefs mount point from /proc/mounts */
-   tracing_path_mount();
-
/*
 * "perf-" is the same as "perf ", but we obviously:
 *
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4ae1123c6794..b76088fadf3d 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -84,8 +84,7 @@ int open_trace_file(const char *trace_file, bool readwrite)
char buf[PATH_MAX];
int ret;
 
-   ret = e_snprintf(buf, PATH_MAX, "%s/%s",
-tracing_path, trace_file);
+   ret = e_snprintf(buf, PATH_MAX, "%s/%s", tracing_path_mount(), 
trace_file);
if (ret >= 0) {
pr_debug("Opening %s write=%d\n", buf, readwrite);
if (readwrite && !probe_event_dry_run)


RE: [PATCH v8 10/15] cpufreq: Add Kryo CPU scaling driver

2018-05-19 Thread ilialin
>From c5804e1d17578a63ca87cc8fd839bf756cfe3567 Mon Sep 17 00:00:00 2001
In-Reply-To: <1526555955-29960-11-git-send-email-ilia...@codeaurora.org>
References: <1526555955-29960-11-git-send-email-ilia...@codeaurora.org>
From: Ilia Lin 
Date: Thu, 17 May 2018 13:55:12 +0300
Subject: [PATCH] cpufreq: Add Kryo CPU scaling driver

In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
the CPU frequency subset and voltage value of each OPP varies
based on the silicon variant in use. Qualcomm Process Voltage Scaling Tables
defines the voltage and frequency value based on the msm-id in SMEM
and speedbin blown in the efuse combination.
The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the SoC
to provide the OPP framework with required information.
This is used to determine the voltage and frequency value for each OPP of
operating-points-v2 table when it is parsed by the OPP framework.

Signed-off-by: Ilia Lin 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/Kconfig.arm  |  10 +++
 drivers/cpufreq/Makefile |   1 +
 drivers/cpufreq/cpufreq-dt-platdev.c |   3 +
 drivers/cpufreq/qcom-cpufreq-kryo.c  | 164
+++
 4 files changed, 178 insertions(+)
 create mode 100644 drivers/cpufreq/qcom-cpufreq-kryo.c

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index de55c7d..0bfd40e 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -124,6 +124,16 @@ config ARM_OMAP2PLUS_CPUFREQ
depends on ARCH_OMAP2PLUS
default ARCH_OMAP2PLUS

+config ARM_QCOM_CPUFREQ_KRYO
+   bool "Qualcomm Kryo based CPUFreq"
+   depends on QCOM_QFPROM
+   depends on QCOM_SMEM
+   select PM_OPP
+   help
+ This adds the CPUFreq driver for Qualcomm Kryo SoC based boards.
+
+ If in doubt, say N.
+
 config ARM_S3C_CPUFREQ
bool
help
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 8d24ade..fb4a2ec 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_MACH_MVEBU_V7)   += mvebu-cpufreq.o
 obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)+= omap-cpufreq.o
 obj-$(CONFIG_ARM_PXA2xx_CPUFREQ)   += pxa2xx-cpufreq.o
 obj-$(CONFIG_PXA3xx)   += pxa3xx-cpufreq.o
+obj-$(CONFIG_ARM_QCOM_CPUFREQ_KRYO)+= qcom-cpufreq-kryo.o
 obj-$(CONFIG_ARM_S3C2410_CPUFREQ)  += s3c2410-cpufreq.o
 obj-$(CONFIG_ARM_S3C2412_CPUFREQ)  += s3c2412-cpufreq.o
 obj-$(CONFIG_ARM_S3C2416_CPUFREQ)  += s3c2416-cpufreq.o
diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c
b/drivers/cpufreq/cpufreq-dt-platdev.c
index 3b585e4..77d6ab8 100644
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -118,6 +118,9 @@

{ .compatible = "nvidia,tegra124", },

+   { .compatible = "qcom,apq8096", },
+   { .compatible = "qcom,msm8996", },
+
{ .compatible = "st,stih407", },
{ .compatible = "st,stih410", },

diff --git a/drivers/cpufreq/qcom-cpufreq-kryo.c
b/drivers/cpufreq/qcom-cpufreq-kryo.c
new file mode 100644
index 000..ae2d1b9
--- /dev/null
+++ b/drivers/cpufreq/qcom-cpufreq-kryo.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018, The Linux Foundation. All rights reserved.
+ */
+
+/*
+ * In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
+ * the CPU frequency subset and voltage value of each OPP varies
+ * based on the silicon variant in use. Qualcomm Process Voltage Scaling
Tables
+ * defines the voltage and frequency value based on the msm-id in SMEM
+ * and speedbin blown in the efuse combination.
+ * The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the
SoC
+ * to provide the OPP framework with required information.
+ * This is used to determine the voltage and frequency value for each OPP
of
+ * operating-points-v2 table when it is parsed by the OPP framework.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MSM_ID_SMEM137
+#define SILVER_LEAD0
+#define GOLD_LEAD  2
+
+enum _msm_id {
+   MSM8996V3 = 0xF6ul,
+   APQ8096V3 = 0x123ul,
+   MSM8996SG = 0x131ul,
+   APQ8096SG = 0x138ul,
+};
+
+enum _msm8996_version {
+   MSM8996_V3,
+   MSM8996_SG,
+   NUM_OF_MSM8996_VERSIONS,
+};
+
+static enum _msm8996_version __init qcom_cpufreq_kryo_get_msm_id(void)
+{
+   size_t len;
+   u32 *msm_id;
+   enum _msm8996_version version;
+
+   msm_id = qcom_smem_get(QCOM_SMEM_HOST_ANY, MSM_ID_SMEM, &len);
+   /* The first 4 bytes are format, next to them is the actual msm-id
*/
+   msm_id++;
+
+   switch ((enum _msm_id)*msm_id) {
+   case MSM8996V3:
+   case APQ8096V3:
+   version = MSM8996_V3;
+   break;
+   case MSM8996SG:
+   case APQ8096SG:
+   version = MSM8996_SG;
+   break;
+   

[tip:perf/core] perf tools: Reuse the path to the tracepoint /events/ directory

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  c02cab228e44aacf161642b63779971f8e39993b
Gitweb: https://git.kernel.org/tip/c02cab228e44aacf161642b63779971f8e39993b
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 14:22:37 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 14:25:07 -0300

perf tools: Reuse the path to the tracepoint /events/ directory

When using for_each_event() we needlessly rebuild the whole path to
the tracepoint directory, reuse the dir_path instead, saving some cycles
and reducing the size of the next patch.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-54bcs15n0cp6gwcgpc4hp...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/parse-events.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 2fc4ee8b86c1..f9d5bbd63484 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -156,13 +156,12 @@ struct event_symbol event_symbols_sw[PERF_COUNT_SW_MAX] = 
{
(strcmp(sys_dirent->d_name, ".")) &&\
(strcmp(sys_dirent->d_name, "..")))
 
-static int tp_event_has_id(struct dirent *sys_dir, struct dirent *evt_dir)
+static int tp_event_has_id(const char *dir_path, struct dirent *evt_dir)
 {
char evt_path[MAXPATHLEN];
int fd;
 
-   snprintf(evt_path, MAXPATHLEN, "%s/%s/%s/id", tracing_events_path,
-   sys_dir->d_name, evt_dir->d_name);
+   snprintf(evt_path, MAXPATHLEN, "%s/%s/id", dir_path, evt_dir->d_name);
fd = open(evt_path, O_RDONLY);
if (fd < 0)
return -EINVAL;
@@ -171,12 +170,12 @@ static int tp_event_has_id(struct dirent *sys_dir, struct 
dirent *evt_dir)
return 0;
 }
 
-#define for_each_event(sys_dirent, evt_dir, evt_dirent)\
+#define for_each_event(dir_path, evt_dir, evt_dirent)  \
while ((evt_dirent = readdir(evt_dir)) != NULL) \
if (evt_dirent->d_type == DT_DIR && \
(strcmp(evt_dirent->d_name, ".")) &&\
(strcmp(evt_dirent->d_name, "..")) &&   \
-   (!tp_event_has_id(sys_dirent, evt_dirent)))
+   (!tp_event_has_id(dir_path, evt_dirent)))
 
 #define MAX_EVENT_LENGTH 512
 
@@ -204,7 +203,7 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
if (!evt_dir)
continue;
 
-   for_each_event(sys_dirent, evt_dir, evt_dirent) {
+   for_each_event(dir_path, evt_dir, evt_dirent) {
 
scnprintf(evt_path, MAXPATHLEN, "%s/%s/id", dir_path,
  evt_dirent->d_name);
@@ -2119,7 +2118,7 @@ restart:
if (!evt_dir)
continue;
 
-   for_each_event(sys_dirent, evt_dir, evt_dirent) {
+   for_each_event(dir_path, evt_dir, evt_dirent) {
if (event_glob != NULL &&
!strglobmatch(evt_dirent->d_name, event_glob))
continue;
@@ -2199,7 +2198,7 @@ int is_valid_tracepoint(const char *event_string)
if (!evt_dir)
continue;
 
-   for_each_event(sys_dirent, evt_dir, evt_dirent) {
+   for_each_event(dir_path, evt_dir, evt_dirent) {
snprintf(evt_path, MAXPATHLEN, "%s:%s",
 sys_dirent->d_name, evt_dirent->d_name);
if (!strcmp(evt_path, event_string)) {


[tip:perf/core] perf parse-events: Use get/put_events_file()

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  25a7d914274de38637c5199342eb90a297361386
Gitweb: https://git.kernel.org/tip/25a7d914274de38637c5199342eb90a297361386
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 14:27:29 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 14:49:36 -0300

perf parse-events: Use get/put_events_file()

Instead of accessing the trace_events_path variable directly, that may
not have been properly initialized wrt detecting where tracefs is
mounted.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-id7hzn1ydgkxbumeve5wa...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/parse-events.c |  7 +++---
 tools/perf/util/parse-events.c  | 50 +++--
 tools/perf/util/trace-event.c   |  8 +--
 3 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 6829dd416a99..6d57d7082637 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1328,7 +1328,7 @@ static int count_tracepoints(void)
TEST_ASSERT_VAL("Can't open events dir", events_dir);
 
while ((events_ent = readdir(events_dir))) {
-   char sys_path[PATH_MAX];
+   char *sys_path;
struct dirent *sys_ent;
DIR *sys_dir;
 
@@ -1339,8 +1339,8 @@ static int count_tracepoints(void)
|| !strcmp(events_ent->d_name, "header_page"))
continue;
 
-   scnprintf(sys_path, PATH_MAX, "%s/%s",
- tracing_events_path, events_ent->d_name);
+   sys_path = get_events_file(events_ent->d_name);
+   TEST_ASSERT_VAL("Can't get sys path", sys_path);
 
sys_dir = opendir(sys_path);
TEST_ASSERT_VAL("Can't open sys dir", sys_dir);
@@ -1356,6 +1356,7 @@ static int count_tracepoints(void)
}
 
closedir(sys_dir);
+   put_events_file(sys_path);
}
 
closedir(events_dir);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index f9d5bbd63484..24668300b327 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -189,19 +189,19 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
int fd;
u64 id;
char evt_path[MAXPATHLEN];
-   char dir_path[MAXPATHLEN];
+   char *dir_path;
 
sys_dir = opendir(tracing_events_path);
if (!sys_dir)
return NULL;
 
for_each_subsystem(sys_dir, sys_dirent) {
-
-   snprintf(dir_path, MAXPATHLEN, "%s/%s", tracing_events_path,
-sys_dirent->d_name);
+   dir_path = get_events_file(sys_dirent->d_name);
+   if (!dir_path)
+   continue;
evt_dir = opendir(dir_path);
if (!evt_dir)
-   continue;
+   goto next;
 
for_each_event(dir_path, evt_dir, evt_dirent) {
 
@@ -217,6 +217,7 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
close(fd);
id = atoll(id_buf);
if (id == config) {
+   put_events_file(dir_path);
closedir(evt_dir);
closedir(sys_dir);
path = zalloc(sizeof(*path));
@@ -241,6 +242,8 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
}
}
closedir(evt_dir);
+next:
+   put_events_file(dir_path);
}
 
closedir(sys_dir);
@@ -511,14 +514,19 @@ static int add_tracepoint_multi_event(struct list_head 
*list, int *idx,
  struct parse_events_error *err,
  struct list_head *head_config)
 {
-   char evt_path[MAXPATHLEN];
+   char *evt_path;
struct dirent *evt_ent;
DIR *evt_dir;
int ret = 0, found = 0;
 
-   snprintf(evt_path, MAXPATHLEN, "%s/%s", tracing_events_path, sys_name);
+   evt_path = get_events_file(sys_name);
+   if (!evt_path) {
+   tracepoint_error(err, errno, sys_name, evt_name);
+   return -1;
+   }
evt_dir = opendir(evt_path);
if (!evt_dir) {
+   put_events_file(evt_path);
tracepoint_error(err, errno, sys_name, evt_name);
return -1;
}
@@ -544,6 +552,7 @@ static int add_tracepoint_multi_event(struct list_head 
*list, int *idx,
ret = -1;
}
 
+   put_events_file(evt_path);
closedir(evt_dir);
return ret;
 }
@@ -2091,7 +2100,7 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob

[tip:perf/core] tools lib api fs tracing_path: Introduce opendir() method

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  7014e0e3bf9b0d0b6221eb7d2f8a1f690423dd73
Gitweb: https://git.kernel.org/tip/7014e0e3bf9b0d0b6221eb7d2f8a1f690423dd73
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 14:42:39 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 14:50:38 -0300

tools lib api fs tracing_path: Introduce opendir() method

That takes care of using the right call to get the tracing_path
directory, the one that will end up calling tracing_path_set() to figure
out where tracefs is mounted.

One more step in doing just lazy reading of system structures to reduce
the number of operations done unconditionaly at 'perf' start.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-42zzi0f274909bg9mxzl8...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 13 +
 tools/lib/api/fs/tracing_path.h |  3 +++
 tools/perf/tests/parse-events.c |  2 +-
 tools/perf/util/parse-events.c  |  8 
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 9cd282425929..9b451af0721c 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -101,6 +101,19 @@ void put_events_file(char *file)
free(file);
 }
 
+DIR *tracing_events__opendir(void)
+{
+   DIR *dir = NULL;
+   char *path = get_tracing_file("events");
+
+   if (path) {
+   dir = opendir(path);
+   put_events_file(path);
+   }
+
+   return dir;
+}
+
 int tracing_path__strerror_open_tp(int err, char *buf, size_t size,
   const char *sys, const char *name)
 {
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 3b32fb439f12..904d085b2ae7 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -3,9 +3,12 @@
 #define __API_FS_TRACING_PATH_H
 
 #include 
+#include 
 
 extern char tracing_events_path[];
 
+DIR *tracing_events__opendir(void);
+
 void tracing_path_set(const char *mountpoint);
 const char *tracing_path_mount(void);
 
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 6d57d7082637..b9ebe15afb13 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1323,7 +1323,7 @@ static int count_tracepoints(void)
DIR *events_dir;
int cnt = 0;
 
-   events_dir = opendir(tracing_events_path);
+   events_dir = tracing_events__opendir();
 
TEST_ASSERT_VAL("Can't open events dir", events_dir);
 
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 24668300b327..15eec49e71a1 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -191,7 +191,7 @@ struct tracepoint_path *tracepoint_id_to_path(u64 config)
char evt_path[MAXPATHLEN];
char *dir_path;
 
-   sys_dir = opendir(tracing_events_path);
+   sys_dir = tracing_events__opendir();
if (!sys_dir)
return NULL;
 
@@ -578,7 +578,7 @@ static int add_tracepoint_multi_sys(struct list_head *list, 
int *idx,
DIR *events_dir;
int ret = 0;
 
-   events_dir = opendir(tracing_events_path);
+   events_dir = tracing_events__opendir();
if (!events_dir) {
tracepoint_error(err, errno, sys_name, evt_name);
return -1;
@@ -2106,7 +2106,7 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob,
bool evt_num_known = false;
 
 restart:
-   sys_dir = opendir(tracing_events_path);
+   sys_dir = tracing_events__opendir();
if (!sys_dir)
return;
 
@@ -2200,7 +2200,7 @@ int is_valid_tracepoint(const char *event_string)
char evt_path[MAXPATHLEN];
char *dir_path;
 
-   sys_dir = opendir(tracing_events_path);
+   sys_dir = tracing_events__opendir();
if (!sys_dir)
return 0;
 


[tip:perf/core] tools lib api fs tracing_path: Make tracing_events_path private

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  789e465058352122023e4fa7de8dcf5c513e0b0b
Gitweb: https://git.kernel.org/tip/789e465058352122023e4fa7de8dcf5c513e0b0b
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 14:51:23 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 14:51:23 -0300

tools lib api fs tracing_path: Make tracing_events_path private

Not anymore accessed outside this library, keep it private.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-wg1m07flfrg1rm06jjzie...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 3 +--
 tools/lib/api/fs/tracing_path.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 9b451af0721c..120037496f77 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -15,8 +15,7 @@
 
 static char tracing_mnt[PATH_MAX]  = "/sys/kernel/debug";
 static char tracing_path[PATH_MAX]= "/sys/kernel/debug/tracing";
-char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
-
+static char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
 
 static void __tracing_path_set(const char *tracing, const char *mountpoint)
 {
diff --git a/tools/lib/api/fs/tracing_path.h b/tools/lib/api/fs/tracing_path.h
index 904d085b2ae7..a19136b086dc 100644
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
@@ -5,8 +5,6 @@
 #include 
 #include 
 
-extern char tracing_events_path[];
-
 DIR *tracing_events__opendir(void);
 
 void tracing_path_set(const char *mountpoint);


[tip:perf/core] tools include compiler-gcc: Add __pure attribute helper

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  6e1690c4c0b540930f08295b6a95c8660b257745
Gitweb: https://git.kernel.org/tip/6e1690c4c0b540930f08295b6a95c8660b257745
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 15:12:36 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 15:17:21 -0300

tools include compiler-gcc: Add __pure attribute helper

Adopt it from the kernel sources, will be used soon.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-oubheiqj8edo5rzewt11c...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/include/linux/compiler-gcc.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/include/linux/compiler-gcc.h 
b/tools/include/linux/compiler-gcc.h
index a3a4427441bf..70fe61295733 100644
--- a/tools/include/linux/compiler-gcc.h
+++ b/tools/include/linux/compiler-gcc.h
@@ -21,6 +21,9 @@
 /* &a[0] degrades to a pointer: a different type from an array */
 #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
 
+#ifndef __pure
+#define  __pure__attribute__((pure))
+#endif
 #define  noinline  __attribute__((noinline))
 #ifndef __packed
 #define __packed   __attribute__((packed))


[tip:perf/core] perf tools: Read the cache line size lazily

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  9ac94e31ca8c6311ec9eb68aea513e39ad809013
Gitweb: https://git.kernel.org/tip/9ac94e31ca8c6311ec9eb68aea513e39ad809013
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 15:03:05 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 16:03:34 -0300

perf tools: Read the cache line size lazily

It is not read as commonly as 'page_size', so it makes sense to read it
lazily, caching its value when it is first read.

Less files open unconditionally at startup.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-35xhrq91u94uc1djtclek...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/perf.c  | 11 ---
 tools/perf/util/sort.c |  4 ++--
 tools/perf/util/sort.h |  4 ++--
 tools/perf/util/util.c | 21 -
 tools/perf/util/util.h |  2 +-
 5 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index d5a0878de816..cefd8f74630c 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -421,16 +421,6 @@ void pthread__unblock_sigwinch(void)
pthread_sigmask(SIG_UNBLOCK, &set, NULL);
 }
 
-#ifdef _SC_LEVEL1_DCACHE_LINESIZE
-#define cache_line_size(cacheline_sizep) *cacheline_sizep = 
sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
-#else
-static void cache_line_size(int *cacheline_sizep)
-{
-   if 
(sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", 
cacheline_sizep))
-   pr_debug("cannot determine cache line size");
-}
-#endif
-
 int main(int argc, const char **argv)
 {
int err;
@@ -444,7 +434,6 @@ int main(int argc, const char **argv)
 
/* The page_size is placed in util object. */
page_size = sysconf(_SC_PAGE_SIZE);
-   cache_line_size(&cacheline_size);
 
if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0)
sysctl_perf_event_max_stack = value;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e65903a695a6..4058ade352a5 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2582,7 +2582,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const 
char *tok,
if (sort__mode != SORT_MODE__MEMORY)
return -EINVAL;
 
-   if (sd->entry == &sort_mem_dcacheline && cacheline_size == 0)
+   if (sd->entry == &sort_mem_dcacheline && cacheline_size() == 0)
return -EINVAL;
 
if (sd->entry == &sort_mem_daddr_sym)
@@ -2628,7 +2628,7 @@ static int setup_sort_list(struct perf_hpp_list *list, 
char *str,
if (*tok) {
ret = sort_dimension__add(list, tok, evlist, level);
if (ret == -EINVAL) {
-   if (!cacheline_size && !strncasecmp(tok, 
"dcacheline", strlen(tok)))
+   if (!cacheline_size() && !strncasecmp(tok, 
"dcacheline", strlen(tok)))
pr_err("The \"dcacheline\" --sort key 
needs to know the cacheline size and it couldn't be determined on this system");
else
pr_err("Invalid --sort key: `%s'", tok);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 035b62e2c60b..9e6896293bbd 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -186,13 +186,13 @@ static inline float hist_entry__get_percent_limit(struct 
hist_entry *he)
 static inline u64 cl_address(u64 address)
 {
/* return the cacheline of the address */
-   return (address & ~(cacheline_size - 1));
+   return (address & ~(cacheline_size() - 1));
 }
 
 static inline u64 cl_offset(u64 address)
 {
/* return the cacheline of the address */
-   return (address & (cacheline_size - 1));
+   return (address & (cacheline_size() - 1));
 }
 
 enum sort_mode {
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 1019bbc5dbd8..99ab52165680 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -38,7 +38,26 @@ void perf_set_multithreaded(void)
 }
 
 unsigned int page_size;
-int cacheline_size;
+
+#ifdef _SC_LEVEL1_DCACHE_LINESIZE
+#define cache_line_size(cacheline_sizep) *cacheline_sizep = 
sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
+#else
+static void cache_line_size(int *cacheline_sizep)
+{
+   if 
(sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", 
cacheline_sizep))
+   pr_debug("cannot determine cache line size");
+}
+#endif
+
+int cacheline_size(void)
+{
+   static int size;
+
+   if (!size)
+   cache_line_size(&size);
+
+   return size;
+}
 
 int sysctl_perf_event_max_stack = PERF_MAX_STACK_DEPTH;
 int sysctl_perf_event_max_contexts_per_stack = PERF_MAX_CONTEXTS_PER_STACK;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index c9626c206208..74d21dfe0d29 100644
---

[tip:perf/core] perf script: Show virtual addresses instead of offsets

2018-05-19 Thread tip-bot for Sandipan Das
Commit-ID:  19610184693c547b4c12738df4156589892c4018
Gitweb: https://git.kernel.org/tip/19610184693c547b4c12738df4156589892c4018
Author: Sandipan Das 
AuthorDate: Thu, 17 May 2018 12:03:25 +0530
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 16:55:29 -0300

perf script: Show virtual addresses instead of offsets

When perf data is recorded with the call-graph option enabled, the
callchain shown by perf script shows the binary offsets of the symbols
as the ip. This is incorrect for kernel symbols as the ip values are
always off by a fixed offset depending on the architecture. If the
offsets from the start of the symbols are printed, they are also
incorrect for both kernel and userspace symbols.

Without the call-graph option, the callchain shows the virtual addresses
of the symbols rather than their binary offsets. The offsets printed in
this case are also correct.

This fixes the inconsistency in perf script's output.

This can be verified on a powerpc64le system running Fedora 27 as
follows:

  # cat /proc/kallsyms | grep sys_write
  ...
  c04025a0 T sys_write
  c04025a0 T __se_sys_write
  ...

  # perf probe -a sys_write

Before applying this patch:

  # perf record -e probe:sys_write -g ~/test
  # perf script -F ip,sym,symoff

4125b0 sys_write+0x80008010
 1b9e0 system_call+0x80008058
118234 __GI___libc_write+0xf52c0024
 92c74 _IO_file_write@@GLIBC_2.17+0xf52c0044
  5afbfd8a [unknown]
 91a60 new_do_write+0xf52c0090
 94638 _IO_do_write@@GLIBC_2.17+0xf52c0038
 94bbc _IO_file_overflow@@GLIBC_2.17+0xf52c014c
 95a24 __overflow+0xf52c0064
 84548 _IO_puts+0xf52c0218
   440 main+0xe020
 236a0 generic_start_main.isra.0+0xf52c0140
 23898 __libc_start_main+0xf52c00b8
 0 [unknown]
  ...

  # perf record -e probe:sys_write ~/test
  # perf script -F ip,sym,symoff

  c04025b0 sys_write+0x10
  ...

After applying this patch:

  # perf record -e probe:sys_write -g ~/test
  # perf script -F ip,sym,symoff

  c04025b0 sys_write+0x10
  c000b9e0 system_call+0x58
  7fffb70d8234 __GI___libc_write+0x24
  7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44
  5afc1818 [unknown]
  7fffb7051a60 new_do_write+0x90
  7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38
  7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c
  7fffb7055a24 __overflow+0x64
  7fffb7044548 _IO_puts+0x218
  1440 main+0x20
  7fffb6fe36a0 generic_start_main.isra.0+0x140
  7fffb6fe3898 __libc_start_main+0xb8
 0 [unknown]
  ...

  # perf record -e probe:sys_write ~/test
  # perf script -F ip,sym,symoff

  c04025b0 sys_write+0x10
  ...

Signed-off-by: Sandipan Das 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Naveen N. Rao 
Cc: Ravi Bangoria 
Link: http://lkml.kernel.org/r/20180517063326.6319-1-sandi...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 72a351613d85..7c777cb32806 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1764,7 +1764,7 @@ static int add_callchain_ip(struct thread *thread,
}
 
srcline = callchain_srcline(al.map, al.sym, al.addr);
-   return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
+   return callchain_cursor_append(cursor, ip, al.map, al.sym,
   branch, flags, nr_loop_iter,
   iter_cycles, branch_from, srcline);
 }


[tip:perf/core] perf tools: No need to unconditionally read the max_stack sysctls

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  029c75e5cf166f9c04744d81c798f54a44a8417c
Gitweb: https://git.kernel.org/tip/029c75e5cf166f9c04744d81c798f54a44a8417c
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 16:31:32 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 17 May 2018 16:31:32 -0300

perf tools: No need to unconditionally read the max_stack sysctls

Let tools that need to have those variables with the sysctl current
values use a function that will read them.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9v...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-top.c   |  2 +-
 tools/perf/builtin-trace.c |  2 +-
 tools/perf/perf.c  |  7 ---
 tools/perf/util/evsel.c|  2 +-
 tools/perf/util/util.c | 13 +
 tools/perf/util/util.h |  2 ++
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 3c061c57afb6..7a349fcd3864 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1264,7 +1264,7 @@ int cmd_top(int argc, const char **argv)
.proc_map_timeout= 500,
.overwrite  = 1,
},
-   .max_stack   = sysctl_perf_event_max_stack,
+   .max_stack   = sysctl__max_stack(),
.sym_pcnt_filter = 5,
.nr_threads_synthesize = UINT_MAX,
};
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index c7effcfc40ed..560aed7da36a 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3162,7 +3162,7 @@ int cmd_trace(int argc, const char **argv)
mmap_pages_user_set = false;
 
if (trace.max_stack == UINT_MAX) {
-   trace.max_stack = input_name ? PERF_MAX_STACK_DEPTH : 
sysctl_perf_event_max_stack;
+   trace.max_stack = input_name ? PERF_MAX_STACK_DEPTH : 
sysctl__max_stack();
max_stack_user_set = false;
}
 
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index cefd8f74630c..51c81509a315 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -426,7 +426,6 @@ int main(int argc, const char **argv)
int err;
const char *cmd;
char sbuf[STRERR_BUFSIZE];
-   int value;
 
/* libsubcmd init */
exec_cmd_init("perf", PREFIX, PERF_EXEC_PATH, EXEC_PATH_ENVIRONMENT);
@@ -435,12 +434,6 @@ int main(int argc, const char **argv)
/* The page_size is placed in util object. */
page_size = sysconf(_SC_PAGE_SIZE);
 
-   if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0)
-   sysctl_perf_event_max_stack = value;
-
-   if (sysctl__read_int("kernel/perf_event_max_contexts_per_stack", 
&value) == 0)
-   sysctl_perf_event_max_contexts_per_stack = value;
-
cmd = extract_argv0_path(argv[0]);
if (!cmd)
cmd = "perf-help";
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 4cd2cf93f726..150db5ed7400 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2862,7 +2862,7 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, 
struct target *target,
return scnprintf(msg, size,
 "Not enough memory to setup event with 
callchain.\n"
 "Hint: Try tweaking 
/proc/sys/kernel/perf_event_max_stack\n"
-"Hint: Current value: %d", 
sysctl_perf_event_max_stack);
+"Hint: Current value: %d", 
sysctl__max_stack());
break;
case ENODEV:
if (target->cpu_list)
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 99ab52165680..eac5b858a371 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -62,6 +62,19 @@ int cacheline_size(void)
 int sysctl_perf_event_max_stack = PERF_MAX_STACK_DEPTH;
 int sysctl_perf_event_max_contexts_per_stack = PERF_MAX_CONTEXTS_PER_STACK;
 
+int sysctl__max_stack(void)
+{
+   int value;
+
+   if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0)
+   sysctl_perf_event_max_stack = value;
+
+   if (sysctl__read_int("kernel/perf_event_max_contexts_per_stack", 
&value) == 0)
+   sysctl_perf_event_max_contexts_per_stack = value;
+
+   return sysctl_perf_event_max_stack;
+}
+
 bool test_attr__enabled;
 
 bool perf_host  = true;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 74d21dfe0d29..dc58254a2b69 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -45,6 +45,8 @@ int hex2u64(const char *ptr, u64 *val);
 extern unsigned int page_size;
 int __pure cacheline_size(void);
 
+int sysctl__max_stack(void);
+
 int fetch_kernel_version(unsigned int *puint,
 

RE: [PATCH v8 10/15] cpufreq: Add Kryo CPU scaling driver

2018-05-19 Thread ilialin
Hi Viresh,

If I send patches in reply, it will produce new patches, instead of answers
in the thread. Please find below the file dump.

->cat drivers/cpufreq/qcom-cpufreq-kryo.c
// SPDX-License-Identifier: GPL-2.0
/*
 * Copyright (c) 2018, The Linux Foundation. All rights reserved.
 */

/*
 * In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
 * the CPU frequency subset and voltage value of each OPP varies
 * based on the silicon variant in use. Qualcomm Process Voltage Scaling
Tables
 * defines the voltage and frequency value based on the msm-id in SMEM
 * and speedbin blown in the efuse combination.
 * The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the
SoC
 * to provide the OPP framework with required information.
 * This is used to determine the voltage and frequency value for each OPP of
 * operating-points-v2 table when it is parsed by the OPP framework.
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MSM_ID_SMEM 137
#define SILVER_LEAD 0
#define GOLD_LEAD   2

enum _msm_id {
MSM8996V3 = 0xF6ul,
APQ8096V3 = 0x123ul,
MSM8996SG = 0x131ul,
APQ8096SG = 0x138ul,
};

enum _msm8996_version {
MSM8996_V3,
MSM8996_SG,
NUM_OF_MSM8996_VERSIONS,
};

static enum _msm8996_version __init qcom_cpufreq_kryo_get_msm_id(void)
{
size_t len;
u32 *msm_id;
enum _msm8996_version version;

msm_id = qcom_smem_get(QCOM_SMEM_HOST_ANY, MSM_ID_SMEM, &len);
/* The first 4 bytes are format, next to them is the actual msm-id
*/
msm_id++;

switch ((enum _msm_id)*msm_id) {
case MSM8996V3:
case APQ8096V3:
version = MSM8996_V3;
break;
case MSM8996SG:
case APQ8096SG:
version = MSM8996_SG;
break;
default:
version = NUM_OF_MSM8996_VERSIONS;
}

return version;
}

static int __init qcom_cpufreq_kryo_driver_init(void)
{
struct device *cpu_dev_silver, *cpu_dev_gold;
struct opp_table *opp_silver, *opp_gold;
enum _msm8996_version msm8996_version;
struct nvmem_cell *speedbin_nvmem;
struct platform_device *pdev;
struct device_node *np;
u8 *speedbin;
u32 versions;
size_t len;
int ret;

cpu_dev_silver = get_cpu_device(SILVER_LEAD);
if (IS_ERR_OR_NULL(cpu_dev_silver))
return PTR_ERR(cpu_dev_silver);

cpu_dev_gold = get_cpu_device(SILVER_LEAD);
if (IS_ERR_OR_NULL(cpu_dev_gold))
return PTR_ERR(cpu_dev_gold);

msm8996_version = qcom_cpufreq_kryo_get_msm_id();
if (NUM_OF_MSM8996_VERSIONS == msm8996_version) {
dev_err(cpu_dev_silver, "Not Snapdragon 820/821!");
return -ENODEV;
}

np = dev_pm_opp_of_get_opp_desc_node(cpu_dev_silver);
if (IS_ERR_OR_NULL(np))
return PTR_ERR(np);

if (!of_device_is_compatible(np, "operating-points-v2-kryo-cpu")) {
ret = -ENOENT;
goto free_np;
}

speedbin_nvmem = of_nvmem_cell_get(np, NULL);
if (IS_ERR(speedbin_nvmem)) {
ret = PTR_ERR(speedbin_nvmem);
dev_err(cpu_dev_silver, "Could not get nvmem cell: %d\n",
ret);
goto free_np;
}

speedbin = nvmem_cell_read(speedbin_nvmem, &len);
nvmem_cell_put(speedbin_nvmem);

switch (msm8996_version) {
case MSM8996_V3:
versions = 1 << (unsigned int)(*speedbin);
break;
case MSM8996_SG:
versions = 1 << ((unsigned int)(*speedbin) + 4);
break;
default:
BUG();
break;
}

opp_silver =
dev_pm_opp_set_supported_hw(cpu_dev_silver,&versions,1);
if (IS_ERR(opp_silver)) {
dev_err(cpu_dev_silver, "Failed to set supported
hardware\n");
ret = PTR_ERR(opp_silver);
goto free_np;
}

opp_gold = dev_pm_opp_set_supported_hw(cpu_dev_gold,&versions,1);
if (IS_ERR(opp_gold)) {
dev_err(cpu_dev_gold, "Failed to set supported hardware\n");
ret = PTR_ERR(opp_gold);
goto free_opp_silver;
}

pdev = platform_device_register_simple("cpufreq-dt", -1, NULL, 0);
if (!IS_ERR_OR_NULL(pdev))
return 0;

ret = PTR_ERR(pdev);
dev_err(cpu_dev_silver, "Failed to register platform device\n");
dev_pm_opp_put_supported_hw(opp_gold);

free_opp_silver:
dev_pm_opp_put_supported_hw(opp_silver);

free_np:
of_node_put(np);

return ret;
}
late_initcall(qcom_cpufreq_kryo_driver_init);

MODULE_DESCRIPTION("Qualcomm Technologies, Inc. Kryo CPUfreq driver");
MODUL

[tip:perf/core] perf script: Show symbol offsets by default

2018-05-19 Thread tip-bot for Sandipan Das
Commit-ID:  7903a70867230d9edbd5e886cd8b8a2b248f418f
Gitweb: https://git.kernel.org/tip/7903a70867230d9edbd5e886cd8b8a2b248f418f
Author: Sandipan Das 
AuthorDate: Thu, 17 May 2018 12:03:26 +0530
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 18 May 2018 16:31:40 -0300

perf script: Show symbol offsets by default

Since the ip shown for a symbol is now always a virtual address, it
becomes difficult to correlate this with objdump output and determine
the exact instruction address. So, we always show the offset from the
start of the symbol.

This can be verified on a powerpc64le system running Fedora 27 as
follows:

  # perf probe -a sys_write
  # perf record -e probe:sys_write -g ~/test

Before applying this patch:

  # perf script

  test  9710 [013] 95614.332431: probe:sys_write: (c04025b0)
  c04025b0 sys_write (/lib/modules/4.17.0-rc4+/build/vmlinux)
  c000b9e0 system_call (/lib/modules/4.17.0-rc4+/build/vmlinux)
  7fffb70d8234 __GI___libc_write (/usr/lib64/libc-2.26.so)
  7fffb7052c74 _IO_file_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
  5afc1818 [unknown] ([unknown])
  7fffb7051a60 new_do_write (/usr/lib64/libc-2.26.so)
  7fffb7054638 _IO_do_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
  7fffb7054bbc _IO_file_overflow@@GLIBC_2.17 
(/usr/lib64/libc-2.26.so)
  7fffb7055a24 __overflow (/usr/lib64/libc-2.26.so)
  7fffb7044548 _IO_puts (/usr/lib64/libc-2.26.so)
  1440 main (/home/sandipan/test)
  7fffb6fe36a0 generic_start_main.isra.0 (/usr/lib64/libc-2.26.so)
  7fffb6fe3898 __libc_start_main (/usr/lib64/libc-2.26.so)
 0 [unknown] ([unknown])
  ...

After applying this patch:

  # perf script

  test  9710 [013] 95614.332431: probe:sys_write: (c04025b0)
  c04025b0 sys_write+0x10 
(/lib/modules/4.17.0-rc4+/build/vmlinux)
  c000b9e0 system_call+0x58 
(/lib/modules/4.17.0-rc4+/build/vmlinux)
  7fffb70d8234 __GI___libc_write+0x24 (/usr/lib64/libc-2.26.so)
  7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44 
(/usr/lib64/libc-2.26.so)
  5afc1818 [unknown] ([unknown])
  7fffb7051a60 new_do_write+0x90 (/usr/lib64/libc-2.26.so)
  7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38 
(/usr/lib64/libc-2.26.so)
  7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c 
(/usr/lib64/libc-2.26.so)
  7fffb7055a24 __overflow+0x64 (/usr/lib64/libc-2.26.so)
  7fffb7044548 _IO_puts+0x218 (/usr/lib64/libc-2.26.so)
  1440 main+0x20 (/home/sandipan/test)
  7fffb6fe36a0 generic_start_main.isra.0+0x140 
(/usr/lib64/libc-2.26.so)
  7fffb6fe3898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
 0 [unknown] ([unknown])
  ...

Signed-off-by: Sandipan Das 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Naveen N. Rao 
Cc: Ravi Bangoria 
Link: http://lkml.kernel.org/r/20180517063326.6319-2-sandi...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c| 26 --
 .../tests/shell/record+probe_libc_inet_pton.sh | 12 +-
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index fa2c7a288750..cefc8813e91e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -153,8 +153,8 @@ static struct {
.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
  PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
  PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
- PERF_OUTPUT_SYM | PERF_OUTPUT_DSO |
- PERF_OUTPUT_PERIOD,
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
 
.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
},
@@ -165,8 +165,9 @@ static struct {
.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
  PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
  PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
- PERF_OUTPUT_SYM | PERF_OUTPUT_DSO |
- PERF_OUTPUT_PERIOD | PERF_OUTPUT_BPF_OUTPUT,
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD |
+ PERF_OUTPUT_BPF_OUTPUT,
 
.invalid_fields = PERF_OUTPUT_TRACE,
},
@@ -185,10 +186,10 @@ static struct {
.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
  PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
  PERF_OU

[tip:perf/core] perf annotate: Record the min/max cycles

2018-05-19 Thread tip-bot for Jin Yao
Commit-ID:  48659ebf37e5d9d23bda6dbf032bdbe9708929f1
Gitweb: https://git.kernel.org/tip/48659ebf37e5d9d23bda6dbf032bdbe9708929f1
Author: Jin Yao 
AuthorDate: Thu, 17 May 2018 22:58:37 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 18 May 2018 16:31:41 -0300

perf annotate: Record the min/max cycles

Currently perf has a feature to account cycles for LBRs

For example, on skylake:

  perf record -b ...
  perf report or perf annotate

And then browsing the annotate browser gives average cycle counts for
program blocks.

For some analysis it would be useful if we could know not only the
average cycles but also the min and max cycles.

This patch records the min and max cycles.

Signed-off-by: Jin Yao 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1526569118-14217-2-git-send-email-yao@linux.intel.com
[ Switch from max/min to min/max ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 14 +-
 tools/perf/util/annotate.h |  4 
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 5d74a30fe00f..4fcfefea3bc2 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -760,6 +760,15 @@ static int __symbol__account_cycles(struct annotation 
*notes,
ch[offset].num_aggr++;
ch[offset].cycles_aggr += cycles;
 
+   if (cycles > ch[offset].cycles_max)
+   ch[offset].cycles_max = cycles;
+
+   if (ch[offset].cycles_min) {
+   if (cycles && cycles < ch[offset].cycles_min)
+   ch[offset].cycles_min = cycles;
+   } else
+   ch[offset].cycles_min = cycles;
+
if (!have_start && ch[offset].have_start)
return 0;
if (ch[offset].num) {
@@ -953,8 +962,11 @@ void annotation__compute_ipc(struct annotation *notes, 
size_t size)
if (ch->have_start)
annotation__count_and_fill(notes, ch->start, 
offset, ch);
al = notes->offsets[offset];
-   if (al && ch->num_aggr)
+   if (al && ch->num_aggr) {
al->cycles = ch->cycles_aggr / ch->num_aggr;
+   al->cycles_max = ch->cycles_max;
+   al->cycles_min = ch->cycles_min;
+   }
notes->have_cycles = true;
}
}
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index f28a9e43421d..d50363d56f73 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -105,6 +105,8 @@ struct annotation_line {
int  jump_sources;
floatipc;
u64  cycles;
+   u64  cycles_max;
+   u64  cycles_min;
size_t   privsize;
char*path;
u32  idx;
@@ -186,6 +188,8 @@ struct cyc_hist {
u64 start;
u64 cycles;
u64 cycles_aggr;
+   u64 cycles_max;
+   u64 cycles_min;
u32 num;
u32 num_aggr;
u8  have_start;


[tip:perf/core] perf bpf: Fixup include and examples install messages

2018-05-19 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  cfc4033be77abcf5953bed2fd201100515fcb357
Gitweb: https://git.kernel.org/tip/cfc4033be77abcf5953bed2fd201100515fcb357
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 17 May 2018 17:22:22 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Sat, 19 May 2018 06:42:50 -0300

perf bpf: Fixup include and examples install messages

Before:

  INSTALL  lib
install include/bpf/*.h '/home/acme/lib/include/perf/bpf'
  INSTALL  lib
install examples/bpf/*.c '/home/acme/lib/examples/perf/bpf'

After:

  INSTALL  lib
  INSTALL  include/bpf
  INSTALL  lib
  INSTALL  examples/bpf

Reported-by: Ingo Molnar 
Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Fixes: dd8e4ead6e98 ("perf bpf: Add bpf.h to be used in eBPF proggies")
Fixes: 8f12a2ff00e5 ("perf bpf: Add 'examples' directories")
Link: https://lkml.kernel.org/n/tip-icljqe87e8pak8mu6mkki...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile.perf | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index c63a3971d719..ecc9fc952655 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -770,9 +770,11 @@ endif
 ifndef NO_LIBBPF
$(call QUIET_INSTALL, lib) \
$(INSTALL) -d -m 755 
'$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf'
+   $(call QUIET_INSTALL, include/bpf) \
$(INSTALL) include/bpf/*.h 
'$(DESTDIR_SQ)$(perf_include_instdir_SQ)/bpf'
$(call QUIET_INSTALL, lib) \
$(INSTALL) -d -m 755 
'$(DESTDIR_SQ)$(perf_examples_instdir_SQ)/bpf'
+   $(call QUIET_INSTALL, examples/bpf) \
$(INSTALL) examples/bpf/*.c 
'$(DESTDIR_SQ)$(perf_examples_instdir_SQ)/bpf'
 endif
$(call QUIET_INSTALL, perf-archive) \


[tip:perf/core] perf annotate: Create hotkey 'c' to show min/max cycles

2018-05-19 Thread tip-bot for Jin Yao
Commit-ID:  3e71fc0319775723adc08991ba7fbaeff1150347
Gitweb: https://git.kernel.org/tip/3e71fc0319775723adc08991ba7fbaeff1150347
Author: Jin Yao 
AuthorDate: Thu, 17 May 2018 22:58:38 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Sat, 19 May 2018 06:42:49 -0300

perf annotate: Create hotkey 'c' to show min/max cycles

In the 'perf annotate' view, a new hotkey 'c' is created for showing the
min/max cycles.

For example, when press 'c', the annotate view is:

  Percent│ IPC Cycle(min/max)
 │
 │
 │ Disassembly of section .text:
 │
 │ 0003aab0 :
8.22 │3.92   sub$0x18,%rsp
 │3.92   mov$0x1,%esi
 │3.92   xor%eax,%eax
 │3.92   cmpl   
$0x0,argp_program_version_hook@@G
 │3.92 1(2/1)  ↓ je 20
 │   lock   cmpxchg 
%esi,__abort_msg@@GLIBC_P
 │ ↓ jne29
 │ ↓ jmp43
 │1.10 20:   cmpxchg 
%esi,__abort_msg@@GLIBC_PRIVATE+
8.93 │1.10 1(5/1)  ↓ je 43

When press 'c' again, the annotate view is switched back:

  Percent│ IPC Cycle
 │
 │
 │Disassembly of section .text:
 │
 │0003aab0 :
8.22 │3.92  sub$0x18,%rsp
 │3.92  mov$0x1,%esi
 │3.92  xor%eax,%eax
 │3.92  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
 │3.92 1  ↓ je 20
 │  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
 │↓ jne29
 │↓ jmp43
 │1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
8.93 │1.10 1  ↓ je 43

Signed-off-by: Jin Yao 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao@linux.intel.com
[ Rename all maxmin to minmax ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/browsers/annotate.c |  8 
 tools/perf/util/annotate.c| 37 +++--
 tools/perf/util/annotate.h|  7 ++-
 3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 3781d74088a7..8be40fa903aa 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -695,6 +695,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
"O Bump offset level (jump targets -> +call -> all 
-> cycle thru)\n"
"s Toggle source code view\n"
"t Circulate percent, total period, samples view\n"
+   "c Show min/max cycle\n"
"/ Search string\n"
"k Toggle line numbers\n"
"P Print to [symbol_name].annotation file.\n"
@@ -791,6 +792,13 @@ show_sup_ins:
notes->options->show_total_period = true;
annotation__update_column_widths(notes);
continue;
+   case 'c':
+   if (notes->options->show_minmax_cycle)
+   notes->options->show_minmax_cycle = false;
+   else
+   notes->options->show_minmax_cycle = true;
+   annotation__update_column_widths(notes);
+   continue;
case K_LEFT:
case K_ESC:
case 'q':
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4fcfefea3bc2..6612c7f90af4 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2498,13 +2498,38 @@ static void __annotation_line__write(struct 
annotation_line *al, struct annotati
else
obj__printf(obj, "%*s ", ANNOTATION__IPC_WIDTH - 1, 
"IPC");
 
-   if (al->cycles)
-   obj__printf(obj, "%*" PRIu64 " ",
+   if (!notes->options->show_minmax_cycle) {
+   if (al->cycles)
+   obj__printf(obj, "%*" PRIu64 " ",
   ANNOTATION__CYCLES_WIDTH - 1, 
al->cycles);
-   else if (!show_title)
-   obj__printf(obj, "%*s", ANNOTATION__CYCLES_WIDTH, " ");
-   else
-   obj__printf(obj, "%*s ", ANNOTATION__CYCLES_WIDTH - 1, 
"Cycle");
+   else if (!show_title)
+   

[tip:perf/core] perf machine: Add machine__is() to identify machine arch

2018-05-19 Thread tip-bot for Adrian Hunter
Commit-ID:  dbbd34a666ee117d0e39e71a47f38f02c4a5c698
Gitweb: https://git.kernel.org/tip/dbbd34a666ee117d0e39e71a47f38f02c4a5c698
Author: Adrian Hunter 
AuthorDate: Thu, 17 May 2018 12:21:53 +0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Sat, 19 May 2018 06:42:50 -0300

perf machine: Add machine__is() to identify machine arch

Add a function to identify the machine architecture.

Signed-off-by: Adrian Hunter 
Tested-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Joerg Roedel 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: x...@kernel.org
Link: 
http://lkml.kernel.org/r/1526548928-20790-6-git-send-email-adrian.hun...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/env.c | 18 ++
 tools/perf/util/env.h |  2 ++
 tools/perf/util/machine.c |  9 +
 tools/perf/util/machine.h |  2 ++
 4 files changed, 31 insertions(+)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4c842762e3f2..319fb0a0d05e 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -93,6 +93,24 @@ int perf_env__read_cpu_topology_map(struct perf_env *env)
return 0;
 }
 
+static int perf_env__read_arch(struct perf_env *env)
+{
+   struct utsname uts;
+
+   if (env->arch)
+   return 0;
+
+   if (!uname(&uts))
+   env->arch = strdup(uts.machine);
+
+   return env->arch ? 0 : -ENOMEM;
+}
+
+const char *perf_env__raw_arch(struct perf_env *env)
+{
+   return env && !perf_env__read_arch(env) ? env->arch : "unknown";
+}
+
 void cpu_cache_level__free(struct cpu_cache_level *cache)
 {
free(cache->type);
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index c4ef2e523367..62e193948608 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -76,4 +76,6 @@ int perf_env__read_cpu_topology_map(struct perf_env *env);
 void cpu_cache_level__free(struct cpu_cache_level *cache);
 
 const char *perf_env__arch(struct perf_env *env);
+const char *perf_env__raw_arch(struct perf_env *env);
+
 #endif /* __PERF_ENV_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7c777cb32806..107bae7676b1 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2296,6 +2296,15 @@ int machine__set_current_tid(struct machine *machine, 
int cpu, pid_t pid,
return 0;
 }
 
+/*
+ * Compares the raw arch string. N.B. see instead perf_env__arch() if a
+ * normalized arch is needed.
+ */
+bool machine__is(struct machine *machine, const char *arch)
+{
+   return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
+}
+
 int machine__get_kernel_start(struct machine *machine)
 {
struct map *map = machine__kernel_map(machine);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 388fb4741c54..b31d33b5aa2a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -188,6 +188,8 @@ static inline bool machine__is_host(struct machine *machine)
return machine ? machine->pid == HOST_KERNEL_ID : false;
 }
 
+bool machine__is(struct machine *machine, const char *arch);
+
 struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, 
pid_t tid);
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, 
pid_t tid);
 


[tip:perf/core] perf tools: Fix kernel_start for PTI on x86

2018-05-19 Thread tip-bot for Adrian Hunter
Commit-ID:  19422a9f2a3be7f3a046285ffae4cbb571aa853a
Gitweb: https://git.kernel.org/tip/19422a9f2a3be7f3a046285ffae4cbb571aa853a
Author: Adrian Hunter 
AuthorDate: Thu, 17 May 2018 12:21:54 +0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Sat, 19 May 2018 06:42:51 -0300

perf tools: Fix kernel_start for PTI on x86

Opickn x86_64, PTI entry trampolines are less than the start of kernel text,
but still above 2^63. So leave kernel_start = 1ULL << 63 for x86_64.

Signed-off-by: Adrian Hunter 
Tested-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Joerg Roedel 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: x...@kernel.org
Link: 
http://lkml.kernel.org/r/1526548928-20790-7-git-send-email-adrian.hun...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 107bae7676b1..e011a7160380 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2321,7 +2321,12 @@ int machine__get_kernel_start(struct machine *machine)
machine->kernel_start = 1ULL << 63;
if (map) {
err = map__load(map);
-   if (!err)
+   /*
+* On x86_64, PTI entry trampolines are less than the
+* start of kernel text, but still above 2^63. So leave
+* kernel_start = 1ULL << 63 for x86_64.
+*/
+   if (!err && !machine__is(machine, "x86_64"))
machine->kernel_start = map->start;
}
return err;


Re: [PATCH v8 10/15] cpufreq: Add Kryo CPU scaling driver

2018-05-19 Thread Russell King - ARM Linux
On Sat, May 19, 2018 at 02:09:24PM +0300, ilia...@codeaurora.org wrote:
> +static int __init qcom_cpufreq_kryo_driver_init(void)
> +{
> +   struct device *cpu_dev_silver, *cpu_dev_gold;
> +   struct opp_table *opp_silver, *opp_gold;
> +   enum _msm8996_version msm8996_version;
> +   struct nvmem_cell *speedbin_nvmem;
> +   struct platform_device *pdev;
> +   struct device_node *np;
> +   u8 *speedbin;
> +   u32 versions;
> +   size_t len;
> +   int ret;
> +
> +   cpu_dev_silver = get_cpu_device(SILVER_LEAD);
> +   if (IS_ERR_OR_NULL(cpu_dev_silver))
> +   return PTR_ERR(cpu_dev_silver);
> +
> +   cpu_dev_gold = get_cpu_device(SILVER_LEAD);
> +   if (IS_ERR_OR_NULL(cpu_dev_gold))
> +   return PTR_ERR(cpu_dev_gold);
> +
> +   msm8996_version = qcom_cpufreq_kryo_get_msm_id();
> +   if (NUM_OF_MSM8996_VERSIONS == msm8996_version) {
> +   dev_err(cpu_dev_silver, "Not Snapdragon 820/821!");
> +   return -ENODEV;
> +   }
> +
> +   np = dev_pm_opp_of_get_opp_desc_node(cpu_dev_silver);
> +   if (IS_ERR_OR_NULL(np))
> +   return PTR_ERR(np);

This function (qcom_cpufreq_kryo_driver_init) returns zero on success.
You are checking "np" here for being an error pointer, or NULL.
What value do you think PTR_ERR() returns in the case of PTR_ERR(NULL)?

IS_ERR_OR_NULL() is considered by some (me included) as being _very_
harmful because programmers generally fail to look at linux/err.h and
consider what it means when used as above.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [BUG] i2c-hid: ELAN Touchpad does not work on ASUS X580GD

2018-05-19 Thread Hans de Goede

Hi,

On 18-05-18 15:15, Jarkko Nikula wrote:

On 05/18/2018 10:48 AM, Hans de Goede wrote:

Could it be the i2c input clock definition in drivers/mfd/intel-lpss-pci.c
is also wrong for Apollo Lake (N3450) ?  There are lots of people having
various issues with i2c attached touchpads on Apollo Lake devices, this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1728244

Is sort of a collection bug for these. Various models laptops, lots of
reporters. Note not sure thie is an i2c-designware issue, but it would
be good to double check the input clock on Apollo Lake.


Does i2c_designware_core.dyndbg=+p and i2c-hid.debug=1 command line arguments 
give any useful debug information from those machines?


A first dmesg with these flags has been provided:
https://launchpadlibrarian.net/370884912/dmesg.log

Regards,

Hans



[tip:timers/2038] timekeeping: Remove timespec64 hack

2018-05-19 Thread tip-bot for Arnd Bergmann
Commit-ID:  4f0fad9a603aee91a374e8411c23953894a77479
Gitweb: https://git.kernel.org/tip/4f0fad9a603aee91a374e8411c23953894a77479
Author: Arnd Bergmann 
AuthorDate: Fri, 27 Apr 2018 15:40:12 +0200
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:57:31 +0200

timekeeping: Remove timespec64 hack

At this point, we have converted most of the kernel to use timespec64
consistently in place of timespec, so it seems it's time to make
timespec64 the native structure and define timespec in terms of that
one on 64-bit architectures.

Starting with gcc-5, the compiler can completely optimize away the
timespec_to_timespec64 and timespec64_to_timespec functions on 64-bit
architectures. With older compilers, we introduce a couple of extra
copies of local variables, but those are easily avoided by using
the timespec64 based interfaces consistently, as we do in most of the
important code paths already.

The main upside of removing the hack is that printing the tv_sec
field of a timespec64 structure can now use the %lld format
string on all architectures without a cast to time64_t. Without
this patch, the field is a 'long' type and would have to be printed
using %ld on 64-bit architectures.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: y2...@lists.linaro.org
Cc: John Stultz 
Link: https://lkml.kernel.org/r/20180427134016.2525989-2-a...@arndb.de

---
 include/linux/time32.h| 18 +++--
 include/linux/time64.h|  7 ---
 include/linux/timekeeping32.h | 45 ---
 kernel/time/time.c|  2 --
 4 files changed, 3 insertions(+), 69 deletions(-)

diff --git a/include/linux/time32.h b/include/linux/time32.h
index d2bcd4377b56..0b14f936100a 100644
--- a/include/linux/time32.h
+++ b/include/linux/time32.h
@@ -18,25 +18,14 @@
 /* timespec64 is defined as timespec here */
 static inline struct timespec timespec64_to_timespec(const struct timespec64 
ts64)
 {
-   return ts64;
+   return *(const struct timespec *)&ts64;
 }
 
 static inline struct timespec64 timespec_to_timespec64(const struct timespec 
ts)
 {
-   return ts;
+   return *(const struct timespec64 *)&ts;
 }
 
-# define timespec_equaltimespec64_equal
-# define timespec_compare  timespec64_compare
-# define set_normalized_timespec   set_normalized_timespec64
-# define timespec_add  timespec64_add
-# define timespec_sub  timespec64_sub
-# define timespec_validtimespec64_valid
-# define timespec_valid_strict timespec64_valid_strict
-# define timespec_to_nstimespec64_to_ns
-# define ns_to_timespecns_to_timespec64
-# define timespec_add_ns   timespec64_add_ns
-
 #else
 static inline struct timespec timespec64_to_timespec(const struct timespec64 
ts64)
 {
@@ -55,6 +44,7 @@ static inline struct timespec64 timespec_to_timespec64(const 
struct timespec ts)
ret.tv_nsec = ts.tv_nsec;
return ret;
 }
+#endif
 
 static inline int timespec_equal(const struct timespec *a,
 const struct timespec *b)
@@ -159,8 +149,6 @@ static __always_inline void timespec_add_ns(struct timespec 
*a, u64 ns)
a->tv_nsec = ns;
 }
 
-#endif
-
 /**
  * time_to_tm - converts the calendar time to local broken-down time
  *
diff --git a/include/linux/time64.h b/include/linux/time64.h
index 0d96887ba4e0..0a7b2f79cec7 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -16,11 +16,6 @@ typedef __u64 timeu64_t;
 
 #include 
 
-#if __BITS_PER_LONG == 64
-/* this trick allows us to optimize out timespec64_to_timespec */
-# define timespec64 timespec
-#define itimerspec64 itimerspec
-#else
 struct timespec64 {
time64_ttv_sec; /* seconds */
longtv_nsec;/* nanoseconds */
@@ -31,8 +26,6 @@ struct itimerspec64 {
struct timespec64 it_value;
 };
 
-#endif
-
 /* Parameters used to convert the timespec values: */
 #define MSEC_PER_SEC   1000L
 #define USEC_PER_MSEC  1000L
diff --git a/include/linux/timekeeping32.h b/include/linux/timekeeping32.h
index 3616b4becb59..4ea45d0df1d4 100644
--- a/include/linux/timekeeping32.h
+++ b/include/linux/timekeeping32.h
@@ -16,50 +16,6 @@ static inline struct timespec current_kernel_time(void)
return timespec64_to_timespec(now);
 }
 
-#if BITS_PER_LONG == 64
-/**
- * Deprecated. Use do_settimeofday64().
- */
-static inline int do_settimeofday(const struct timespec *ts)
-{
-   return do_settimeofday64(ts);
-}
-
-static inline int __getnstimeofday(struct timespec *ts)
-{
-   return __getnstimeofday64(ts);
-}
-
-static inline void getnstimeofday(struct timespec *ts)
-{
-   getnstimeofday64(ts);
-}
-
-static inline void ktime_get_ts(struct timespec *ts)
-{
-   ktime_get_ts64(ts);
-}
-
-static inline void ktime_get_real_ts(s

[tip:timers/2038] timekeeping: Clean up ktime_get_real_ts64

2018-05-19 Thread tip-bot for Arnd Bergmann
Commit-ID:  edca71fecb77e2697337d192cbfe96f513407761
Gitweb: https://git.kernel.org/tip/edca71fecb77e2697337d192cbfe96f513407761
Author: Arnd Bergmann 
AuthorDate: Fri, 27 Apr 2018 15:40:13 +0200
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:57:32 +0200

timekeeping: Clean up ktime_get_real_ts64

In a move to make ktime_get_*() the preferred driver interface into the
timekeeping code, sanitizes ktime_get_real_ts64() to be a proper exported
symbol rather than an alias for getnstimeofday64().

The internal __getnstimeofday64() is no longer used, so remove that
and merge it into ktime_get_real_ts64().

Signed-off-by: Arnd Bergmann 
Signed-off-by: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: y2...@lists.linaro.org
Cc: John Stultz 
Link: https://lkml.kernel.org/r/20180427134016.2525989-3-a...@arndb.de

---
 include/linux/timekeeping.h   |  8 +---
 include/linux/timekeeping32.h | 13 ++---
 kernel/time/timekeeping.c | 31 ++-
 3 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 588a0e4b1ab9..415dae6bf1f5 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -30,15 +30,13 @@ struct timespec64 current_kernel_time64(void);
 struct timespec64 get_monotonic_coarse64(void);
 extern void getrawmonotonic64(struct timespec64 *ts);
 extern void ktime_get_ts64(struct timespec64 *ts);
+extern void ktime_get_real_ts64(struct timespec64 *tv);
 extern time64_t ktime_get_seconds(void);
 extern time64_t __ktime_get_real_seconds(void);
 extern time64_t ktime_get_real_seconds(void);
 
-extern int __getnstimeofday64(struct timespec64 *tv);
-extern void getnstimeofday64(struct timespec64 *tv);
 extern void getboottime64(struct timespec64 *ts);
 
-#define ktime_get_real_ts64(ts)getnstimeofday64(ts)
 
 /*
  * ktime_t based interfaces
@@ -210,5 +208,9 @@ extern void read_persistent_clock64(struct timespec64 *ts);
 extern void read_boot_clock64(struct timespec64 *ts);
 extern int update_persistent_clock64(struct timespec64 now);
 
+/*
+ * deprecated aliases, don't use in new code
+ */
+#define getnstimeofday64(ts)   ktime_get_real_ts64(ts)
 
 #endif
diff --git a/include/linux/timekeeping32.h b/include/linux/timekeeping32.h
index 4ea45d0df1d4..5abff52d07fd 100644
--- a/include/linux/timekeeping32.h
+++ b/include/linux/timekeeping32.h
@@ -27,20 +27,11 @@ static inline int do_settimeofday(const struct timespec *ts)
return do_settimeofday64(&ts64);
 }
 
-static inline int __getnstimeofday(struct timespec *ts)
-{
-   struct timespec64 ts64;
-   int ret = __getnstimeofday64(&ts64);
-
-   *ts = timespec64_to_timespec(ts64);
-   return ret;
-}
-
 static inline void getnstimeofday(struct timespec *ts)
 {
struct timespec64 ts64;
 
-   getnstimeofday64(&ts64);
+   ktime_get_real_ts64(&ts64);
*ts = timespec64_to_timespec(ts64);
 }
 
@@ -56,7 +47,7 @@ static inline void ktime_get_real_ts(struct timespec *ts)
 {
struct timespec64 ts64;
 
-   getnstimeofday64(&ts64);
+   ktime_get_real_ts64(&ts64);
*ts = timespec64_to_timespec(ts64);
 }
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49cbceef5deb..7bbc7a6e6095 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -705,18 +705,19 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 }
 
 /**
- * __getnstimeofday64 - Returns the time of day in a timespec64.
+ * ktime_get_real_ts64 - Returns the time of day in a timespec64.
  * @ts:pointer to the timespec to be set
  *
- * Updates the time of day in the timespec.
- * Returns 0 on success, or -ve when suspended (timespec will be undefined).
+ * Returns the time of day in a timespec64 (WARN if suspended).
  */
-int __getnstimeofday64(struct timespec64 *ts)
+void ktime_get_real_ts64(struct timespec64 *ts)
 {
struct timekeeper *tk = &tk_core.timekeeper;
unsigned long seq;
u64 nsecs;
 
+   WARN_ON(timekeeping_suspended);
+
do {
seq = read_seqcount_begin(&tk_core.seq);
 
@@ -727,28 +728,8 @@ int __getnstimeofday64(struct timespec64 *ts)
 
ts->tv_nsec = 0;
timespec64_add_ns(ts, nsecs);
-
-   /*
-* Do not bail out early, in case there were callers still using
-* the value, even in the face of the WARN_ON.
-*/
-   if (unlikely(timekeeping_suspended))
-   return -EAGAIN;
-   return 0;
-}
-EXPORT_SYMBOL(__getnstimeofday64);
-
-/**
- * getnstimeofday64 - Returns the time of day in a timespec64.
- * @ts:pointer to the timespec64 to be set
- *
- * Returns the time of day in a timespec64 (WARN if suspended).
- */
-void getnstimeofday64(struct timespec64 *ts)
-{
-   WARN_ON(__getnstimeofday64(ts));
 }
-EXPORT_SYMBOL(getnstimeofday64);
+EXPORT_SYMBOL(ktime_get_real_ts64);
 
 ktime_t ktime_get(void)
 {


[tip:timers/2038] timekeeping: Standardize on ktime_get_*() naming

2018-05-19 Thread tip-bot for Arnd Bergmann
Commit-ID:  fb7fcc96a86cfaef0f6dcc0665516aa68611e736
Gitweb: https://git.kernel.org/tip/fb7fcc96a86cfaef0f6dcc0665516aa68611e736
Author: Arnd Bergmann 
AuthorDate: Fri, 27 Apr 2018 15:40:14 +0200
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 13:57:32 +0200

timekeeping: Standardize on ktime_get_*() naming

The current_kernel_time64, get_monotonic_coarse64, getrawmonotonic64,
get_monotonic_boottime64 and timekeeping_clocktai64 interfaces have
rather inconsistent naming, and they differ in the calling conventions
by passing the output either by reference or as a return value.

Rename them to ktime_get_coarse_real_ts64, ktime_get_coarse_ts64,
ktime_get_raw_ts64, ktime_get_boottime_ts64 and ktime_get_clocktai_ts64
respectively, and provide the interfaces with macros or inline
functions as needed.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: y2...@lists.linaro.org
Cc: John Stultz 
Link: https://lkml.kernel.org/r/20180427134016.2525989-4-a...@arndb.de

---
 include/linux/timekeeping.h   | 43 ---
 include/linux/timekeeping32.h | 14 ++
 kernel/time/timekeeping.c | 23 +--
 3 files changed, 51 insertions(+), 29 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 415dae6bf1f5..3ef9791d7d75 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -19,25 +19,25 @@ extern void xtime_update(unsigned long ticks);
 extern int do_settimeofday64(const struct timespec64 *ts);
 extern int do_sys_settimeofday64(const struct timespec64 *tv,
 const struct timezone *tz);
-/*
- * Kernel time accessors
- */
-struct timespec64 current_kernel_time64(void);
 
 /*
  * timespec64 based interfaces
  */
-struct timespec64 get_monotonic_coarse64(void);
-extern void getrawmonotonic64(struct timespec64 *ts);
+extern void ktime_get_raw_ts64(struct timespec64 *ts);
 extern void ktime_get_ts64(struct timespec64 *ts);
 extern void ktime_get_real_ts64(struct timespec64 *tv);
+extern void ktime_get_coarse_ts64(struct timespec64 *ts);
+extern void ktime_get_coarse_real_ts64(struct timespec64 *ts);
+
+void getboottime64(struct timespec64 *ts);
+
+/*
+ * time64_t base interfaces
+ */
 extern time64_t ktime_get_seconds(void);
 extern time64_t __ktime_get_real_seconds(void);
 extern time64_t ktime_get_real_seconds(void);
 
-extern void getboottime64(struct timespec64 *ts);
-
-
 /*
  * ktime_t based interfaces
  */
@@ -123,12 +123,12 @@ extern u64 ktime_get_real_fast_ns(void);
 /*
  * timespec64 interfaces utilizing the ktime based ones
  */
-static inline void get_monotonic_boottime64(struct timespec64 *ts)
+static inline void ktime_get_boottime_ts64(struct timespec64 *ts)
 {
*ts = ktime_to_timespec64(ktime_get_boottime());
 }
 
-static inline void timekeeping_clocktai64(struct timespec64 *ts)
+static inline void ktime_get_clocktai_ts64(struct timespec64 *ts)
 {
*ts = ktime_to_timespec64(ktime_get_clocktai());
 }
@@ -212,5 +212,26 @@ extern int update_persistent_clock64(struct timespec64 
now);
  * deprecated aliases, don't use in new code
  */
 #define getnstimeofday64(ts)   ktime_get_real_ts64(ts)
+#define get_monotonic_boottime64(ts)   ktime_get_boottime_ts64(ts)
+#define getrawmonotonic64(ts)  ktime_get_raw_ts64(ts)
+#define timekeeping_clocktai64(ts) ktime_get_clocktai_ts64(ts)
+
+static inline struct timespec64 current_kernel_time64(void)
+{
+   struct timespec64 ts;
+
+   ktime_get_coarse_real_ts64(&ts);
+
+   return ts;
+}
+
+static inline struct timespec64 get_monotonic_coarse64(void)
+{
+   struct timespec64 ts;
+
+   ktime_get_coarse_ts64(&ts);
+
+   return ts;
+}
 
 #endif
diff --git a/include/linux/timekeeping32.h b/include/linux/timekeeping32.h
index 5abff52d07fd..8762c2f45f8b 100644
--- a/include/linux/timekeeping32.h
+++ b/include/linux/timekeeping32.h
@@ -11,9 +11,11 @@ unsigned long get_seconds(void);
 
 static inline struct timespec current_kernel_time(void)
 {
-   struct timespec64 now = current_kernel_time64();
+   struct timespec64 ts64;
+
+   ktime_get_coarse_real_ts64(&ts64);
 
-   return timespec64_to_timespec(now);
+   return timespec64_to_timespec(ts64);
 }
 
 /**
@@ -55,13 +57,17 @@ static inline void getrawmonotonic(struct timespec *ts)
 {
struct timespec64 ts64;
 
-   getrawmonotonic64(&ts64);
+   ktime_get_raw_ts64(&ts64);
*ts = timespec64_to_timespec(ts64);
 }
 
 static inline struct timespec get_monotonic_coarse(void)
 {
-   return timespec64_to_timespec(get_monotonic_coarse64());
+   struct timespec64 ts64;
+
+   ktime_get_coarse_ts64(&ts64);
+
+   return timespec64_to_timespec(ts64);
 }
 
 static inline void getboottime(struct timespec *ts)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7bbc7a6e6095..ed9b74ec9c0b 100644
--- a/kernel/time/timekeeping.c

  1   2   3   >