Re: [PATCH] Bluetooth: hci_ll: Add endianness conversion when setting baudrate

2017-12-08 Thread Marcel Holtmann
Hi David,

> This adds an endianness conversion when setting the baudrate using a
> vendor-specific command. Otherwise, bad things might happen on a big-
> endian system.
> 
> Suggested-by: Marcel Holtmann 
> Signed-off-by: David Lechner 
> ---
> drivers/bluetooth/hci_ll.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel



[PATCH] misc: ad525x_dpot: Unnecessary space before function pointer arguments

2017-12-08 Thread Dhaval Shah
Resolved all the Unnecessary space before function pointer arguments
checkpatch warnings. Issue found by checkpatch.

Signed-off-by: Dhaval Shah 
---
 drivers/misc/ad525x_dpot.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/ad525x_dpot.h b/drivers/misc/ad525x_dpot.h
index 6bd1eba23bc0..443a51fd5680 100644
--- a/drivers/misc/ad525x_dpot.h
+++ b/drivers/misc/ad525x_dpot.h
@@ -195,12 +195,12 @@ enum dpot_devid {
 struct dpot_data;
 
 struct ad_dpot_bus_ops {
-   int (*read_d8) (void *client);
-   int (*read_r8d8) (void *client, u8 reg);
-   int (*read_r8d16) (void *client, u8 reg);
-   int (*write_d8) (void *client, u8 val);
-   int (*write_r8d8) (void *client, u8 reg, u8 val);
-   int (*write_r8d16) (void *client, u8 reg, u16 val);
+   int (*read_d8)(void *client);
+   int (*read_r8d8)(void *client, u8 reg);
+   int (*read_r8d16)(void *client, u8 reg);
+   int (*write_d8)(void *client, u8 val);
+   int (*write_r8d8)(void *client, u8 reg, u8 val);
+   int (*write_r8d16)(void *client, u8 reg, u16 val);
 };
 
 struct ad_dpot_bus_data {
-- 
2.11.0



Re: [PATCH v2 1/3] Bluetooth: hci_ll: add support for setting public address

2017-12-08 Thread Marcel Holtmann
Hi David,

> This adds support for setting the public address on Texas Instruments
> Bluetooth chips using a vendor-specific command.
> 
> This has been tested on a CC2560A. The TI wiki also indicates that this
> command should work on TI WL17xx/WL18xx Bluetooth chips.
> 
> Signed-off-by: David Lechner 
> ---
> 
> v2 changes:
> * This is a new patch in v2
> 
> drivers/bluetooth/hci_ll.c | 17 +
> 1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/bluetooth/hci_ll.c b/drivers/bluetooth/hci_ll.c
> index 974a788..b732004 100644
> --- a/drivers/bluetooth/hci_ll.c
> +++ b/drivers/bluetooth/hci_ll.c
> @@ -57,6 +57,7 @@
> #include "hci_uart.h"
> 
> /* Vendor-specific HCI commands */
> +#define HCI_VS_WRITE_BD_ADDR 0xfc06
> #define HCI_VS_UPDATE_UART_HCI_BAUDRATE   0xff36
> 
> /* HCILL commands */
> @@ -662,6 +663,20 @@ static int download_firmware(struct ll_device *lldev)
>   return err;
> }
> 
> +static int ll_set_bdaddr(struct hci_dev *hdev, const bdaddr_t *bdaddr)
> +{
> + bdaddr_t bdaddr_swapped;
> + struct sk_buff *skb;
> +
> + baswap(&bdaddr_swapped, bdaddr);
> + skb = __hci_cmd_sync(hdev, HCI_VS_WRITE_BD_ADDR, sizeof(bdaddr_t),
> +  &bdaddr_swapped, HCI_INIT_TIMEOUT);
> + if (!IS_ERR(skb))
> + kfree_skb(skb);
> + 

You have a trailing whitespace here.

Does the HCI command really expect the BD_ADDR in the swapped order. The caller 
of hdev->set_bdaddr while provide it in the same order as the HCI Read BD 
Address command and everything in HCI. So it seems odd that you have to swap it 
for the vendor command.

So have you actually tested this with btmgmt public-add  and checked 
that the address comes out correctly. I think ll_set_bdaddr should function 
correctly for the mgmt interface. And if needed any other caller outside of 
mgmt has to do the swapping.

Regards

Marcel



Re: [PATCH v2 2/3] dt-bindings: Add optional nvmem BD address bindings to ti,wlink-st

2017-12-08 Thread Marcel Holtmann
Hi David,


> This adds optional nvmem consumer properties to the ti,wlink-st device tree
> bindings to allow specifying the BD address.
> 
> Signed-off-by: David Lechner 
> ---
> 
> v2 changes:
> * Renamed "mac-address" to "bd-address"
> * Fixed typos in example
> * Specify byte order of "bd-address"
> 
> Documentation/devicetree/bindings/net/ti,wilink-st.txt | 5 +
> 1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/ti,wilink-st.txt 
> b/Documentation/devicetree/bindings/net/ti,wilink-st.txt
> index 1649c1f..a45a508 100644
> --- a/Documentation/devicetree/bindings/net/ti,wilink-st.txt
> +++ b/Documentation/devicetree/bindings/net/ti,wilink-st.txt
> @@ -32,6 +32,9 @@ Optional properties:
>See ../clocks/clock-bindings.txt for details.
>  - clock-names : Must include the following entry:
>"ext_clock" (External clock provided to the TI combo chip).
> + - nvmem-cells: phandle to nvmem data cell that contains a 6 byte BD address
> +   with the most significant byte first (big-endian).
> + - nvmem-cell-names: "bd-address" (required when nvmem-cells is specified)
> 
> Example:
> 
> @@ -43,5 +46,7 @@ Example:
>   enable-gpios = <&gpio1 7 GPIO_ACTIVE_HIGH>;
>   clocks = <&clk32k_wl18xx>;
>   clock-names = "ext_clock";
> + nvmem-cells = <&bd_address>;
> + nvmem-cell-names = "bd-address”;

For me this looks good, but I like to get an extra ACK from Rob on this.

Regards

Marcel



Re: [lkp-robot] [sctp] ecca8f88da: ltp.test_sctp_sendrecvmsg.fail

2017-12-08 Thread Xin Long
On Fri, Dec 8, 2017 at 1:26 PM, kernel test robot  wrote:
>
> FYI, we noticed the following commit (built with gcc-6):
>
> commit: ecca8f88da5c4260cc2bccfefd2a24976704c366 ("sctp: set frag_point in 
> sctp_setsockopt_maxseg correctly")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: ltp
> with following parameters:
>
> test: net.sctp
>
> test-description: The LTP testsuite contains a collection of tools for 
> testing the Linux kernel and related features.
> test-url: http://linux-test-project.github.io/
>
>
> on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 
> 64G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> <<>>
> tag=test_sctp_sendrecvmsg stime=1512621556
> cmdline="test_sctp_sendrecvmsg"
> contacts=""
> analysis=exit
> <<>>
> test_sctp_sendrecvmsg.c1  TBROK  :  
> /tmp/build-ltp/ltp/utils/sctp/func_tests/../testlib/sctputil.h:258: 
> setsockopt(13): Invalid argument
> test_sctp_sendrecvmsg.c2  TBROK  :  
> /tmp/build-ltp/ltp/utils/sctp/func_tests/../testlib/sctputil.h:258: Remaining 
> cases broken
> <<>>
> initiation_status="ok"
> duration=0 termination_type=exited termination_id=2 corefile=no
> cutime=0 cstime=0
> <<>>
> <<>>
> tag=test_sctp_sendrecvmsg_v6 stime=1512621556
> cmdline="test_sctp_sendrecvmsg_v6"
> contacts=""
> analysis=exit
> <<>>
> test_sctp_sendrecvmsg.c1  TBROK  :  
> /tmp/build-ltp/ltp/utils/sctp/func_tests/../testlib/sctputil.h:258: 
> setsockopt(13): Invalid argument
> test_sctp_sendrecvmsg.c2  TBROK  :  
> /tmp/build-ltp/ltp/utils/sctp/func_tests/../testlib/sctputil.h:258: Remaining 
> cases broken
> <<>>
> initiation_status="ok"
> duration=0 termination_type=exited termination_id=2 corefile=no
> cutime=1 cstime=0
> <<>>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
Hi, please update the sctp case according to:

https://github.com/sctp/lksctp-tools/commit/723993d3e33100fa96245d2ed46611eb9cba5236

Thanks.


Re: [PATCH V6 4/7] OF: properties: Implement get_match_data() callback

2017-12-08 Thread Lothar Waßmann
Hi,

On Thu, 7 Dec 2017 12:50:50 -0500 Sinan Kaya wrote:
> On 12/7/2017 10:20 AM, Lothar Waßmann wrote:
> > Hi,
> > 
> > On Thu, 7 Dec 2017 09:45:31 -0500 Sinan Kaya wrote:
> >> On 12/7/2017 8:10 AM, Lothar Waßmann wrote:
>  +void *of_fwnode_get_match_data(const struct fwnode_handle *fwnode,
>  +   struct device *dev)
> >>> Shouldn't this be 'const void *of_fwnode_get_match_data
> >>
> >> OF keeps the driver data as a (const void*) internally. ACPI keeps the
> >> driver data as kernel_ulong_t in struct acpi_device_id.
> >>
> >> I tried to find the middle ground here by converting output to void*
> >> but not keeping const.
> >>
> > It should be no problem to cast a (const void *) to an unsigned long
> > data type (without const qualifier).
> > 
> 
> It is the other way around. If I change this API to return a a (const void*),
> the device_get_match_data() function need to return a (const void *).
> 
> While implementing the ACPI piece, I have to convert an unsigned long to
> (const void *) in ACPI code so that the APIs are compatible.
> 
That's true, but I don't see any problem with that. Your
device_get_match_data() is merely a wrapper around of_device_get_match_data()
which returns a const pointer. I see no reason to change this to a
non-const pointer by the wrapper function.


Lothar Waßmann


Re: [PATCH v6 net-next,mips 0/7] Cavium OCTEON-III network driver.

2017-12-08 Thread Philippe Ombredanne
David,

On Fri, Dec 8, 2017 at 1:09 AM, David Daney  wrote:
[]
> Changes in v5:
[]
> o Removed redundant licensing text boilerplate.

Thank you very much!

Acked-by: Philippe Ombredanne 

-- 
Cordially
Philippe Ombredanne, the licensing scruffy


[PATCH 3/3] misc: ad525x_dpot: macros should not use a trailing semicolon

2017-12-08 Thread Dhaval Shah
Resolved all the macros should not use a trailing semicolon
checkpatch warnings. Issue found by checkpatch.

Signed-off-by: Dhaval Shah 
---
 drivers/misc/ad525x_dpot.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/ad525x_dpot.c b/drivers/misc/ad525x_dpot.c
index 577f5e76c8a8..bc591b7168db 100644
--- a/drivers/misc/ad525x_dpot.c
+++ b/drivers/misc/ad525x_dpot.c
@@ -515,11 +515,11 @@ set_##_name(struct device *dev, \
 #define DPOT_DEVICE_SHOW_SET(name, reg) \
 DPOT_DEVICE_SHOW(name, reg) \
 DPOT_DEVICE_SET(name, reg) \
-static DEVICE_ATTR(name, S_IWUSR | S_IRUGO, show_##name, set_##name);
+static DEVICE_ATTR(name, S_IWUSR | S_IRUGO, show_##name, set_##name)
 
 #define DPOT_DEVICE_SHOW_ONLY(name, reg) \
 DPOT_DEVICE_SHOW(name, reg) \
-static DEVICE_ATTR(name, S_IWUSR | S_IRUGO, show_##name, NULL);
+static DEVICE_ATTR(name, S_IWUSR | S_IRUGO, show_##name, NULL)
 
 DPOT_DEVICE_SHOW_SET(rdac0, DPOT_ADDR_RDAC | DPOT_RDAC0);
 DPOT_DEVICE_SHOW_SET(eeprom0, DPOT_ADDR_EEPROM | DPOT_RDAC0);
@@ -616,7 +616,7 @@ set_##_name(struct device *dev, \
 { \
return sysfs_do_cmd(dev, attr, buf, count, _cmd); \
 } \
-static DEVICE_ATTR(_name, S_IWUSR | S_IRUGO, NULL, set_##_name);
+static DEVICE_ATTR(_name, S_IWUSR | S_IRUGO, NULL, set_##_name)
 
 DPOT_DEVICE_DO_CMD(inc_all, DPOT_INC_ALL);
 DPOT_DEVICE_DO_CMD(dec_all, DPOT_DEC_ALL);
-- 
2.11.0



[PATCH] Input: atmel_mxt_ts: Add touchscreen support for Chromebooks with upstream coreboot

2017-12-08 Thread Jean Lucas
Chromebooks use coreboot for system initialization. coreboot has always
had the default mainboard vendor string for Google machines set to
"Google". Google engineers set this string to "GOOGLE" for the coreboot
copy within their Chromium OS tree. The atmel_mxt_ts driver in its
current state is set to match the latter case; it will only bind to a
Chromebook's touchscreen either if the device uses the vendor coreboot
firmware (providing the matching mainboard vendor string), or if a user
running upstream coreboot has manually set the string to "GOOGLE". Let's
add a match for coreboot's default.

Signed-off-by: Jean Lucas 
---
 drivers/input/touchscreen/atmel_mxt_ts.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c 
b/drivers/input/touchscreen/atmel_mxt_ts.c
index 7659bc48f1db..43d1ea4145d7 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -3038,6 +3038,14 @@ static const struct dmi_system_id mxt_dmi_table[] = {
},
.driver_data = chromebook_platform_data,
},
+   {
+   /* Chromebooks with a custom coreboot version */
+   .ident = "Chromebook",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "Google"),
+   },
+   .driver_data = chromebook_platform_data,
+   },
{ }
 };
 
-- 
2.15.1



[PATCH 1/3] misc: ad525x_dpot: Prefer 'unsigned int' to bare use of 'unsigned'

2017-12-08 Thread Dhaval Shah
Resolved all the Prefer 'unsigned int' to bare use of 'unsigned'
checkpatch warnings. Issue found by checkpatch.

Signed-off-by: Dhaval Shah 
---
 drivers/misc/ad525x_dpot.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/misc/ad525x_dpot.c b/drivers/misc/ad525x_dpot.c
index fe1672747bc1..1c6b55655f52 100644
--- a/drivers/misc/ad525x_dpot.c
+++ b/drivers/misc/ad525x_dpot.c
@@ -84,12 +84,12 @@
 struct dpot_data {
struct ad_dpot_bus_data bdata;
struct mutex update_lock;
-   unsigned rdac_mask;
-   unsigned max_pos;
+   unsigned int rdac_mask;
+   unsigned int max_pos;
unsigned long devid;
-   unsigned uid;
-   unsigned feat;
-   unsigned wipers;
+   unsigned int uid;
+   unsigned int feat;
+   unsigned int wipers;
u16 rdac_cache[MAX_RDACS];
DECLARE_BITMAP(otp_en_mask, MAX_RDACS);
 };
@@ -126,7 +126,7 @@ static inline int dpot_write_r8d16(struct dpot_data *dpot, 
u8 reg, u16 val)
 
 static s32 dpot_read_spi(struct dpot_data *dpot, u8 reg)
 {
-   unsigned ctrl = 0;
+   unsigned int ctrl = 0;
int value;
 
if (!(reg & (DPOT_ADDR_EEPROM | DPOT_ADDR_CMD))) {
@@ -175,7 +175,7 @@ static s32 dpot_read_spi(struct dpot_data *dpot, u8 reg)
 static s32 dpot_read_i2c(struct dpot_data *dpot, u8 reg)
 {
int value;
-   unsigned ctrl = 0;
+   unsigned int ctrl = 0;
 
switch (dpot->uid) {
case DPOT_UID(AD5246_ID):
@@ -238,7 +238,7 @@ static s32 dpot_read(struct dpot_data *dpot, u8 reg)
 
 static s32 dpot_write_spi(struct dpot_data *dpot, u8 reg, u16 value)
 {
-   unsigned val = 0;
+   unsigned int val = 0;
 
if (!(reg & (DPOT_ADDR_EEPROM | DPOT_ADDR_CMD | DPOT_ADDR_OTP))) {
if (dpot->feat & F_RDACS_WONLY)
@@ -328,7 +328,7 @@ static s32 dpot_write_spi(struct dpot_data *dpot, u8 reg, 
u16 value)
 static s32 dpot_write_i2c(struct dpot_data *dpot, u8 reg, u16 value)
 {
/* Only write the instruction byte for certain commands */
-   unsigned tmp = 0, ctrl = 0;
+   unsigned int tmp = 0, ctrl = 0;
 
switch (dpot->uid) {
case DPOT_UID(AD5246_ID):
@@ -636,7 +636,7 @@ static const struct attribute_group ad525x_group_commands = 
{
 };
 
 static int ad_dpot_add_files(struct device *dev,
-   unsigned features, unsigned rdac)
+   unsigned int features, unsigned int rdac)
 {
int err = sysfs_create_file(&dev->kobj,
dpot_attrib_wipers[rdac]);
@@ -661,7 +661,7 @@ static int ad_dpot_add_files(struct device *dev,
 }
 
 static inline void ad_dpot_remove_files(struct device *dev,
-   unsigned features, unsigned rdac)
+   unsigned int features, unsigned int rdac)
 {
sysfs_remove_file(&dev->kobj,
dpot_attrib_wipers[rdac]);
-- 
2.11.0



[PATCH 0/3] misc: ad525x_dpot: Different type of warnings are resolved.

2017-12-08 Thread Dhaval Shah
Three types of checkpatch warning are resolved.
 * First patch  : Prefer 'unsigned int' to bare use of 'unsigned'
 * Second patch : please, no space before tabs
 * third patch  : macros should not use a trailing semicolon

Issue found by checkpatch.
./scripts/checkpatch.pl -f --strict drivers/misc/ad525x_dpot.c 

Dhaval Shah (3):
  misc: ad525x_dpot: Prefer 'unsigned int' to bare use of 'unsigned'
  misc: ad525x_dpot: please, no space before tabs
  misc: ad525x_dpot: macros should not use a trailing semicolon

 drivers/misc/ad525x_dpot.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

-- 
2.11.0



[PATCH 2/3] misc: ad525x_dpot: please, no space before tabs

2017-12-08 Thread Dhaval Shah
Resolved the please, no space beofore tabs checkpatch
warning. Issue found by checkpatch.

Signed-off-by: Dhaval Shah 
---
 drivers/misc/ad525x_dpot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/ad525x_dpot.c b/drivers/misc/ad525x_dpot.c
index 1c6b55655f52..577f5e76c8a8 100644
--- a/drivers/misc/ad525x_dpot.c
+++ b/drivers/misc/ad525x_dpot.c
@@ -3,7 +3,7 @@
  * Copyright (c) 2009-2010 Analog Devices, Inc.
  * Author: Michael Hennerich 
  *
- * DEVID   #Wipers #Positions  Resistor Options (kOhm)
+ * DEVID   #Wipers #Positions  Resistor Options (kOhm)
  * AD5258  1   64  1, 10, 50, 100
  * AD5259  1   256 5, 10, 50, 100
  * AD5251  2   64  1, 10, 50, 100
-- 
2.11.0



Re: [PATCH v2 3/3] Bluetooth: hci_ll: Add optional nvmem BD address source

2017-12-08 Thread Marcel Holtmann
Hi David,

> This adds an optional nvmem consumer to get a BD address from an external
> source. The BD address is then set in the Bluetooth chip after the
> firmware has been loaded.
> 
> This has been tested working with a TI CC2560A chip (in a LEGO MINDSTORMS
> EV3).
> 
> Signed-off-by: David Lechner 
> ---
> 
> v2 changes:
> * Add support for HCI_QUIRK_INVALID_BDADDR when there is an error getting the
>  BD address from nvmem
> * Rework error handling
> * rename "mac-address" to "bd-address"
> * use bdaddr_t, bacmp and other bluetooth helper functions
> * use ll_set_bdaddr() from new, separate patch
> 
> drivers/bluetooth/hci_ll.c | 55 ++
> 1 file changed, 55 insertions(+)
> 
> diff --git a/drivers/bluetooth/hci_ll.c b/drivers/bluetooth/hci_ll.c
> index b732004..f5fef2d 100644
> --- a/drivers/bluetooth/hci_ll.c
> +++ b/drivers/bluetooth/hci_ll.c
> @@ -53,6 +53,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include "hci_uart.h"
> 
> @@ -90,6 +91,7 @@ struct ll_device {
>   struct serdev_device *serdev;
>   struct gpio_desc *enable_gpio;
>   struct clk *ext_clk;
> + bdaddr_t bdaddr;
> };
> 
> struct ll_struct {
> @@ -715,6 +717,19 @@ static int ll_setup(struct hci_uart *hu)
>   if (err)
>   return err;
> 
> + /* Set BD address if one was specified at probe */
> + if (!bacmp(&lldev->bdaddr, BDADDR_NONE)) {
> + /*
> +  * This means that there was an error getting the BD address
> +  * during probe, so mark the device as having a bad address.
> +  */
> + set_bit(HCI_QUIRK_INVALID_BDADDR, &hu->hdev->quirks);
> + } else if (bacmp(&lldev->bdaddr, BDADDR_ANY)) {
> + err = ll_set_bdaddr(hu->hdev, &lldev->bdaddr);
> + if (err)
> + set_bit(HCI_QUIRK_INVALID_BDADDR, &hu->hdev->quirks);
> + }
> +
>   /* Operational speed if any */
>   if (hu->oper_speed)
>   speed = hu->oper_speed;
> @@ -743,6 +758,7 @@ static int hci_ti_probe(struct serdev_device *serdev)
> {
>   struct hci_uart *hu;
>   struct ll_device *lldev;
> + struct nvmem_cell *bdaddr_cell;
>   u32 max_speed = 300;
> 
>   lldev = devm_kzalloc(&serdev->dev, sizeof(struct ll_device), 
> GFP_KERNEL);
> @@ -764,6 +780,45 @@ static int hci_ti_probe(struct serdev_device *serdev)
>   of_property_read_u32(serdev->dev.of_node, "max-speed", &max_speed);
>   hci_uart_set_speeds(hu, 115200, max_speed);
> 
> + /* optional BD address from nvram */
> + bdaddr_cell = nvmem_cell_get(&serdev->dev, "bd-address");
> + if (IS_ERR(bdaddr_cell)) {
> + int err = PTR_ERR(bdaddr_cell);
> +
> + if (err == -EPROBE_DEFER)
> + return err;
> +
> + /*
> +  * ENOENT means there is no matching nvmem cell and ENOSYS
> +  * means that nvmem is not enabled in the kernel configuration.
> +  */

Fix the comment style to this:

/* foo
 * bar
 */

> + if (err != -ENOENT && err != -ENOSYS) {
> + /*
> +  * If there was some other error, give userspace a
> +  * chance to fix the problem instead of failing to load
> +  * the driver. Using BDADDR_NONE as a flag that is
> +  * tested later in the setup function.
> +  */
> + dev_warn(&serdev->dev,
> +  "Failed to get \"bd-address\" nvmem cell 
> (%d)\n",
> +  err);
> + bacpy(&lldev->bdaddr, BDADDR_NONE);
> + }
> + } else {
> + bdaddr_t *bdaddr;
> + int len;
> +
> + bdaddr = nvmem_cell_read(bdaddr_cell, &len);
> + if (len != sizeof(bdaddr_t)) {
> + dev_err(&serdev->dev, "Invalid nvmem bd-address 
> length\n");
> + nvmem_cell_put(bdaddr_cell);
> + return -EINVAL;
> + }
> +
> + baswap(&lldev->bdaddr, bdaddr);

This swapping needs a comment. Explain the format of the NVMEM storage and also 
which the HCI vendor command takes.

> + nvmem_cell_put(bdaddr_cell);
> + }
> +
>   return hci_uart_register_device(hu, &llp);
> }

Regards

Marcel



Re: [PATCH v3 2/2] clocksource: sprd: Add timer driver for Spreadtrum SC9860 platform

2017-12-08 Thread Baolin Wang
Hi Daniel,

On 8 December 2017 at 14:58, Daniel Lezcano  wrote:
> On 08/12/2017 06:03, Baolin Wang wrote:
>> The Spreadtrum SC9860 platform will use the architected timers as local
>> clock events, but we also need a broadcast timer device to wakeup the
>> cpus when the cpus are in sleep mode.
>>
>> The Spreadtrum timer can support 32bit or 64bit counter, as well as
>> supporting period mode or one-shot mode.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>> Changes since v2:
>>  - Add more timer description in changelog.
>>  - Rename the driver file.
>>  - Remove GENERIC_CLOCKEVENTS and ARCH_SPRD dependency.
>>  - Remove some redundant headfiles.
>>  - Use timer-of APIs.
>>  - Change the license format according to Linus[1][2][3],
>>  Thomas[4] and Greg[5] comments on the topic.
>>  [1] https://lkml.org/lkml/2017/11/2/715
>>  [2] https://lkml.org/lkml/2017/11/25/125
>>  [3] https://lkml.org/lkml/2017/11/25/133
>>  [4] https://lkml.org/lkml/2017/11/2/805
>>  [5] https://lkml.org/lkml/2017/10/19/165
>>
>> Changes since v1:
>>  - Change to 32bit counter to avoid build warning.
>> ---
>>  drivers/clocksource/Kconfig  |7 ++
>>  drivers/clocksource/Makefile |1 +
>>  drivers/clocksource/timer-sprd.c |  168 
>> ++
>>  3 files changed, 176 insertions(+)
>>  create mode 100644 drivers/clocksource/timer-sprd.c
>>
>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>> index c729a88..9a6b087 100644
>> --- a/drivers/clocksource/Kconfig
>> +++ b/drivers/clocksource/Kconfig
>> @@ -441,6 +441,13 @@ config MTK_TIMER
>>   help
>> Support for Mediatek timer driver.
>>
>> +config SPRD_TIMER
>> + bool "Spreadtrum timer driver" if COMPILE_TEST
>> + depends on HAS_IOMEM
>> + select TIMER_OF
>> + help
>> +   Enables the support for the Spreadtrum timer driver.
>> +
>>  config SYS_SUPPORTS_SH_MTU2
>>  bool
>>
>> diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile
>> index 72711f1..d6dec44 100644
>> --- a/drivers/clocksource/Makefile
>> +++ b/drivers/clocksource/Makefile
>> @@ -54,6 +54,7 @@ obj-$(CONFIG_CLKSRC_TI_32K) += timer-ti-32k.o
>>  obj-$(CONFIG_CLKSRC_NPS) += timer-nps.o
>>  obj-$(CONFIG_OXNAS_RPS_TIMER)+= timer-oxnas-rps.o
>>  obj-$(CONFIG_OWL_TIMER)  += owl-timer.o
>> +obj-$(CONFIG_SPRD_TIMER) += timer-sprd.o
>>
>>  obj-$(CONFIG_ARC_TIMERS) += arc_timer.o
>>  obj-$(CONFIG_ARM_ARCH_TIMER) += arm_arch_timer.o
>> diff --git a/drivers/clocksource/timer-sprd.c 
>> b/drivers/clocksource/timer-sprd.c
>> new file mode 100644
>> index 000..81a5f0c
>> --- /dev/null
>> +++ b/drivers/clocksource/timer-sprd.c
>> @@ -0,0 +1,168 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2017 Spreadtrum Communications Inc.
>> + */
>> +
>> +#include 
>> +#include 
>> +
>> +#include "timer-of.h"
>> +
>> +#define TIMER_NAME   "sprd_timer"
>> +
>> +#define TIMER_LOAD_LO0x0
>> +#define TIMER_LOAD_HI0x4
>> +#define TIMER_VALUE_LO   0x8
>> +#define TIMER_VALUE_HI   0xc
>> +
>> +#define TIMER_CTL0x10
>> +#define TIMER_CTL_PERIOD_MODEBIT(0)
>> +#define TIMER_CTL_ENABLE BIT(1)
>> +#define TIMER_CTL_64BIT_WIDTHBIT(16)
>> +
>> +#define TIMER_INT0x14
>> +#define TIMER_INT_EN BIT(0)
>> +#define TIMER_INT_RAW_STSBIT(1)
>> +#define TIMER_INT_MASK_STS   BIT(2)
>> +#define TIMER_INT_CLRBIT(3)
>> +
>> +#define TIMER_VALUE_SHDW_LO  0x18
>> +#define TIMER_VALUE_SHDW_HI  0x1c
>> +
>> +#define TIMER_VALUE_LO_MASK  GENMASK(31, 0)
>> +
>> +static void sprd_timer_enable(void __iomem *base, u32 flag)
>> +{
>> + u32 val = readl_relaxed(base + TIMER_CTL);
>> +
>> + val |= TIMER_CTL_ENABLE;
>> + if (flag & TIMER_CTL_64BIT_WIDTH)
>> + val |= TIMER_CTL_64BIT_WIDTH;
>> + else
>> + val &= ~TIMER_CTL_64BIT_WIDTH;
>> +
>> + if (flag & TIMER_CTL_PERIOD_MODE)
>> + val |= TIMER_CTL_PERIOD_MODE;
>> + else
>> + val &= ~TIMER_CTL_PERIOD_MODE;
>> +
>> + writel_relaxed(val, base + TIMER_CTL);
>> +}
>> +
>> +static void sprd_timer_disable(void __iomem *base)
>> +{
>> + u32 val = readl_relaxed(base + TIMER_CTL);
>> +
>> + val &= ~TIMER_CTL_ENABLE;
>> + writel_relaxed(val, base + TIMER_CTL);
>> +}
>> +
>> +static void sprd_timer_update_counter(void __iomem *base, unsigned long 
>> cycles)
>> +{
>> + writel_relaxed(cycles & TIMER_VALUE_LO_MASK, base + TIMER_LOAD_LO);
>> + writel_relaxed(0, base + TIMER_LOAD_HI);
>> +}
>> +
>> +static void sprd_timer_enable_interrupt(void __iomem *base)
>> +{
>> + writel_relaxed(TIMER_INT_EN, base + TIMER_INT);
>> +}
>> +
>> +static void sprd_timer_clear_interrupt(void __iomem *base)
>> +{
>> + u32 val = readl_relaxed(base + TIMER_INT);
>> +
>> + val |= TIMER_INT_CLR;
>> + writel_relaxed(val, base + TIMER_INT);
>> +}
>

Re: [PATCH v2] mm: terminate shrink_slab loop if signal is pending

2017-12-08 Thread Michal Hocko
On Thu 07-12-17 17:23:05, Suren Baghdasaryan wrote:
> Slab shrinkers can be quite time consuming and when signal
> is pending they can delay handling of the signal. If fatal
> signal is pending there is no point in shrinking that process
> since it will be killed anyway.

The thing is that we are _not_ shrinking _that_ process. We are
shrinking globally shared objects and the fact that the memory pressure
is so large that the kswapd doesn't keep pace with it means that we have
to throttle all allocation sites by doing this direct reclaim. I agree
that expediting killed task is a good thing in general because such a
process should free at least some memory.

> This change checks for pending
> fatal signals inside shrink_slab loop and if one is detected
> terminates this loop early.

This changelog doesn't really address my previous review feedback, I am
afraid. You should mention more details about problems you are seeing
and what causes them. If we have a shrinker which takes considerable
amount of time them we should be addressing that. If that is not
possible then it should be documented at least.

The changelog also should describe how does this play along with the
rest of the allocation path.

The patch is not mergeable in this form I am afraid.

> Signed-off-by: Suren Baghdasaryan 
> 
> ---
> V2:
> Sergey Senozhatsky:
>   - Fix missing parentheses
> ---
>  mm/vmscan.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c02c850ea349..28e4bdc72c16 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -486,6 +486,13 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>   .memcg = memcg,
>   };
>  
> + /*
> +  * We are about to die and free our memory.
> +  * Stop shrinking which might delay signal handling.
> +  */
> + if (unlikely(fatal_signal_pending(current)))
> + break;
> +
>   /*
>* If kernel memory accounting is disabled, we ignore
>* SHRINKER_MEMCG_AWARE flag and call all shrinkers
> -- 
> 2.15.1.424.g9478a66081-goog
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH net-next 1/6] net: mvpp2: only free the TSO header buffers when it was allocated

2017-12-08 Thread Antoine Tenart
Hi David,

On Thu, Dec 07, 2017 at 02:53:29PM -0500, David Miller wrote:
> From: Antoine Tenart 
> Date: Thu,  7 Dec 2017 09:48:58 +0100
> 
> > This patch adds a check to only free the TSO header buffer when its
> > allocation previously succeeded.
> > 
> > Signed-off-by: Antoine Tenart 
> 
> No, please keep this as a failure to bring up.
> 
> Even if you emit a log message, it is completely unintuitive to
> have netdev features change on the user just because of a memory
> allocation failure.

OK, makes sense.

One other possibility would be to disable TSO if CMA_SIZE_MBYTES is set
to a too small value (i.e. its default). But I don't think this would be
a good solution either.

The drawback is the default configuration when selecting DMA_CMA won't
work for PPv2.

Anyway, I'll send a v2 without these patches.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


[PATCH] sched/autogroup: move sched.h include

2017-12-08 Thread Sergey Senozhatsky
Move local "sched.h" include to the bottom. sched.h defines
several macros that are getting redefined in ARCH-specific
code, for instance, finish_arch_post_lock_switch() and
prepare_arch_switch(), so we need ARCH-specific definitions
to come in first.

Suggested-by: Martin Schwidefsky 
Signed-off-by: Sergey Senozhatsky 
---
 kernel/sched/autogroup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c
index 0786227a3f48..bb4b9fe026a1 100644
--- a/kernel/sched/autogroup.c
+++ b/kernel/sched/autogroup.c
@@ -1,12 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
-#include "sched.h"
-
 #include 
 #include 
 #include 
 #include 
 #include 
 
+#include "sched.h"
+
 unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
 static struct autogroup autogroup_default;
 static atomic_t autogroup_seq_nr;
-- 
2.15.1



Re: [PATCH -mm] mm, swap: Fix race between swapoff and some swap operations

2017-12-08 Thread Minchan Kim
On Fri, Dec 08, 2017 at 01:41:10PM +0800, Huang, Ying wrote:
> Minchan Kim  writes:
> 
> > On Thu, Dec 07, 2017 at 04:29:37PM -0800, Andrew Morton wrote:
> >> On Thu,  7 Dec 2017 09:14:26 +0800 "Huang, Ying"  
> >> wrote:
> >> 
> >> > When the swapin is performed, after getting the swap entry information
> >> > from the page table, the PTL (page table lock) will be released, then
> >> > system will go to swap in the swap entry, without any lock held to
> >> > prevent the swap device from being swapoff.  This may cause the race
> >> > like below,
> >> > 
> >> > CPU 1CPU 2
> >> > --
> >> >  do_swap_page
> >> >swapin_readahead
> >> >  __read_swap_cache_async
> >> > swapoffswapcache_prepare
> >> >   p->swap_map = NULL __swap_duplicate
> >> >p->swap_map[?] /* !!! NULL pointer 
> >> > access */
> >> > 
> >> > Because swap off is usually done when system shutdown only, the race
> >> > may not hit many people in practice.  But it is still a race need to
> >> > be fixed.
> >> 
> >> swapoff is so rare that it's hard to get motivated about any fix which
> >> adds overhead to the regular codepaths.
> >
> > That was my concern, too when I see this patch.
> >
> >> 
> >> Is there something we can do to ensure that all the overhead of this
> >> fix is placed into the swapoff side?  stop_machine() may be a bit
> >> brutal, but a surprising amount of code uses it.  Any other ideas?
> >
> > How about this?
> >
> > I think It's same approach with old where we uses si->lock everywhere
> > instead of more fine-grained cluster lock.
> >
> > The reason I repeated to reset p->max to zero in the loop is to avoid
> > using lockdep annotation(maybe, spin_lock_nested(something) to prevent
> > false positive.
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 42fe5653814a..9ce007a42bbc 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -2644,6 +2644,19 @@ SYSCALL_DEFINE1(swapoff, const char __user *, 
> > specialfile)
> > swap_file = p->swap_file;
> > old_block_size = p->old_block_size;
> > p->swap_file = NULL;
> > +
> > +   if (p->flags & SWP_SOLIDSTATE) {
> > +   unsigned long ci, nr_cluster;
> > +
> > +   nr_cluster = DIV_ROUND_UP(p->max, SWAPFILE_CLUSTER);
> > +   for (ci = 0; ci < nr_cluster; ci++) {
> > +   struct swap_cluster_info *sci;
> > +
> > +   sci = lock_cluster(p, ci * SWAPFILE_CLUSTER);
> > +   p->max = 0;
> > +   unlock_cluster(sci);
> > +   }
> > +   }
> > p->max = 0;
> > swap_map = p->swap_map;
> > p->swap_map = NULL;
> > @@ -3369,10 +3382,10 @@ static int __swap_duplicate(swp_entry_t entry, 
> > unsigned char usage)
> > goto bad_file;
> > p = swap_info[type];
> > offset = swp_offset(entry);
> > -   if (unlikely(offset >= p->max))
> > -   goto out;
> >  
> > ci = lock_cluster_or_swap_info(p, offset);
> > +   if (unlikely(offset >= p->max))
> > +   goto unlock_out;
> >  
> > count = p->swap_map[offset];
> >  
> 
> Sorry, this doesn't work, because
> 
> lock_cluster_or_swap_info()
> 
> Need to read p->cluster_info, which may be freed during swapoff too.
> 
> 
> To reduce the added overhead in regular code path, Maybe we can use SRCU
> to implement get_swap_device() and put_swap_device()?  There is only
> increment/decrement on CPU local variable in srcu_read_lock/unlock().
> Should be acceptable in not so hot swap path?
> 
> This needs to select CONFIG_SRCU if CONFIG_SWAP is enabled.  But I guess
> that should be acceptable too?
> 

Why do we need srcu here? Is it enough with rcu like below?

It might have a bug/room to be optimized about performance/naming.
I just wanted to show my intention.

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2417d288e016..bfe493f3bcb8 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -273,6 +273,7 @@ struct swap_info_struct {
 */
struct work_struct discard_work; /* discard worker */
struct swap_cluster_list discard_clusters; /* discard clusters list */
+   struct rcu_head rcu;
 };
 
 #ifdef CONFIG_64BIT
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 42fe5653814a..ecec064f9b20 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -302,6 +302,7 @@ static inline struct swap_cluster_info 
*lock_cluster_or_swap_info(
 {
struct swap_cluster_info *ci;
 
+   rcu_read_lock();
ci = lock_cluster(si, offset);
if (!ci)
spin_lock(&si->lock);
@@ -316,6 +317,7 @@ static inline void unlock_cluster_or_swap_info(struct 
swap_info_struct *si,
unlock_cluster(ci);
else
spin_unlock(&si->lock);
+   rcu_read_unlock();
 }
 
 stati

Re: [PATCH v2 2/3] x86/acpi: take rsdp address for boot params if available

2017-12-08 Thread Juergen Gross
On 08/12/17 08:05, Ingo Molnar wrote:
> 
> * Juergen Gross  wrote:
> 
>> In case the rsdp address in struct boot_params is specified don't try
>> to find the table by searching, but take the address directly as set
>> by the boot loader.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  drivers/acpi/osl.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index 3bb46cb24a99..3b25e2ad7d75 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -45,6 +45,10 @@
>>  #include 
>>  #include 
>>  
>> +#ifdef CONFIG_X86
>> +#include 
>> +#endif
>> +
>>  #include "internal.h"
>>  
>>  #define _COMPONENT  ACPI_OS_SERVICES
>> @@ -195,6 +199,10 @@ acpi_physical_address __init 
>> acpi_os_get_root_pointer(void)
>>  if (acpi_rsdp)
>>  return acpi_rsdp;
>>  #endif
>> +#ifdef CONFIG_X86
>> +if (boot_params.hdr.acpi_rsdp_addr)
>> +return boot_params.hdr.acpi_rsdp_addr;
>> +#endif
> 
> Argh, that's typical short sighted hackery, layering violations and general 
> eyesore combined into a single patch ...
> 
> Those #ifdefs are a disgrace, plus why should generic ACPI code include 
> platform 
> details like boot_params.hdr/acpi_rsdp_addr? It's also not very extensible to 
> non-x86 - so someone will have to redo this work for ARM64 as well in the 
> future 
> ...
> 
> So how about doing it right:
> 
> 1)
> 
> Add a __weak acpi_arch_get_root_pointer() __weak function to 
> drivers/acpi/osl.c:
> 
> 
> __weak acpi_physical_address acpi_arch_get_root_pointer(void)
> {
>   return 0;
> }
> 
> 2)
> 
> use it in acpi_os_get_root_pointer():
> 
>   ...
>   pa = acpi_arch_get_root_pointer();
>   if (pa)
>   return pa;
>   ...
> 
> 3)
> 
> Override the default variant in x86's acpi.c via something like:
> 
> acpi_physical_address acpi_arch_get_root_pointer(void)
> {
>   return boot_params.hdr.acpi_rsdp_addr;
> }
> 
> 4)
> 
> Add this to arch/x86/include/asm/acpi.h:
> 
> extern acpi_physical_address acpi_arch_get_root_pointer(void);
> 
> 5)
> 
> Add #include  to drivers/acpi/osl.c.
> 
> 
> That looks much cleaner, has no layering violations and is infinitely more 
> extensible, right?

Right.

Thanks for the very constructive comment.


Juergen


Re: [RFC PATCH] mm: kasan: suppress soft lockup in slub when !CONFIG_PREEMPT

2017-12-08 Thread Dmitry Vyukov
On Fri, Dec 8, 2017 at 12:40 AM, Matthew Wilcox  wrote:
> On Fri, Dec 08, 2017 at 07:30:07AM +0800, Yang Shi wrote:
>> When running stress test with KASAN enabled, the below softlockup may
>> happen occasionally:
>>
>> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s!
>> hardirqs last  enabled at (0): [<  (null)>]  (null)
>> hardirqs last disabled at (0): [] copy_process.part.30+0x5c6/0x1f50
>> softirqs last  enabled at (0): [] copy_process.part.30+0x5c6/0x1f50
>> softirqs last disabled at (0): [<  (null)>]  (null)
>
>> Call Trace:
>>  [] __slab_free+0x19c/0x270
>>  [] ___cache_free+0xa6/0xb0
>>  [] qlist_free_all+0x47/0x80
>>  [] quarantine_reduce+0x159/0x190
>>  [] kasan_kmalloc+0xaf/0xc0
>>  [] kasan_slab_alloc+0x12/0x20
>>  [] kmem_cache_alloc+0xfa/0x360
>>  [] ? getname_flags+0x4f/0x1f0
>>  [] getname_flags+0x4f/0x1f0
>>  [] getname+0x12/0x20
>>  [] do_sys_open+0xf9/0x210
>>  [] SyS_open+0x1e/0x20
>>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> This feels like papering over a problem.  KASAN only calls
> quarantine_reduce() when it's allowed to block.  Presumably it has
> millions of entries on the free list at this point.  I think the right
> thing to do is for qlist_free_all() to call cond_resched() after freeing
> every N items.


Agree. Adding touch_softlockup_watchdog() to a random low-level
function looks like a wrong thing to do.
quarantine_reduce() already has this logic. Look at
QUARANTINE_BATCHES. It's meant to do exactly this -- limit amount of
work in quarantine_reduce() and in quarantine_remove_cache() to
reasonably-sized batches. We could simply increase number of batches
to make them smaller. But it would be good to understand what exactly
happens in this case. Batches should on a par of ~~1MB. Why freeing
1MB worth of objects (smallest of which is 32b) takes 22 seconds?



>> The code is run in irq disabled or preempt disabled context, so
>> cond_resched() can't be used in this case. Touch softlockup watchdog when
>> KASAN is enabled to suppress the warning.
>>
>> Signed-off-by: Yang Shi 
>> ---
>>  mm/slub.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index cfd56e5..4ae435e 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -35,6 +35,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include 
>>
>> @@ -2266,6 +2267,10 @@ static void put_cpu_partial(struct kmem_cache *s, 
>> struct page *page, int drain)
>>   page->pobjects = pobjects;
>>   page->next = oldpage;
>>
>> +#ifdef CONFIG_KASAN
>> + touch_softlockup_watchdog();
>> +#endif
>> +
>>   } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
>>   != oldpage);
>>   if (unlikely(!s->cpu_partial)) {
>> --
>> 1.8.3.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 


RE: URGENT PLEASE

2017-12-08 Thread Office Contact


Re: [PATCH v4] mm, thp: introduce generic transparent huge page allocation interfaces

2017-12-08 Thread Michal Hocko
On Fri 08-12-17 12:42:55, changbin...@intel.com wrote:
> From: Changbin Du 
> 
> This patch introduced 4 new interfaces to allocate a prepared transparent
> huge page. These interfaces merge distributed two-step allocation as simple
> single step. And they can avoid issue like forget to call 
> prep_transhuge_page()
> or call it on wrong page. A real fix:
> 40a899e ("mm: migrate: fix an incorrect call of prep_transhuge_page()")
> 
> Anyway, I just want to prove that expose direct allocation interfaces is
> better than a interface only do the second part of it.
> 
> These are similar to alloc_hugepage_xxx which are for hugetlbfs pages. New
> interfaces are:
>   - alloc_transhuge_page_vma
>   - alloc_transhuge_page_nodemask
>   - alloc_transhuge_page_node
>   - alloc_transhuge_page
> 
> These interfaces implicitly add __GFP_COMP gfp mask which is the minimum
> flags used for huge page allocation. More flags leave to the callers.
> 
> This patch does below changes:
>   - define alloc_transhuge_page_xxx interfaces
>   - apply them to all existing code
>   - declare prep_transhuge_page as static since no others use it
>   - remove alloc_hugepage_vma definition since it no longer has users

I am not really convinced this is a huge win, to be honest. Just look at
the diffstat. Very few callsites get marginally simpler while we add a
lot of stubs and the code churn.

> Signed-off-by: Changbin Du 
> 
> ---
> v4:
>   - Revise the nop function definition. (Andrew)
> 
> v3:
>   - Rebase to latest mainline.
> 
> v2:
> Anshuman Khandu:
>   - Remove redundant 'VM_BUG_ON(!(gfp_mask & __GFP_COMP))'.
> Andrew Morton:
>   - Fix build error if thp is disabled.
> ---
>  include/linux/gfp.h |  4 
>  include/linux/huge_mm.h | 35 +--
>  include/linux/migrate.h | 14 +-
>  mm/huge_memory.c| 48 +---
>  mm/khugepaged.c | 11 ++-
>  mm/mempolicy.c  | 14 +++---
>  mm/migrate.c| 14 --
>  mm/shmem.c  |  6 ++
>  8 files changed, 90 insertions(+), 56 deletions(-)
-- 
Michal Hocko
SUSE Labs


Re: WARNING in x86_emulate_insn

2017-12-08 Thread Tianyu Lan
Hi Jim&Wanpeng:
 Thanks for your help.

2017-12-08 5:25 GMT+08:00 Jim Mattson :
> Try disabling the module parameter, "unrestricted_guest." Make sure
> that the module parameter, "emulate_invalid_guest_state" is enabled.
> This combination allows userspace to feed invalid guest state into the
> in-kernel emulator.

Yes, you are right. I need to disable unrestricted_guest to reproduce the issue.

I find this is pop instruction emulation issue. According "SDM VOL2,
chapter INSTRUCTION
SET REFERENCE. POP—Pop a Value from the Stack"

Protected Mode Exceptions
#GP(0) If attempt is made to load SS register with NULL segment selector.

This test case hits it but current code doesn't check such case.
The following patch can fix the issue.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index abe74f7..e2ac5cc 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1844,6 +1844,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
int rc;
struct segmented_address addr;

+   if ( !get_segment_selector(ctxt, VCPU_SREG_SS))
+   return emulate_gp(ctxt, 0);
+
addr.ea = reg_read(ctxt, VCPU_REGS_RSP) & stack_mask(ctxt);
addr.seg = VCPU_SREG_SS;
rc = segmented_read(ctxt, addr, dest, len);


Re: [Xen-devel] [PATCH v2 1/3] x86/boot: add acpi rsdp address to setup_header

2017-12-08 Thread Jan Beulich
>>> On 08.12.17 at 08:16,  wrote:
> Also, a more fundamental question: why doesn't Xen use EFI to hand over 
> hardware configuration details?

Iirc the main purpose of the change here is to allow booting PVH
(guest or Dom0) with Grub2 in the middle. PVH, at least for the
time being, is something that gets away without any firmware
(and I'm pretty certain this is going to remain that way for Dom0).
ACPI tables are being built by the tool stack (guest) or hypervisor
(Dom0). Hence there simply isn't any EFI which could be used to
propagate such information.

Jan




Re: [PATCH net-next 1/6] net: mvpp2: only free the TSO header buffers when it was allocated

2017-12-08 Thread Antoine Tenart
On Thu, Dec 07, 2017 at 02:53:29PM -0500, David Miller wrote:
> From: Antoine Tenart 
> Date: Thu,  7 Dec 2017 09:48:58 +0100
> 
> > This patch adds a check to only free the TSO header buffer when its
> > allocation previously succeeded.
> > 
> > Signed-off-by: Antoine Tenart 
> 
> No, please keep this as a failure to bring up.
> 
> Even if you emit a log message, it is completely unintuitive to
> have netdev features change on the user just because of a memory
> allocation failure.

One thing I forgot, this patch still is needed for a proper error path
handling. We can't be sure all buffers were allocated correctly when
calling mvpp2_txq_deinit() (e.g. if one of them was the reason of the
fail).

I'll send a v2 without the patch 2/6, but I'll keep this one.

Thanks,
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


[PATCH v2] usb: dwc2: Change TxFIFO and RxFIFO flushing flow

2017-12-08 Thread Minas Harutyunyan
Before flushing fifos required to check AHB master state and
flush when AHB master is in IDLE state.

Signed-off-by: Minas Harutyunyan 
---
 drivers/usb/dwc2/core.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/usb/dwc2/core.c b/drivers/usb/dwc2/core.c
index dbca3b8890da..4d2a8c452e6b 100644
--- a/drivers/usb/dwc2/core.c
+++ b/drivers/usb/dwc2/core.c
@@ -670,10 +670,23 @@ void dwc2_flush_tx_fifo(struct dwc2_hsotg *hsotg, const 
int num)
 
dev_vdbg(hsotg->dev, "Flush Tx FIFO %d\n", num);
 
+   /* Wait for AHB master IDLE state */
+   do {
+   greset = dwc2_readl(hsotg->regs + GRSTCTL);
+   if (++count > 1) {
+   dev_warn(hsotg->dev,
+"%s() HANG! AHB Idle GRSTCTL=%0x\n",
+__func__, greset);
+   return;
+   }
+   udelay(1);
+   } while (!(greset & GRSTCTL_AHBIDLE));
+
greset = GRSTCTL_TXFFLSH;
greset |= num << GRSTCTL_TXFNUM_SHIFT & GRSTCTL_TXFNUM_MASK;
dwc2_writel(greset, hsotg->regs + GRSTCTL);
 
+   count = 0;
do {
greset = dwc2_readl(hsotg->regs + GRSTCTL);
if (++count > 1) {
@@ -702,9 +715,23 @@ void dwc2_flush_rx_fifo(struct dwc2_hsotg *hsotg)
 
dev_vdbg(hsotg->dev, "%s()\n", __func__);
 
+   /* Wait for AHB master IDLE state */
+   do {
+   greset = dwc2_readl(hsotg->regs + GRSTCTL);
+   if (++count > 1) {
+   dev_warn(hsotg->dev,
+"%s() HANG! AHB Idle GRSTCTL=%0x\n",
+__func__, greset);
+   return;
+   }
+   udelay(1);
+   } while (!(greset & GRSTCTL_AHBIDLE));
+
greset = GRSTCTL_RXFFLSH;
dwc2_writel(greset, hsotg->regs + GRSTCTL);
 
+   /* Wait for RxFIFO flush done */
+   count = 0;
do {
greset = dwc2_readl(hsotg->regs + GRSTCTL);
if (++count > 1) {
-- 
2.11.0



Re: Traversing XFS mounted VM puts process in D state

2017-12-08 Thread Dinesh Pathak
On Fri, Dec 8, 2017 at 7:08 AM, Dave Chinner  wrote:
> [cc linux-...@vger.kernel.org]
>
> On Fri, Dec 08, 2017 at 06:42:32AM +0530, Dinesh Pathak wrote:
>> Hi, We are mounting and traversing one backup of a VM with XFS filesystem.
>> Sometimes during traversing, the process goes into D state and can not be
>> killed. Eventually system needs to IPMI rebooted. This happens once in 100
>> times.
>>
>> This VM backup is kept on NFS storage. So we first do NFS mounting. Then do
>> loopback mount of the partition which contain XFS. After that we traverse
>> the file system, but this traversing is not necessarily multi threaded (We
>> have seen the issue in both single-threaded and multi-threaded traversal)
>>
>> I see a similar problem reported here: https://access.redhat.com/
>> solutions/2456711
>> The resolution given here is to upgrade the linux kernel to
>> kernel-3.10.0-514.el7 RHSA-2016-2574
>>  RHEL7.3. Upgrading the
>> kernel may not be possible for us. Is there any patch/patches that we can
>> apply to fix this issue.
>
> Oh, it's RHEL kernel. This is not a mainline kernel so you need to
> report this to your local Red Hat support engineer rather than to
> upstream kernel lists.
>
> -Dave.

Hi Dave, Thanks for your time. The above link only reports a similar
bug, which has same kernel trace, which we found on internet. Our
client machine, where traversal is done, is using CentOS.

$ hostnamectl
   Static hostname: coh-tw-cl01-node-4
 Icon name: computer-server
   Chassis: server
Machine ID: b38a4225b6544e20b25a2e55f63ed5fa
   Boot ID: 90dc6e0a0cdd4b6581ae62941d74587c
  Operating System: CentOS Linux 7 (Core)
   CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-327.22.2.el7.x86_64
  Architecture: x86-64

Thanks,
Dinesh

>
>> One more thread here says that this issue is fixed only in the above kernel
>> version. It is seen in previous as well as later versions.
>> https://bugs.centos.org/view.php?id=13843&history=1
>>
>> Is there anyway to reproduce this problem. All our efforts to reproduce
>> this issue have not succeeded.
>>
>> Please help me know if any more debugging can be done.
>>
>> Thanks,
>> Dinesh
>>
>> Kernel version of source VM, whose backup is taken.
>>
>> root@web-2318 ~]# uname -a
>>
>> Linux web-2318.website.oxilion.nl 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul
>> 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>> Kernel version of the machine where backup is mounted and traversed.
>> 3.10.0-327.22.2.el7.x86_64 #1 SMP Tue Jul 5 12:41:09 PDT 2016 x86_64 x86_64
>> x86_64 GNU/Linux
>>
>>
>> Mon Dec  4 21:08:21 2017] yoda_exec   D  0 48948  
>> 48938
>> 0x
>>
>> [Mon Dec  4 21:08:21 2017]  8801052437b0 0086
>> 88000aa02e00 880105243fd8
>>
>> [Mon Dec  4 21:08:21 2017]  880105243fd8 880105243fd8
>> 88000aa02e00 88010521e730
>>
>> [Mon Dec  4 21:08:21 2017]  7fff 88000aa02e00
>> 0002 
>>
>> [Mon Dec  4 21:08:21 2017] Call Trace:
>>
>> [Mon Dec  4 21:08:21 2017]  [] schedule+0x29/0x70
>>
>> [Mon Dec  4 21:08:21 2017]  []
>> schedule_timeout+0x209/0x2d0
>>
>> [Mon Dec  4 21:08:21 2017]  [] ?
>> xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] __down_common+0xd2/0x14a
>>
>> [Mon Dec  4 21:08:21 2017]  [] ?
>> _xfs_buf_find+0x16d/0x2c0 [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] __down+0x1d/0x1f
>>
>> [Mon Dec  4 21:08:21 2017]  [] down+0x41/0x50
>>
>> [Mon Dec  4 21:08:21 2017]  [] xfs_buf_lock+0x3c/0xd0
>> [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] _xfs_buf_find+0x16d/0x2c0
>> [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] xfs_buf_get_map+0x2a/0x180
>> [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  []
>> xfs_buf_read_map+0x2c/0x140 [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  []
>> xfs_trans_read_buf_map+0x199/0x400 [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] xfs_da_read_buf+0xd4/0x100
>> [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  []
>> xfs_da3_node_read+0x23/0xd0 [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] ?
>> kmem_cache_alloc+0x1ba/0x1d0
>>
>> [Mon Dec  4 21:08:21 2017]  []
>> xfs_da3_node_lookup_int+0x6e/0x2f0 [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  []
>> xfs_dir2_node_lookup+0x4d/0x170
>> [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] xfs_dir_lookup+0x195/0x1b0
>> [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] xfs_lookup+0x66/0x110 [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] xfs_vn_lookup+0x7b/0xd0
>> [xfs]
>>
>> [Mon Dec  4 21:08:21 2017]  [] lookup_real+0x1d/0x50
>>
>> [Mon Dec  4 21:08:21 2017]  [] __lookup_hash+0x42/0x60
>>
>> [Mon Dec  4 21:08:21 2017]  [] lookup_slow+0x42/0xa7
>>
>> [Mon Dec  4 21:08:21 2017]  [] path_lookupat+0x773/0x7a0
>>
>> [Mon Dec  4 21:08:21 2017]  [] ? kvfree+0x2a/0x40
>>
>> [Mon Dec  4 21:08:21 2017]  [] ?
>> kmem_cache_alloc+0x35/0x1d0
>>
>> [Mon Dec  4 21:08:21 2017]  [] ? getname_flags+0x4f/0x1a0
>>
>> [Mon Dec  4 21:08:21 2017]  [

Re: [PATCH 0/2] mm: introduce MAP_FIXED_SAFE

2017-12-08 Thread Michal Hocko
On Thu 07-12-17 11:57:27, Matthew Wilcox wrote:
> On Thu, Dec 07, 2017 at 11:14:27AM -0800, Kees Cook wrote:
> > On Wed, Dec 6, 2017 at 9:46 PM, Michael Ellerman  
> > wrote:
> > > Matthew Wilcox  writes:
> > >> So, just like we currently say "exactly one of MAP_SHARED or 
> > >> MAP_PRIVATE",
> > >> we could add a new paragraph saying "at most one of MAP_FIXED or
> > >> MAP_REQUIRED" and "any of the following values".
> > >
> > > MAP_REQUIRED doesn't immediately grab me, but I don't actively dislike
> > > it either :)
> > >
> > > What about MAP_AT_ADDR ?
> > >
> > > It's short, and says what it does on the tin. The first argument to mmap
> > > is actually called "addr" too.
> > 
> > "FIXED" is supposed to do this too.
> > 
> > Pavel suggested:
> > 
> > MAP_ADD_FIXED
> > 
> > (which is different from "use fixed", and describes why it would fail:
> > can't add since it already exists.)
> > 
> > Perhaps "MAP_FIXED_NEW"?
> > 
> > There has been a request to drop "FIXED" from the name, so these:
> > 
> > MAP_FIXED_NOCLOBBER
> > MAP_FIXED_NOREPLACE
> > MAP_FIXED_ADD
> > MAP_FIXED_NEW
> > 
> > Could be:
> > 
> > MAP_NOCLOBBER
> > MAP_NOREPLACE
> > MAP_ADD
> > MAP_NEW
> > 
> > and we still have the unloved, but acceptable:
> > 
> > MAP_REQUIRED
> > 
> > My vote is still for "NOREPLACE" or "NOCLOBBER" since it's very
> > specific, though "NEW" is pretty clear too.
> 
> How about MAP_NOFORCE?

OK, this doesn't seem to lead to anywhere. The more this is discussed
the more names we are getting. So you know what? I will resubmit and
keep my original name. If somebody really hates it then feel free to
nack the patch and push alternative and gain concensus on it.

I will keep MAP_FIXED_SAFE because it is an alternative to MAP_FIXED so
having that in the name is _useful_ for everybody familiar with
MAP_FIXED already. And _SAFE suffix tells that the operation doesn't
cause any silent memory corruptions or other unexpected side effects.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v8 4/6] clocksource: stm32: only use 32 bits timers

2017-12-08 Thread Daniel Lezcano
On 14/11/2017 09:52, Benjamin Gaignard wrote:
> The clock driving counters is at 90MHz so the maximum period
> for 16 bis counters is around 750 ms

728 us

> which is a short period for a clocksource.

Which clocksource are you talking about ?

> For 32 bits counters this period is close
> 47 secondes which is more acceptable.
> 
> This patch remove 16 bits counters support and makes sure that
> they won't be probed anymore.

Are we talking about clockevent or clocksource?

Is this issue present today ? Or is it if we add the clocksource support
? We are talking about clocksource but we change the clockevent code.

All this is very confusing.

I have a rough idea of what is happening, but it is not up to me to
decode and infer from the changes, you need to describe *clearly* the
situation.

 - What happens if we use a 16bits timer as a clockevent ?
 - What happens if we use a 16bits timer as a clocksource ?
 - Why is it preferable to remove the support of the 16bits timers
instead of downgrading them with the rating ?

> Signed-off-by: Benjamin Gaignard 

> ---
>  drivers/clocksource/timer-stm32.c | 26 --
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/clocksource/timer-stm32.c 
> b/drivers/clocksource/timer-stm32.c
> index ae41a19..8173bcf 100644
> --- a/drivers/clocksource/timer-stm32.c
> +++ b/drivers/clocksource/timer-stm32.c
> @@ -83,9 +83,9 @@ static irqreturn_t stm32_clock_event_handler(int irq, void 
> *dev_id)
>  static int __init stm32_clockevent_init(struct device_node *node)
>  {
>   struct reset_control *rstc;
> - unsigned long max_delta;
> - int ret, bits, prescaler = 1;
> + unsigned long max_arr;
>   struct timer_of *to;
> + int ret;
>  
>   to = kzalloc(sizeof(*to), GFP_KERNEL);
>   if (!to)
> @@ -115,29 +115,27 @@ static int __init stm32_clockevent_init(struct 
> device_node *node)
>  
>   /* Detect whether the timer is 16 or 32 bits */
>   writel_relaxed(~0U, timer_of_base(to) + TIM_ARR);
> - max_delta = readl_relaxed(timer_of_base(to) + TIM_ARR);
> - if (max_delta == ~0U) {
> - prescaler = 1;
> - bits = 32;
> - } else {
> - prescaler = 1024;
> - bits = 16;
> + max_arr = readl_relaxed(timer_of_base(to) + TIM_ARR);
> + if (max_arr != ~0U) {
> + pr_err("32 bits timer is needed\n");
> + ret = -EINVAL;
> + goto deinit;
>   }

Wrap this in a function:

static bool stm32_timer_is_32bits(struct timer_of *to)
{
return readl_relaxed(timer_of_base(to) + TIM_ARR) == ~0UL;
}

Then clearly inform the user.

if (!stm32_timer_is_32bits(to)) {
pr_warn("Timer %pOF is a 16 bits timer\n", node);
/* abort the registration or downgrade the timer's rating */
}

> +
>   writel_relaxed(0, timer_of_base(to) + TIM_ARR);
>  
> - writel_relaxed(prescaler - 1, timer_of_base(to) + TIM_PSC);
> + writel_relaxed(0, timer_of_base(to) + TIM_PSC);
>   writel_relaxed(TIM_EGR_UG, timer_of_base(to) + TIM_EGR);
>   writel_relaxed(TIM_DIER_UIE, timer_of_base(to) + TIM_DIER);
>   writel_relaxed(0, timer_of_base(to) + TIM_SR);
>  
>   clockevents_config_and_register(&to->clkevt,
> - timer_of_period(to), MIN_DELTA, 
> max_delta);
> -
> - pr_info("%pOF: STM32 clockevent driver initialized (%d bits)\n",
> - node, bits);
> + timer_of_period(to), MIN_DELTA, ~0U);
>  
>   return 0;
>  
> +deinit:
> + timer_of_exit(to);

Fix this please (timer_of_cleanup).

In the future, make sure the patches are git-bisect safe.



-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH 4.4 71/96] e1000e: Separate signaling for link check/link up

2017-12-08 Thread Benjamin Poirier
On 2017/12/07 20:02, Ben Hutchings wrote:
> On Tue, 2017-11-28 at 11:23 +0100, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Benjamin Poirier 
> > 
> > commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 upstream.
> [...]
> > --- a/drivers/net/ethernet/intel/e1000e/mac.c
> > +++ b/drivers/net/ethernet/intel/e1000e/mac.c
> > @@ -410,6 +410,9 @@ void e1000e_clear_hw_cntrs_base(struct e
> >   *  Checks to see of the link status of the hardware has changed.  If a
> >   *  change in link status has been detected, then we read the PHY registers
> >   *  to get the current speed/duplex if link exists.
> > + *
> > + *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 
> > (link
> > + *  up).
> >   **/
> >  s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
> >  {
> [...]
> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > @@ -5017,7 +5017,7 @@ static bool e1000e_has_link(struct e1000
> > >   case e1000_media_type_copper:
> > >   if (hw->mac.get_link_status) {
> > >   ret_val = hw->mac.ops.check_for_link(hw);
> > > - link_active = !hw->mac.get_link_status;
> > > + link_active = ret_val > 0;
> > >   } else {
> > >   link_active = true;
> > >   }
> 
> As this change in e1000e_has_link() is conditional only on the media
> type, doesn't e1000_check_for_copper_link_ich8lan() also need to be
> changed to return 1 for link up?

You're right. I looked at it again, in the commit log I wrote that
"hw->mac.ops.check_for_link(hw) === e1000e_check_for_copper_link" which
is true for the race condition reported (because that's the function in
use on adapters that have msix vectors mac.type == e1000_82574) but not
generally true. The other check_for_link callback needs to be adjusted
likewise.

However, I happen to have a I218-LM (e1000_pch_lpt) so I tested 4.14.3
and this error only delays link up, it doesn't prevent it.
e1000_check_for_copper_link_ich8lan() sets mac->get_link_status = false;
and on the next watchdog execution, we fall in the second branch of the
following e1000e_has_link code:

case e1000_media_type_copper:
if (hw->mac.get_link_status) {
ret_val = hw->mac.ops.check_for_link(hw);
link_active = ret_val > 0;
} else {
link_active = true;

OTOH, there are multiple reports in
https://bugzilla.kernel.org/show_bug.cgi?id=198047
that reverting 830466993daf ("e1000e: Separate signaling for link
check/link up") fixes the issue so there's something I'm missing.

Gabriel and Christian, can you test the following patch?

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d6d4ed7acf03..31277d3bb7dc 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1367,6 +1367,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, 
bool force)
  *  Checks to see of the link status of the hardware has changed.  If a
  *  change in link status has been detected, then we read the PHY registers
  *  to get the current speed/duplex if link exists.
+ *
+ *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link
+ *  up).
  **/
 static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 {
@@ -1382,7 +1385,7 @@ static s32 e1000_check_for_copper_link_ich8lan(struct 
e1000_hw *hw)
 * Change or Rx Sequence Error interrupt.
 */
if (!mac->get_link_status)
-   return 0;
+   return 1;
 
/* First we want to see if the MII Status Register reports
 * link.  If so, then we want to get the current speed/duplex
@@ -1613,10 +1616,12 @@ static s32 e1000_check_for_copper_link_ich8lan(struct 
e1000_hw *hw)
 * different link partner.
 */
ret_val = e1000e_config_fc_after_link_up(hw);
-   if (ret_val)
+   if (ret_val) {
e_dbg("Error configuring flow control\n");
+   return ret_val;
+   }
 
-   return ret_val;
+   return 1;
 }
 
 static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
-- 
2.15.1



[PATCH 3/3] clk: sunxi-ng: sun8i: a83t: Use sigma-delta modulation for audio PLL

2017-12-08 Thread Chen-Yu Tsai
The audio blocks require specific clock rates. Until now we were using
the closest clock rate possible with integer N-M factors. This resulted
in audio playback being slightly slower than it should be.

The vendor kernel gets around this (for newer SoCs) by using sigma-delta
modulation to generate a fractional-N factor. This patch copies the
parameters for the A83T.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/clk/sunxi-ng/ccu-sun8i-a83t.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c 
b/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c
index 06b69e433d0f..04a9c33f53f0 100644
--- a/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c
+++ b/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c
@@ -76,17 +76,26 @@ static struct ccu_mult pll_c1cpux_clk = {
  */
 #define SUN8I_A83T_PLL_AUDIO_REG   0x008
 
+/* clock rates doubled for post divider */
+static struct ccu_sdm_setting pll_audio_sdm_table[] = {
+   { .rate = 45158400, .pattern = 0xc00121ff, .m = 29, .n = 54 },
+   { .rate = 49152000, .pattern = 0xc000e147, .m = 30, .n = 61 },
+};
+
 static struct ccu_nm pll_audio_clk = {
.enable = BIT(31),
.lock   = BIT(2),
.n  = _SUNXI_CCU_MULT_OFFSET_MIN_MAX(8, 8, 0, 12, 0),
.m  = _SUNXI_CCU_DIV(0, 6),
.fixed_post_div = 2,
+   .sdm= _SUNXI_CCU_SDM(pll_audio_sdm_table, BIT(24),
+0x284, BIT(31)),
.common = {
.reg= SUN8I_A83T_PLL_AUDIO_REG,
.lock_reg   = CCU_SUN8I_A83T_LOCK_REG,
.features   = CCU_FEATURE_LOCK_REG |
- CCU_FEATURE_FIXED_POSTDIV,
+ CCU_FEATURE_FIXED_POSTDIV |
+ CCU_FEATURE_SIGMA_DELTA_MOD,
.hw.init= CLK_HW_INIT("pll-audio", "osc24M",
  &ccu_nm_ops, CLK_SET_RATE_UNGATE),
},
-- 
2.15.0



Re: [Xen-devel] [PATCH v2 1/3] x86/boot: add acpi rsdp address to setup_header

2017-12-08 Thread Ingo Molnar

* Jan Beulich  wrote:

> >>> On 08.12.17 at 08:16,  wrote:
> > Also, a more fundamental question: why doesn't Xen use EFI to hand over 
> > hardware configuration details?
> 
> Iirc the main purpose of the change here is to allow booting PVH
> (guest or Dom0) with Grub2 in the middle. PVH, at least for the
> time being, is something that gets away without any firmware
> (and I'm pretty certain this is going to remain that way for Dom0).
> ACPI tables are being built by the tool stack (guest) or hypervisor
> (Dom0). Hence there simply isn't any EFI which could be used to
> propagate such information.

Ok, that's fair enough. If hpa (or someone else) doesn't object to the boot 
protocol extension this approach looks good to me in principle.

Thanks,

Ingo



[PATCH 2/3] clk: sunxi-ng: sun8i: a83t: Add /2 fixed post divider to audio PLL

2017-12-08 Thread Chen-Yu Tsai
On the A83T, the audio PLL should have its div1 set to 0, or /1, and
div2 set to 1, or /2. This setting is the default, and is required
to match the sigma-delta modulation parameters from the BSP kernel.

This patch adds a /2 fixed post divider to the audio PLL, and fixes
the enforced d1 & d2 values. This also resolves the mismatch between
the values mentioned in the comment for the audio PLL, and the actual
enforced values.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/clk/sunxi-ng/ccu-sun8i-a83t.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c 
b/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c
index 5cedcd0d8be8..06b69e433d0f 100644
--- a/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c
+++ b/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c
@@ -81,10 +81,12 @@ static struct ccu_nm pll_audio_clk = {
.lock   = BIT(2),
.n  = _SUNXI_CCU_MULT_OFFSET_MIN_MAX(8, 8, 0, 12, 0),
.m  = _SUNXI_CCU_DIV(0, 6),
+   .fixed_post_div = 2,
.common = {
.reg= SUN8I_A83T_PLL_AUDIO_REG,
.lock_reg   = CCU_SUN8I_A83T_LOCK_REG,
-   .features   = CCU_FEATURE_LOCK_REG,
+   .features   = CCU_FEATURE_LOCK_REG |
+ CCU_FEATURE_FIXED_POSTDIV,
.hw.init= CLK_HW_INIT("pll-audio", "osc24M",
  &ccu_nm_ops, CLK_SET_RATE_UNGATE),
},
@@ -889,9 +891,10 @@ static int sun8i_a83t_ccu_probe(struct platform_device 
*pdev)
if (IS_ERR(reg))
return PTR_ERR(reg);
 
-   /* Enforce d1 = 0, d2 = 0 for Audio PLL */
+   /* Enforce d1 = 0, d2 = 1 for Audio PLL */
val = readl(reg + SUN8I_A83T_PLL_AUDIO_REG);
-   val &= ~(BIT(16) | BIT(18));
+   val &= ~BIT(16);
+   val |= BIT(18);
writel(val, reg + SUN8I_A83T_PLL_AUDIO_REG);
 
/* Enforce P = 1 for both CPU cluster PLLs */
-- 
2.15.0



[PATCH 1/3] clk: sunxi-ng: Support fixed post-dividers on NM style clocks

2017-12-08 Thread Chen-Yu Tsai
On the A83T, the audio PLL should have its div1 set to 0, or /1, and
div2 set to 1, or /2. This setting is the default, and is required
to match the sigma-delta modulation parameters from the BSP kernel.

To do this, we first add fixed post-divider to the NM style clocks,
which is the type of clock the audio PLL clock is modeled into.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/clk/sunxi-ng/ccu_nm.c | 50 ---
 drivers/clk/sunxi-ng/ccu_nm.h |  2 ++
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/drivers/clk/sunxi-ng/ccu_nm.c b/drivers/clk/sunxi-ng/ccu_nm.c
index 7620aa973a6e..a16de092bf94 100644
--- a/drivers/clk/sunxi-ng/ccu_nm.c
+++ b/drivers/clk/sunxi-ng/ccu_nm.c
@@ -70,11 +70,18 @@ static unsigned long ccu_nm_recalc_rate(struct clk_hw *hw,
unsigned long parent_rate)
 {
struct ccu_nm *nm = hw_to_ccu_nm(hw);
+   unsigned long rate;
unsigned long n, m;
u32 reg;
 
-   if (ccu_frac_helper_is_enabled(&nm->common, &nm->frac))
-   return ccu_frac_helper_read_rate(&nm->common, &nm->frac);
+   if (ccu_frac_helper_is_enabled(&nm->common, &nm->frac)) {
+   rate = ccu_frac_helper_read_rate(&nm->common, &nm->frac);
+
+   if (nm->common.features & CCU_FEATURE_FIXED_POSTDIV)
+   rate /= nm->fixed_post_div;
+
+   return rate;
+   }
 
reg = readl(nm->common.base + nm->common.reg);
 
@@ -90,15 +97,15 @@ static unsigned long ccu_nm_recalc_rate(struct clk_hw *hw,
if (!m)
m++;
 
-   if (ccu_sdm_helper_is_enabled(&nm->common, &nm->sdm)) {
-   unsigned long rate =
-   ccu_sdm_helper_read_rate(&nm->common, &nm->sdm,
-m, n);
-   if (rate)
-   return rate;
-   }
+   if (ccu_sdm_helper_is_enabled(&nm->common, &nm->sdm))
+   rate = ccu_sdm_helper_read_rate(&nm->common, &nm->sdm, m, n);
+   else
+   rate = parent_rate * n / m;
+
+   if (nm->common.features & CCU_FEATURE_FIXED_POSTDIV)
+   rate /= nm->fixed_post_div;
 
-   return parent_rate * n / m;
+   return rate;
 }
 
 static long ccu_nm_round_rate(struct clk_hw *hw, unsigned long rate,
@@ -107,11 +114,20 @@ static long ccu_nm_round_rate(struct clk_hw *hw, unsigned 
long rate,
struct ccu_nm *nm = hw_to_ccu_nm(hw);
struct _ccu_nm _nm;
 
-   if (ccu_frac_helper_has_rate(&nm->common, &nm->frac, rate))
+   if (nm->common.features & CCU_FEATURE_FIXED_POSTDIV)
+   rate *= nm->fixed_post_div;
+
+   if (ccu_frac_helper_has_rate(&nm->common, &nm->frac, rate)) {
+   if (nm->common.features & CCU_FEATURE_FIXED_POSTDIV)
+   rate /= nm->fixed_post_div;
return rate;
+   }
 
-   if (ccu_sdm_helper_has_rate(&nm->common, &nm->sdm, rate))
+   if (ccu_sdm_helper_has_rate(&nm->common, &nm->sdm, rate)) {
+   if (nm->common.features & CCU_FEATURE_FIXED_POSTDIV)
+   rate /= nm->fixed_post_div;
return rate;
+   }
 
_nm.min_n = nm->n.min ?: 1;
_nm.max_n = nm->n.max ?: 1 << nm->n.width;
@@ -119,8 +135,12 @@ static long ccu_nm_round_rate(struct clk_hw *hw, unsigned 
long rate,
_nm.max_m = nm->m.max ?: 1 << nm->m.width;
 
ccu_nm_find_best(*parent_rate, rate, &_nm);
+   rate = *parent_rate * _nm.n / _nm.m;
+
+   if (nm->common.features & CCU_FEATURE_FIXED_POSTDIV)
+   rate /= nm->fixed_post_div;
 
-   return *parent_rate * _nm.n / _nm.m;
+   return rate;
 }
 
 static int ccu_nm_set_rate(struct clk_hw *hw, unsigned long rate,
@@ -131,6 +151,10 @@ static int ccu_nm_set_rate(struct clk_hw *hw, unsigned 
long rate,
unsigned long flags;
u32 reg;
 
+   /* Adjust target rate according to post-dividers */
+   if (nm->common.features & CCU_FEATURE_FIXED_POSTDIV)
+   rate = rate * nm->fixed_post_div;
+
if (ccu_frac_helper_has_rate(&nm->common, &nm->frac, rate)) {
spin_lock_irqsave(nm->common.lock, flags);
 
diff --git a/drivers/clk/sunxi-ng/ccu_nm.h b/drivers/clk/sunxi-ng/ccu_nm.h
index c623b0c7a23c..eba586b4c7d0 100644
--- a/drivers/clk/sunxi-ng/ccu_nm.h
+++ b/drivers/clk/sunxi-ng/ccu_nm.h
@@ -36,6 +36,8 @@ struct ccu_nm {
struct ccu_frac_internalfrac;
struct ccu_sdm_internal sdm;
 
+   unsigned intfixed_post_div;
+
struct ccu_common   common;
 };
 
-- 
2.15.0



[PATCH 0/3] clk: sunxi-ng: sun8i: a83t: Use sigma-delta modulation for audio PLL

2017-12-08 Thread Chen-Yu Tsai
Hi,

This series follows previous improvements for the other Allwinner SoCs to
improve audio quality, in particular the speed and pitch of audio playback.
The audio PLLs in Allwinner SoCs cannot produce the correct frequency to
match the audio sample rate families through integer factors. As such
the audio is either too fast or too flow.

This is dealt with by using sigma-delta modulation, a form of fractional-N
synthesis, on the divider. The parameters used are copied from the BSP
kernel.

As these parameters assume that one of the untracked dividers is set to
2, we first add a fixed /2 post divider to the PLL, and force the
untracked divider to /2. Then we introduce the sigma-delta modulation
parameters.

This has been tested with SPDIF playback on the Cubietruck Plus, and an
external PCM5122 DAC from a PiFi DAC 2.0+, connected via I2S to the
Banana Pi M3. The I2C and I2S support used in this latter test will be
sent as a separate series later.

Please have a look.

Regards
ChenYu

Chen-Yu Tsai (3):
  clk: sunxi-ng: Support fixed post-dividers on NM style clocks
  clk: sunxi-ng: sun8i: a83t: Add /2 fixed post divider to audio PLL
  clk: sunxi-ng: sun8i: a83t: Use sigma-delta modulation for audio PLL

 drivers/clk/sunxi-ng/ccu-sun8i-a83t.c | 18 ++---
 drivers/clk/sunxi-ng/ccu_nm.c | 50 ++-
 drivers/clk/sunxi-ng/ccu_nm.h |  2 ++
 3 files changed, 54 insertions(+), 16 deletions(-)

-- 
2.15.0



Re: [PATCH v2 1/3] x86/boot: add acpi rsdp address to setup_header

2017-12-08 Thread Juergen Gross
On 08/12/17 08:16, Ingo Molnar wrote:
> 
> * Juergen Gross  wrote:
> 
>> Xen PVH guests receive the address of the RSDP table from Xen. In order
>> to support booting a Xen PVH guest via grub2 using the standard x86
>> boot entry we need a way fro grub2 to pass the RSDP address to the
>> kernel.
>>
>> For this purpose expand the struct setup_header to hold the physical
>> address of the RSDP address. Being zero means it isn't specified and
>> has to be located the legacy way (searching through low memory or
>> EBDA).
> 
> s/fro
>  /for
> 
> pedantry:
> 
> s/grub2
>  /Grub2

Okay.

> 
>> Signed-off-by: Juergen Gross 
>> Reviewed-by: Roger Pau Monné 
>> ---
>>  Documentation/x86/boot.txt| 19 +++
>>  arch/x86/boot/header.S|  6 +-
>>  arch/x86/include/uapi/asm/bootparam.h |  1 +
>>  3 files changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
>> index 5e9b826b5f62..a33c224797e4 100644
>> --- a/Documentation/x86/boot.txt
>> +++ b/Documentation/x86/boot.txt
>> @@ -61,6 +61,13 @@ Protocol 2.12:(Kernel 3.8) Added the xloadflags field 
>> and extension fields
>>  to struct boot_params for loading bzImage and ramdisk
>>  above 4G in 64bit.
>>  
>> +Protocol 2.13:  (Kernel 3.14) Support 32- and 64-bit flags being set in
>> +xloadflags to support booting a 64 bit kernel from 32 bit
>> +EFI
> 
> The changelog should I think declare that we add documentation for the 2.13 
> protocol iteration as well.
> 
> Also, please use a consistent spelling of '32-bit' and '64-bit' in the same 
> sentence!

Okay.

> 
>> +Field name: acpi_rsdp_addr
>> +Type:   write
>> +Offset/size:0x268/8
>> +Protocol:   2.14+
>> +
>> +  This field can be set by the boot loader to tell the kernel the
>> +  physical address of the ACPI RSDP table.
>> +
>> +  A value of 0 indicates the kernel should fall back to the standard
>> +  methods to locate the RSDP (search in EBDA/low memory).
> 
> That's not the only method used: the ACPI RSDP address can also be discovered 
> via 
> efi.rsdp20 and efi.rsdp, both of which appear to be 32-bit values.

Sure, but this is valid for booting via EFI only.

> 
>>   THE IMAGE CHECKSUM
>>  
>> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>> index 850b8762e889..e7184127f309 100644
>> --- a/arch/x86/boot/header.S
>> +++ b/arch/x86/boot/header.S
>> @@ -300,7 +300,7 @@ _start:
>>  # Part 2 of the header, from the old setup.S
>>  
>>  .ascii  "HdrS"  # header signature
>> -.word   0x020d  # header version number (>= 0x0105)
>> +.word   0x020e  # header version number (>= 0x0105)
>>  # or else old loadlin-1.5 will fail)
>>  .globl realmode_swtch
>>  realmode_swtch: .word   0, 0# default_switch, SETUPSEG
>> @@ -558,6 +558,10 @@ pref_address:   .quad LOAD_PHYSICAL_ADDR
>> # preferred load addr
>>  init_size:  .long INIT_SIZE # kernel initialization size
>>  handover_offset:.long 0 # Filled in by build.c
>>  
>> +acpi_rsdp_addr: .quad 0 # 64-bit physical 
>> pointer to
>> +# ACPI RSDP table, added with
>> +# version 2.14
> 
> s/pointer to ACPI RSDP table
>  /pointer to the ACPI RSDP table

Okay.

> 
> Also, a more fundamental question: why doesn't Xen use EFI to hand over 
> hardware 
> configuration details?

I think Jan has answered this question quite well.


Juergen


[PATCH RFC 1/7] kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR

2017-12-08 Thread Quan Xu
From: Ben Luo 

Guest enables pv timer functionality using this MSR

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Signed-off-by: Ben Luo 
---
 arch/x86/include/asm/kvm_host.h  |5 +
 arch/x86/include/uapi/asm/kvm_para.h |6 ++
 arch/x86/kvm/lapic.c |   22 ++
 arch/x86/kvm/lapic.h |6 ++
 arch/x86/kvm/x86.c   |8 
 5 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493..641b4aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -684,6 +684,11 @@ struct kvm_vcpu_arch {
bool pv_unhalted;
} pv;
 
+   struct {
+   u64 msr_val;
+   struct gfn_to_hva_cache data;
+   } pv_timer;
+
int pending_ioapic_eoi;
int pending_external_vector;
 
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 554aa8f..3dd6116 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -41,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_PV_TIMER_EN0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
@@ -64,6 +65,11 @@ struct kvm_clock_pairing {
 #define KVM_STEAL_VALID_BITS ((-1ULL << (KVM_STEAL_ALIGNMENT_BITS + 1)))
 #define KVM_STEAL_RESERVED_MASK (((1 << KVM_STEAL_ALIGNMENT_BITS) - 1 ) << 1)
 
+struct pvtimer_vcpu_event_info {
+   __u64 expire_tsc;
+   __u64 next_sync_tsc;
+} __attribute__((__packed__));
+
 #define KVM_MAX_MMU_OP_BATCH   32
 
 #define KVM_ASYNC_PF_ENABLED   (1 << 0)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 36c90d6..55c9ba3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1991,6 +1991,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool 
init_event)
kvm_lapic_set_base(vcpu,
vcpu->arch.apic_base | MSR_IA32_APICBASE_BSP);
vcpu->arch.pv_eoi.msr_val = 0;
+   vcpu->arch.pv_timer.msr_val = 0;
apic_update_ppr(apic);
if (vcpu->arch.apicv_active) {
kvm_x86_ops->apicv_post_state_restore(vcpu);
@@ -2478,6 +2479,27 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 
data)
 addr, sizeof(u8));
 }
 
+int kvm_lapic_enable_pv_timer(struct kvm_vcpu *vcpu, u64 data)
+{
+   u64 addr = data & ~KVM_MSR_ENABLED;
+   int ret;
+
+   if (!lapic_in_kernel(vcpu))
+   return 1;
+
+   if (!IS_ALIGNED(addr, 4))
+   return 1;
+
+   vcpu->arch.pv_timer.msr_val = data;
+   if (!pv_timer_enabled(vcpu))
+   return 0;
+
+   ret = kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_timer.data,
+   addr, sizeof(struct pvtimer_vcpu_event_info));
+
+   return ret;
+}
+
 void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4b9935a..539a738 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -113,6 +113,7 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct 
kvm_vcpu *vcpu)
 }
 
 int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
+int kvm_lapic_enable_pv_timer(struct kvm_vcpu *vcpu, u64 data);
 void kvm_lapic_init(void);
 void kvm_lapic_exit(void);
 
@@ -207,6 +208,11 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu 
*vcpu)
return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, 
&vcpu->arch.apic->pending_events);
 }
 
+static inline bool pv_timer_enabled(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.pv_timer.msr_val & KVM_MSR_ENABLED;
+}
+
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb..5668774 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1025,6 +1025,7 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
HV_X64_MSR_STIMER0_CONFIG,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,
+   MSR_KVM_PV_TIMER_EN,
 
MSR_IA32_TSC_ADJUST,
MSR_IA32_TSCDEADLINE,
@@ -2279,6 +2280,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_lapic_enable_pv_eoi(vcpu, data))
return 1;
break;
+   case MSR_KVM_PV_TIMER_EN:
+   if (kvm_lapic_enable_pv_timer(vcpu, data))
+   return 1;
+   break;
 
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
@@ -2510,6 +2515,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_KVM_PV_E

[PATCH RFC 0/7] kvm pvtimer

2017-12-08 Thread Quan Xu
From: Ben Luo 

This patchset introduces a new paravirtualized mechanism to reduce VM-exit
caused by guest timer accessing.

In general, KVM guest programs tsc-deadline timestamp to MSR_IA32_TSC_DEADLINE
MSR. This will cause a VM-exit, and then KVM handles this timer for guest.

Also kvm always registers timer on the CPU which vCPU was running on. Even
though vCPU thread is rescheduled to another CPU, the timer will be migrated
to the target CPU as well. When timer expired, timer interrupt could make
guest-mode vCPU VM-exit on this CPU.

When pvtimer is enabled:

   - The tsc-deadline timestamp is mostly recorded in share page with less
 VM-exit. We Introduce a periodically working kthread to scan share page
 and synchronize timer setting for guest on a dedicated CPU.

   - Since the working kthread scans periodically, some of the timer events
 may be lost or delayed. We have to program these tsc-deadline timestamps
 to MSR_IA32_TSC_DEADLINE as normal, which will cause VM-exit and KVM will
 signal the working thread through IPI to program timer, instread of
 registering on current CPU.

   - Timer interrupt will be delivered by posted interrupt mechanism to vCPUs
 with less VM-exit.

Ben Luo (7):
  kvm: x86: emulate MSR_KVM_PV_TIMER_EN MSR
  kvm: x86: add a function to exchange value
  KVM: timer: synchronize tsc-deadline timestamp for guest
  KVM: timer: program timer to a dedicated CPU
  KVM: timer: ignore timer migration if pvtimer is enabled
  Doc/KVM: introduce a new cpuid bit for kvm pvtimer
  kvm: guest: reprogram guest timer

 Documentation/virtual/kvm/cpuid.txt  |4 +
 arch/x86/include/asm/kvm_host.h  |5 +
 arch/x86/include/asm/kvm_para.h  |9 ++
 arch/x86/include/uapi/asm/kvm_para.h |7 ++
 arch/x86/kernel/apic/apic.c  |9 +-
 arch/x86/kernel/kvm.c|   38 
 arch/x86/kvm/cpuid.c |1 +
 arch/x86/kvm/lapic.c |  170 +-
 arch/x86/kvm/lapic.h |   11 ++
 arch/x86/kvm/x86.c   |   15 +++-
 include/linux/kvm_host.h |3 +
 virt/kvm/kvm_main.c  |   42 +
 12 files changed, 308 insertions(+), 6 deletions(-)



Re: [PATCH v2 3/3] x86/xen: supply rsdp address in boot params for pvh guests

2017-12-08 Thread Juergen Gross
On 08/12/17 08:22, Ingo Molnar wrote:
> 
> * Juergen Gross  wrote:
> 
>> When booted via the special PVH entry save the RSDP address set in the
>> boot information block in struct boot_params. This will enable Xen to
>> locate the RSDP at an arbitrary address.
>>
>> Set the boot loader version to 2.14 (0x020e) replacing the wrong 0x0212
>> which should have been 0x020c.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>> V2: set bootloader version to 2.14 (Roger Pau Monné)
>> ---
>>  arch/x86/xen/enlighten_pvh.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
>> index 436c4f003e17..036e3a5f284a 100644
>> --- a/arch/x86/xen/enlighten_pvh.c
>> +++ b/arch/x86/xen/enlighten_pvh.c
>> @@ -68,9 +68,12 @@ static void __init init_pvh_bootparams(void)
>>   *
>>   * Version 2.12 supports Xen entry point but we will use default x86/PC
>>   * environment (i.e. hardware_subarch 0).
>> + * The RSDP address is available from version 2.14 on.
>>   */
>> -pvh_bootparams.hdr.version = 0x212;
>> +pvh_bootparams.hdr.version = 0x20e;
> 
> While 0x212 was "obvious" to read but totally wrong, it would be less fragile 
> and 
> more readable if the version was generated as something like:
> 
>   pvh_bootparams.hdr.version = (2 << 8) | 14;

Sure, I can make that change.

> 
> similar to how it's written in other cases:
> 
>>  pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */
> 
> Also, shouldn't the 0x212 fix be a separate patch, Cc: stable? The bug 
> appears to 
> have been introduced at around v4.12.

While not really being very important, this seems to be cleaner, yes.
After all this value is visible in sysfs, so it should be correct.


Juergen


[PATCH RFC 2/7] kvm: x86: add a function to exchange value

2017-12-08 Thread Quan Xu
From: Ben Luo 

Introduce kvm_xchg_guest_cached to exchange value with guest
page atomically.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Signed-off-by: Ben Luo 
---
 include/linux/kvm_host.h |3 +++
 virt/kvm/kvm_main.c  |   42 ++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6882538..32949ed 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -688,6 +688,9 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct 
gfn_to_hva_cache *ghc,
   void *data, int offset, unsigned long len);
 int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
  gpa_t gpa, unsigned long len);
+unsigned long kvm_xchg_guest_cached(struct kvm *kvm,
+   struct gfn_to_hva_cache *ghc, unsigned long offset,
+   unsigned long new, int size);
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9deb5a2..3149e17 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2010,6 +2010,48 @@ int kvm_read_guest_cached(struct kvm *kvm, struct 
gfn_to_hva_cache *ghc,
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_cached);
 
+unsigned long kvm_xchg_guest_cached(struct kvm *kvm,
+   struct gfn_to_hva_cache *ghc, unsigned long offset,
+   unsigned long new, int size)
+{
+   unsigned long r;
+   void *kva;
+   struct page *page;
+   kvm_pfn_t pfn;
+
+   WARN_ON(offset > ghc->len);
+
+   pfn = gfn_to_pfn_atomic(kvm, ghc->gpa >> PAGE_SHIFT);
+   page = kvm_pfn_to_page(pfn);
+
+   if (is_error_page(page))
+   return -EFAULT;
+
+   kva = kmap_atomic(page) + offset_in_page(ghc->gpa) + offset;
+   switch (size) {
+   case 1:
+   r = xchg((char *)kva, new);
+   break;
+   case 2:
+   r = xchg((short *)kva, new);
+   break;
+   case 4:
+   r = xchg((int *)kva, new);
+   break;
+   case 8:
+   r = xchg((long *)kva, new);
+   break;
+   default:
+   kunmap_atomic(kva);
+   return -EFAULT;
+   }
+
+   kunmap_atomic(kva);
+   mark_page_dirty_in_slot(ghc->memslot, ghc->gpa >> PAGE_SHIFT);
+
+   return r;
+}
+
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
 {
const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
-- 
1.7.1



[PATCH RFC 5/7] KVM: timer: ignore timer migration if pvtimer is enabled

2017-12-08 Thread Quan Xu
From: Ben Luo 

When pvtimer is enabled, KVM programs timer to a dedicated CPU
through IPI. Whether the vCPU is on the dedicated CPU or any
other CPU, the timer interrupt will be delivered properly.
No need to migrate timer.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Signed-off-by: Ben Luo 
---
 arch/x86/kvm/lapic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5835a27..265efe6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2282,7 +2282,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 {
struct hrtimer *timer;
 
-   if (!lapic_in_kernel(vcpu))
+   if (!lapic_in_kernel(vcpu) || pv_timer_enabled(vcpu))
return;
 
timer = &vcpu->arch.apic->lapic_timer.timer;
-- 
1.7.1



[PATCH RFC 3/7] KVM: timer: synchronize tsc-deadline timestamp for guest

2017-12-08 Thread Quan Xu
From: Ben Luo 

In general, KVM guest programs tsc-deadline timestamp to
MSR_IA32_TSC_DEADLINE MSR. This will cause a VM-exit, and
then KVM handles this timer for guest.

The tsc-deadline timestamp is mostly recorded in share page
with less VM-exit. We Introduce a periodically working kthread
to scan share page and synchronize timer setting for guest
on a dedicated CPU.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Signed-off-by: Ben Luo 
---
 arch/x86/kvm/lapic.c |  138 ++
 arch/x86/kvm/lapic.h |5 ++
 2 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 55c9ba3..20a23bb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -36,6 +36,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include "kvm_cache_regs.h"
 #include "irq.h"
 #include "trace.h"
@@ -70,6 +74,12 @@
 #define APIC_BROADCAST 0xFF
 #define X2APIC_BROADCAST   0xul
 
+static struct hrtimer pv_sync_timer;
+static long pv_timer_period_ns = PVTIMER_PERIOD_NS;
+static struct task_struct *pv_timer_polling_worker;
+
+module_param(pv_timer_period_ns, long, 0644);
+
 static inline int apic_test_vector(int vec, void *bitmap)
 {
return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
@@ -2542,8 +2552,130 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
}
 }
 
+static enum hrtimer_restart pv_sync_timer_callback(struct hrtimer *timer)
+{
+   hrtimer_forward_now(timer, ns_to_ktime(pv_timer_period_ns));
+   wake_up_process(pv_timer_polling_worker);
+
+   return HRTIMER_RESTART;
+}
+
+void kvm_apic_sync_pv_timer(void *data)
+{
+   struct kvm_vcpu *vcpu = data;
+   struct kvm_lapic *apic = vcpu->arch.apic;
+   unsigned long flags, this_tsc_khz = vcpu->arch.virtual_tsc_khz;
+   u64 guest_tsc, expire_tsc;
+   long rem_tsc;
+
+   if (!lapic_in_kernel(vcpu) || !pv_timer_enabled(vcpu))
+   return;
+
+   local_irq_save(flags);
+   guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+   rem_tsc = ktime_to_ns(hrtimer_get_remaining(&pv_sync_timer))
+   * this_tsc_khz;
+   if (rem_tsc <= 0)
+   rem_tsc += pv_timer_period_ns * this_tsc_khz;
+   do_div(rem_tsc, 100L);
+
+   /*
+* make sure guest_tsc and rem_tsc are assigned before to update
+* next_sync_tsc.
+*/
+   smp_wmb();
+   kvm_xchg_guest_cached(vcpu->kvm, &vcpu->arch.pv_timer.data,
+   offsetof(struct pvtimer_vcpu_event_info, next_sync_tsc),
+   guest_tsc + rem_tsc, 8);
+
+   /* make sure next_sync_tsc is visible */
+   smp_wmb();
+
+   expire_tsc = kvm_xchg_guest_cached(vcpu->kvm, &vcpu->arch.pv_timer.data,
+   offsetof(struct pvtimer_vcpu_event_info, expire_tsc),
+   0UL, 8);
+
+   /* make sure expire_tsc is visible */
+   smp_wmb();
+
+   if (expire_tsc) {
+   if (expire_tsc > guest_tsc)
+   /*
+* As we bind this thread to a dedicated CPU through
+* IPI, the timer is registered on that dedicated
+* CPU here.
+*/
+   kvm_set_lapic_tscdeadline_msr(apic->vcpu, expire_tsc);
+   else
+   /* deliver immediately if expired */
+   kvm_apic_local_deliver(apic, APIC_LVTT);
+   }
+   local_irq_restore(flags);
+}
+
+static int pv_timer_polling(void *arg)
+{
+   struct kvm *kvm;
+   struct kvm_vcpu *vcpu;
+   int i;
+   mm_segment_t oldfs = get_fs();
+
+   while (1) {
+   set_current_state(TASK_INTERRUPTIBLE);
+
+   if (kthread_should_stop()) {
+   __set_current_state(TASK_RUNNING);
+   break;
+   }
+
+   spin_lock(&kvm_lock);
+   __set_current_state(TASK_RUNNING);
+   list_for_each_entry(kvm, &vm_list, vm_list) {
+   set_fs(USER_DS);
+   use_mm(kvm->mm);
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   kvm_apic_sync_pv_timer(vcpu);
+   }
+   unuse_mm(kvm->mm);
+   set_fs(oldfs);
+   }
+
+   spin_unlock(&kvm_lock);
+
+   schedule();
+   }
+
+   return 0;
+}
+
+static void kvm_pv_timer_init(void)
+{
+   ktime_t ktime = ktime_set(0, pv_timer_period_ns);
+
+   hrtimer_init(&pv_sync_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+   pv_sync_timer.function = &pv_sync_timer_callback;
+
+   /* kthread for pv_timer sync buffer */
+   pv_timer_polling_worker = kthread_create(pv_timer_polling, NULL,
+   "pv_timer_polling

[PATCH RFC 7/7] kvm: guest: reprogram guest timer

2017-12-08 Thread Quan Xu
From: Ben Luo 

In general, KVM guest programs tsc-deadline timestamp to
MSR_IA32_TSC_DEADLINE MSR.

When pvtimer is enabled, we introduce a new mechanism to
reprogram KVM guest timer. A periodically working kthread
scans share page and synchronize timer setting for guest
on a dedicated CPU. The next time event of the periodically
working kthread is a threshold to decide whether to program
tsc-deadline timestamp to MSR_IA32_TSC_DEADLINE MSR, or to
share page.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Signed-off-by: Ben Luo 
---
 arch/x86/include/asm/kvm_para.h |9 +
 arch/x86/kernel/apic/apic.c |9 ++---
 arch/x86/kernel/kvm.c   |   38 ++
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index c373e44..109e706 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern void kvmclock_init(void);
 extern int kvm_register_clock(char *txt);
@@ -92,6 +93,8 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned 
long p1,
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
+int kvm_pv_timer_next_event(unsigned long tsc,
+   struct clock_event_device *evt);
 extern void kvm_disable_steal_time(void);
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
@@ -126,6 +129,12 @@ static inline void kvm_disable_steal_time(void)
 {
return;
 }
+
+static inline int kvm_pv_timer_next_event(unsigned long tsc,
+   struct clock_event_device *evt)
+{
+   return 0;
+}
 #endif
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ff89177..286c1b3 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -471,10 +471,13 @@ static int lapic_next_event(unsigned long delta,
 static int lapic_next_deadline(unsigned long delta,
   struct clock_event_device *evt)
 {
-   u64 tsc;
+   u64 tsc = rdtsc() + (((u64) delta) * TSC_DIVISOR);
 
-   tsc = rdtsc();
-   wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR));
+   /* TODO: undisciplined function call */
+   if (kvm_pv_timer_next_event(tsc, evt))
+   return 0;
+
+   wrmsrl(MSR_IA32_TSC_DEADLINE, tsc);
return 0;
 }
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8bb9594..ec7aff1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -328,6 +328,35 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 
val)
apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
+static DEFINE_PER_CPU(int, pvtimer_enabled);
+static DEFINE_PER_CPU(struct pvtimer_vcpu_event_info,
+ pvtimer_shared_buf) = {0};
+#define PVTIMER_PADDING25000
+int kvm_pv_timer_next_event(unsigned long tsc,
+   struct clock_event_device *evt)
+{
+   struct pvtimer_vcpu_event_info *src;
+   u64 now;
+
+   if (!this_cpu_read(pvtimer_enabled))
+   return false;
+
+   src = this_cpu_ptr(&pvtimer_shared_buf);
+   xchg((u64 *)&src->expire_tsc, tsc);
+
+   barrier();
+
+   if (tsc < src->next_sync_tsc)
+   return false;
+
+   rdtscll(now);
+   if (tsc < now || tsc - now < PVTIMER_PADDING)
+   return false;
+
+   return true;
+}
+EXPORT_SYMBOL_GPL(kvm_pv_timer_next_event);
+
 static void kvm_guest_cpu_init(void)
 {
if (!kvm_para_available())
@@ -362,6 +391,15 @@ static void kvm_guest_cpu_init(void)
 
if (has_steal_clock)
kvm_register_steal_time();
+
+   if (kvm_para_has_feature(KVM_FEATURE_PV_TIMER)) {
+   unsigned long data;
+
+   data  = slow_virt_to_phys(this_cpu_ptr(&pvtimer_shared_buf))
+ | KVM_MSR_ENABLED;
+   wrmsrl(MSR_KVM_PV_TIMER_EN, data);
+   this_cpu_write(pvtimer_enabled, 1);
+   }
 }
 
 static void kvm_pv_disable_apf(void)
-- 
1.7.1



[PATCH RFC 6/7] Doc/KVM: introduce a new cpuid bit for kvm pvtimer

2017-12-08 Thread Quan Xu
From: Ben Luo 

KVM_FEATURE_PV_TIMER enables guest to check whether pvtimer
can be enabled in guest.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Signed-off-by: Ben Luo 
---
 Documentation/virtual/kvm/cpuid.txt  |4 
 arch/x86/include/uapi/asm/kvm_para.h |1 +
 arch/x86/kvm/cpuid.c |1 +
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..b26b31c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,10 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_PV_TIMER   || 8 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || timer support.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 3dd6116..46734a8 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_PV_TIMER   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..e02fd23 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -593,6 +593,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1 << KVM_FEATURE_CLOCKSOURCE2) |
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_PV_EOI) |
+(1 << KVM_FEATURE_PV_TIMER) |
 (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
 (1 << KVM_FEATURE_PV_UNHALT);
 
-- 
1.7.1



[PATCH RFC 4/7] KVM: timer: program timer to a dedicated CPU

2017-12-08 Thread Quan Xu
From: Ben Luo 

KVM always registers timer on the CPU which vCPU was running on.
Even though vCPU thread is rescheduled to another CPU, the timer
will be migrated to the target CPU as well. When timer expired,
timer interrupt could make guest-mode vCPU VM-exit on this CPU.

Since the working kthread scans periodically, some of the timer
events may be lost or delayed. We have to program these tsc-
deadline timestamps to MSR_IA32_TSC_DEADLINE as normal, which
will cause VM-exit and KVM will signal the working thread through
IPI to program timer, instread of registering on current CPU.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Signed-off-by: Ben Luo 
---
 arch/x86/kvm/lapic.c |8 +++-
 arch/x86/kvm/x86.c   |7 ++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 20a23bb..5835a27 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2072,7 +2072,13 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer 
*data)
struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, 
lapic_timer);
 
-   apic_timer_expired(apic);
+
+   if (pv_timer_enabled(apic->vcpu)) {
+   kvm_apic_local_deliver(apic, APIC_LVTT);
+   if (apic_lvtt_tscdeadline(apic))
+   apic->lapic_timer.tscdeadline = 0;
+   } else
+   apic_timer_expired(apic);
 
if (lapic_is_periodic(apic)) {
advance_periodic_target_expiration(apic);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5668774..3cbb223 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -26,6 +26,7 @@
 #include "tss.h"
 #include "kvm_cache_regs.h"
 #include "x86.h"
+#include "lapic.h"
 #include "cpuid.h"
 #include "pmu.h"
 #include "hyperv.h"
@@ -2196,7 +2197,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
return kvm_x2apic_msr_write(vcpu, msr, data);
case MSR_IA32_TSCDEADLINE:
-   kvm_set_lapic_tscdeadline_msr(vcpu, data);
+   if (pv_timer_enabled(vcpu))
+   smp_call_function_single(PVTIMER_SYNC_CPU,
+   kvm_apic_sync_pv_timer, vcpu, 0);
+   else
+   kvm_set_lapic_tscdeadline_msr(vcpu, data);
break;
case MSR_IA32_TSC_ADJUST:
if (guest_cpuid_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
-- 
1.7.1



Re: WARNING in x86_emulate_insn

2017-12-08 Thread Ingo Molnar

* Tianyu Lan  wrote:

> Hi Jim&Wanpeng:
>  Thanks for your help.
> 
> 2017-12-08 5:25 GMT+08:00 Jim Mattson :
> > Try disabling the module parameter, "unrestricted_guest." Make sure
> > that the module parameter, "emulate_invalid_guest_state" is enabled.
> > This combination allows userspace to feed invalid guest state into the
> > in-kernel emulator.
> 
> Yes, you are right. I need to disable unrestricted_guest to reproduce the 
> issue.
> 
> I find this is pop instruction emulation issue. According "SDM VOL2,
> chapter INSTRUCTION
> SET REFERENCE. POP—Pop a Value from the Stack"
> 
> Protected Mode Exceptions
> #GP(0) If attempt is made to load SS register with NULL segment selector.
> 
> This test case hits it but current code doesn't check such case.
> The following patch can fix the issue.
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index abe74f7..e2ac5cc 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1844,6 +1844,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
> int rc;
> struct segmented_address addr;
> 
> +   if ( !get_segment_selector(ctxt, VCPU_SREG_SS))
> +   return emulate_gp(ctxt, 0);
> +
> addr.ea = reg_read(ctxt, VCPU_REGS_RSP) & stack_mask(ctxt);
> addr.seg = VCPU_SREG_SS;
> rc = segmented_read(ctxt, addr, dest, len);

s/if ( !get_segment_selector
 /if (!get_segment_selector

I think it would also be nice to convert the syzkaller testcase to a new KVM 
unit 
test:

  git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git

There's a test_pop() function in kvm-unit-tests/x86/emulator.c.

Thanks,

Ingo


[tip:sched/core] sched/fair: Remove unused 'curr' parameter from wakeup_gran

2017-12-08 Thread tip-bot for Cheng Jian
Commit-ID:  a555e9d86ee384d9d3cb3310a57aed33f7e053d4
Gitweb: https://git.kernel.org/tip/a555e9d86ee384d9d3cb3310a57aed33f7e053d4
Author: Cheng Jian 
AuthorDate: Thu, 7 Dec 2017 21:30:43 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 8 Dec 2017 07:51:53 +0100

sched/fair: Remove unused 'curr' parameter from wakeup_gran

The first parameter of wakeup_gran(), 'curr', is unnecessary now.

Signed-off-by: Cheng Jian 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: huawei.li...@huawei.com
Cc: xiexi...@huawei.com
Link: 
http://lkml.kernel.org/r/1512653443-179848-1-git-send-email-cj.chengj...@huawei.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2fe3aa8..2915c0d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6449,8 +6449,7 @@ static void task_dead_fair(struct task_struct *p)
 }
 #endif /* CONFIG_SMP */
 
-static unsigned long
-wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
+static unsigned long wakeup_gran(struct sched_entity *se)
 {
unsigned long gran = sysctl_sched_wakeup_granularity;
 
@@ -6492,7 +6491,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct 
sched_entity *se)
if (vdiff <= 0)
return -1;
 
-   gran = wakeup_gran(curr, se);
+   gran = wakeup_gran(se);
if (vdiff > gran)
return 1;
 


Re: kernel BUG at net/core/skbuff.c:LINE! (2)

2017-12-08 Thread Xin Long
On Fri, Dec 8, 2017 at 4:16 PM, syzbot

wrote:
> syzkaller has found reproducer for the following crash on
> 82bcf1def3b5f1251177ad47c44f7e17af039b4b
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
>
>
> skbuff: skb_over_panic: text:10b86b8d len:196 put:20
> head:3b477e60 data:0e85441e tail:0xd4 end:0xc0 dev:lo
> [ cut here ]
> kernel BUG at net/core/skbuff.c:104!
> invalid opcode:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
> RSP: 0018:8801db307508 EFLAGS: 00010286
> RAX: 0082 RBX: 8801c517e840 RCX: 
> RDX: 0082 RSI: 11003b660e61 RDI: ed003b660e95
> RBP: 8801db307570 R08: 11003b660e23 R09: 
> R10:  R11:  R12: 85bd4020
> R13: 84754ed2 R14: 0014 R15: 8801c4e26540
> FS:  () GS:8801db30() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00463610 CR3: 0001c6698000 CR4: 001406e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  skb_over_panic net/core/skbuff.c:109 [inline]
>  skb_put+0x181/0x1c0 net/core/skbuff.c:1694
>  add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
>  add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
>  mld_send_cr net/ipv6/mcast.c:1903 [inline]
>  mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
>  call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
>  expire_timers kernel/time/timer.c:1357 [inline]
>  __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
>  run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
>  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1d3/0x210 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:540 [inline]
>  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920
>  
> RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
> RSP: 0018:8801d9f97da8 EFLAGS: 0282 ORIG_RAX: ff11
> RAX: dc00 RBX: 11003b3f2fb8 RCX: 
> RDX: 10c59734 RSI: 0001 RDI: 862cb9a0
> RBP: 8801d9f97da8 R08:  R09: 
> R10:  R11:  R12: 0001
> R13: 8801d9f97e60 R14: 869eb920 R15: 
>  arch_safe_halt arch/x86/include/asm/paravirt.h:93 [inline]
>  default_idle+0xbf/0x430 arch/x86/kernel/process.c:355
>  arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:346
>  default_idle_call+0x36/0x90 kernel/sched/idle.c:98
>  cpuidle_idle_call kernel/sched/idle.c:156 [inline]
>  do_idle+0x24a/0x3b0 kernel/sched/idle.c:246
>  cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:351
>  start_secondary+0x330/0x460 arch/x86/kernel/smpboot.c:277
>  secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:237
> Code: 03 0f b6 04 01 84 c0 74 04 3c 03 7e 20 8b 4b 78 41 57 48 c7 c7 a0 38
> bd 85 52 56 4c 89 ea 41 50 4c 89 e6 45 89 f0 e8 0c b6 3d fd <0f> 0b 4c 89 4d
> b8 4c 89 45 c0 48 89 75 c8 48 89 55 d0 e8 7d 93
> RIP: skb_panic+0x15c/0x1f0 net/core/skbuff.c:100 RSP: 8801db307508
> ---[ end trace 941a8a0f633e271f ]---
>
This isn't a sctp problem, but mld's, seems when lo's mtu became 0,
it allocs a skb without enough space in add_grec():
  if (AVAILABLE(skb) < sizeof(*psrc) +
first*sizeof(struct mld2_grec)) {
if (truncate && !first)
break;   /* truncate these */
if (pgr)
pgr->grec_nsrcs = htons(scount);
if (skb)
mld_sendpack(skb);
skb = mld_newpack(idev, dev->mtu); <---

I will check this for sure later on both igmp and mld.


[PATCH] cgroup: avoid cgroup root name longer than max

2017-12-08 Thread Ma Shimiao
cgroup root name has max length limit, we should avoid copying
longer name than that to the name.

Signed-off-by: Ma Shimiao 
---
 kernel/cgroup/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 0b1ffe147f24..3614a21ad6b8 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1866,7 +1866,7 @@ void init_cgroup_root(struct cgroup_root *root, struct 
cgroup_sb_opts *opts)
if (opts->release_agent)
strcpy(root->release_agent_path, opts->release_agent);
if (opts->name)
-   strcpy(root->name, opts->name);
+   strncpy(root->name, opts->name, MAX_CGROUP_ROOT_NAMELEN);
if (opts->cpuset_clone_children)
set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
 }
-- 
2.13.6





Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement

2017-12-08 Thread Michal Hocko
On Fri 08-12-17 16:38:46, kemi wrote:
> 
> 
> On 2017年11月30日 17:45, Michal Hocko wrote:
> > On Thu 30-11-17 17:32:08, kemi wrote:
> 
> > Do not get me wrong. If we want to make per-node stats more optimal,
> > then by all means let's do that. But having 3 sets of counters is just
> > way to much.
> > 
> 
> Hi, Michal
>   Apologize to respond later in this email thread.
> 
> After thinking about how to optimize our per-node stats more gracefully, 
> we may add u64 vm_numa_stat_diff[] in struct per_cpu_nodestat, thus,
> we can keep everything in per cpu counter and sum them up when read /proc
> or /sys for numa stats. 
> What's your idea for that? thanks

I would like to see a strong argument why we cannot make it a _standard_
node counter.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 1/3] x86/boot: add acpi rsdp address to setup_header

2017-12-08 Thread Ingo Molnar

* Juergen Gross  wrote:

> >> +Offset/size:  0x268/8
> >> +Protocol: 2.14+
> >> +
> >> +  This field can be set by the boot loader to tell the kernel the
> >> +  physical address of the ACPI RSDP table.
> >> +
> >> +  A value of 0 indicates the kernel should fall back to the standard
> >> +  methods to locate the RSDP (search in EBDA/low memory).
> > 
> > That's not the only method used: the ACPI RSDP address can also be 
> > discovered via 
> > efi.rsdp20 and efi.rsdp, both of which appear to be 32-bit values.
> 
> Sure, but this is valid for booting via EFI only.

Yeah, so what I tried to say is that the description as written is not fully 
correct and triggered my pedantry:

 +  A value of 0 indicates the kernel should fall back to the standard
 +  methods to locate the RSDP (search in EBDA/low memory).

To make it correct we need to either write less:

 +  A value of 0 indicates the kernel should fall back to the standard
 +  methods to locate the RSDP.

or write more and make it open ended so it doesn't have to be extended with 
every 
method of getting the RSDP that might be added in the future:

 +  A value of 0 indicates the kernel should fall back to the standard
 +  methods to locate the RSDP (search in EBDA/low memory, get it from
 +  EFI if present, etc.).

... or so?

Thanks,

Ingo


Re: WARNING in x86_emulate_insn

2017-12-08 Thread Tianyu Lan
2017-12-08 16:44 GMT+08:00 Ingo Molnar :
>
> * Tianyu Lan  wrote:
>
>> Hi Jim&Wanpeng:
>>  Thanks for your help.
>>
>> 2017-12-08 5:25 GMT+08:00 Jim Mattson :
>> > Try disabling the module parameter, "unrestricted_guest." Make sure
>> > that the module parameter, "emulate_invalid_guest_state" is enabled.
>> > This combination allows userspace to feed invalid guest state into the
>> > in-kernel emulator.
>>
>> Yes, you are right. I need to disable unrestricted_guest to reproduce the 
>> issue.
>>
>> I find this is pop instruction emulation issue. According "SDM VOL2,
>> chapter INSTRUCTION
>> SET REFERENCE. POP—Pop a Value from the Stack"
>>
>> Protected Mode Exceptions
>> #GP(0) If attempt is made to load SS register with NULL segment selector.
>>
>> This test case hits it but current code doesn't check such case.
>> The following patch can fix the issue.
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index abe74f7..e2ac5cc 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -1844,6 +1844,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
>> int rc;
>> struct segmented_address addr;
>>
>> +   if ( !get_segment_selector(ctxt, VCPU_SREG_SS))
>> +   return emulate_gp(ctxt, 0);
>> +
>> addr.ea = reg_read(ctxt, VCPU_REGS_RSP) & stack_mask(ctxt);
>> addr.seg = VCPU_SREG_SS;
>> rc = segmented_read(ctxt, addr, dest, len);
>
> s/if ( !get_segment_selector
>  /if (!get_segment_selector

Sorry. I mixed xen and kernel code style...

>
> I think it would also be nice to convert the syzkaller testcase to a new KVM 
> unit
> test:

Sure. I will add it.

>
>   git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
>
> There's a test_pop() function in kvm-unit-tests/x86/emulator.c.
>
> Thanks,
>
> Ingo



-- 
Best regards
Tianyu Lan


Re: [RFC PATCH v2 1/2] xen/pvh: Add memory map pointer to hvm_start_info struct

2017-12-08 Thread Jan Beulich
>>> On 07.12.17 at 23:45,  wrote:
> The start info structure that is defined as part of the x86/HVM direct
> boot ABI and used for starting Xen PVH guests would be more versatile if
> it also included a way to efficiently pass information about the memory
> map to the guest.
> 
> That way Xen PVH guests would not be forced to use a hypercall to get the
> information and would make it easier for KVM guests to share the PVH
> entry point.
> ---
>  include/xen/interface/hvm/start_info.h | 34 
> +++---
>  1 file changed, 31 insertions(+), 3 deletions(-)

First of all such a change should be submitted against the canonical
copy of the header, which lives in the Xen tree.

The argument of avoiding a hypercall doesn't really count imo - this
isn't in any way performance critical code. The argument of making
re-use easier is fine, though.

> --- a/include/xen/interface/hvm/start_info.h
> +++ b/include/xen/interface/hvm/start_info.h
> @@ -33,7 +33,7 @@
>   *| magic  | Contains the magic value XEN_HVM_START_MAGIC_VALUE
>   *|| ("xEn3" with the 0x80 bit of the "E" set).
>   *  4 ++
> - *| version| Version of this structure. Current version is 0. New
> + *| version| Version of this structure. Current version is 1. New
>   *|| versions are guaranteed to be backwards-compatible.
>   *  8 ++
>   *| flags  | SIF_xxx flags.
> @@ -48,6 +48,12 @@
>   * 32 ++
>   *| rsdp_paddr | Physical address of the RSDP ACPI data structure.
>   * 40 ++
> + *| memmap_paddr   | Physical address of the memory map. Only present in
> + *|| version 1 and newer of the structure.
> + * 48 ++
> + *| memmap_entries | Number of entries in the memory map table. Only
> + *|| present in version 1 and newer of the structure.
> + * 52 ++

Please let's make this optional even in v1 (and later), i.e. spell out
that it may be zero. That way Xen code could continue to use the
hypercall approach even.

Also please spell out a 4-byte reserved entry at the end, to make
the specified structure a multiple of 8 in size again regardless of
bitness of the producer/consumer.

> @@ -62,6 +68,17 @@
>   *| reserved   |
>   * 32 ++
>   *
> + * The layout of each entry in the memory map table is as follows and no
> + * padding is used between entries in the array:
> + *
> + *  0 ++
> + *| addr   | Base address
> + *  8 ++
> + *| size   | Size of mapping
> + * 16 ++
> + *| type   | E820_TYPE_xxx
> + * 20 +|

I'm not convinced of re-using E820 types here. I can see that this
might ease the consumption in Linux, but I don't think there should
be any connection to x86 aspects here - the data being supplied is
x86-agnostic, and Linux'es placement of the header is also making
no connection to x86 (oddly enough, the current placement in the
Xen tree does, for a reason which escapes me).

I could also imagine reasons to add new types without them being
sanctioned by whoever maintains E820 type assignments.

As to the size field - you need to spell out whether these are bytes
or pages (it might be worthwhile to also make this explicit for the
addr one, but there I view it as less of a problem, since "address"
doesn't commonly mean a page granular entity).

Also this again lacks a 4-byte reserved field at the end.

> @@ -86,13 +103,24 @@ struct hvm_start_info {
>  uint64_t cmdline_paddr; /* Physical address of the command line. 
> */
>  uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data
> */
>  /* structure.   
>  */
> -};
> +uint64_t memmap_paddr;   /* Physical address of an array of   */
> + /* hvm_memmap_table_entry. Only present in   */
> + /* Ver 1 or later. For e820 mem map table.   */
> +uint32_t memmap_entries; /* Only present in Ver 1 or later. Number of */
> + /* entries in the memmap table.  */
> +} __attribute__((packed));

No packed attribute here and below please, at least not in the
canonical (non-Linux) variant of the header.

Jan


Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement

2017-12-08 Thread kemi


On 2017年11月30日 17:45, Michal Hocko wrote:
> On Thu 30-11-17 17:32:08, kemi wrote:

> Do not get me wrong. If we want to make per-node stats more optimal,
> then by all means let's do that. But having 3 sets of counters is just
> way to much.
> 

Hi, Michal
  Apologize to respond later in this email thread.

After thinking about how to optimize our per-node stats more gracefully, 
we may add u64 vm_numa_stat_diff[] in struct per_cpu_nodestat, thus,
we can keep everything in per cpu counter and sum them up when read /proc
or /sys for numa stats. 
What's your idea for that? thanks

The motivation for that modification is listed below:
1) thanks to 0-day system, a bug is reported for the V1 patch:

[0.00] BUG: unable to handle kernel paging request at 0392b000
[0.00] IP: __inc_numa_state+0x2a/0x34
[0.00] *pdpt =  *pde = f000ff53f000ff53 
[0.00] Oops: 0002 [#1] PREEMPT SMP
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-12996-g81611e2 #1
[0.00] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[0.00] task: cbf56000 task.stack: cbf4e000
[0.00] EIP: __inc_numa_state+0x2a/0x34
[0.00] EFLAGS: 00210006 CPU: 0
[0.00] EAX: 0392b000 EBX:  ECX:  EDX: cbef90ef
[0.00] ESI: cffdb320 EDI: 0004 EBP: cbf4fd80 ESP: cbf4fd7c
[0.00]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[0.00] CR0: 80050033 CR2: 0392b000 CR3: 0c0a8000 CR4: 000406b0
[0.00] DR0:  DR1:  DR2:  DR3: 
[0.00] DR6: fffe0ff0 DR7: 0400
[0.00] Call Trace:
[0.00]  zone_statistics+0x4d/0x5b
[0.00]  get_page_from_freelist+0x257/0x993
[0.00]  __alloc_pages_nodemask+0x108/0x8c8
[0.00]  ? __bitmap_weight+0x38/0x41
[0.00]  ? pcpu_next_md_free_region+0xe/0xab
[0.00]  ? pcpu_chunk_refresh_hint+0x8b/0xbc
[0.00]  ? pcpu_chunk_slot+0x1e/0x24
[0.00]  ? pcpu_chunk_relocate+0x15/0x6d
[0.00]  ? find_next_bit+0xa/0xd
[0.00]  ? cpumask_next+0x15/0x18
[0.00]  ? pcpu_alloc+0x399/0x538
[0.00]  cache_grow_begin+0x85/0x31c
[0.00]  cache_alloc+0x147/0x1e0
[0.00]  ? debug_smp_processor_id+0x12/0x14
[0.00]  kmem_cache_alloc+0x80/0x145
[0.00]  create_kmalloc_cache+0x22/0x64
[0.00]  kmem_cache_init+0xf9/0x16c
[0.00]  start_kernel+0x1d4/0x3d6
[0.00]  i386_start_kernel+0x9a/0x9e
[0.00]  startup_32_smp+0x15f/0x170

That is because u64 percpu pointer vm_numa_stat is used before initialization.

[...]
> +extern u64 __percpu *vm_numa_stat;
[...]
> +#ifdef CONFIG_NUMA
> + size = sizeof(u64) * num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS;
> + align = __alignof__(u64[num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS]);
> + vm_numa_stat = (u64 __percpu *)__alloc_percpu(size, align);
> +#endif

The pointer is used in mm_init->kmem_cache_init->create_kmalloc_cache->...->
__alloc_pages() when CONFIG_SLAB/CONFIG_ZONE_DMA is set in kconfig, while the
vm_numa_stat is initialized in setup_per_cpu_pageset after mm_init is called.
The proposal mentioned above can fix it by making the numa stats counter ready
before calling mm_init (start_kernel->build_all_zonelists() can help to do that)

2) Compare to the V1 patch, this modification makes the semantics of per-node 
numa
stats more clear for review and maintenance. 


Re: [alsa-devel] [PATCH v9 01/13] Documentation: Add SLIMbus summary

2017-12-08 Thread Vinod Koul
On Thu, Dec 07, 2017 at 11:22:51PM +, Srinivas Kandagatla wrote:
> Thankyou for taking time to review the patch,
> 
> On 07/12/17 17:32, Jonathan Corbet wrote:
> >On Thu,  7 Dec 2017 10:27:08 +
> >srinivas.kandaga...@linaro.org wrote:
> >
> >A couple of overall comments...
> >
> >>  Documentation/driver-api/index.rst   |   1 +
> >>  Documentation/driver-api/slimbus/index.rst   |  15 
> >>  Documentation/driver-api/slimbus/summary.rst | 106 
> >> +++
> >>  3 files changed, 122 insertions(+)
> >>  create mode 100644 Documentation/driver-api/slimbus/index.rst
> >>  create mode 100644 Documentation/driver-api/slimbus/summary.rst
> >
> >Do we really need a separate subdirectory for a single file?
> >
> May be not, TBH, I did take some inspiration from soundwire patches.

FWIW, SoundWire patches have more Documentation. We have 4 files atm, though
they are not part of current series, so a directory looks apt for that

> I can drop that in next version. We can think of adding directory if we end
> up adding more apis for the new features in future.
> 
> >It seems you have kerneldoc comments for your data structures and at least
> >some of your exported symbols.  If you really want to document this stuff
> >well, I'd suggest finishing out those comments, then pulling them into the
> >documentation in the appropriate places.
> Am sure all the exported symbols have kernel doc, I will pull them into
> relevant sub sections.
> 
> Do you think something like this http://paste.ubuntu.com/26135862/ makes
> sense?
> 
> thanks,
> srini
> >
> >Thanks,
> >
> >jon
> >
> ___
> Alsa-devel mailing list
> alsa-de...@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

-- 
~Vinod


Re: [PATCH -mm] mm, swap: Fix race between swapoff and some swap operations

2017-12-08 Thread Huang, Ying
Minchan Kim  writes:

> On Fri, Dec 08, 2017 at 01:41:10PM +0800, Huang, Ying wrote:
>> Minchan Kim  writes:
>> 
>> > On Thu, Dec 07, 2017 at 04:29:37PM -0800, Andrew Morton wrote:
>> >> On Thu,  7 Dec 2017 09:14:26 +0800 "Huang, Ying"  
>> >> wrote:
>> >> 
>> >> > When the swapin is performed, after getting the swap entry information
>> >> > from the page table, the PTL (page table lock) will be released, then
>> >> > system will go to swap in the swap entry, without any lock held to
>> >> > prevent the swap device from being swapoff.  This may cause the race
>> >> > like below,
>> >> > 
>> >> > CPU 1   CPU 2
>> >> > -   -
>> >> > do_swap_page
>> >> >   swapin_readahead
>> >> > __read_swap_cache_async
>> >> > swapoff   swapcache_prepare
>> >> >   p->swap_map = NULL__swap_duplicate
>> >> >   p->swap_map[?] /* !!! NULL 
>> >> > pointer access */
>> >> > 
>> >> > Because swap off is usually done when system shutdown only, the race
>> >> > may not hit many people in practice.  But it is still a race need to
>> >> > be fixed.
>> >> 
>> >> swapoff is so rare that it's hard to get motivated about any fix which
>> >> adds overhead to the regular codepaths.
>> >
>> > That was my concern, too when I see this patch.
>> >
>> >> 
>> >> Is there something we can do to ensure that all the overhead of this
>> >> fix is placed into the swapoff side?  stop_machine() may be a bit
>> >> brutal, but a surprising amount of code uses it.  Any other ideas?
>> >
>> > How about this?
>> >
>> > I think It's same approach with old where we uses si->lock everywhere
>> > instead of more fine-grained cluster lock.
>> >
>> > The reason I repeated to reset p->max to zero in the loop is to avoid
>> > using lockdep annotation(maybe, spin_lock_nested(something) to prevent
>> > false positive.
>> >
>> > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > index 42fe5653814a..9ce007a42bbc 100644
>> > --- a/mm/swapfile.c
>> > +++ b/mm/swapfile.c
>> > @@ -2644,6 +2644,19 @@ SYSCALL_DEFINE1(swapoff, const char __user *, 
>> > specialfile)
>> >swap_file = p->swap_file;
>> >old_block_size = p->old_block_size;
>> >p->swap_file = NULL;
>> > +
>> > +  if (p->flags & SWP_SOLIDSTATE) {
>> > +  unsigned long ci, nr_cluster;
>> > +
>> > +  nr_cluster = DIV_ROUND_UP(p->max, SWAPFILE_CLUSTER);
>> > +  for (ci = 0; ci < nr_cluster; ci++) {
>> > +  struct swap_cluster_info *sci;
>> > +
>> > +  sci = lock_cluster(p, ci * SWAPFILE_CLUSTER);
>> > +  p->max = 0;
>> > +  unlock_cluster(sci);
>> > +  }
>> > +  }
>> >p->max = 0;
>> >swap_map = p->swap_map;
>> >p->swap_map = NULL;
>> > @@ -3369,10 +3382,10 @@ static int __swap_duplicate(swp_entry_t entry, 
>> > unsigned char usage)
>> >goto bad_file;
>> >p = swap_info[type];
>> >offset = swp_offset(entry);
>> > -  if (unlikely(offset >= p->max))
>> > -  goto out;
>> >  
>> >ci = lock_cluster_or_swap_info(p, offset);
>> > +  if (unlikely(offset >= p->max))
>> > +  goto unlock_out;
>> >  
>> >count = p->swap_map[offset];
>> >  
>> 
>> Sorry, this doesn't work, because
>> 
>> lock_cluster_or_swap_info()
>> 
>> Need to read p->cluster_info, which may be freed during swapoff too.
>> 
>> 
>> To reduce the added overhead in regular code path, Maybe we can use SRCU
>> to implement get_swap_device() and put_swap_device()?  There is only
>> increment/decrement on CPU local variable in srcu_read_lock/unlock().
>> Should be acceptable in not so hot swap path?
>> 
>> This needs to select CONFIG_SRCU if CONFIG_SWAP is enabled.  But I guess
>> that should be acceptable too?
>> 
>
> Why do we need srcu here? Is it enough with rcu like below?
>
> It might have a bug/room to be optimized about performance/naming.
> I just wanted to show my intention.

Yes.  rcu should work too.  But if we use rcu, it may need to be called
several times to make sure the swap device under us doesn't go away, for
example, when checking si->max in __swp_swapcount() and
add_swap_count_continuation().  And I found we need rcu to protect swap
cache radix tree array too.  So I think it may be better to use one
calling to srcu_read_lock/unlock() instead of multiple callings to
rcu_read_lock/unlock().

Best Regards,
Huang, Ying


Re: [PATCH v2 1/3] x86/boot: add acpi rsdp address to setup_header

2017-12-08 Thread Juergen Gross
On 08/12/17 09:48, Ingo Molnar wrote:
> 
> * Juergen Gross  wrote:
> 
 +Offset/size:  0x268/8
 +Protocol: 2.14+
 +
 +  This field can be set by the boot loader to tell the kernel the
 +  physical address of the ACPI RSDP table.
 +
 +  A value of 0 indicates the kernel should fall back to the standard
 +  methods to locate the RSDP (search in EBDA/low memory).
>>>
>>> That's not the only method used: the ACPI RSDP address can also be 
>>> discovered via 
>>> efi.rsdp20 and efi.rsdp, both of which appear to be 32-bit values.
>>
>> Sure, but this is valid for booting via EFI only.
> 
> Yeah, so what I tried to say is that the description as written is not fully 
> correct and triggered my pedantry:
> 
>  +  A value of 0 indicates the kernel should fall back to the standard
>  +  methods to locate the RSDP (search in EBDA/low memory).
> 
> To make it correct we need to either write less:
> 
>  +  A value of 0 indicates the kernel should fall back to the standard
>  +  methods to locate the RSDP.
> 
> or write more and make it open ended so it doesn't have to be extended with 
> every 
> method of getting the RSDP that might be added in the future:
> 
>  +  A value of 0 indicates the kernel should fall back to the standard
>  +  methods to locate the RSDP (search in EBDA/low memory, get it from
>  +  EFI if present, etc.).
> 
> ... or so?

Aah, okay. I got your remark wrong then.

I think I'll go with the shorter variant.


Juergen


Re: [RFC PATCH 1/6] drm: Add Content Protection property

2017-12-08 Thread Daniel Vetter
On Thu, Dec 07, 2017 at 02:30:52PM +, Alan Cox wrote:
> > If you want to actually lock down a machine to implement content
> > protection, then you need secure boot without unlockable boot-loader and a
> > pile more bits in userspace. 
> 
> So let me take my Intel hat off for a moment.
> 
> The upstream policy has always been that we don't merge things which
> don't have an open usable user space. Is the HDCP encryption feature
> useful on its own ? What do users get from it ?
> 
> If this is just an enabler for a lump of binary stuff in ChromeOS then I
> don't think it belongs, if it is useful standalone then it seems it does
> belong ?

The cros side is ofc all open source. dri-devel is extremely strict with
not taking anything that doesn't fullfil this requirement, probably more
strict than anyone else. Sean has the link in the cover letter of his
patch series.

For more context, here's our documented expectations about the userspace
side of any uapi addition to drm:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] sched/autogroup: move sched.h include

2017-12-08 Thread Petr Mladek
On Fri 2017-12-08 17:24:22, Sergey Senozhatsky wrote:
> Move local "sched.h" include to the bottom. sched.h defines
> several macros that are getting redefined in ARCH-specific
> code, for instance, finish_arch_post_lock_switch() and
> prepare_arch_switch(), so we need ARCH-specific definitions
> to come in first.

This patch is needed to fix compilation error [1] caused by a patchset
that deprecates %pf/%pF printk modifiers[2].

IMHO, we should make sure that this fix goes into Linus' tree
before the printk-related patchset. What is the best practice,
please?

I see two reasonable possibilities. Either sched people could
push this for-4.15-rcX. Or I could put it into printk.git for-4.16
in the right order.

What do you think?

Referece:
[0] http://lkml.kernel.org/r/201712080259.tvo64xfa%fengguang...@intel.com
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/commit/?h=for-next&id=98fff2c57b7e88d643cb42ffd910fe9905b33176

Best Regards,
Petr


Re: [PATCH 0/1] About MIPS/Loongson maintainance

2017-12-08 Thread Huacai Chen
Hi, James,

Of course we don't want to send PR directly, if there is a better way.
So, I hope you can officially be a co-maintainer of linux-mips, and as
a result, our community will become more active. I think most of MIPS
developers have the same will as me.

Huacai

On Fri, Dec 8, 2017 at 3:51 PM, James Hogan  wrote:
> On Fri, Dec 08, 2017 at 12:01:46PM +0800, Jiaxun Yang wrote:
>> Also we're going to separate code between
>> Loongson2 and Loongson3 since they are becoming more and more
>> identical.
>
> Do you mean you want to combine them?
>
>> But It will cause a lot of changes under march of loongson64
>>  that currently maintaining by linux-mips community. Send plenty of
>> patches to mailing list would not be a wise way to do that. So we can
>> PR these changes to linux-next directly and PR to linux-mips before
>> merge window.
>
> For the avoidance of doubt, a pull request would not excempt you from
> needing your patches properly reviewed on the mailing lists first.
>
> And quoting Stephen's boilerplate response to linux-next additions:
>> Thanks for adding your subsystem tree as a participant of linux-next.  As
>> you may know, this is not a judgement of your code.  The purpose of
>> linux-next is for integration testing and to lower the impact of
>> conflicts between subsystems in the next merge window.
>>
>> You will need to ensure that the patches/commits in your tree/series have
>> been:
>>  * submitted under GPL v2 (or later) and include the Contributor's
>> Signed-off-by,
>>  * posted to the relevant mailing list,
>>  * reviewed by you (or another maintainer of your subsystem tree),
>>  * successfully unit tested, and
>>  * destined for the current or next Linux merge window.
>>
>> Basically, this should be just what you would send to Linus (or ask him
>> to fetch).  It is allowed to be rebased if you deem it necessary.
>
> Cheers
> James


Re: [RFC v3 PATCH 0/2] Introduce Security Version to EFI Stub

2017-12-08 Thread Gary Lin
On Thu, Dec 07, 2017 at 11:35:52AM +0100, Ingo Molnar wrote:
> 
> 
> * Gary Lin  wrote:
> 
> > On Thu, Dec 07, 2017 at 09:18:16AM +0100, Ingo Molnar wrote:
> > > 
> > > * Gary Lin  wrote:
> > > 
> > > > On Thu, Dec 07, 2017 at 07:09:27AM +0100, Ingo Molnar wrote:
> > > > > 
> > > > > * Gary Lin  wrote:
> > > > > 
> > > > > > On Wed, Dec 06, 2017 at 07:37:34PM +0100, Ingo Molnar wrote:
> > > > > > > 
> > > > > > > * Gary Lin  wrote:
> > > > > > > 
> > > > > > > > On Tue, Dec 05, 2017 at 04:14:26PM -0500, Josh Boyer wrote:
> > > > > > > > > On Tue, Dec 5, 2017 at 5:01 AM, Gary Lin  
> > > > > > > > > wrote:
> > > > > > > > > > The series of patches introduce Security Version to EFI 
> > > > > > > > > > stub.
> > > > > > > > > >
> > > > > > > > > > Security Version is a monotonically increasing number and 
> > > > > > > > > > designed to
> > > > > > > > > > prevent the user from loading an insecure kernel 
> > > > > > > > > > accidentally. The
> > > > > > > > > > bootloader maintains a list of security versions 
> > > > > > > > > > corresponding to
> > > > > > > > > > different distributions. After fixing a critical 
> > > > > > > > > > vulnerability, the
> > > > > > > > > > distribution kernel maintainer bumps the "version", and the 
> > > > > > > > > > bootloader
> > > > > > > > > > updates the list automatically. When the user tries to load 
> > > > > > > > > > a kernel
> > > > > > > > > > with a lower security version, the bootloader shows a 
> > > > > > > > > > warning prompt
> > > > > > > > > > to notify the user the potential risk.
> > > > > > > > > 
> > > > > > > > > If a distribution releases a kernel with a higher security 
> > > > > > > > > version and
> > > > > > > > > that it automatically updated on boot, what happens if that 
> > > > > > > > > kernel
> > > > > > > > > contains a different bug that causes it to fail to boot or 
> > > > > > > > > break
> > > > > > > > > critical functionality?  At that point, the user's machine 
> > > > > > > > > would be in
> > > > > > > > > a state where the higher security version is enforced but the 
> > > > > > > > > only
> > > > > > > > > kernel that provides that is broken.  Wouldn't that make a bad
> > > > > > > > > situation even worse by now requiring manual acceptance of 
> > > > > > > > > the older
> > > > > > > > > SV kernel boot physically at the machine?
> > > > > > > > > 
> > > > > > > > > I feel like I'm missing a detail here or something.
> > > > > > > > > 
> > > > > > > > If the new kernel fails to boot, then the user has to choose 
> > > > > > > > the kernel
> > > > > > > > manually anyway, and there will be an option in the warning 
> > > > > > > > prompt to
> > > > > > > > lower SV.
> > > > > > > 
> > > > > > > And what if the firmware does not support a lowering of the SV?
> > > > > > > 
> > > > > > The SV list is manipulated by the bootloader, and the firmware only
> > > > > > provides the interface to the storage, i.e. non-volatile flash.
> > > > > 
> > > > > What about systems where the bootloader is part of the system and 
> > > > > users only have 
> > > > > the ability to provide kernel images, but no ability to change the 
> > > > > boot loader?
> > > > 
> > > > It depends on how the bootloader works. If the system uses my
> > > > implementation of shim loader, it surely has the ability to lower SV,
> > > > but it requires physical access on purpose.
> > > 
> > > And that's my problem: if in practice the bootloader is 'part of the 
> > > system', is 
> > > signed and is updated like the firmware, then putting a "Security 
> > > Version" into 
> > > the kernel image and architecting a boot protocol for a monotonic method 
> > > for the 
> > > bootloader to restrict the loading of kernel images is an obviously bad 
> > > idea.
> > > 
> > Even though the bootloader doesn't actually block the booting?
> 
> We don't know that for sure, in that scenario *how* the bootloader interprets 
> the 
> SV is not under the user's control...
> 
OK, it seems the implementation in shim brings up some concern. I'll
discuss with my colleagues for other possible solutions.

Cheers,

Gary Lin


Re: [RESEND PATCH 0/4] drm/meson: power domain init related fixes

2017-12-08 Thread Jerome Brunet
On Wed, 2017-12-06 at 12:54 +0100, Neil Armstrong wrote:
> On the Amlogic Gx SoCs (GXBB, GXL & GXM), the VPU power domain is initialized
> by the vendor U-Boot code, but running mainline U-boot has been possible
> on these SoCs. But lacking such init made the system lock at kernel boot.
> 
> A PM Power Domain driver has been pushed at [1] to solve the main issue.
> The following patches :
> - updates the DT bindings accordingly
> - adds support for missing regulators and registers init
> 
> Neil Armstrong (4):
>   dt-bindings: display: amlogic,meson-vpu: Add optional power domain
> property
>   dt-bindings: display: amlogic,meson-dw-hdmi: Add optional HDMI 5V
> regulator
>   drm/meson: dw_hdmi: Add support for an optional external 5V regulator
>   drm/meson: Add missing VPU init
> 
>  .../devicetree/bindings/display/amlogic,meson-dw-hdmi.txt   |  4 
>  .../devicetree/bindings/display/amlogic,meson-vpu.txt   |  4 
>  drivers/gpu/drm/meson/meson_drv.c   |  9 +
>  drivers/gpu/drm/meson/meson_dw_hdmi.c   | 13
> +
>  drivers/gpu/drm/meson/meson_registers.h |  4 
>  5 files changed, 34 insertions(+)
> 

No dependencies on the bootloader anymore, this is great ! Thanks 
Series tested on libretech-cc s905x

Tested-by: Jerome Brunet 
Reviewed-by: Jerome Brunet 


Re: [PATCH 0/2] ARM: sun8i: a83t: Enable EMAC Ethernet

2017-12-08 Thread Maxime Ripard
On Fri, Dec 08, 2017 at 03:31:55PM +0800, Chen-Yu Tsai wrote:
> Hi,
> 
> This is my spin on enabling Ethernet on the A83T. It consists of
> Corentin's dtsi patch plus my board level patch. There's nothing
> really special about them.
> 
> ChenYu

Applied both, thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH 0/3] clk: sunxi-ng: sun8i: a83t: Use sigma-delta modulation for audio PLL

2017-12-08 Thread Maxime Ripard
On Fri, Dec 08, 2017 at 04:35:09PM +0800, Chen-Yu Tsai wrote:
> Hi,
> 
> This series follows previous improvements for the other Allwinner SoCs to
> improve audio quality, in particular the speed and pitch of audio playback.
> The audio PLLs in Allwinner SoCs cannot produce the correct frequency to
> match the audio sample rate families through integer factors. As such
> the audio is either too fast or too flow.
> 
> This is dealt with by using sigma-delta modulation, a form of fractional-N
> synthesis, on the divider. The parameters used are copied from the BSP
> kernel.
> 
> As these parameters assume that one of the untracked dividers is set to
> 2, we first add a fixed /2 post divider to the PLL, and force the
> untracked divider to /2. Then we introduce the sigma-delta modulation
> parameters.
> 
> This has been tested with SPDIF playback on the Cubietruck Plus, and an
> external PCM5122 DAC from a PiFi DAC 2.0+, connected via I2S to the
> Banana Pi M3. The I2C and I2S support used in this latter test will be
> sent as a separate series later.

Applied, thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [alsa-devel] [PATCH v9 01/13] Documentation: Add SLIMbus summary

2017-12-08 Thread Srinivas Kandagatla



On 08/12/17 08:44, Vinod Koul wrote:

Do we really need a separate subdirectory for a single file?


May be not, TBH, I did take some inspiration from soundwire patches.

FWIW, SoundWire patches have more Documentation. We have 4 files atm, though
they are not part of current series, so a directory looks apt for that


Yes, it does makes sense for soundwire!!


Re: [PATCH -mm] mm, swap: Fix race between swapoff and some swap operations

2017-12-08 Thread Minchan Kim
On Fri, Dec 08, 2017 at 04:41:38PM +0800, Huang, Ying wrote:
> Minchan Kim  writes:
> 
> > On Fri, Dec 08, 2017 at 01:41:10PM +0800, Huang, Ying wrote:
> >> Minchan Kim  writes:
> >> 
> >> > On Thu, Dec 07, 2017 at 04:29:37PM -0800, Andrew Morton wrote:
> >> >> On Thu,  7 Dec 2017 09:14:26 +0800 "Huang, Ying"  
> >> >> wrote:
> >> >> 
> >> >> > When the swapin is performed, after getting the swap entry information
> >> >> > from the page table, the PTL (page table lock) will be released, then
> >> >> > system will go to swap in the swap entry, without any lock held to
> >> >> > prevent the swap device from being swapoff.  This may cause the race
> >> >> > like below,
> >> >> > 
> >> >> > CPU 1 CPU 2
> >> >> > - -
> >> >> >   do_swap_page
> >> >> > swapin_readahead
> >> >> >   __read_swap_cache_async
> >> >> > swapoff swapcache_prepare
> >> >> >   p->swap_map = NULL  __swap_duplicate
> >> >> > p->swap_map[?] /* !!! NULL 
> >> >> > pointer access */
> >> >> > 
> >> >> > Because swap off is usually done when system shutdown only, the race
> >> >> > may not hit many people in practice.  But it is still a race need to
> >> >> > be fixed.
> >> >> 
> >> >> swapoff is so rare that it's hard to get motivated about any fix which
> >> >> adds overhead to the regular codepaths.
> >> >
> >> > That was my concern, too when I see this patch.
> >> >
> >> >> 
> >> >> Is there something we can do to ensure that all the overhead of this
> >> >> fix is placed into the swapoff side?  stop_machine() may be a bit
> >> >> brutal, but a surprising amount of code uses it.  Any other ideas?
> >> >
> >> > How about this?
> >> >
> >> > I think It's same approach with old where we uses si->lock everywhere
> >> > instead of more fine-grained cluster lock.
> >> >
> >> > The reason I repeated to reset p->max to zero in the loop is to avoid
> >> > using lockdep annotation(maybe, spin_lock_nested(something) to prevent
> >> > false positive.
> >> >
> >> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> >> > index 42fe5653814a..9ce007a42bbc 100644
> >> > --- a/mm/swapfile.c
> >> > +++ b/mm/swapfile.c
> >> > @@ -2644,6 +2644,19 @@ SYSCALL_DEFINE1(swapoff, const char __user *, 
> >> > specialfile)
> >> >  swap_file = p->swap_file;
> >> >  old_block_size = p->old_block_size;
> >> >  p->swap_file = NULL;
> >> > +
> >> > +if (p->flags & SWP_SOLIDSTATE) {
> >> > +unsigned long ci, nr_cluster;
> >> > +
> >> > +nr_cluster = DIV_ROUND_UP(p->max, SWAPFILE_CLUSTER);
> >> > +for (ci = 0; ci < nr_cluster; ci++) {
> >> > +struct swap_cluster_info *sci;
> >> > +
> >> > +sci = lock_cluster(p, ci * SWAPFILE_CLUSTER);
> >> > +p->max = 0;
> >> > +unlock_cluster(sci);
> >> > +}
> >> > +}
> >> >  p->max = 0;
> >> >  swap_map = p->swap_map;
> >> >  p->swap_map = NULL;
> >> > @@ -3369,10 +3382,10 @@ static int __swap_duplicate(swp_entry_t entry, 
> >> > unsigned char usage)
> >> >  goto bad_file;
> >> >  p = swap_info[type];
> >> >  offset = swp_offset(entry);
> >> > -if (unlikely(offset >= p->max))
> >> > -goto out;
> >> >  
> >> >  ci = lock_cluster_or_swap_info(p, offset);
> >> > +if (unlikely(offset >= p->max))
> >> > +goto unlock_out;
> >> >  
> >> >  count = p->swap_map[offset];
> >> >  
> >> 
> >> Sorry, this doesn't work, because
> >> 
> >> lock_cluster_or_swap_info()
> >> 
> >> Need to read p->cluster_info, which may be freed during swapoff too.
> >> 
> >> 
> >> To reduce the added overhead in regular code path, Maybe we can use SRCU
> >> to implement get_swap_device() and put_swap_device()?  There is only
> >> increment/decrement on CPU local variable in srcu_read_lock/unlock().
> >> Should be acceptable in not so hot swap path?
> >> 
> >> This needs to select CONFIG_SRCU if CONFIG_SWAP is enabled.  But I guess
> >> that should be acceptable too?
> >> 
> >
> > Why do we need srcu here? Is it enough with rcu like below?
> >
> > It might have a bug/room to be optimized about performance/naming.
> > I just wanted to show my intention.
> 
> Yes.  rcu should work too.  But if we use rcu, it may need to be called
> several times to make sure the swap device under us doesn't go away, for
> example, when checking si->max in __swp_swapcount() and

I think it's not a big concern performance pov and benefit is good
abstraction through current locking function so we don't need much churn.

> add_swap_count_continuation().  And I found we need rcu to protect swap
> cache radix tree array too.  So I think it may be better 

Re: [PATCH net-next v2 3/8] net: phy: meson-gxl: add read and write helpers for bank registers

2017-12-08 Thread Russell King - ARM Linux
On Thu, Dec 07, 2017 at 05:02:35PM +0100, Andrew Lunn wrote:
> > Banks actually comes from the datasheet, Yes.
> > I don't mind renaming it but I would be making things up. As you wish ?
> 
> Keep it as is for the moment.
>  
> > Does the usual pages comes with this weird toggle thing to open the access ?
> > Would we able to use these generic helpers with our this kind of quirks ?
> 
> I don't think the API has been defined yet. But what has been
> discussed is adding functions to struct phy_driver. The driver can
> then implement whatever is needed to select a given page. There will
> then be helpers which take the lock, select the page, do the
> read/write, select page 0, and unlock.

I'm not sure adding generic helpers really works, because the indirection
through phy_driver just makes the code more complex.  For Marvell, I
ended up with:

http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=phy&id=9ca46084228bb1b8851e7d24276d85c4ec6e13ae

and the preceding three commits.

The "generic" versions I came up with were basically lifting
marvell_read_paged(), marvell_write_paged() and marvell_modify_paged()
to phylib, along with the lower leve marvell_save_page(),
marvell_select_page() and marvell_restore_page().  The result is a
fair amount of out-of-line code and an assumption that it's a single
register.  As soon as a PHY has other requirements, these generic
implementations don't work.

Also, unfortunately, we can't get help from sparse for statically
checking the locking - the __acquire()/__release() annotations don't
work for mutexes.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH V6 4/7] OF: properties: Implement get_match_data() callback

2017-12-08 Thread Lothar Waßmann
Hi,

On Thu, 7 Dec 2017 12:50:50 -0500 Sinan Kaya wrote:
> On 12/7/2017 10:20 AM, Lothar Waßmann wrote:
> > Hi,
> > 
> > On Thu, 7 Dec 2017 09:45:31 -0500 Sinan Kaya wrote:
> >> On 12/7/2017 8:10 AM, Lothar Waßmann wrote:
>  +void *of_fwnode_get_match_data(const struct fwnode_handle *fwnode,
>  +   struct device *dev)
> >>> Shouldn't this be 'const void *of_fwnode_get_match_data
> >>
> >> OF keeps the driver data as a (const void*) internally. ACPI keeps the
> >> driver data as kernel_ulong_t in struct acpi_device_id.
> >>
> >> I tried to find the middle ground here by converting output to void*
> >> but not keeping const.
> >>
> > It should be no problem to cast a (const void *) to an unsigned long
> > data type (without const qualifier).
> > 
> 
> It is the other way around. If I change this API to return a a (const void*),
> the device_get_match_data() function need to return a (const void *).
> 
> While implementing the ACPI piece, I have to convert an unsigned long to
> (const void *) in ACPI code so that the APIs are compatible.
> 
Just one more remark: Do you need write access to the data the pointer
returned by device_get_match_data() or of_fwnode_get_match_data()
points to?
If not, the return type of those functions should be 'const void *'.


Lothar Waßmann


[PATCH] KVM: X86: Fix host dr6 miss restore

2017-12-08 Thread Wanpeng Li
From: Wanpeng Li 

Reported by syzkaller:

   WARNING: CPU: 0 PID: 12927 at arch/x86/kernel/traps.c:780 
do_debug+0x222/0x250
   CPU: 0 PID: 12927 Comm: syz-executor Tainted: G   OE4.15.0-rc2+ 
#16
   RIP: 0010:do_debug+0x222/0x250
   Call Trace:
<#DB>
debug+0x3e/0x70
   RIP: 0010:copy_user_enhanced_fast_string+0x10/0x20

_copy_from_user+0x5b/0x90
SyS_timer_create+0x33/0x80
entry_SYSCALL_64_fastpath+0x23/0x9a

The syzkaller will mmap a buffer which is also the struct sigevent parameter of 
timer_create(), it will also call perf_event_open() to set a BP for the buffer,
so when the implementation of timer_create() in kernel tries to get the struct 
sigevent parameter by copy_from_user(), rep movsb triggers the BP. The 
syzkaller 
testcase also sets the debug registers for the guest, however, the kvm just 
restores host debug registers when we have active breakpoints. I can observe 
the dr6 single step bit is set and !hw_breakpoint_active() sporadically by 
print 
when running the testcase heavy multithreading. The do_debug() which is 
triggered 
by rep movsb will splash when (dr6 & DR_STEP && !user_mode(regs)). 

This patch fixes it by restoring host dr6 unconditionally before preempt/irq 
enable.

Reported-by: Dmitry Vyukov 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: David Hildenbrand 
Cc: Dmitry Vyukov 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c5d55c..a6370fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7065,6 +7065,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 */
if (hw_breakpoint_active())
hw_breakpoint_restore();
+   else
+   set_debugreg(current->thread.debugreg6, 6);
 
vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
 
-- 
2.7.4



Re: [RFC PATCH] mm: kasan: suppress soft lockup in slub when !CONFIG_PREEMPT

2017-12-08 Thread Andrey Ryabinin
On 12/08/2017 11:26 AM, Dmitry Vyukov wrote:
> On Fri, Dec 8, 2017 at 12:40 AM, Matthew Wilcox  wrote:
>> On Fri, Dec 08, 2017 at 07:30:07AM +0800, Yang Shi wrote:
>>> When running stress test with KASAN enabled, the below softlockup may
>>> happen occasionally:
>>>
>>> NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s!
>>> hardirqs last  enabled at (0): [<  (null)>]  (null)
>>> hardirqs last disabled at (0): [] copy_process.part.30+0x5c6/0x1f50
>>> softirqs last  enabled at (0): [] copy_process.part.30+0x5c6/0x1f50
>>> softirqs last disabled at (0): [<  (null)>]  (null)
>>
>>> Call Trace:
>>>  [] __slab_free+0x19c/0x270
>>>  [] ___cache_free+0xa6/0xb0
>>>  [] qlist_free_all+0x47/0x80
>>>  [] quarantine_reduce+0x159/0x190
>>>  [] kasan_kmalloc+0xaf/0xc0
>>>  [] kasan_slab_alloc+0x12/0x20
>>>  [] kmem_cache_alloc+0xfa/0x360
>>>  [] ? getname_flags+0x4f/0x1f0
>>>  [] getname_flags+0x4f/0x1f0
>>>  [] getname+0x12/0x20
>>>  [] do_sys_open+0xf9/0x210
>>>  [] SyS_open+0x1e/0x20
>>>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>>
>> This feels like papering over a problem.  KASAN only calls
>> quarantine_reduce() when it's allowed to block.  Presumably it has
>> millions of entries on the free list at this point.  I think the right
>> thing to do is for qlist_free_all() to call cond_resched() after freeing
>> every N items.
> 
> 
> Agree. Adding touch_softlockup_watchdog() to a random low-level
> function looks like a wrong thing to do.
> quarantine_reduce() already has this logic. Look at
> QUARANTINE_BATCHES. It's meant to do exactly this -- limit amount of
> work in quarantine_reduce() and in quarantine_remove_cache() to
> reasonably-sized batches. We could simply increase number of batches
> to make them smaller. But it would be good to understand what exactly
> happens in this case. Batches should on a par of ~~1MB. Why freeing
> 1MB worth of objects (smallest of which is 32b) takes 22 seconds?
> 

I think the problem here is that kernel 4.9.44-003.ali3000.alios7.x86_64.debug
doesn't have 64abdcb24351 ("kasan: eliminate long stalls during quarantine 
reduction").

We probably should ask that commit to be included in stable, but it would be 
good to hear
a confirmation from Yang that it really helps.


Re: [PATCH] ARM: dts: sun8i-h3: Remove allwinner,leds-active-low for non internal PHY

2017-12-08 Thread Maxime Ripard
On Thu, Dec 07, 2017 at 07:21:02PM +0100, Corentin Labbe wrote:
> allwinner,leds-active-low have effect only on boards which us the internal 
> PHY.
> So this patch remove it from all boards which do not use the internal PHY.
> 
> Signed-off-by: Corentin Labbe 

Applied, thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


[PATCH] PCI: qcom: add missing supplies required for msm8996

2017-12-08 Thread srinivas . kandagatla
From: Srinivas Kandagatla 

This patch adds supplies that are required for msm8996. Two of them vdda
and vdda-1p8 are analog supplies that go in to controller, and the rest
of the two vddpe's are supplies to PCIe endpoints.

Without these supplies PCIe endpoints which require power supplies are
not enumerated at all, as there is no one to power it up.

Signed-off-by: Srinivas Kandagatla 
---
 .../devicetree/bindings/pci/qcom,pcie.txt  | 16 +
 drivers/pci/dwc/pcie-qcom.c| 28 --
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/qcom,pcie.txt 
b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
index 3c9d321b3d3b..045102cb3e12 100644
--- a/Documentation/devicetree/bindings/pci/qcom,pcie.txt
+++ b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
@@ -179,6 +179,11 @@
Value type: 
Definition: A phandle to the core analog power supply
 
+- vdda-1p8-supply:
+   Usage: required for msm8996
+   Value type: 
+   Definition: A phandle to the 1.8v analog power supply
+
 - vdda_phy-supply:
Usage: required for ipq/apq8064
Value type: 
@@ -189,6 +194,15 @@
Value type: 
Definition: A phandle to the analog power supply for IC which generates
reference clock
+- vddpe-supply:
+   Usage: optional
+   Value type: 
+   Definition: A phandle to the PCIe endpoint power supply
+
+- vddpe1-supply:
+   Usage: optional
+   Value type: 
+   Definition: A phandle to the PCIe endpoint power supply 1
 
 - phys:
Usage: required for apq8084
@@ -205,6 +219,8 @@
Value type: 
Definition: List of phandle and GPIO specifier pairs. Should contain
- "perst-gpios" PCIe endpoint reset signal line
+   - "pe_en-gpios" PCIe endpoint enable signal line
+   - "pe_en1-gpios" PCIe endpoint enable1 signal line
- "wake-gpios"  PCIe endpoint wake signal line
 
 * Example for ipq/apq8064
diff --git a/drivers/pci/dwc/pcie-qcom.c b/drivers/pci/dwc/pcie-qcom.c
index 952a4fc4bf3c..01488f90da31 100644
--- a/drivers/pci/dwc/pcie-qcom.c
+++ b/drivers/pci/dwc/pcie-qcom.c
@@ -109,13 +109,15 @@ struct qcom_pcie_resources_1_0_0 {
struct reset_control *core;
struct regulator *vdda;
 };
-
+#define QCOM_PCIE_MAX_SUPPLY   4
 struct qcom_pcie_resources_2_3_2 {
struct clk *aux_clk;
struct clk *master_clk;
struct clk *slave_clk;
struct clk *cfg_clk;
struct clk *pipe_clk;
+   int num_supplies;
+   struct regulator_bulk_data supplies[QCOM_PCIE_MAX_SUPPLY];
 };
 
 struct qcom_pcie_resources_2_4_0 {
@@ -529,6 +531,17 @@ static int qcom_pcie_get_resources_2_3_2(struct qcom_pcie 
*pcie)
struct qcom_pcie_resources_2_3_2 *res = &pcie->res.v2_3_2;
struct dw_pcie *pci = pcie->pci;
struct device *dev = pci->dev;
+   int ret;
+
+   res->supplies[0].supply = "vdda";
+   res->supplies[1].supply = "vdda-1p8";
+   res->supplies[2].supply = "vddpe";
+   res->supplies[3].supply = "vddpe1";
+   res->num_supplies = QCOM_PCIE_MAX_SUPPLY;
+   ret = devm_regulator_bulk_get(dev, QCOM_PCIE_MAX_SUPPLY,
+ res->supplies);
+   if (ret)
+   return ret;
 
res->aux_clk = devm_clk_get(dev, "aux");
if (IS_ERR(res->aux_clk))
@@ -558,6 +571,8 @@ static void qcom_pcie_deinit_2_3_2(struct qcom_pcie *pcie)
clk_disable_unprepare(res->master_clk);
clk_disable_unprepare(res->cfg_clk);
clk_disable_unprepare(res->aux_clk);
+
+   regulator_bulk_disable(res->num_supplies, res->supplies);
 }
 
 static void qcom_pcie_post_deinit_2_3_2(struct qcom_pcie *pcie)
@@ -575,10 +590,16 @@ static int qcom_pcie_init_2_3_2(struct qcom_pcie *pcie)
u32 val;
int ret;
 
+   ret = regulator_bulk_enable(res->num_supplies, res->supplies);
+   if (ret < 0) {
+   dev_err(dev, "cannot enable regulators\n");
+   return ret;
+   }
+
ret = clk_prepare_enable(res->aux_clk);
if (ret) {
dev_err(dev, "cannot prepare/enable aux clock\n");
-   return ret;
+   goto err_aux_clk;
}
 
ret = clk_prepare_enable(res->cfg_clk);
@@ -629,6 +650,9 @@ static int qcom_pcie_init_2_3_2(struct qcom_pcie *pcie)
 err_cfg_clk:
clk_disable_unprepare(res->aux_clk);
 
+err_aux_clk:
+   regulator_bulk_disable(res->num_supplies, res->supplies);
+
return ret;
 }
 
-- 
2.15.0



[PATCH net] net: mvpp2: fix the RSS table entry offset

2017-12-08 Thread Antoine Tenart
The macro used to access or set an RSS table entry was using an offset
of 8, while it should use an offset of 0. This lead to wrongly configure
the RSS table, not accessing the right entries.

Fixes: 1d7d15d79fb4 ("net: mvpp2: initialize the RSS tables")
Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvpp2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index fed2b2f909fc..634b2f41cc9e 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -85,7 +85,7 @@
 
 /* RSS Registers */
 #define MVPP22_RSS_INDEX   0x1500
-#define MVPP22_RSS_INDEX_TABLE_ENTRY(idx)  ((idx) << 8)
+#define MVPP22_RSS_INDEX_TABLE_ENTRY(idx)  (idx)
 #define MVPP22_RSS_INDEX_TABLE(idx)((idx) << 8)
 #define MVPP22_RSS_INDEX_QUEUE(idx)((idx) << 16)
 #define MVPP22_RSS_TABLE_ENTRY 0x1508
-- 
2.14.3



Re: [PATCH v4 2/5] kasan/Makefile: Support LLVM style asan parameters.

2017-12-08 Thread Alexander Potapenko
On Mon, Dec 4, 2017 at 8:17 PM, Paul Lawrence  wrote:
> From: Andrey Ryabinin 
>
> LLVM doesn't understand GCC-style paramters ("--param asan-foo=bar"),
> thus we currently we don't use inline/globals/stack instrumentation
> when building the kernel with clang.
>
> Add support for LLVM-style parameters ("-mllvm -asan-foo=bar") to
> enable all KASAN features.
>
> Signed-off-by: Andrey Ryabinin 
> Signed-off-by: Paul Lawrence 
> ---
>  scripts/Makefile.kasan | 29 ++---
>  1 file changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
> index 1ce7115aa499..d5a1a4b6d079 100644
> --- a/scripts/Makefile.kasan
> +++ b/scripts/Makefile.kasan
> @@ -10,10 +10,7 @@ KASAN_SHADOW_OFFSET ?= $(CONFIG_KASAN_SHADOW_OFFSET)
>
>  CFLAGS_KASAN_MINIMAL := -fsanitize=kernel-address
>
> -CFLAGS_KASAN := $(call cc-option, -fsanitize=kernel-address \
> -   -fasan-shadow-offset=$(KASAN_SHADOW_OFFSET) \
> -   --param asan-stack=1 --param asan-globals=1 \
> -   --param 
> asan-instrumentation-with-call-threshold=$(call_threshold))
> +cc-param = $(call cc-option, -mllvm -$(1), $(call cc-option, --param $(1)))
>
>  ifeq ($(call cc-option, $(CFLAGS_KASAN_MINIMAL) -Werror),)
> ifneq ($(CONFIG_COMPILE_TEST),y)
> @@ -21,13 +18,23 @@ ifeq ($(call cc-option, $(CFLAGS_KASAN_MINIMAL) -Werror),)
>  -fsanitize=kernel-address is not supported by compiler)
> endif
>  else
> -ifeq ($(CFLAGS_KASAN),)
> -ifneq ($(CONFIG_COMPILE_TEST),y)
> -$(warning CONFIG_KASAN: compiler does not support all options.\
> -Trying minimal configuration)
> -endif
> -CFLAGS_KASAN := $(CFLAGS_KASAN_MINIMAL)
> -endif
> +   # -fasan-shadow-offset fails without -fsanitize
Would be nice to have a comment here explaining that
-fasan-shadow-offset is a GCC flag whereas -asan-mapping-offset is an
LLVM one.
> +   CFLAGS_KASAN_SHADOW := $(call cc-option, -fsanitize=kernel-address \
> +   -fasan-shadow-offset=$(KASAN_SHADOW_OFFSET), \
> +   $(call cc-option, -fsanitize=kernel-address \
> +   -mllvm -asan-mapping-offset=$(KASAN_SHADOW_OFFSET)))
> +
> +   ifeq ($(strip $(CFLAGS_KASAN_SHADOW)),)
> +  CFLAGS_KASAN := $(CFLAGS_KASAN_MINIMAL)
> +   else
> +  # Now add all the compiler specific options that are valid standalone
> +  CFLAGS_KASAN := $(CFLAGS_KASAN_SHADOW) \
> +   $(call cc-param,asan-globals=1) \
> +   $(call 
> cc-param,asan-instrumentation-with-call-threshold=$(call_threshold)) \
> +   $(call cc-param,asan-stack=1) \
> +   $(call cc-param,asan-use-after-scope=1)
> +   endif
> +
>  endif
>
>  CFLAGS_KASAN += $(call cc-option, -fsanitize-address-use-after-scope)
> --
> 2.15.0.531.g2ccb3012c9-goog
>
Reviewed-by: Alexander Potapenko 


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


Re: [PATCH v8 4/6] clocksource: stm32: only use 32 bits timers

2017-12-08 Thread Benjamin Gaignard
2017-12-08 9:34 GMT+01:00 Daniel Lezcano :
> On 14/11/2017 09:52, Benjamin Gaignard wrote:
>> The clock driving counters is at 90MHz so the maximum period
>> for 16 bis counters is around 750 ms
>
> 728 us
>
>> which is a short period for a clocksource.
>
> Which clocksource are you talking about ?
>
>> For 32 bits counters this period is close
>> 47 secondes which is more acceptable.
>>
>> This patch remove 16 bits counters support and makes sure that
>> they won't be probed anymore.
>
> Are we talking about clockevent or clocksource?
>
> Is this issue present today ? Or is it if we add the clocksource support
> ? We are talking about clocksource but we change the clockevent code.
>
> All this is very confusing.
>
> I have a rough idea of what is happening, but it is not up to me to
> decode and infer from the changes, you need to describe *clearly* the
> situation.
>
>  - What happens if we use a 16bits timer as a clockevent ?
>  - What happens if we use a 16bits timer as a clocksource ?
>  - Why is it preferable to remove the support of the 16bits timers
> instead of downgrading them with the rating ?

Up to this patch it is only about clockevent, clocksource code is
introduced in patch 5.
For the both cases 16bits counter have a a too short period (728us)
and can't be used
so downgrading the rating is not a solution.

I will change the wording in v9

>
>> Signed-off-by: Benjamin Gaignard 
>
>> ---
>>  drivers/clocksource/timer-stm32.c | 26 --
>>  1 file changed, 12 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/clocksource/timer-stm32.c 
>> b/drivers/clocksource/timer-stm32.c
>> index ae41a19..8173bcf 100644
>> --- a/drivers/clocksource/timer-stm32.c
>> +++ b/drivers/clocksource/timer-stm32.c
>> @@ -83,9 +83,9 @@ static irqreturn_t stm32_clock_event_handler(int irq, void 
>> *dev_id)
>>  static int __init stm32_clockevent_init(struct device_node *node)
>>  {
>>   struct reset_control *rstc;
>> - unsigned long max_delta;
>> - int ret, bits, prescaler = 1;
>> + unsigned long max_arr;
>>   struct timer_of *to;
>> + int ret;
>>
>>   to = kzalloc(sizeof(*to), GFP_KERNEL);
>>   if (!to)
>> @@ -115,29 +115,27 @@ static int __init stm32_clockevent_init(struct 
>> device_node *node)
>>
>>   /* Detect whether the timer is 16 or 32 bits */
>>   writel_relaxed(~0U, timer_of_base(to) + TIM_ARR);
>> - max_delta = readl_relaxed(timer_of_base(to) + TIM_ARR);
>> - if (max_delta == ~0U) {
>> - prescaler = 1;
>> - bits = 32;
>> - } else {
>> - prescaler = 1024;
>> - bits = 16;
>> + max_arr = readl_relaxed(timer_of_base(to) + TIM_ARR);
>> + if (max_arr != ~0U) {
>> + pr_err("32 bits timer is needed\n");
>> + ret = -EINVAL;
>> + goto deinit;
>>   }
>
> Wrap this in a function:
>
> static bool stm32_timer_is_32bits(struct timer_of *to)
> {
> return readl_relaxed(timer_of_base(to) + TIM_ARR) == ~0UL;
> }
>
> Then clearly inform the user.
>
> if (!stm32_timer_is_32bits(to)) {
> pr_warn("Timer %pOF is a 16 bits timer\n", node);
> /* abort the registration or downgrade the timer's rating */
> }

Ok I will change that in v9

>
>> +
>>   writel_relaxed(0, timer_of_base(to) + TIM_ARR);
>>
>> - writel_relaxed(prescaler - 1, timer_of_base(to) + TIM_PSC);
>> + writel_relaxed(0, timer_of_base(to) + TIM_PSC);
>>   writel_relaxed(TIM_EGR_UG, timer_of_base(to) + TIM_EGR);
>>   writel_relaxed(TIM_DIER_UIE, timer_of_base(to) + TIM_DIER);
>>   writel_relaxed(0, timer_of_base(to) + TIM_SR);
>>
>>   clockevents_config_and_register(&to->clkevt,
>> - timer_of_period(to), MIN_DELTA, 
>> max_delta);
>> -
>> - pr_info("%pOF: STM32 clockevent driver initialized (%d bits)\n",
>> - node, bits);
>> + timer_of_period(to), MIN_DELTA, ~0U);
>>
>>   return 0;
>>
>> +deinit:
>> + timer_of_exit(to);
>
> Fix this please (timer_of_cleanup).
>
> In the future, make sure the patches are git-bisect safe.
>
>
>
> --
>   Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:   Facebook |
>  Twitter |
>  Blog


Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap

2017-12-08 Thread David Rientjes
On Thu, 7 Dec 2017, David Rientjes wrote:

> I'm backporting and testing the following patch against Linus's tree.  To 
> clarify an earlier point, we don't actually have any change from upstream 
> code that allows for free_pgtables() before the 
> set_bit(MMF_OOM_SKIP);down_write();up_write() cycle.
> 
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -66,6 +66,15 @@ static inline bool tsk_is_oom_victim(struct task_struct * 
> tsk)
>   return tsk->signal->oom_mm;
>  }
>  
> +/*
> + * Use this helper if tsk->mm != mm and the victim mm needs a special
> + * handling. This is guaranteed to stay true after once set.
> + */
> +static inline bool mm_is_oom_victim(struct mm_struct *mm)
> +{
> + return test_bit(MMF_OOM_VICTIM, &mm->flags);
> +}
> +
>  /*
>   * Checks whether a page fault on the given mm is still reliable.
>   * This is no longer true if the oom reaper started to reap the
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -71,6 +71,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>  #define MMF_HUGE_ZERO_PAGE   23  /* mm has ever used the global huge 
> zero page */
>  #define MMF_DISABLE_THP  24  /* disable THP for all VMAs */
>  #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
> +#define MMF_OOM_VICTIM   25  /* mm is the oom victim */
>  
>  #define MMF_INIT_MASK(MMF_DUMPABLE_MASK | 
> MMF_DUMP_FILTER_MASK |\
>MMF_DISABLE_THP_MASK)
> diff --git a/mm/mmap.c b/mm/mmap.c
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3019,20 +3019,20 @@ void exit_mmap(struct mm_struct *mm)
>   /* Use -1 here to ensure all VMAs in the mm are unmapped */
>   unmap_vmas(&tlb, vma, 0, -1);
>  
> - set_bit(MMF_OOM_SKIP, &mm->flags);
> - if (unlikely(tsk_is_oom_victim(current))) {
> + if (unlikely(mm_is_oom_victim(mm))) {
>   /*
>* Wait for oom_reap_task() to stop working on this
>* mm. Because MMF_OOM_SKIP is already set before
>* calling down_read(), oom_reap_task() will not run
>* on this "mm" post up_write().
>*
> -  * tsk_is_oom_victim() cannot be set from under us
> +  * mm_is_oom_victim() cannot be set from under us
>* either because current->mm is already set to NULL
>* under task_lock before calling mmput and oom_mm is
>* set not NULL by the OOM killer only if current->mm
>* is found not NULL while holding the task_lock.
>*/
> + set_bit(MMF_OOM_SKIP, &mm->flags);
>   down_write(&mm->mmap_sem);
>   up_write(&mm->mmap_sem);
>   }
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -683,8 +683,10 @@ static void mark_oom_victim(struct task_struct *tsk)
>   return;
>  
>   /* oom_mm is bound to the signal struct life time. */
> - if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
> + if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
>   mmgrab(tsk->signal->oom_mm);
> + set_bit(MMF_OOM_VICTIM, &mm->flags);
> + }
>  
>   /*
>* Make sure that the task is woken up from uninterruptible sleep
> 

This passes all functional testing that I have and I can create a 
synthetic testcase that can trigger at least MMF_OOM_VICTIM getting set 
while oom_reaper is still working on an mm that this prevents, so feel 
free to add an

Acked-by: David Rientjes 

with a variant of your previous changelogs.  Thanks!

I think it would appropriate to cc stable for 4.14 and add a

Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run 
concurrently")

if nobody disagrees, which I think you may have already done on a previous 
iteration.

We can still discuss if there are any VM_LOCKED subtleties in the this 
thread, but I have no evidence that it is responsible for any issues.


Re: WARNING in x86_emulate_insn

2017-12-08 Thread Wanpeng Li
2017-12-08 16:28 GMT+08:00 Tianyu Lan :
> Hi Jim&Wanpeng:
>  Thanks for your help.
>
> 2017-12-08 5:25 GMT+08:00 Jim Mattson :
>> Try disabling the module parameter, "unrestricted_guest." Make sure
>> that the module parameter, "emulate_invalid_guest_state" is enabled.
>> This combination allows userspace to feed invalid guest state into the
>> in-kernel emulator.
>
> Yes, you are right. I need to disable unrestricted_guest to reproduce the 
> issue.

I can observe ctxt->exception.vector == 0xff which triggers Dmitry's
report. Do you figure out the reason?

Regards,
Wanpeng Li

>
> I find this is pop instruction emulation issue. According "SDM VOL2,
> chapter INSTRUCTION
> SET REFERENCE. POP—Pop a Value from the Stack"
>
> Protected Mode Exceptions
> #GP(0) If attempt is made to load SS register with NULL segment selector.
>
> This test case hits it but current code doesn't check such case.
> The following patch can fix the issue.
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index abe74f7..e2ac5cc 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1844,6 +1844,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
> int rc;
> struct segmented_address addr;
>
> +   if ( !get_segment_selector(ctxt, VCPU_SREG_SS))
> +   return emulate_gp(ctxt, 0);
> +
> addr.ea = reg_read(ctxt, VCPU_REGS_RSP) & stack_mask(ctxt);
> addr.seg = VCPU_SREG_SS;
> rc = segmented_read(ctxt, addr, dest, len);


Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray

2017-12-08 Thread Byungchul Park

On 12/8/2017 4:25 PM, Dave Chinner wrote:

On Fri, Dec 08, 2017 at 01:45:52PM +0900, Byungchul Park wrote:

On Fri, Dec 08, 2017 at 09:22:16AM +1100, Dave Chinner wrote:

On Thu, Dec 07, 2017 at 11:06:34AM -0500, Theodore Ts'o wrote:

On Wed, Dec 06, 2017 at 06:06:48AM -0800, Matthew Wilcox wrote:

Unfortunately for you, I don't find arguments along the lines of
"lockdep will save us" at all convincing.  lockdep already throws
too many false positives to be useful as a tool that reliably and
accurately points out rare, exciting, complex, intricate locking
problems.


But it does reliably and accurately point out "dude, you forgot to take
the lock".  It's caught a number of real problems in my own testing that
you never got to see.


The problem is that if it has too many false positives --- and it's
gotten *way* worse with the completion callback "feature", people will
just stop using Lockdep as being too annyoing and a waste of developer
time when trying to figure what is a legitimate locking bug versus
lockdep getting confused.

I can't even disable the new Lockdep feature which is throwing
lots of new false positives --- it's just all or nothing.

Dave has just said he's already stopped using Lockdep, as a result.


This is compeltely OT, but FYI I stopped using lockdep a long time
ago.  We've spend orders of magnitude more time and effort to shut
up lockdep false positives in the XFS code than we ever have on
locking problems that lockdep has uncovered. And still lockdep
throws too many false positives on XFS workloads to be useful to me.

But it's more than that: I understand just how much lockdep *doesn't
check* and that means *I know I can't rely on lockdep* for potential
deadlock detection. e.g.  it doesn't cover semaphores, which means


Hello,

I'm careful in saying the following since you seem to feel not good at
crossrelease and even lockdep. Now that cross-release has been
introduced, semaphores can be covered as you might know. Actually, all
general waiters can.


And all it will do is create a whole bunch more work for us XFS guys
to shut up all the the false positive crap that falls out from it
because the locking model we have is far more complex than any of
the lockdep developers thought was necessary to support, just like
happened with the XFS inode annotations all those years ago.

e.g. nobody has ever bothered to ask us what is needed to describe
XFS's semaphore locking model.  If you did that, you'd know that we
nest *thousands* of locked semaphores in compeltely random lock
order during metadata buffer writeback. And that this lock order
does not reflect the actual locking order rules we have for locking
buffers during transactions.

Oh, and you'd also know that a semaphore's lock order and context
can change multiple times during the life time of the buffer.  Say
we free a block and the reallocate it as something else before it is
reclaimed - that buffer now might have a different lock order. Or
maybe we promote a buffer to be a root btree block as a result of a
join - it's now the first buffer in a lock run, rather than a child.
Or we split a tree, and the root is now a node and so no longer is
the first buffer in a lock run. Or that we walk sideways along the
leaf nodes siblings during searches.  IOWs, there is no well defined
static lock ordering at all for buffers - and therefore semaphores -
in XFS at all.

And knowing that, you wouldn't simply mention that lockdep can
support semaphores now as though that is necessary to "make it work"
for XFS.  It's going to be much simpler for us to just turn off
lockdep and ignore whatever crap it sends our way than it is to
spend unplanned weeks of our time to try to make lockdep sorta work
again. Sure, we might get there in the end, but it's likely to take
months, if not years like it did with the XFS inode annotations.


it has zero coverage of the entire XFS metadata buffer subsystem and
the complex locking orders we have for metadata updates.

Put simply: lockdep doesn't provide me with any benefit, so I don't
use it...


Sad..


I don't think you understand. I'll try to explain.

The lockdep infrastructure by itself doesn't make lockdep a useful
tool - it mostly generates false positives because it has no
concept of locking models that don't match it's internal tracking
assumptions and/or limitations.

That means if we can't suppress the false positives, then lockdep is
going to be too noisy to find real problems.  It's taken the XFS
developers months of work over the past 7-8 years to suppress all
the *common* false positives that lockdep throws on XFS. And despite
all that work, there's still too many false positives occuring
because we can't easily suppress them with annotations. IOWs, the
signal to noise ratio is still too low for lockdep to find real
problems.

That's why lockdep isn't useful to me - the noise floor is too high,
and the effort to lower the noise floor further is too great.

This is important, because cross-r

Re: [PATCH v8 3/6] clocksource: stm32: increase min delta value

2017-12-08 Thread Daniel Lezcano
On 14/11/2017 09:52, Benjamin Gaignard wrote:
> The CPU is a CortexM4 @ 200MHZ and the clocks driving
> the timers are at 90MHZ with a min delta at 1 you could
> have an interrupt each 0.01 ms which is really to much.
> By increase it to 0x60 it give more time (around 1 ms)
> to CPU to handle the interrupt.
> 
> Signed-off-by: Benjamin Gaignard 
> ---
>  drivers/clocksource/timer-stm32.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/clocksource/timer-stm32.c 
> b/drivers/clocksource/timer-stm32.c
> index fc61fd1..ae41a19 100644
> --- a/drivers/clocksource/timer-stm32.c
> +++ b/drivers/clocksource/timer-stm32.c
> @@ -36,6 +36,8 @@
>  
>  #define TIM_EGR_UG   BIT(0)
>  
> +#define MIN_DELTA0x60

Explain why 0x60 is a good value.

>  static int stm32_clock_event_shutdown(struct clock_event_device *evt)
>  {
>   struct timer_of *to = to_timer_of(evt);
> @@ -129,7 +131,7 @@ static int __init stm32_clockevent_init(struct 
> device_node *node)
>   writel_relaxed(0, timer_of_base(to) + TIM_SR);
>  
>   clockevents_config_and_register(&to->clkevt,
> - timer_of_period(to), 0x1, max_delta);
> + timer_of_period(to), MIN_DELTA, 
> max_delta);
>  
>   pr_info("%pOF: STM32 clockevent driver initialized (%d bits)\n",
>   node, bits);
> 


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH v8 4/6] clocksource: stm32: only use 32 bits timers

2017-12-08 Thread Daniel Lezcano
On 08/12/2017 10:25, Benjamin Gaignard wrote:
> 2017-12-08 9:34 GMT+01:00 Daniel Lezcano :
>> On 14/11/2017 09:52, Benjamin Gaignard wrote:
>>> The clock driving counters is at 90MHz so the maximum period
>>> for 16 bis counters is around 750 ms
>>
>> 728 us
>>
>>> which is a short period for a clocksource.
>>
>> Which clocksource are you talking about ?
>>
>>> For 32 bits counters this period is close
>>> 47 secondes which is more acceptable.
>>>
>>> This patch remove 16 bits counters support and makes sure that
>>> they won't be probed anymore.
>>
>> Are we talking about clockevent or clocksource?
>>
>> Is this issue present today ? Or is it if we add the clocksource support
>> ? We are talking about clocksource but we change the clockevent code.
>>
>> All this is very confusing.
>>
>> I have a rough idea of what is happening, but it is not up to me to
>> decode and infer from the changes, you need to describe *clearly* the
>> situation.
>>
>>  - What happens if we use a 16bits timer as a clockevent ?
>>  - What happens if we use a 16bits timer as a clocksource ?
>>  - Why is it preferable to remove the support of the 16bits timers
>> instead of downgrading them with the rating ?
> 
> Up to this patch it is only about clockevent, clocksource code is
> introduced in patch 5.
> For the both cases 16bits counter have a a too short period (728us)
> and can't be used
> so downgrading the rating is not a solution.

You have to explain why it is a too short period. I will be happy to see
an example of the issues the user is facing.



> I will change the wording in v9
> 
>>
>>> Signed-off-by: Benjamin Gaignard 
>>
>>> ---
>>>  drivers/clocksource/timer-stm32.c | 26 --
>>>  1 file changed, 12 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/clocksource/timer-stm32.c 
>>> b/drivers/clocksource/timer-stm32.c
>>> index ae41a19..8173bcf 100644
>>> --- a/drivers/clocksource/timer-stm32.c
>>> +++ b/drivers/clocksource/timer-stm32.c
>>> @@ -83,9 +83,9 @@ static irqreturn_t stm32_clock_event_handler(int irq, 
>>> void *dev_id)
>>>  static int __init stm32_clockevent_init(struct device_node *node)
>>>  {
>>>   struct reset_control *rstc;
>>> - unsigned long max_delta;
>>> - int ret, bits, prescaler = 1;
>>> + unsigned long max_arr;
>>>   struct timer_of *to;
>>> + int ret;
>>>
>>>   to = kzalloc(sizeof(*to), GFP_KERNEL);
>>>   if (!to)
>>> @@ -115,29 +115,27 @@ static int __init stm32_clockevent_init(struct 
>>> device_node *node)
>>>
>>>   /* Detect whether the timer is 16 or 32 bits */
>>>   writel_relaxed(~0U, timer_of_base(to) + TIM_ARR);
>>> - max_delta = readl_relaxed(timer_of_base(to) + TIM_ARR);
>>> - if (max_delta == ~0U) {
>>> - prescaler = 1;
>>> - bits = 32;
>>> - } else {
>>> - prescaler = 1024;
>>> - bits = 16;
>>> + max_arr = readl_relaxed(timer_of_base(to) + TIM_ARR);
>>> + if (max_arr != ~0U) {
>>> + pr_err("32 bits timer is needed\n");
>>> + ret = -EINVAL;
>>> + goto deinit;
>>>   }
>>
>> Wrap this in a function:
>>
>> static bool stm32_timer_is_32bits(struct timer_of *to)
>> {
>> return readl_relaxed(timer_of_base(to) + TIM_ARR) == ~0UL;
>> }
>>
>> Then clearly inform the user.
>>
>> if (!stm32_timer_is_32bits(to)) {
>> pr_warn("Timer %pOF is a 16 bits timer\n", node);
>> /* abort the registration or downgrade the timer's rating */
>> }
> 
> Ok I will change that in v9
> 
>>
>>> +
>>>   writel_relaxed(0, timer_of_base(to) + TIM_ARR);
>>>
>>> - writel_relaxed(prescaler - 1, timer_of_base(to) + TIM_PSC);
>>> + writel_relaxed(0, timer_of_base(to) + TIM_PSC);
>>>   writel_relaxed(TIM_EGR_UG, timer_of_base(to) + TIM_EGR);
>>>   writel_relaxed(TIM_DIER_UIE, timer_of_base(to) + TIM_DIER);
>>>   writel_relaxed(0, timer_of_base(to) + TIM_SR);
>>>
>>>   clockevents_config_and_register(&to->clkevt,
>>> - timer_of_period(to), MIN_DELTA, 
>>> max_delta);
>>> -
>>> - pr_info("%pOF: STM32 clockevent driver initialized (%d bits)\n",
>>> - node, bits);
>>> + timer_of_period(to), MIN_DELTA, ~0U);
>>>
>>>   return 0;
>>>
>>> +deinit:
>>> + timer_of_exit(to);
>>
>> Fix this please (timer_of_cleanup).
>>
>> In the future, make sure the patches are git-bisect safe.
>>
>>
>>
>> --
>>   Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:   Facebook |
>>  Twitter |
>>  Blog


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



[PATCH v4 00/12] Krait clocks + Krait CPUfreq

2017-12-08 Thread Sricharan R
Mostly a resend of the v3 posted by Stephen quite some time back [1]
except for few changes.
  Based on reading some feedback from list,
  * Dropped the patch "clk: Add safe switch hook" from v3 [2].
Now this is taken care by patch#10 in this series only for Krait.
  * Dropped the path "clk: Avoid sending high rates to downstream
  clocks during set_rate" from v3 [3].
  * Rebased on top of clk-next.
  * Dropped the DT update from the series. Will send seperately
  * Now with cpufreq-dt+opp supporting voltage scaling, registering the
krait cpu supplies in DT should be sufficient. But one issue is,
the qcom-cpufreq drivers reads the efuse and based on that registers
the opp data and then registers the cpufreq-dt device. So when
cpufreq-dt driver probes and registers the regulator to the OPP framework,
it expects that the opp data for the device should not be registered before
the regulator. Will send a RFC patch removing that check, to find out the
right way of doing it.

These patches provide cpufreq scaling on devices with Krait CPUs.
In Krait CPU designs there's one PLL and two muxes per CPU, allowing
us to switch CPU frequencies independently.

 secondary
 +-++
 | QSB |---+|\
 +-+   || |-+
   |+---|/  |
   ||   +   |
 +-+   ||   |
 | PLL |+---+   |   primary
 +-+|  || +
|  |+-|\   +--+
 +---+  |  |  | \  |  |
 | HFPLL |--+-|  |-| CPU0 |
 +---+  |  || |  | |  |
|  || +-+ | /  +--+
|  |+-| / 2 |-|/
|  |  +-+ +
|  | secondary
|  |+
|  +|\
|   | |-+
+---|/  |   primary
+   | +
+-|\   +--+
 +---+| \  |  |
 | HFPLL ||  |-| CPU1 |
 +---+  | |  | |  |
| +-+ | /  +--+
+-| / 2 |-|/
  +-+ +

To support this in the common clock framework we model the muxes,
dividers, and PLLs as different clocks. CPUfreq only interacts
with the primary mux (farthest right in the diagram). When CPUfreq
sets a rate, the mux code finds the best parent that can provide the rate.
Due to the design, QSB and the top PLL are always a fixed rate and thus
only support one frequency each. These sources provide the lowest
frequencies for the CPUs. The HFPLLs are where we can make the CPU go
faster (GHz range). Sometimes we need to run the HFPLL twice as
fast and divide it by two to get a particular frequency.

When switching rates we can't leave the CPU clocked by the HFPLL because
we need to turn off the output of the PLL when changing its frequency.
This means we have to switch over to the secondary mux and use one of the
fixed sources. This is why we need something like the safe parent patch.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332607.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332615.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332608.html

Sricharan R (2):
  clk: qcom: Add safe switch hook for krait mux clocks
  cpufreq: dt: Reintroduce independent_clocks platform data

Stephen Boyd (10):
  ARM: Add Krait L2 register accessor functions
  clk: mux: Split out register accessors for reuse
  clk: qcom: Add support for High-Frequency PLLs (HFPLLs)
  clk: qcom: Add HFPLL driver
  clk: qcom: Add MSM8960/APQ8064's HFPLLs
  clk: qcom: Add IPQ806X's HFPLLs
  clk: qcom: Add support for Krait clocks
  clk: qcom: Add KPSS ACC/GCC driver
  clk: qcom: Add Krait clock controller driver
  cpufreq: Add module to register cpufreq on Krait CPUs

 .../devicetree/bindings/arm/msm/qcom,kpss-acc.txt  |   7 +
 .../devicetree/bindings/arm/msm/qcom,kpss-gcc.txt  |  28 ++
 .../devicetree/bindings/arm/msm/qcom,pvs.txt   |  38 ++
 .../devicetree/bindings/clock/qcom,hfpll.txt   |  40 ++
 .../devicetree/bindings/clock/qcom,krait-cc.txt|  22 ++
 arch/arm/common/Kconfig|   3 +
 arch/arm/common/Makefile   |   1 +
 arch/arm/common/krait-l2-accessors.c   |  58 +++
 arch/arm/include/asm/krait-l2-accessors.h  |  20 +
 drivers/clk/clk

[PATCH v4 01/12] ARM: Add Krait L2 register accessor functions

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

Krait CPUs have a handful of L2 cache controller registers that
live behind a cp15 based indirection register. First you program
the indirection register (l2cpselr) to point the L2 'window'
register (l2cpdr) at what you want to read/write.  Then you
read/write the 'window' register to do what you want. The
l2cpselr register is not banked per-cpu so we must lock around
accesses to it to prevent other CPUs from re-pointing l2cpdr
underneath us.

Cc: Mark Rutland 
Cc: Russell King 
Cc: Courtney Cavin 
Signed-off-by: Stephen Boyd 
---
 arch/arm/common/Kconfig   |  3 ++
 arch/arm/common/Makefile  |  1 +
 arch/arm/common/krait-l2-accessors.c  | 58 +++
 arch/arm/include/asm/krait-l2-accessors.h | 20 +++
 4 files changed, 82 insertions(+)
 create mode 100644 arch/arm/common/krait-l2-accessors.c
 create mode 100644 arch/arm/include/asm/krait-l2-accessors.h

diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig
index e5ad070..c8e1986 100644
--- a/arch/arm/common/Kconfig
+++ b/arch/arm/common/Kconfig
@@ -7,6 +7,9 @@ config DMABOUNCE
bool
select ZONE_DMA
 
+config KRAIT_L2_ACCESSORS
+   bool
+
 config SHARP_LOCOMO
bool
 
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index 70b4a14..eec6cd1 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -7,6 +7,7 @@ obj-y   += firmware.o
 
 obj-$(CONFIG_SA)   += sa.o
 obj-$(CONFIG_DMABOUNCE)+= dmabounce.o
+obj-$(CONFIG_KRAIT_L2_ACCESSORS) += krait-l2-accessors.o
 obj-$(CONFIG_SHARP_LOCOMO) += locomo.o
 obj-$(CONFIG_SHARP_PARAM)  += sharpsl_param.o
 obj-$(CONFIG_SHARP_SCOOP)  += scoop.o
diff --git a/arch/arm/common/krait-l2-accessors.c 
b/arch/arm/common/krait-l2-accessors.c
new file mode 100644
index 000..5d514bb
--- /dev/null
+++ b/arch/arm/common/krait-l2-accessors.c
@@ -0,0 +1,58 @@
+/*
+ * Copyright (c) 2011-2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+static DEFINE_RAW_SPINLOCK(krait_l2_lock);
+
+void krait_set_l2_indirect_reg(u32 addr, u32 val)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(&krait_l2_lock, flags);
+   /*
+* Select the L2 window by poking l2cpselr, then write to the window
+* via l2cpdr.
+*/
+   asm volatile ("mcr p15, 3, %0, c15, c0, 6 @ l2cpselr" : : "r" (addr));
+   isb();
+   asm volatile ("mcr p15, 3, %0, c15, c0, 7 @ l2cpdr" : : "r" (val));
+   isb();
+
+   raw_spin_unlock_irqrestore(&krait_l2_lock, flags);
+}
+EXPORT_SYMBOL(krait_set_l2_indirect_reg);
+
+u32 krait_get_l2_indirect_reg(u32 addr)
+{
+   u32 val;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(&krait_l2_lock, flags);
+   /*
+* Select the L2 window by poking l2cpselr, then read from the window
+* via l2cpdr.
+*/
+   asm volatile ("mcr p15, 3, %0, c15, c0, 6 @ l2cpselr" : : "r" (addr));
+   isb();
+   asm volatile ("mrc p15, 3, %0, c15, c0, 7 @ l2cpdr" : "=r" (val));
+
+   raw_spin_unlock_irqrestore(&krait_l2_lock, flags);
+
+   return val;
+}
+EXPORT_SYMBOL(krait_get_l2_indirect_reg);
diff --git a/arch/arm/include/asm/krait-l2-accessors.h 
b/arch/arm/include/asm/krait-l2-accessors.h
new file mode 100644
index 000..48fe552
--- /dev/null
+++ b/arch/arm/include/asm/krait-l2-accessors.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (c) 2011-2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __ASMARM_KRAIT_L2_ACCESSORS_H
+#define __ASMARM_KRAIT_L2_ACCESSORS_H
+
+extern void krait_set_l2_indirect_reg(u32 addr, u32 val);
+extern u32 krait_get_l2_indirect_reg(u32 addr);
+
+#endif
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v4 05/12] clk: qcom: Add MSM8960/APQ8064's HFPLLs

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

Describe the HFPLLs present on MSM8960 and APQ8064 devices.

Signed-off-by: Stephen Boyd 
---
 drivers/clk/qcom/gcc-msm8960.c   | 172 +++
 include/dt-bindings/clock/qcom,gcc-msm8960.h |   2 +
 2 files changed, 174 insertions(+)

diff --git a/drivers/clk/qcom/gcc-msm8960.c b/drivers/clk/qcom/gcc-msm8960.c
index eb551c7..809f16a 100644
--- a/drivers/clk/qcom/gcc-msm8960.c
+++ b/drivers/clk/qcom/gcc-msm8960.c
@@ -30,6 +30,7 @@
 #include "clk-pll.h"
 #include "clk-rcg.h"
 #include "clk-branch.h"
+#include "clk-hfpll.h"
 #include "reset.h"
 
 static struct clk_pll pll3 = {
@@ -86,6 +87,164 @@
},
 };
 
+static struct hfpll_data hfpll0_data = {
+   .mode_reg = 0x3200,
+   .l_reg = 0x3208,
+   .m_reg = 0x320c,
+   .n_reg = 0x3210,
+   .config_reg = 0x3204,
+   .status_reg = 0x321c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3214,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll0 = {
+   .d = &hfpll0_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .name = "hfpll0",
+   .ops = &clk_ops_hfpll,
+   .flags = CLK_IGNORE_UNUSED,
+   },
+   .lock = __SPIN_LOCK_UNLOCKED(hfpll0.lock),
+};
+
+static struct hfpll_data hfpll1_8064_data = {
+   .mode_reg = 0x3240,
+   .l_reg = 0x3248,
+   .m_reg = 0x324c,
+   .n_reg = 0x3250,
+   .config_reg = 0x3244,
+   .status_reg = 0x325c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3254,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct hfpll_data hfpll1_data = {
+   .mode_reg = 0x3300,
+   .l_reg = 0x3308,
+   .m_reg = 0x330c,
+   .n_reg = 0x3310,
+   .config_reg = 0x3304,
+   .status_reg = 0x331c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3314,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll1 = {
+   .d = &hfpll1_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .name = "hfpll1",
+   .ops = &clk_ops_hfpll,
+   .flags = CLK_IGNORE_UNUSED,
+   },
+   .lock = __SPIN_LOCK_UNLOCKED(hfpll1.lock),
+};
+
+static struct hfpll_data hfpll2_data = {
+   .mode_reg = 0x3280,
+   .l_reg = 0x3288,
+   .m_reg = 0x328c,
+   .n_reg = 0x3290,
+   .config_reg = 0x3284,
+   .status_reg = 0x329c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3294,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll2 = {
+   .d = &hfpll2_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .name = "hfpll2",
+   .ops = &clk_ops_hfpll,
+   .flags = CLK_IGNORE_UNUSED,
+   },
+   .lock = __SPIN_LOCK_UNLOCKED(hfpll2.lock),
+};
+
+static struct hfpll_data hfpll3_data = {
+   .mode_reg = 0x32c0,
+   .l_reg = 0x32c8,
+   .m_reg = 0x32cc,
+   .n_reg = 0x32d0,
+   .config_reg = 0x32c4,
+   .status_reg = 0x32dc,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x32d4,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll3 = {
+   .d = &hfpll3_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .name = "hfpll3",
+   .ops = &clk_ops_hfpll,
+   .flags = CLK_IGNORE_UNUSED,
+   },
+   .lock = __SPIN_LOCK_UNLOCKED(hfpll3.lock),
+};
+
+static struct hfpll_data hfpll_l2_8064_data = {
+   .mode_reg = 0x3300,
+   .l_reg = 0x3308,
+   .m_reg = 0x330c,
+   .n_reg = 0x3310,
+   .config_reg = 0x3304,
+   .status_reg = 0x331c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3314,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct hfpll_data hfpll_l2_data = {
+   .mode_reg = 0x3400,
+   .l_reg = 0x3408,
+   .m_reg = 0x340c,
+   .n_reg = 0x3410,
+   .config_reg = 0x3404,
+   .status_reg = 0x341c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3414,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll_l2 = {
+   .d = &hfpll_l2_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .paren

[PATCH v4 02/12] clk: mux: Split out register accessors for reuse

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

We want to reuse the logic in clk-mux.c for other clock drivers
that don't use readl as register accessors. Fortunately, there
really isn't much to the mux code besides the table indirection
and quirk flags if you assume any bit shifting and masking has
been done already. Pull that logic out into reusable functions
that operate on an optional table and some flags so that other
drivers can use the same logic.

Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk-mux.c| 75 +++-
 include/linux/clk-provider.h |  9 --
 2 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/clk/clk-mux.c b/drivers/clk/clk-mux.c
index 39cabe1..22ebf99 100644
--- a/drivers/clk/clk-mux.c
+++ b/drivers/clk/clk-mux.c
@@ -26,35 +26,24 @@
  * parent - parent is adjustable through clk_set_parent
  */
 
-static u8 clk_mux_get_parent(struct clk_hw *hw)
+unsigned int clk_mux_get_parent(struct clk_hw *hw, unsigned int val,
+   unsigned int *table, unsigned long flags)
 {
-   struct clk_mux *mux = to_clk_mux(hw);
int num_parents = clk_hw_get_num_parents(hw);
-   u32 val;
-
-   /*
-* FIXME need a mux-specific flag to determine if val is bitwise or 
numeric
-* e.g. sys_clkin_ck's clksel field is 3 bits wide, but ranges from 0x1
-* to 0x7 (index starts at one)
-* OTOH, pmd_trace_clk_mux_ck uses a separate bit for each clock, so
-* val = 0x4 really means "bit 2, index starts at bit 0"
-*/
-   val = clk_readl(mux->reg) >> mux->shift;
-   val &= mux->mask;
 
-   if (mux->table) {
+   if (table) {
int i;
 
for (i = 0; i < num_parents; i++)
-   if (mux->table[i] == val)
+   if (table[i] == val)
return i;
return -EINVAL;
}
 
-   if (val && (mux->flags & CLK_MUX_INDEX_BIT))
+   if (val && (flags & CLK_MUX_INDEX_BIT))
val = ffs(val) - 1;
 
-   if (val && (mux->flags & CLK_MUX_INDEX_ONE))
+   if (val && (flags & CLK_MUX_INDEX_ONE))
val--;
 
if (val >= num_parents)
@@ -62,23 +51,53 @@ static u8 clk_mux_get_parent(struct clk_hw *hw)
 
return val;
 }
+EXPORT_SYMBOL_GPL(clk_mux_get_parent);
 
-static int clk_mux_set_parent(struct clk_hw *hw, u8 index)
+static u8 _clk_mux_get_parent(struct clk_hw *hw)
 {
struct clk_mux *mux = to_clk_mux(hw);
u32 val;
-   unsigned long flags = 0;
 
-   if (mux->table) {
-   index = mux->table[index];
+   /*
+* FIXME need a mux-specific flag to determine if val is bitwise or
+* numeric e.g. sys_clkin_ck's clksel field is 3 bits wide,
+* but ranges from 0x1 to 0x7 (index starts at one)
+* OTOH, pmd_trace_clk_mux_ck uses a separate bit for each clock, so
+* val = 0x4 really means "bit 2, index starts at bit 0"
+*/
+   val = clk_readl(mux->reg) >> mux->shift;
+   val &= mux->mask;
+
+   return clk_mux_get_parent(hw, val, mux->table, mux->flags);
+}
+
+unsigned int clk_mux_reindex(u8 index, unsigned int *table,
+unsigned long flags)
+{
+   unsigned int val = index;
+
+   if (table) {
+   val = table[val];
} else {
-   if (mux->flags & CLK_MUX_INDEX_BIT)
-   index = 1 << index;
+   if (flags & CLK_MUX_INDEX_BIT)
+   val = 1 << index;
 
-   if (mux->flags & CLK_MUX_INDEX_ONE)
-   index++;
+   if (flags & CLK_MUX_INDEX_ONE)
+   val++;
}
 
+   return val;
+}
+EXPORT_SYMBOL_GPL(clk_mux_reindex);
+
+static int clk_mux_set_parent(struct clk_hw *hw, u8 index)
+{
+   struct clk_mux *mux = to_clk_mux(hw);
+   u32 val;
+   unsigned long flags = 0;
+
+   index = clk_mux_reindex(index, mux->table, mux->flags);
+
if (mux->lock)
spin_lock_irqsave(mux->lock, flags);
else
@@ -102,14 +121,14 @@ static int clk_mux_set_parent(struct clk_hw *hw, u8 index)
 }
 
 const struct clk_ops clk_mux_ops = {
-   .get_parent = clk_mux_get_parent,
+   .get_parent = _clk_mux_get_parent,
.set_parent = clk_mux_set_parent,
.determine_rate = __clk_mux_determine_rate,
 };
 EXPORT_SYMBOL_GPL(clk_mux_ops);
 
 const struct clk_ops clk_mux_ro_ops = {
-   .get_parent = clk_mux_get_parent,
+   .get_parent = _clk_mux_get_parent,
 };
 EXPORT_SYMBOL_GPL(clk_mux_ro_ops);
 
@@ -117,7 +136,7 @@ struct clk_hw *clk_hw_register_mux_table(struct device 
*dev, const char *name,
const char * const *parent_names, u8 num_parents,
unsigned long flags,
void __iomem *reg, u8 shift, u32 mask,
-   u8 clk_mux_flags, u32 *table, spinlock_t *lock)
+   u8 clk_mux_flags, unsi

[PATCH v4 09/12] clk: qcom: Add Krait clock controller driver

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

The Krait CPU clocks are made up of a primary mux and secondary
mux for each CPU and the L2, controlled via cp15 accessors. For
Kraits within KPSSv1 each secondary mux accepts a different aux
source, but on KPSSv2 each secondary mux accepts the same aux
source.

Cc: 
Signed-off-by: Stephen Boyd 
---
 .../devicetree/bindings/clock/qcom,krait-cc.txt|  22 ++
 drivers/clk/qcom/Kconfig   |   8 +
 drivers/clk/qcom/Makefile  |   1 +
 drivers/clk/qcom/krait-cc.c| 350 +
 4 files changed, 381 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/qcom,krait-cc.txt
 create mode 100644 drivers/clk/qcom/krait-cc.c

diff --git a/Documentation/devicetree/bindings/clock/qcom,krait-cc.txt 
b/Documentation/devicetree/bindings/clock/qcom,krait-cc.txt
new file mode 100644
index 000..874138f
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/qcom,krait-cc.txt
@@ -0,0 +1,22 @@
+Krait Clock Controller
+
+PROPERTIES
+
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: must be one of:
+   "qcom,krait-cc-v1"
+   "qcom,krait-cc-v2"
+
+- #clock-cells:
+   Usage: required
+   Value type: 
+   Definition: must be 1
+
+Example:
+
+   kraitcc: clock-controller {
+   compatible = "qcom,krait-cc-v1";
+   #clock-cells = <1>;
+   };
diff --git a/drivers/clk/qcom/Kconfig b/drivers/clk/qcom/Kconfig
index 17dcb88..de6b60d 100644
--- a/drivers/clk/qcom/Kconfig
+++ b/drivers/clk/qcom/Kconfig
@@ -222,6 +222,14 @@ config KPSS_XCC
  if you want to support CPU frequency scaling on devices such
  as MSM8960, APQ8064, etc.
 
+config KRAITCC
+   tristate "Krait Clock Controller"
+   depends on COMMON_CLK_QCOM && ARM
+   select KRAIT_CLOCKS
+   help
+ Support for the Krait CPU clocks on Qualcomm devices.
+ Say Y if you want to support CPU frequency scaling.
+
 config KRAIT_CLOCKS
bool
select KRAIT_L2_ACCESSORS
diff --git a/drivers/clk/qcom/Makefile b/drivers/clk/qcom/Makefile
index 7ad2302..6e6c700 100644
--- a/drivers/clk/qcom/Makefile
+++ b/drivers/clk/qcom/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_QCOM_CLK_SMD_RPM) += clk-smd-rpm.o
 obj-$(CONFIG_SPMI_PMIC_CLKDIV) += clk-spmi-pmic-div.o
 obj-$(CONFIG_KPSS_XCC) += kpss-xcc.o
 obj-$(CONFIG_QCOM_HFPLL) += hfpll.o
+obj-$(CONFIG_KRAITCC) += krait-cc.o
diff --git a/drivers/clk/qcom/krait-cc.c b/drivers/clk/qcom/krait-cc.c
new file mode 100644
index 000..f5ffb1a
--- /dev/null
+++ b/drivers/clk/qcom/krait-cc.c
@@ -0,0 +1,350 @@
+/* Copyright (c) 2013-2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "clk-krait.h"
+
+static unsigned int sec_mux_map[] = {
+   2,
+   0,
+};
+
+static unsigned int pri_mux_map[] = {
+   1,
+   2,
+   0,
+};
+
+static int
+krait_add_div(struct device *dev, int id, const char *s, unsigned int offset)
+{
+   struct krait_div2_clk *div;
+   struct clk_init_data init = {
+   .num_parents = 1,
+   .ops = &krait_div2_clk_ops,
+   .flags = CLK_SET_RATE_PARENT,
+   };
+   const char *p_names[1];
+   struct clk *clk;
+
+   div = devm_kzalloc(dev, sizeof(*div), GFP_KERNEL);
+   if (!div)
+   return -ENOMEM;
+
+   div->width = 2;
+   div->shift = 6;
+   div->lpl = id >= 0;
+   div->offset = offset;
+   div->hw.init = &init;
+
+   init.name = kasprintf(GFP_KERNEL, "hfpll%s_div", s);
+   if (!init.name)
+   return -ENOMEM;
+
+   init.parent_names = p_names;
+   p_names[0] = kasprintf(GFP_KERNEL, "hfpll%s", s);
+   if (!p_names[0]) {
+   kfree(init.name);
+   return -ENOMEM;
+   }
+
+   clk = devm_clk_register(dev, &div->hw);
+   kfree(p_names[0]);
+   kfree(init.name);
+
+   return PTR_ERR_OR_ZERO(clk);
+}
+
+static int
+krait_add_sec_mux(struct device *dev, int id, const char *s,
+ unsigned int offset, bool unique_aux)
+{
+   struct krait_mux_clk *mux;
+   static const char *sec_mux_list[] = {
+   "acpu_aux",
+   "qsb",
+   };
+   struct clk_init_data init = {
+   .parent_names = sec_mux_list,
+   .num_parents =

[PATCH v4 08/12] clk: qcom: Add KPSS ACC/GCC driver

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

The ACC and GCC regions present in KPSSv1 contain registers to
control clocks and power to each Krait CPU and L2. For CPUfreq
purposes probe these devices and expose a mux clock that chooses
between PXO and PLL8.

Cc: 
Signed-off-by: Stephen Boyd 
---
 .../devicetree/bindings/arm/msm/qcom,kpss-acc.txt  |  7 ++
 .../devicetree/bindings/arm/msm/qcom,kpss-gcc.txt  | 28 +++
 drivers/clk/qcom/Kconfig   |  8 ++
 drivers/clk/qcom/Makefile  |  1 +
 drivers/clk/qcom/kpss-xcc.c| 96 ++
 5 files changed, 140 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/msm/qcom,kpss-gcc.txt
 create mode 100644 drivers/clk/qcom/kpss-xcc.c

diff --git a/Documentation/devicetree/bindings/arm/msm/qcom,kpss-acc.txt 
b/Documentation/devicetree/bindings/arm/msm/qcom,kpss-acc.txt
index 1333db9..382a574 100644
--- a/Documentation/devicetree/bindings/arm/msm/qcom,kpss-acc.txt
+++ b/Documentation/devicetree/bindings/arm/msm/qcom,kpss-acc.txt
@@ -21,10 +21,17 @@ PROPERTIES
the register region. An optional second element specifies
the base address and size of the alias register region.
 
+- clock-output-names:
+   Usage: optional
+   Value type: 
+   Definition: Name of the output clock. Typically acpuX_aux where X is a
+   CPU number starting at 0.
+
 Example:
 
clock-controller@2088000 {
compatible = "qcom,kpss-acc-v2";
reg = <0x02088000 0x1000>,
  <0x02008000 0x1000>;
+   clock-output-names = "acpu0_aux";
};
diff --git a/Documentation/devicetree/bindings/arm/msm/qcom,kpss-gcc.txt 
b/Documentation/devicetree/bindings/arm/msm/qcom,kpss-gcc.txt
new file mode 100644
index 000..d1e12f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/msm/qcom,kpss-gcc.txt
@@ -0,0 +1,28 @@
+Krait Processor Sub-system (KPSS) Global Clock Controller (GCC)
+
+PROPERTIES
+
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: should be one of:
+   "qcom,kpss-gcc"
+
+- reg:
+   Usage: required
+   Value type: 
+   Definition: base address and size of the register region
+
+- clock-output-names:
+   Usage: required
+   Value type: 
+   Definition: Name of the output clock. Typically acpu_l2_aux indicating
+   an L2 cache auxiliary clock.
+
+Example:
+
+   l2cc: clock-controller@2011000 {
+   compatible = "qcom,kpss-gcc";
+   reg = <0x2011000 0x1000>;
+   clock-output-names = "acpu_l2_aux";
+   };
diff --git a/drivers/clk/qcom/Kconfig b/drivers/clk/qcom/Kconfig
index 6592595..17dcb88 100644
--- a/drivers/clk/qcom/Kconfig
+++ b/drivers/clk/qcom/Kconfig
@@ -214,6 +214,14 @@ config QCOM_HFPLL
  Say Y if you want to support CPU frequency scaling on devices
  such as MSM8974, APQ8084, etc.
 
+config KPSS_XCC
+   tristate "KPSS Clock Controller"
+   depends on COMMON_CLK_QCOM
+   help
+ Support for the Krait ACC and GCC clock controllers. Say Y
+ if you want to support CPU frequency scaling on devices such
+ as MSM8960, APQ8064, etc.
+
 config KRAIT_CLOCKS
bool
select KRAIT_L2_ACCESSORS
diff --git a/drivers/clk/qcom/Makefile b/drivers/clk/qcom/Makefile
index b6741b0..7ad2302 100644
--- a/drivers/clk/qcom/Makefile
+++ b/drivers/clk/qcom/Makefile
@@ -37,4 +37,5 @@ obj-$(CONFIG_MSM_MMCC_8996) += mmcc-msm8996.o
 obj-$(CONFIG_QCOM_CLK_RPM) += clk-rpm.o
 obj-$(CONFIG_QCOM_CLK_SMD_RPM) += clk-smd-rpm.o
 obj-$(CONFIG_SPMI_PMIC_CLKDIV) += clk-spmi-pmic-div.o
+obj-$(CONFIG_KPSS_XCC) += kpss-xcc.o
 obj-$(CONFIG_QCOM_HFPLL) += hfpll.o
diff --git a/drivers/clk/qcom/kpss-xcc.c b/drivers/clk/qcom/kpss-xcc.c
new file mode 100644
index 000..bed51c6
--- /dev/null
+++ b/drivers/clk/qcom/kpss-xcc.c
@@ -0,0 +1,96 @@
+/* Copyright (c) 2014-2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const char *aux_parents[] = {
+   "pll8_vote",
+   "pxo",
+};
+
+static unsigned int aux_parent_map[] = {
+   3,
+   0,
+};
+
+static const struct of_device_id kpss_xcc_match_table[] = {
+   { .compatible = "qcom,kpss-acc-v1", .data = (void *)1UL },
+   { .compatible = "qcom,kpss-gcc" },
+   {}
+};
+MODULE_DEVI

Re: [PATCH v8 4/6] clocksource: stm32: only use 32 bits timers

2017-12-08 Thread Benjamin Gaignard
2017-12-08 10:29 GMT+01:00 Daniel Lezcano :
> On 08/12/2017 10:25, Benjamin Gaignard wrote:
>> 2017-12-08 9:34 GMT+01:00 Daniel Lezcano :
>>> On 14/11/2017 09:52, Benjamin Gaignard wrote:
 The clock driving counters is at 90MHz so the maximum period
 for 16 bis counters is around 750 ms
>>>
>>> 728 us
>>>
 which is a short period for a clocksource.
>>>
>>> Which clocksource are you talking about ?
>>>
 For 32 bits counters this period is close
 47 secondes which is more acceptable.

 This patch remove 16 bits counters support and makes sure that
 they won't be probed anymore.
>>>
>>> Are we talking about clockevent or clocksource?
>>>
>>> Is this issue present today ? Or is it if we add the clocksource support
>>> ? We are talking about clocksource but we change the clockevent code.
>>>
>>> All this is very confusing.
>>>
>>> I have a rough idea of what is happening, but it is not up to me to
>>> decode and infer from the changes, you need to describe *clearly* the
>>> situation.
>>>
>>>  - What happens if we use a 16bits timer as a clockevent ?
>>>  - What happens if we use a 16bits timer as a clocksource ?
>>>  - Why is it preferable to remove the support of the 16bits timers
>>> instead of downgrading them with the rating ?
>>
>> Up to this patch it is only about clockevent, clocksource code is
>> introduced in patch 5.
>> For the both cases 16bits counter have a a too short period (728us)
>> and can't be used
>> so downgrading the rating is not a solution.
>
> You have to explain why it is a too short period. I will be happy to see
> an example of the issues the user is facing.

This a very basic issue, the kernel doesn't boot at all...

>
>
>
>> I will change the wording in v9
>>
>>>
 Signed-off-by: Benjamin Gaignard 
>>>
 ---
  drivers/clocksource/timer-stm32.c | 26 --
  1 file changed, 12 insertions(+), 14 deletions(-)

 diff --git a/drivers/clocksource/timer-stm32.c 
 b/drivers/clocksource/timer-stm32.c
 index ae41a19..8173bcf 100644
 --- a/drivers/clocksource/timer-stm32.c
 +++ b/drivers/clocksource/timer-stm32.c
 @@ -83,9 +83,9 @@ static irqreturn_t stm32_clock_event_handler(int irq, 
 void *dev_id)
  static int __init stm32_clockevent_init(struct device_node *node)
  {
   struct reset_control *rstc;
 - unsigned long max_delta;
 - int ret, bits, prescaler = 1;
 + unsigned long max_arr;
   struct timer_of *to;
 + int ret;

   to = kzalloc(sizeof(*to), GFP_KERNEL);
   if (!to)
 @@ -115,29 +115,27 @@ static int __init stm32_clockevent_init(struct 
 device_node *node)

   /* Detect whether the timer is 16 or 32 bits */
   writel_relaxed(~0U, timer_of_base(to) + TIM_ARR);
 - max_delta = readl_relaxed(timer_of_base(to) + TIM_ARR);
 - if (max_delta == ~0U) {
 - prescaler = 1;
 - bits = 32;
 - } else {
 - prescaler = 1024;
 - bits = 16;
 + max_arr = readl_relaxed(timer_of_base(to) + TIM_ARR);
 + if (max_arr != ~0U) {
 + pr_err("32 bits timer is needed\n");
 + ret = -EINVAL;
 + goto deinit;
   }
>>>
>>> Wrap this in a function:
>>>
>>> static bool stm32_timer_is_32bits(struct timer_of *to)
>>> {
>>> return readl_relaxed(timer_of_base(to) + TIM_ARR) == ~0UL;
>>> }
>>>
>>> Then clearly inform the user.
>>>
>>> if (!stm32_timer_is_32bits(to)) {
>>> pr_warn("Timer %pOF is a 16 bits timer\n", node);
>>> /* abort the registration or downgrade the timer's rating */
>>> }
>>
>> Ok I will change that in v9
>>
>>>
 +
   writel_relaxed(0, timer_of_base(to) + TIM_ARR);

 - writel_relaxed(prescaler - 1, timer_of_base(to) + TIM_PSC);
 + writel_relaxed(0, timer_of_base(to) + TIM_PSC);
   writel_relaxed(TIM_EGR_UG, timer_of_base(to) + TIM_EGR);
   writel_relaxed(TIM_DIER_UIE, timer_of_base(to) + TIM_DIER);
   writel_relaxed(0, timer_of_base(to) + TIM_SR);

   clockevents_config_and_register(&to->clkevt,
 - timer_of_period(to), MIN_DELTA, 
 max_delta);
 -
 - pr_info("%pOF: STM32 clockevent driver initialized (%d bits)\n",
 - node, bits);
 + timer_of_period(to), MIN_DELTA, ~0U);

   return 0;

 +deinit:
 + timer_of_exit(to);
>>>
>>> Fix this please (timer_of_cleanup).
>>>
>>> In the future, make sure the patches are git-bisect safe.
>>>
>>>
>>>
>>> --
>>>   Linaro.org │ Open source software for ARM SoCs
>>>
>>> Follow Linaro:   Facebook |
>>>  Twitter |
>>>  Blog
>
>
> --
>  

[PATCH v4 10/12] clk: qcom: Add safe switch hook for krait mux clocks

2017-12-08 Thread Sricharan R
When the Hfplls are reprogrammed during the rate change,
the primary muxes which are sourced from the same hfpll
for higher frequencies, needs to be switched to the 'safe
secondary mux' as the parent for that small window. This
is done by registering a clk notifier for the muxes and
switching to the safe parent in the PRE_RATE_CHANGE notifier
and back to the original parent in the POST_RATE_CHANGE notifier.

Signed-off-by: Sricharan R 
---
 drivers/clk/qcom/clk-krait.c |  2 ++
 drivers/clk/qcom/clk-krait.h |  3 +++
 drivers/clk/qcom/krait-cc.c  | 56 
 3 files changed, 61 insertions(+)

diff --git a/drivers/clk/qcom/clk-krait.c b/drivers/clk/qcom/clk-krait.c
index 6099307..1627a6e 100644
--- a/drivers/clk/qcom/clk-krait.c
+++ b/drivers/clk/qcom/clk-krait.c
@@ -60,6 +60,8 @@ static int krait_mux_set_parent(struct clk_hw *hw, u8 index)
if (__clk_is_enabled(hw->clk))
__krait_mux_set_sel(mux, sel);
 
+   mux->reparent = true;
+
return 0;
 }
 
diff --git a/drivers/clk/qcom/clk-krait.h b/drivers/clk/qcom/clk-krait.h
index af89782..f9e1279 100644
--- a/drivers/clk/qcom/clk-krait.h
+++ b/drivers/clk/qcom/clk-krait.h
@@ -23,6 +23,9 @@ struct krait_mux_clk {
u32 shift;
u32 en_mask;
boollpl;
+   u8  safe_sel;
+   u8  old_index;
+   boolreparent;
 
struct clk_hw   hw;
struct notifier_block   clk_nb;
diff --git a/drivers/clk/qcom/krait-cc.c b/drivers/clk/qcom/krait-cc.c
index f5ffb1a..ec899ad 100644
--- a/drivers/clk/qcom/krait-cc.c
+++ b/drivers/clk/qcom/krait-cc.c
@@ -35,6 +35,49 @@
0,
 };
 
+/*
+ * Notifier function for switching the muxes to safe parent
+ * while the hfpll is getting reprogrammed.
+ */
+static int krait_notifier_cb(struct notifier_block *nb,
+unsigned long event,
+void *data)
+{
+   int ret = 0;
+   struct krait_mux_clk *mux = container_of(nb, struct krait_mux_clk,
+clk_nb);
+   /* Switch to safe parent */
+   if (event == PRE_RATE_CHANGE) {
+   mux->old_index = krait_mux_clk_ops.get_parent(&mux->hw);
+   ret = krait_mux_clk_ops.set_parent(&mux->hw, mux->safe_sel);
+   mux->reparent = false;
+   /*
+* By the time POST_RATE_CHANGE notifier is called,
+* clk framework itself would have changed the parent for the new rate.
+* Only otherwise, put back to the old parent.
+*/
+   } else if (event == POST_RATE_CHANGE) {
+   if (!mux->reparent)
+   ret = krait_mux_clk_ops.set_parent(&mux->hw,
+  mux->old_index);
+   }
+
+   return notifier_from_errno(ret);
+}
+
+static int krait_notifier_register(struct device *dev, struct clk *clk,
+  struct krait_mux_clk *mux)
+{
+   int ret = 0;
+
+   mux->clk_nb.notifier_call = krait_notifier_cb;
+   ret = clk_notifier_register(clk, &mux->clk_nb);
+   if (ret)
+   dev_err(dev, "failed to register clock notifier: %d\n", ret);
+
+   return ret;
+}
+
 static int
 krait_add_div(struct device *dev, int id, const char *s, unsigned int offset)
 {
@@ -79,6 +122,7 @@
 krait_add_sec_mux(struct device *dev, int id, const char *s,
  unsigned int offset, bool unique_aux)
 {
+   int ret;
struct krait_mux_clk *mux;
static const char *sec_mux_list[] = {
"acpu_aux",
@@ -102,6 +146,7 @@
mux->shift = 2;
mux->parent_map = sec_mux_map;
mux->hw.init = &init;
+   mux->safe_sel = 0;
 
init.name = kasprintf(GFP_KERNEL, "krait%s_sec_mux", s);
if (!init.name)
@@ -117,6 +162,11 @@
 
clk = devm_clk_register(dev, &mux->hw);
 
+   ret = krait_notifier_register(dev, clk, mux);
+   if (ret)
+   goto unique_aux;
+
+unique_aux:
if (unique_aux)
kfree(sec_mux_list[0]);
 err_aux:
@@ -128,6 +178,7 @@
 krait_add_pri_mux(struct device *dev, int id, const char *s,
  unsigned int offset)
 {
+   int ret;
struct krait_mux_clk *mux;
const char *p_names[3];
struct clk_init_data init = {
@@ -148,6 +199,7 @@
mux->lpl = id >= 0;
mux->parent_map = pri_mux_map;
mux->hw.init = &init;
+   mux->safe_sel = 2;
 
init.name = kasprintf(GFP_KERNEL, "krait%s_pri_mux", s);
if (!init.name)
@@ -173,6 +225,10 @@
 
clk = devm_clk_register(dev, &mux->hw);
 
+   ret = krait_notifier_register(dev, clk, mux);
+   if (ret)
+   goto err_p3;
+err_p3:
kfree(p_names[2]);
 err_p2:
kfree(p_names[1]);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The 

[PATCH v4 07/12] clk: qcom: Add support for Krait clocks

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

The Krait clocks are made up of a series of muxes and a divider
that choose between a fixed rate clock and dedicated HFPLLs for
each CPU. Instead of using mmio accesses to remux parents, the
Krait implementation exposes the remux control via cp15
registers. Support these clocks.

Signed-off-by: Stephen Boyd 
---
 drivers/clk/qcom/Kconfig |   4 ++
 drivers/clk/qcom/Makefile|   1 +
 drivers/clk/qcom/clk-krait.c | 134 +++
 drivers/clk/qcom/clk-krait.h |  48 
 4 files changed, 187 insertions(+)
 create mode 100644 drivers/clk/qcom/clk-krait.c
 create mode 100644 drivers/clk/qcom/clk-krait.h

diff --git a/drivers/clk/qcom/Kconfig b/drivers/clk/qcom/Kconfig
index 6c811bd..6592595 100644
--- a/drivers/clk/qcom/Kconfig
+++ b/drivers/clk/qcom/Kconfig
@@ -213,3 +213,7 @@ config QCOM_HFPLL
  Support for the high-frequency PLLs present on Qualcomm devices.
  Say Y if you want to support CPU frequency scaling on devices
  such as MSM8974, APQ8084, etc.
+
+config KRAIT_CLOCKS
+   bool
+   select KRAIT_L2_ACCESSORS
diff --git a/drivers/clk/qcom/Makefile b/drivers/clk/qcom/Makefile
index 4a4bf38..b6741b0 100644
--- a/drivers/clk/qcom/Makefile
+++ b/drivers/clk/qcom/Makefile
@@ -10,6 +10,7 @@ clk-qcom-y += clk-rcg2.o
 clk-qcom-y += clk-branch.o
 clk-qcom-y += clk-regmap-divider.o
 clk-qcom-y += clk-regmap-mux.o
+clk-qcom-$(CONFIG_KRAIT_CLOCKS) += clk-krait.o
 clk-qcom-y += clk-hfpll.o
 clk-qcom-y += reset.o
 clk-qcom-$(CONFIG_QCOM_GDSC) += gdsc.o
diff --git a/drivers/clk/qcom/clk-krait.c b/drivers/clk/qcom/clk-krait.c
new file mode 100644
index 000..6099307
--- /dev/null
+++ b/drivers/clk/qcom/clk-krait.c
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2013-2014, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "clk-krait.h"
+
+/* Secondary and primary muxes share the same cp15 register */
+static DEFINE_SPINLOCK(krait_clock_reg_lock);
+
+#define LPL_SHIFT  8
+static void __krait_mux_set_sel(struct krait_mux_clk *mux, int sel)
+{
+   unsigned long flags;
+   u32 regval;
+
+   spin_lock_irqsave(&krait_clock_reg_lock, flags);
+   regval = krait_get_l2_indirect_reg(mux->offset);
+   regval &= ~(mux->mask << mux->shift);
+   regval |= (sel & mux->mask) << mux->shift;
+   if (mux->lpl) {
+   regval &= ~(mux->mask << (mux->shift + LPL_SHIFT));
+   regval |= (sel & mux->mask) << (mux->shift + LPL_SHIFT);
+   }
+   krait_set_l2_indirect_reg(mux->offset, regval);
+   spin_unlock_irqrestore(&krait_clock_reg_lock, flags);
+
+   /* Wait for switch to complete. */
+   mb();
+   udelay(1);
+}
+
+static int krait_mux_set_parent(struct clk_hw *hw, u8 index)
+{
+   struct krait_mux_clk *mux = to_krait_mux_clk(hw);
+   u32 sel;
+
+   sel = clk_mux_reindex(index, mux->parent_map, 0);
+   mux->en_mask = sel;
+   /* Don't touch mux if CPU is off as it won't work */
+   if (__clk_is_enabled(hw->clk))
+   __krait_mux_set_sel(mux, sel);
+
+   return 0;
+}
+
+static u8 krait_mux_get_parent(struct clk_hw *hw)
+{
+   struct krait_mux_clk *mux = to_krait_mux_clk(hw);
+   u32 sel;
+
+   sel = krait_get_l2_indirect_reg(mux->offset);
+   sel >>= mux->shift;
+   sel &= mux->mask;
+   mux->en_mask = sel;
+
+   return clk_mux_get_parent(hw, sel, mux->parent_map, 0);
+}
+
+const struct clk_ops krait_mux_clk_ops = {
+   .set_parent = krait_mux_set_parent,
+   .get_parent = krait_mux_get_parent,
+   .determine_rate = __clk_mux_determine_rate_closest,
+};
+EXPORT_SYMBOL_GPL(krait_mux_clk_ops);
+
+/* The divider can divide by 2, 4, 6 and 8. But we only really need div-2. */
+static long krait_div2_round_rate(struct clk_hw *hw, unsigned long rate,
+ unsigned long *parent_rate)
+{
+   *parent_rate = clk_hw_round_rate(clk_hw_get_parent(hw), rate * 2);
+   return DIV_ROUND_UP(*parent_rate, 2);
+}
+
+static int krait_div2_set_rate(struct clk_hw *hw, unsigned long rate,
+  unsigned long parent_rate)
+{
+   struct krait_div2_clk *d = to_krait_div2_clk(hw);
+   unsigned long flags;
+   u32 val;
+   u32 mask = BIT(d->width) - 1;
+
+   if (d->lpl)
+   mask = mask << (d->shift + LPL_SHIFT) | mask << d->shift;
+
+   spin_lock_irqsave

[PATCH v4 06/12] clk: qcom: Add IPQ806X's HFPLLs

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

Describe the HFPLLs present on IPQ806X devices.

Signed-off-by: Stephen Boyd 
---
 drivers/clk/qcom/gcc-ipq806x.c | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/drivers/clk/qcom/gcc-ipq806x.c b/drivers/clk/qcom/gcc-ipq806x.c
index 28eb200..d571cf8 100644
--- a/drivers/clk/qcom/gcc-ipq806x.c
+++ b/drivers/clk/qcom/gcc-ipq806x.c
@@ -30,6 +30,7 @@
 #include "clk-pll.h"
 #include "clk-rcg.h"
 #include "clk-branch.h"
+#include "clk-hfpll.h"
 #include "reset.h"
 
 static struct clk_pll pll0 = {
@@ -113,6 +114,84 @@
},
 };
 
+static struct hfpll_data hfpll0_data = {
+   .mode_reg = 0x3200,
+   .l_reg = 0x3208,
+   .m_reg = 0x320c,
+   .n_reg = 0x3210,
+   .config_reg = 0x3204,
+   .status_reg = 0x321c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3214,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll0 = {
+   .d = &hfpll0_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .name = "hfpll0",
+   .ops = &clk_ops_hfpll,
+   .flags = CLK_IGNORE_UNUSED,
+   },
+   .lock = __SPIN_LOCK_UNLOCKED(hfpll0.lock),
+};
+
+static struct hfpll_data hfpll1_data = {
+   .mode_reg = 0x3240,
+   .l_reg = 0x3248,
+   .m_reg = 0x324c,
+   .n_reg = 0x3250,
+   .config_reg = 0x3244,
+   .status_reg = 0x325c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3314,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll1 = {
+   .d = &hfpll1_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .name = "hfpll1",
+   .ops = &clk_ops_hfpll,
+   .flags = CLK_IGNORE_UNUSED,
+   },
+   .lock = __SPIN_LOCK_UNLOCKED(hfpll1.lock),
+};
+
+static struct hfpll_data hfpll_l2_data = {
+   .mode_reg = 0x3300,
+   .l_reg = 0x3308,
+   .m_reg = 0x330c,
+   .n_reg = 0x3310,
+   .config_reg = 0x3304,
+   .status_reg = 0x331c,
+   .config_val = 0x7845c665,
+   .droop_reg = 0x3314,
+   .droop_val = 0x0108c000,
+   .min_rate = 6UL,
+   .max_rate = 18UL,
+};
+
+static struct clk_hfpll hfpll_l2 = {
+   .d = &hfpll_l2_data,
+   .clkr.hw.init = &(struct clk_init_data){
+   .parent_names = (const char *[]){ "pxo" },
+   .num_parents = 1,
+   .name = "hfpll_l2",
+   .ops = &clk_ops_hfpll,
+   .flags = CLK_IGNORE_UNUSED,
+   },
+   .lock = __SPIN_LOCK_UNLOCKED(hfpll_l2.lock),
+};
+
 static struct clk_pll pll14 = {
.l_reg = 0x31c4,
.m_reg = 0x31c8,
@@ -2800,6 +2879,9 @@ enum {
[UBI32_CORE2_CLK_SRC] = &ubi32_core2_src_clk.clkr,
[NSSTCM_CLK_SRC] = &nss_tcm_src.clkr,
[NSSTCM_CLK] = &nss_tcm_clk.clkr,
+   [PLL9] = &hfpll0.clkr,
+   [PLL10] = &hfpll1.clkr,
+   [PLL12] = &hfpll_l2.clkr,
 };
 
 static const struct qcom_reset_map gcc_ipq806x_resets[] = {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v4 11/12] cpufreq: Add module to register cpufreq on Krait CPUs

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

Register a cpufreq-generic device whenever we detect that a
"qcom,krait" compatible CPU is present in DT.

Cc: 
Signed-off-by: Stephen Boyd 
---
 .../devicetree/bindings/arm/msm/qcom,pvs.txt   |  38 
 drivers/cpufreq/Kconfig.arm|   9 +
 drivers/cpufreq/Makefile   |   1 +
 drivers/cpufreq/qcom-cpufreq.c | 204 +
 4 files changed, 252 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/msm/qcom,pvs.txt
 create mode 100644 drivers/cpufreq/qcom-cpufreq.c

diff --git a/Documentation/devicetree/bindings/arm/msm/qcom,pvs.txt 
b/Documentation/devicetree/bindings/arm/msm/qcom,pvs.txt
new file mode 100644
index 000..e7cb104
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/msm/qcom,pvs.txt
@@ -0,0 +1,38 @@
+Qualcomm Process Voltage Scaling Tables
+
+The node name is required to be "qcom,pvs". There shall only be one
+such node present in the root of the tree.
+
+PROPERTIES
+
+- qcom,pvs-format-a or qcom,pvs-format-b:
+   Usage: required
+   Value type: 
+   Definition: Indicates the format of qcom,speedX-pvsY-bin-vZ properties.
+   If qcom,pvs-format-a is used the table is two columns
+   (frequency and voltage in that order). If qcom,pvs-format-b 
is used the table is three columns (frequency, voltage,
+   and current in that order).
+
+- qcom,speedX-pvsY-bin-vZ:
+   Usage: required
+   Value type: 
+   Definition: The PVS table corresponding to the speed bin X, pvs bin Y,
+   and version Z.
+Example:
+
+   qcom,pvs {
+   qcom,pvs-format-a;
+   qcom,speed0-pvs0-bin-v0 =
+   <  38400  95 >,
+   <  48600  975000 >,
+   <  59400 100 >,
+   <  70200 1025000 >,
+   <  81000 1075000 >,
+   <  91800 110 >,
+   < 102600 1125000 >,
+   < 113400 1175000 >,
+   < 124200 120 >,
+   < 135000 1225000 >,
+   < 145800 1237500 >,
+   < 151200 125 >;
+   };
diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index bdce448..60f28e7 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -100,6 +100,15 @@ config ARM_OMAP2PLUS_CPUFREQ
depends on ARCH_OMAP2PLUS
default ARCH_OMAP2PLUS
 
+config ARM_QCOM_CPUFREQ
+   tristate "Qualcomm based"
+   depends on ARCH_QCOM
+   select PM_OPP
+   help
+ This adds the CPUFreq driver for Qualcomm SoC based boards.
+
+ If in doubt, say N.
+
 config ARM_S3C_CPUFREQ
bool
help
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 812f9e0..1496464 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_ARM_MEDIATEK_CPUFREQ)+= mediatek-cpufreq.o
 obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)+= omap-cpufreq.o
 obj-$(CONFIG_ARM_PXA2xx_CPUFREQ)   += pxa2xx-cpufreq.o
 obj-$(CONFIG_PXA3xx)   += pxa3xx-cpufreq.o
+obj-$(CONFIG_ARM_QCOM_CPUFREQ) += qcom-cpufreq.o
 obj-$(CONFIG_ARM_S3C24XX_CPUFREQ)  += s3c24xx-cpufreq.o
 obj-$(CONFIG_ARM_S3C24XX_CPUFREQ_DEBUGFS) += s3c24xx-cpufreq-debugfs.o
 obj-$(CONFIG_ARM_S3C2410_CPUFREQ)  += s3c2410-cpufreq.o
diff --git a/drivers/cpufreq/qcom-cpufreq.c b/drivers/cpufreq/qcom-cpufreq.c
new file mode 100644
index 000..95ed2ba
--- /dev/null
+++ b/drivers/cpufreq/qcom-cpufreq.c
@@ -0,0 +1,204 @@
+/* Copyright (c) 2014, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "cpufreq-dt.h"
+
+static void __init get_krait_bin_format_a(int *speed, int *pvs, int *pvs_ver)
+{
+   void __iomem *base;
+   u32 pte_efuse;
+
+   *speed = *pvs = *pvs_ver = 0;
+
+   base = ioremap(0x007000c0, 4);
+   if (!base) {
+   pr_warn("Unable to read efuse data. Defaulting to 0!\n");
+   return;
+   }
+
+   pte_efuse = readl_relaxed(base);
+   iounmap(base);
+
+   *speed = pte_efuse & 0xf;
+   if (*speed == 0xf)
+   *speed = (pte_efuse >> 4) & 0xf;
+
+   if (*speed == 0

Re: [PATCH v5 2/4] clk: meson-axg: add clocks dt-bindings required header

2017-12-08 Thread Jerome Brunet
On Thu, 2017-12-07 at 16:10 -0600, Rob Herring wrote:
> On Thu, Dec 07, 2017 at 05:52:58PM +0800, Yixun Lan wrote:
> > From: Qiufang Dai 
> > 
> > Add the required header for the clocks ID dt-bindings
> > exported from various subsystem in the Meson-AXG SoC.
> > 
> > Signed-off-by: Qiufang Dai 
> > Signed-off-by: Yixun Lan 
> > ---
> >  include/dt-bindings/clock/axg-clkc.h | 71
> > 
> >  1 file changed, 71 insertions(+)
> >  create mode 100644 include/dt-bindings/clock/axg-clkc.h
> 
> Please add acks when posting new versions.
> 
> Rob

Yixun, please be consistent about this. Maintainers are not going to dig
through your previous revision to collect Tags

I believe Neil acked patch 3, didn't he ?

Please resend your series with the Tags collected
If not already done, please make sure patches 1-3 apply on top of rc1 and patch
4 on Kevin's dt64 branch.

Thanks
Jerome



[GIT PULL] xen: fixes for 4.15-rc3

2017-12-08 Thread Juergen Gross
Linus,

Please git pull the following tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
for-linus-4.15-rc3-tag

xen: fixes for 4.15-rc3

Those are just two small fixes for the nev pvcalls frontend driver.

Thanks.

Juergen

 drivers/xen/pvcalls-front.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Dan Carpenter (2):
  xen/pvcalls: check for xenbus_read() errors
  xen/pvcalls: Fix a check in pvcalls_front_remove()


[PATCH v4 12/12] cpufreq: dt: Reintroduce independent_clocks platform data

2017-12-08 Thread Sricharan R
The Platform data was removed earlier by,
'commit eb96924acddc ("cpufreq: dt: Kill platform-data")'
since there were no users at that time.
Now this is required when the each of the cpu clocks
can be scaled independently, which is the case
for krait cores. So reintroduce it.

Signed-off-by: Sricharan R 
---
 drivers/cpufreq/cpufreq-dt.c | 7 ++-
 drivers/cpufreq/cpufreq-dt.h | 6 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 545946a..8f0e881 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -228,7 +228,10 @@ static int cpufreq_init(struct cpufreq_policy *policy)
}
 
if (fallback) {
-   cpumask_setall(policy->cpus);
+   struct cpufreq_dt_platform_data *pd = cpufreq_get_driver_data();
+
+   if (!pd || !pd->independent_clocks)
+   cpumask_setall(policy->cpus);
 
/*
 * OPP tables are initialized only for policy->cpu, do it for
@@ -380,6 +383,8 @@ static int dt_cpufreq_probe(struct platform_device *pdev)
if (data && data->have_governor_per_policy)
dt_cpufreq_driver.flags |= CPUFREQ_HAVE_GOVERNOR_PER_POLICY;
 
+   dt_cpufreq_driver.driver_data = data;
+
ret = cpufreq_register_driver(&dt_cpufreq_driver);
if (ret)
dev_err(&pdev->dev, "failed register driver: %d\n", ret);
diff --git a/drivers/cpufreq/cpufreq-dt.h b/drivers/cpufreq/cpufreq-dt.h
index 54d774e..dcc03c6 100644
--- a/drivers/cpufreq/cpufreq-dt.h
+++ b/drivers/cpufreq/cpufreq-dt.h
@@ -13,6 +13,12 @@
 #include 
 
 struct cpufreq_dt_platform_data {
+   /*
+* True when each CPU has its own clock to control its
+* frequency, false when all CPUs are controlled by a single
+* clock.
+*/
+   bool independent_clocks;
bool have_governor_per_policy;
 };
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] LDT improvements

2017-12-08 Thread Thomas Gleixner
On Fri, 8 Dec 2017, Ingo Molnar wrote:
> * Andy Lutomirski  wrote:
> > I don't love mucking with user address space.  I'm also quite nervous about 
> > putting it in our near anything that could pass an access_ok check, since 
> > we're 
> > totally screwed if the bad guys can figure out how to write to it.
> 
> Hm, robustness of the LDT address wrt. access_ok() is a valid concern.
> 
> Can we have vmas with high addresses, in the vmalloc space for example?
> IIRC the GPU code has precedents in that area.
> 
> Since this is x86-64, limitation of the vmalloc() space is not an issue.
> 
> I like Thomas's solution:
> 
>  - have the LDT in a regular mmap space vma (hence per process ASLR 
> randomized), 
>but with the system bit set.
> 
>  - That would be an advantage even for non-PTI kernels, because mmap() is 
> probably 
>more randomized than kmalloc().

Randomization is pointless as long as you can get the LDT address in user
space, i.e. w/o UMIP.

>  - It would also be a cleaner approach all around, and would avoid the fixmap
>complications and the scheduler muckery.

The error code of such an access is always 0x03. So I added a special
handler, which checks whether the address is in the LDT map range and
verifies that the access bit in the descriptor is 0. If that's the case it
sets it and returns. If not, the thing dies. That works.

Thanks,

tglx


[PATCH v4 04/12] clk: qcom: Add HFPLL driver

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

On some devices (MSM8974 for example), the HFPLLs are
instantiated within the Krait processor subsystem as separate
register regions. Add a driver for these PLLs so that we can
provide HFPLL clocks for use by the system.

Cc: 
Signed-off-by: Stephen Boyd 
---
 .../devicetree/bindings/clock/qcom,hfpll.txt   |  40 
 drivers/clk/qcom/Kconfig   |   8 ++
 drivers/clk/qcom/Makefile  |   1 +
 drivers/clk/qcom/hfpll.c   | 106 +
 4 files changed, 155 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/qcom,hfpll.txt
 create mode 100644 drivers/clk/qcom/hfpll.c

diff --git a/Documentation/devicetree/bindings/clock/qcom,hfpll.txt 
b/Documentation/devicetree/bindings/clock/qcom,hfpll.txt
new file mode 100644
index 000..fee92bb
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/qcom,hfpll.txt
@@ -0,0 +1,40 @@
+High-Frequency PLL (HFPLL)
+
+PROPERTIES
+
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: must be "qcom,hfpll"
+
+- reg:
+   Usage: required
+   Value type: 
+   Definition: address and size of HPLL registers. An optional second
+   element specifies the address and size of the alias
+   register region.
+
+- clock-output-names:
+   Usage: required
+   Value type: 
+   Definition: Name of the PLL. Typically hfpllX where X is a CPU number
+   starting at 0. Otherwise hfpll_Y where Y is more specific
+   such as "l2".
+
+Example:
+
+1) An HFPLL for the L2 cache.
+
+   clock-controller@f9016000 {
+   compatible = "qcom,hfpll";
+   reg = <0xf9016000 0x30>;
+   clock-output-names = "hfpll_l2";
+   };
+
+2) An HFPLL for CPU0. This HFPLL has the alias register region.
+
+   clock-controller@f908a000 {
+   compatible = "qcom,hfpll";
+   reg = <0xf908a000 0x30>, <0xf900a000 0x30>;
+   clock-output-names = "hfpll0";
+   };
diff --git a/drivers/clk/qcom/Kconfig b/drivers/clk/qcom/Kconfig
index 20b5d6f..6c811bd 100644
--- a/drivers/clk/qcom/Kconfig
+++ b/drivers/clk/qcom/Kconfig
@@ -205,3 +205,11 @@ config SPMI_PMIC_CLKDIV
  Technologies, Inc. SPMI PMIC. It configures the frequency of
  clkdiv outputs of the PMIC. These clocks are typically wired
  through alternate functions on GPIO pins.
+
+config QCOM_HFPLL
+   tristate "High-Frequency PLL (HFPLL) Clock Controller"
+   depends on COMMON_CLK_QCOM
+   help
+ Support for the high-frequency PLLs present on Qualcomm devices.
+ Say Y if you want to support CPU frequency scaling on devices
+ such as MSM8974, APQ8084, etc.
diff --git a/drivers/clk/qcom/Makefile b/drivers/clk/qcom/Makefile
index 4795e21..4a4bf38 100644
--- a/drivers/clk/qcom/Makefile
+++ b/drivers/clk/qcom/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_MSM_MMCC_8996) += mmcc-msm8996.o
 obj-$(CONFIG_QCOM_CLK_RPM) += clk-rpm.o
 obj-$(CONFIG_QCOM_CLK_SMD_RPM) += clk-smd-rpm.o
 obj-$(CONFIG_SPMI_PMIC_CLKDIV) += clk-spmi-pmic-div.o
+obj-$(CONFIG_QCOM_HFPLL) += hfpll.o
diff --git a/drivers/clk/qcom/hfpll.c b/drivers/clk/qcom/hfpll.c
new file mode 100644
index 000..7405bb6
--- /dev/null
+++ b/drivers/clk/qcom/hfpll.c
@@ -0,0 +1,106 @@
+/*
+ * Copyright (c) 2013-2014, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "clk-regmap.h"
+#include "clk-hfpll.h"
+
+static const struct hfpll_data hdata = {
+   .mode_reg = 0x00,
+   .l_reg = 0x04,
+   .m_reg = 0x08,
+   .n_reg = 0x0c,
+   .user_reg = 0x10,
+   .config_reg = 0x14,
+   .config_val = 0x430405d,
+   .status_reg = 0x1c,
+   .lock_bit = 16,
+
+   .user_val = 0x8,
+   .user_vco_mask = 0x10,
+   .low_vco_max_rate = 124800,
+   .min_rate = 53760UL,
+   .max_rate = 29UL,
+};
+
+static const struct of_device_id qcom_hfpll_match_table[] = {
+   { .compatible = "qcom,hfpll" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, qcom_hfpll_match_table);
+
+static const struct regmap_config hfpll_regmap_config = {
+   .reg_bits   = 32,
+   .reg_stride = 4,
+   .val_bits   = 32,
+   .max_register   = 0x30,
+   .fast_io= true,
+};
+
+static int qcom_hfpll_probe(struct platform_device *pdev)
+{
+ 

Re: [PATCH 0/1] About MIPS/Loongson maintainance

2017-12-08 Thread Jiaxun Yang
On 2017-12-08 Fri 07:51 +,James Hogan Wrote:
> On Fri, Dec 08, 2017 at 12:01:46PM +0800, Jiaxun Yang wrote:
> > Also we're going to separate code between
> > Loongson2 and Loongson3 since they are becoming more and more
> > identical.
> 
> Do you mean you want to combine them?

Sorry, my fault. They're become more and more different and  I'm going
to separate loongson64 into loongson2 and loongson3.

> 
> > But It will cause a lot of changes under march of loongson64
> >  that currently maintaining by linux-mips community. Send plenty of
> > patches to mailing list would not be a wise way to do that. So we
> > can
> > PR these changes to linux-next directly and PR to linux-mips before
> > merge window.

So we can commit by ourselves after subsystem's review to reduce linux-
mips's workload. 
Since Huacai Chen said that we won't send PR, maybe it's unnecessary.
Thanks.

> For the avoidance of doubt, a pull request would not excempt you from
> needing your patches properly reviewed on the mailing lists first.
> 
> And quoting Stephen's boilerplate response to linux-next additions:
> > Thanks for adding your subsystem tree as a participant of linux-
> > next.  As
> > you may know, this is not a judgement of your code.  The purpose of
> > linux-next is for integration testing and to lower the impact of
> > conflicts between subsystems in the next merge window.
> > 
> > You will need to ensure that the patches/commits in your
> > tree/series have
> > been:
> >  * submitted under GPL v2 (or later) and include the
> > Contributor's
> > Signed-off-by,
> >  * posted to the relevant mailing list,
> >  * reviewed by you (or another maintainer of your subsystem
> > tree),
> >  * successfully unit tested, and
> >  * destined for the current or next Linux merge window.
> > 
> > Basically, this should be just what you would send to Linus (or ask
> > him
> > to fetch).  It is allowed to be rebased if you deem it necessary.
> 
> Cheers
> James
-- 
Jiaxun Yang


Re: [PATCH v4 00/12] Krait clocks + Krait CPUfreq

2017-12-08 Thread Sricharan R
oops, got Mike's id wrong. Will just re-post fixing his id.

On 12/8/2017 2:59 PM, Sricharan R wrote:
> Mostly a resend of the v3 posted by Stephen quite some time back [1]
> except for few changes.
>   Based on reading some feedback from list,
>   * Dropped the patch "clk: Add safe switch hook" from v3 [2].
> Now this is taken care by patch#10 in this series only for Krait.
>   * Dropped the path "clk: Avoid sending high rates to downstream
> clocks during set_rate" from v3 [3].
>   * Rebased on top of clk-next.
>   * Dropped the DT update from the series. Will send seperately
>   * Now with cpufreq-dt+opp supporting voltage scaling, registering the
> krait cpu supplies in DT should be sufficient. But one issue is,
> the qcom-cpufreq drivers reads the efuse and based on that registers
> the opp data and then registers the cpufreq-dt device. So when
> cpufreq-dt driver probes and registers the regulator to the OPP framework,
> it expects that the opp data for the device should not be registered 
> before
> the regulator. Will send a RFC patch removing that check, to find out the
> right way of doing it.
> 
> These patches provide cpufreq scaling on devices with Krait CPUs.
> In Krait CPU designs there's one PLL and two muxes per CPU, allowing
> us to switch CPU frequencies independently.
> 
>secondary
>+-++
>| QSB |---+|\
>+-+   || |-+
>  |+---|/  |
>  ||   +   |
>+-+   ||   |
>| PLL |+---+   |   primary
>+-+|  || +
>   |  |+-|\   +--+
>+---+  |  |  | \  |  |
>| HFPLL |--+-|  |-| CPU0 |
>+---+  |  || |  | |  |
>   |  || +-+ | /  +--+
>   |  |+-| / 2 |-|/
>   |  |  +-+ +
>   |  | secondary
>   |  |+
>   |  +|\
>   |   | |-+
>   +---|/  |   primary
>   +   | +
>   +-|\   +--+
>+---+| \  |  |
>| HFPLL ||  |-| CPU1 |
>+---+  | |  | |  |
>   | +-+ | /  +--+
>   +-| / 2 |-|/
> +-+ +
> 
> To support this in the common clock framework we model the muxes,
> dividers, and PLLs as different clocks. CPUfreq only interacts
> with the primary mux (farthest right in the diagram). When CPUfreq
> sets a rate, the mux code finds the best parent that can provide the rate.
> Due to the design, QSB and the top PLL are always a fixed rate and thus
> only support one frequency each. These sources provide the lowest
> frequencies for the CPUs. The HFPLLs are where we can make the CPU go
> faster (GHz range). Sometimes we need to run the HFPLL twice as
> fast and divide it by two to get a particular frequency.
> 
> When switching rates we can't leave the CPU clocked by the HFPLL because
> we need to turn off the output of the PLL when changing its frequency.
> This means we have to switch over to the secondary mux and use one of the
> fixed sources. This is why we need something like the safe parent patch.
> 
> [1] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332607.html
> [2] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332615.html
> [3] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332608.html
> 
> Sricharan R (2):
>   clk: qcom: Add safe switch hook for krait mux clocks
>   cpufreq: dt: Reintroduce independent_clocks platform data
> 
> Stephen Boyd (10):
>   ARM: Add Krait L2 register accessor functions
>   clk: mux: Split out register accessors for reuse
>   clk: qcom: Add support for High-Frequency PLLs (HFPLLs)
>   clk: qcom: Add HFPLL driver
>   clk: qcom: Add MSM8960/APQ8064's HFPLLs
>   clk: qcom: Add IPQ806X's HFPLLs
>   clk: qcom: Add support for Krait clocks
>   clk: qcom: Add KPSS ACC/GCC driver
>   clk: qcom: Add Krait clock controller driver
>   cpufreq: Add module to register cpufreq on Krait CPUs
> 
>  .../devicetree/bindings/arm/msm/qcom,kpss-acc.txt  |   7 +
>  .../devicetree/bindings/arm/msm/qcom,kpss-gcc.txt  |  28 ++
>  .../devicetree/bindings/arm/msm/qcom,pvs.txt   |  38 ++
>  .../devicetree/bindings/clock/qcom,hfpll.txt   |  40 ++
>  .../devicetree/bindings/clock/qcom,krait-cc.txt|  22 ++
>  arch/arm/common/K

[PATCH v4 03/12] clk: qcom: Add support for High-Frequency PLLs (HFPLLs)

2017-12-08 Thread Sricharan R
From: Stephen Boyd 

HFPLLs are the main frequency source for Krait CPU clocks. Add
support for changing the rate of these PLLs.

Signed-off-by: Stephen Boyd 
---
 drivers/clk/qcom/Makefile|   1 +
 drivers/clk/qcom/clk-hfpll.c | 253 +++
 drivers/clk/qcom/clk-hfpll.h |  54 +
 3 files changed, 308 insertions(+)
 create mode 100644 drivers/clk/qcom/clk-hfpll.c
 create mode 100644 drivers/clk/qcom/clk-hfpll.h

diff --git a/drivers/clk/qcom/Makefile b/drivers/clk/qcom/Makefile
index 602af38..4795e21 100644
--- a/drivers/clk/qcom/Makefile
+++ b/drivers/clk/qcom/Makefile
@@ -10,6 +10,7 @@ clk-qcom-y += clk-rcg2.o
 clk-qcom-y += clk-branch.o
 clk-qcom-y += clk-regmap-divider.o
 clk-qcom-y += clk-regmap-mux.o
+clk-qcom-y += clk-hfpll.o
 clk-qcom-y += reset.o
 clk-qcom-$(CONFIG_QCOM_GDSC) += gdsc.o
 
diff --git a/drivers/clk/qcom/clk-hfpll.c b/drivers/clk/qcom/clk-hfpll.c
new file mode 100644
index 000..5b0bb71
--- /dev/null
+++ b/drivers/clk/qcom/clk-hfpll.c
@@ -0,0 +1,253 @@
+/*
+ * Copyright (c) 2013-2014, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "clk-regmap.h"
+#include "clk-hfpll.h"
+
+#define PLL_OUTCTRLBIT(0)
+#define PLL_BYPASSNL   BIT(1)
+#define PLL_RESET_NBIT(2)
+
+/* Initialize a HFPLL at a given rate and enable it. */
+static void __clk_hfpll_init_once(struct clk_hw *hw)
+{
+   struct clk_hfpll *h = to_clk_hfpll(hw);
+   struct hfpll_data const *hd = h->d;
+   struct regmap *regmap = h->clkr.regmap;
+
+   if (likely(h->init_done))
+   return;
+
+   /* Configure PLL parameters for integer mode. */
+   if (hd->config_val)
+   regmap_write(regmap, hd->config_reg, hd->config_val);
+   regmap_write(regmap, hd->m_reg, 0);
+   regmap_write(regmap, hd->n_reg, 1);
+
+   if (hd->user_reg) {
+   u32 regval = hd->user_val;
+   unsigned long rate;
+
+   rate = clk_hw_get_rate(hw);
+
+   /* Pick the right VCO. */
+   if (hd->user_vco_mask && rate > hd->low_vco_max_rate)
+   regval |= hd->user_vco_mask;
+   regmap_write(regmap, hd->user_reg, regval);
+   }
+
+   if (hd->droop_reg)
+   regmap_write(regmap, hd->droop_reg, hd->droop_val);
+
+   h->init_done = true;
+}
+
+static void __clk_hfpll_enable(struct clk_hw *hw)
+{
+   struct clk_hfpll *h = to_clk_hfpll(hw);
+   struct hfpll_data const *hd = h->d;
+   struct regmap *regmap = h->clkr.regmap;
+   u32 val;
+
+   __clk_hfpll_init_once(hw);
+
+   /* Disable PLL bypass mode. */
+   regmap_update_bits(regmap, hd->mode_reg, PLL_BYPASSNL, PLL_BYPASSNL);
+
+   /*
+* H/W requires a 5us delay between disabling the bypass and
+* de-asserting the reset. Delay 10us just to be safe.
+*/
+   usleep_range(10, 100);
+
+   /* De-assert active-low PLL reset. */
+   regmap_update_bits(regmap, hd->mode_reg, PLL_RESET_N, PLL_RESET_N);
+
+   /* Wait for PLL to lock. */
+   if (hd->status_reg) {
+   do {
+   regmap_read(regmap, hd->status_reg, &val);
+   } while (!(val & BIT(hd->lock_bit)));
+   } else {
+   usleep_range(60, 100);
+   }
+
+   /* Enable PLL output. */
+   regmap_update_bits(regmap, hd->mode_reg, PLL_OUTCTRL, PLL_OUTCTRL);
+}
+
+/* Enable an already-configured HFPLL. */
+static int clk_hfpll_enable(struct clk_hw *hw)
+{
+   unsigned long flags;
+   struct clk_hfpll *h = to_clk_hfpll(hw);
+   struct hfpll_data const *hd = h->d;
+   struct regmap *regmap = h->clkr.regmap;
+   u32 mode;
+
+   spin_lock_irqsave(&h->lock, flags);
+   regmap_read(regmap, hd->mode_reg, &mode);
+   if (!(mode & (PLL_BYPASSNL | PLL_RESET_N | PLL_OUTCTRL)))
+   __clk_hfpll_enable(hw);
+   spin_unlock_irqrestore(&h->lock, flags);
+
+   return 0;
+}
+
+static void __clk_hfpll_disable(struct clk_hfpll *h)
+{
+   struct hfpll_data const *hd = h->d;
+   struct regmap *regmap = h->clkr.regmap;
+
+   /*
+* Disable the PLL output, disable test mode, enable the bypass mode,
+* and assert the reset.
+*/
+   regmap_update_bits(regmap, hd->mode_reg,
+  PLL_BYPASSNL | PLL_RESET_N | PLL_OUTCTRL, 0);
+}
+
+static void clk_hfpll_disable(struct clk_hw *hw)
+{
+   str

Re: [PATCH V3 2/2] Xen/PCIback: Implement PCI flr/slot/bus reset with 'reset' SysFS attribute

2017-12-08 Thread Jan Beulich
>>> On 07.12.17 at 23:21,  wrote:
> Due to the complexity with the PCI lock we cannot do the reset when a
> device is bound ('echo $BDF > bind') or when unbound ('echo $BDF > unbind')
> as the pci_[slot|bus]_reset also takes the same lock resulting in a
> dead-lock.

It took me a moment to figure that here you're referring to the
process of (un)binding, not the state. To avoid that ambiguity in
wording, how about "... we cannot do the reset while a device is
being bound (...) or while it is being unbound ..."?

> --- a/Documentation/ABI/testing/sysfs-driver-pciback
> +++ b/Documentation/ABI/testing/sysfs-driver-pciback
> @@ -11,3 +11,18 @@ Description:
>  #echo 00:19.0-E0:2:FF > /sys/bus/pci/drivers/pciback/quirks
>  will allow the guest to read and write to the configuration
>  register 0x0E.
> +
> +What:   /sys/bus/pci/drivers/pciback/reset
> +Date:   Dec 2017
> +KernelVersion:  4.15
> +Contact:xen-de...@lists.xenproject.org 
> +Description:
> +An option to perform a flr/slot/bus reset when a PCI device
> + is owned by Xen PCI backend. Writing a string of :BB:DD.F

:BB:DD.F (or else the D-s are ambiguous, the more that "domain"
in Xen code is ambiguous anyway - I continue to be mislead by struct
pcistub_device_id's domain field)

Also I assume the  part is optional (default zero), which
probably can and should be expressed in some way.

> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -313,6 +313,102 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
>   up_write(&pcistub_sem);
>  }
>  
> +struct pcistub_args {
> + const struct pci_dev *dev;
> + unsigned int dcount;

The sole use of this field is for a debug message. Why not drop it
and make "dev" the "data" argument without further indirection?

> +static int pcistub_device_search(struct pci_dev *dev, void *data)
> +{
> + struct pcistub_device *psdev;
> + struct pcistub_args *arg = data;
> + bool found = false;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pcistub_devices_lock, flags);
> +
> + list_for_each_entry(psdev, &pcistub_devices, dev_list) {
> + if (psdev->dev == dev) {
> + found = true;
> + arg->dcount++;
> + break;

Neither here nor in the caller I can see a check whether the device
is currently assigned to a guest. Ownership by pciback alone imo is
not sufficient to allow a reset to be performed.

> +static int pcistub_device_reset(struct pci_dev *dev)
> +{
> + struct xen_pcibk_dev_data *dev_data;
> + bool slot = false, bus = false;
> + struct pcistub_args arg = {};
> +
> + if (!dev)
> + return -EINVAL;
> +
> + dev_dbg(&dev->dev, "[%s]\n", __func__);
> +
> + /* First check and try FLR */
> + if (pcie_has_flr(dev)) {
> + dev_dbg(&dev->dev, "resetting %s device using FLR\n",
> + pci_name(dev));
> + pcie_flr(dev);

The lack of error check here puzzled me, but I see the function
indeed returns void right now. I think the prereq patch should
change this along with exporting the function - you really don't
want the device to be handed to a guest when the FLR timed
out.

> + return 0;
> + }
> +
> + if (!pci_probe_reset_slot(dev->slot))
> + slot = true;
> + else if ((!pci_probe_reset_bus(dev->bus)) &&
> +  (!pci_is_root_bus(dev->bus)))

Too many parentheses for my taste.

> +static ssize_t reset_store(struct device_driver *drv, const char *buf,
> +size_t count)
> +{
> + struct pcistub_device *psdev;
> + int domain, bus, slot, func;
> + int err;
> +
> + err = str_to_slot(buf, &domain, &bus, &slot, &func);
> + if (err)
> + return err;
> +
> + psdev = pcistub_device_find(domain, bus, slot, func);
> + if (psdev) {
> + err = pcistub_device_reset(psdev->dev);
> + pcistub_device_put(psdev);
> + } else {
> + err = -ENODEV;
> + }
> +
> + if (!err)
> + err = count;
> +
> + return err;
> +}
> +static DRIVER_ATTR_WO(reset);

Would it be worth for reads of the file to return whether the device
can be reset this way (i.e. the result of the checks you do before
actually doing the reset)?

Jan


  1   2   3   4   5   6   7   8   9   10   >