date:20160423

Re: [PATCH RFC 10/22] block, bfq: add full hierarchical scheduling and cgroups support

2016-04-23 Thread Paolo Valente

Il giorno 22/apr/2016, alle ore 21:32, Tejun Heo  ha scritto:

> Hello, Paolo.
> 
> On Fri, Apr 22, 2016 at 09:05:14PM +0200, Paolo Valente wrote:
>>> Ah, right, I was confused.  cic is always associated with the task and
>>> yes a writeback worker can trigger blkcg changed events frequently as
>>> it walks through different cgroups.  Is this an issue?
>> 
>> That’s exactly the source of my confusion: why does the worker walk
>> through different cgroups all the time if the I/O is originated by
>> the same process, which never changes group?
> 
> Because the workqueue workers aren't tied to individual workqueues,
> they wander around serving different workqueues.

There is certainly something I don’t know here, because I don’t understand why 
there is also a workqueue containing root-group I/O all the time, if the only 
process doing I/O belongs to a different (sub)group.

Anyway, if this is expected, then there is no reason to bother you further on 
it. In contrast, the actual problem I see is the following. If one third or 
half of the bios belong to a different group than the writer that one wants to 
isolate, then, whatever weight is assigned to the writer group, we will never 
be able to let the writer get the desired share of the time (or of the 
bandwidth with bfq and all quasi-sequential workloads). For instance, in the 
scenario that you told me to try, the writer will never get 50% of the time, 
with any scheduler. Am I missing something also on this?

Thanks,
Paolo

>  This might change if
> we eventually update workqueues to be cgroup aware but for now it's
> expected to happen.
> 
> Thanks.
> 
> -- 
> tejun

[RFC PATCH 1/2] regulator: refactor valid_ops_mask checking code

2016-04-23 Thread WEN Pingbo

To make the code more compat and centralized, this patch add a
unified function - regulator_ops_is_valid. So we can add
some extra checking code easily later.

Signed-off-by: WEN Pingbo 
---
 drivers/regulator/core.c | 88 
 1 file changed, 29 insertions(+), 59 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index e0b7642..fe47d38 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -132,6 +132,19 @@ static bool have_full_constraints(void)
return has_full_constraints || of_have_populated_dt();
 }
 
+static bool regulator_ops_is_valid(struct regulator_dev *rdev, int ops)
+{
+   if (!rdev->constraints) {
+   rdev_err(rdev, "no constraints\n");
+   return false;
+   }
+
+   if (rdev->constraints->valid_ops_mask & ops)
+   return true;
+
+   return false;
+}
+
 static inline struct regulator_dev *rdev_get_supply(struct regulator_dev *rdev)
 {
if (rdev && rdev->supply)
@@ -198,28 +211,13 @@ static struct device_node *of_get_regulator(struct device 
*dev, const char *supp
return regnode;
 }
 
-static int _regulator_can_change_status(struct regulator_dev *rdev)
-{
-   if (!rdev->constraints)
-   return 0;
-
-   if (rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_STATUS)
-   return 1;
-   else
-   return 0;
-}
-
 /* Platform voltage constraint check */
 static int regulator_check_voltage(struct regulator_dev *rdev,
   int *min_uV, int *max_uV)
 {
BUG_ON(*min_uV > *max_uV);
 
-   if (!rdev->constraints) {
-   rdev_err(rdev, "no constraints\n");
-   return -ENODEV;
-   }
-   if (!(rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_VOLTAGE)) {
+   if (!regulator_ops_is_valid(rdev, REGULATOR_CHANGE_VOLTAGE)) {
rdev_err(rdev, "voltage operation not allowed\n");
return -EPERM;
}
@@ -275,11 +273,7 @@ static int regulator_check_current_limit(struct 
regulator_dev *rdev,
 {
BUG_ON(*min_uA > *max_uA);
 
-   if (!rdev->constraints) {
-   rdev_err(rdev, "no constraints\n");
-   return -ENODEV;
-   }
-   if (!(rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_CURRENT)) {
+   if (!regulator_ops_is_valid(rdev, REGULATOR_CHANGE_CURRENT)) {
rdev_err(rdev, "current operation not allowed\n");
return -EPERM;
}
@@ -312,11 +306,7 @@ static int regulator_mode_constrain(struct regulator_dev 
*rdev, int *mode)
return -EINVAL;
}
 
-   if (!rdev->constraints) {
-   rdev_err(rdev, "no constraints\n");
-   return -ENODEV;
-   }
-   if (!(rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_MODE)) {
+   if (!regulator_ops_is_valid(rdev, REGULATOR_CHANGE_MODE)) {
rdev_err(rdev, "mode operation not allowed\n");
return -EPERM;
}
@@ -333,20 +323,6 @@ static int regulator_mode_constrain(struct regulator_dev 
*rdev, int *mode)
return -EINVAL;
 }
 
-/* dynamic regulator mode switching constraint check */
-static int regulator_check_drms(struct regulator_dev *rdev)
-{
-   if (!rdev->constraints) {
-   rdev_err(rdev, "no constraints\n");
-   return -ENODEV;
-   }
-   if (!(rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_DRMS)) {
-   rdev_dbg(rdev, "drms operation not allowed\n");
-   return -EPERM;
-   }
-   return 0;
-}
-
 static ssize_t regulator_uV_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
@@ -692,8 +668,7 @@ static int drms_uA_update(struct regulator_dev *rdev)
 * first check to see if we can set modes at all, otherwise just
 * tell the consumer everything is OK.
 */
-   err = regulator_check_drms(rdev);
-   if (err < 0)
+   if (!regulator_ops_is_valid(rdev, REGULATOR_CHANGE_DRMS))
return 0;
 
if (!rdev->desc->ops->get_optimum_mode &&
@@ -893,7 +868,7 @@ static void print_constraints(struct regulator_dev *rdev)
rdev_dbg(rdev, "%s\n", buf);
 
if ((constraints->min_uV != constraints->max_uV) &&
-   !(constraints->valid_ops_mask & REGULATOR_CHANGE_VOLTAGE))
+   !regulator_ops_is_valid(rdev, REGULATOR_CHANGE_VOLTAGE))
rdev_warn(rdev,
  "Voltage range but no REGULATOR_CHANGE_VOLTAGE\n");
 }
@@ -1334,7 +1309,7 @@ static struct regulator *create_regulator(struct 
regulator_dev *rdev,
 * it is then we don't need to do nearly so much work for
 * enable/disable calls.
 */
-   if (!_regulator_can_change_status(rdev) &&
+   if (!regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS) &&
_regulator_is_enabled(rdev))

[RFC PATCH 2/2] regulator: add boot protection flag

2016-04-23 Thread WEN Pingbo

In some platform, some critical shared regulator is initialized before
kernel loading. But in kernel booting, the driver probing order and
conflict operation from other regulator consumer, may set the regulator
in a undefined state, which will cause serious problem.

This patch try to add a boot_protection flag in regulator constraints.
So the regulator core will prevent the specified operation during kernel
booting.

The boot_protection flag only work before late_initicall. And as other
constraints liked, you can specify this flag in a board file, or in
dts file. By default, all operations of this regulator will be rejected
during kernel booting, if you add this flag in a regulator. But you
still have a chance to change this, by modifying boot_valid_ops_mask.

[ This patch depends on regulator_ops_is_valid patch. And some document
need to add, but I want to hear some voice first. ]

Signed-off-by: WEN Pingbo 
---
 drivers/regulator/core.c  | 24 +---
 drivers/regulator/of_regulator.c  | 29 +
 include/linux/regulator/machine.h |  2 ++
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index fe47d38..5b9dc22 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -55,6 +55,7 @@ static LIST_HEAD(regulator_map_list);
 static LIST_HEAD(regulator_ena_gpio_list);
 static LIST_HEAD(regulator_supply_alias_list);
 static bool has_full_constraints;
+static bool regulator_has_booted;
 
 static struct dentry *debugfs_root;
 
@@ -139,7 +140,15 @@ static bool regulator_ops_is_valid(struct regulator_dev 
*rdev, int ops)
return false;
}
 
-   if (rdev->constraints->valid_ops_mask & ops)
+   /*
+* Ignore regulator boot-protection, after later_initcall.
+*/
+   if (!regulator_has_booted && rdev->constraints->boot_protection) {
+   if (rdev->constraints->boot_valid_ops_mask & ops)
+   return true;
+   else
+   rdev_info(rdev, "rejected operation 0x%02x\n", ops);
+   } else if (rdev->constraints->valid_ops_mask & ops)
return true;
 
return false;
@@ -868,7 +877,7 @@ static void print_constraints(struct regulator_dev *rdev)
rdev_dbg(rdev, "%s\n", buf);
 
if ((constraints->min_uV != constraints->max_uV) &&
-   !regulator_ops_is_valid(rdev, REGULATOR_CHANGE_VOLTAGE))
+   !(constraints->valid_ops_mask & REGULATOR_CHANGE_VOLTAGE))
rdev_warn(rdev,
  "Voltage range but no REGULATOR_CHANGE_VOLTAGE\n");
 }
@@ -1309,7 +1318,8 @@ static struct regulator *create_regulator(struct 
regulator_dev *rdev,
 * it is then we don't need to do nearly so much work for
 * enable/disable calls.
 */
-   if (!regulator_ops_is_valid(rdev, REGULATOR_CHANGE_STATUS) &&
+   if (rdev->constraints &&
+   !(rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_STATUS) &&
_regulator_is_enabled(rdev))
regulator->always_on = true;
 
@@ -4353,6 +4363,12 @@ static int __init regulator_late_cleanup(struct device 
*dev, void *data)
struct regulation_constraints *c = rdev->constraints;
int enabled, ret;
 
+   /*
+* The kernel boot is finished, let's unset boot_protection
+* Need a lock?
+*/
+   c->boot_protection = 0;
+
if (c && c->always_on)
return 0;
 
@@ -4406,6 +4422,8 @@ static int __init regulator_init_complete(void)
if (of_have_populated_dt())
has_full_constraints = true;
 
+   regulator_has_booted = true;
+
/* If we have a full configuration then disable any regulators
 * we have permission to change the status for and which are
 * not in use or always_on.  This is effectively the default
diff --git a/drivers/regulator/of_regulator.c b/drivers/regulator/of_regulator.c
index 6b0aa80..bfec59c 100644
--- a/drivers/regulator/of_regulator.c
+++ b/drivers/regulator/of_regulator.c
@@ -78,6 +78,35 @@ static void of_get_regulation_constraints(struct device_node 
*np,
if (of_property_read_bool(np, "regulator-allow-set-load"))
constraints->valid_ops_mask |= REGULATOR_CHANGE_DRMS;
 
+   constraints->boot_protection = of_property_read_bool(np,
+   "regulator-boot-protection");
+
+   if (constraints->boot_protection) {
+   if (of_property_read_bool(np, "boot-allow-set-voltage"))
+   constraints->boot_valid_ops_mask |=
+   REGULATOR_CHANGE_VOLTAGE;
+   if (of_property_read_bool(np, "boot-allow-set-current"))
+   constraints->boot_valid_ops_mask |=
+   REGULATOR_CHANGE_CURRENT;
+   if (of_property_read_bool(np, "boot-allow-set-mode"))
+   con

[PATCH] mm: update the document of numa_zonelist_order

2016-04-23 Thread Xishi Qiu

commit 3193913ce62c63056bc67a6ae378beaf494afa66 change the default value
of numa_zonelist_order, this patch update the document.

Signed-off-by: Xishi Qiu 
---
 Documentation/sysctl/vm.txt |   19 ++-
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index cb03684..34a5fec 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -581,15 +581,16 @@ Specify "[Nn]ode" for node order
 "Zone Order" orders the zonelists by zone type, then by node within each
 zone.  Specify "[Zz]one" for zone order.
 
-Specify "[Dd]efault" to request automatic configuration.  Autoconfiguration
-will select "node" order in following case.
-(1) if the DMA zone does not exist or
-(2) if the DMA zone comprises greater than 50% of the available memory or
-(3) if any node's DMA zone comprises greater than 70% of its local memory and
-the amount of local memory is big enough.
-
-Otherwise, "zone" order will be selected. Default order is recommended unless
-this is causing problems for your system/application.
+Specify "[Dd]efault" to request automatic configuration.
+
+On 32-bit, the Normal zone needs to be preserved for allocations accessible
+by the kernel, so "zone" order will be selected.
+
+On 64-bit, devices that require DMA32/DMA are relatively rare, so "node"
+order will be selected.
+
+Default order is recommended unless this is causing problems for your
+system/application.
 
 ==
 
-- 
1.7.1

Re: [patch] Bluetooth: ath3k: Silence uninitialized variable warning

2016-04-23 Thread Dan Carpenter

On Sat, Apr 23, 2016 at 12:17:45PM +0530, Afzal Mohammed wrote:
> Hi,
> 
> On Fri, Apr 22, 2016 at 01:02:55PM +0300, Dan Carpenter wrote:
> 
> > -   int err, pipe, len, size, count, sent = 0;
> > +   int len = 0;
> > +   int err, pipe, size, count, sent = 0;
> 
> Is there any particular reason to avoid more than 1 variable
> initialization in definition on a single line ?, like,
> 
>   int err, pipe, size, count, sent = 0, len = 0;
> 
> have observed that none of your uninitialized variable warning fixes
> does as mentioned above.

That sort of initialization is slightly less readable...

regards,
dan carpenter

[patch added to 3.12-stable] netfilter: x_tables: make sure e->next_offset covers remaining blob size

2016-04-23 Thread Jiri Slaby

From: Florian Westphal 

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===

commit 6e94e0cfb0887e4013b3b930fa6ab1fe6bb6ba91 upstream.

Otherwise this function may read data beyond the ruleset blob.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
Cc: Michal Kubecek 
Signed-off-by: Jiri Slaby 
---
 net/ipv4/netfilter/arp_tables.c | 6 --
 net/ipv4/netfilter/ip_tables.c  | 6 --
 net/ipv6/netfilter/ip6_tables.c | 6 --
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 29aa90ea4c8d..456fc6efe05d 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -558,7 +558,8 @@ static inline int check_entry_size_and_hooks(struct 
arpt_entry *e,
int err;
 
if ((unsigned long)e % __alignof__(struct arpt_entry) != 0 ||
-   (unsigned char *)e + sizeof(struct arpt_entry) >= limit) {
+   (unsigned char *)e + sizeof(struct arpt_entry) >= limit ||
+   (unsigned char *)e + e->next_offset > limit) {
duprintf("Bad offset %p\n", e);
return -EINVAL;
}
@@ -1218,7 +1219,8 @@ check_compat_entry_size_and_hooks(struct 
compat_arpt_entry *e,
 
duprintf("check_compat_entry_size_and_hooks %p\n", e);
if ((unsigned long)e % __alignof__(struct compat_arpt_entry) != 0 ||
-   (unsigned char *)e + sizeof(struct compat_arpt_entry) >= limit) {
+   (unsigned char *)e + sizeof(struct compat_arpt_entry) >= limit ||
+   (unsigned char *)e + e->next_offset > limit) {
duprintf("Bad offset %p, limit = %p\n", e, limit);
return -EINVAL;
}
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index d400a2ad7c56..a5bd3c8eee84 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -720,7 +720,8 @@ check_entry_size_and_hooks(struct ipt_entry *e,
int err;
 
if ((unsigned long)e % __alignof__(struct ipt_entry) != 0 ||
-   (unsigned char *)e + sizeof(struct ipt_entry) >= limit) {
+   (unsigned char *)e + sizeof(struct ipt_entry) >= limit ||
+   (unsigned char *)e + e->next_offset > limit) {
duprintf("Bad offset %p\n", e);
return -EINVAL;
}
@@ -1483,7 +1484,8 @@ check_compat_entry_size_and_hooks(struct compat_ipt_entry 
*e,
 
duprintf("check_compat_entry_size_and_hooks %p\n", e);
if ((unsigned long)e % __alignof__(struct compat_ipt_entry) != 0 ||
-   (unsigned char *)e + sizeof(struct compat_ipt_entry) >= limit) {
+   (unsigned char *)e + sizeof(struct compat_ipt_entry) >= limit ||
+   (unsigned char *)e + e->next_offset > limit) {
duprintf("Bad offset %p, limit = %p\n", e, limit);
return -EINVAL;
}
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index e0538df697c8..fb8a146abed8 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -731,7 +731,8 @@ check_entry_size_and_hooks(struct ip6t_entry *e,
int err;
 
if ((unsigned long)e % __alignof__(struct ip6t_entry) != 0 ||
-   (unsigned char *)e + sizeof(struct ip6t_entry) >= limit) {
+   (unsigned char *)e + sizeof(struct ip6t_entry) >= limit ||
+   (unsigned char *)e + e->next_offset > limit) {
duprintf("Bad offset %p\n", e);
return -EINVAL;
}
@@ -1495,7 +1496,8 @@ check_compat_entry_size_and_hooks(struct 
compat_ip6t_entry *e,
 
duprintf("check_compat_entry_size_and_hooks %p\n", e);
if ((unsigned long)e % __alignof__(struct compat_ip6t_entry) != 0 ||
-   (unsigned char *)e + sizeof(struct compat_ip6t_entry) >= limit) {
+   (unsigned char *)e + sizeof(struct compat_ip6t_entry) >= limit ||
+   (unsigned char *)e + e->next_offset > limit) {
duprintf("Bad offset %p, limit = %p\n", e, limit);
return -EINVAL;
}
-- 
2.8.1

Re: [PATCH 1/8] dt/bindings: firmware: Add Qualcomm SCM binding

2016-04-23 Thread Stanimir Varbanov

Hi Andy,

On 04/23/2016 01:17 AM, Andy Gross wrote:
> This patch adds the device tree support for the Qualcomm SCM firmware.
> 
> Signed-off-by: Andy Gross 
> ---
>  .../devicetree/bindings/firmware/qcom,scm.txt  | 31 
> ++
>  1 file changed, 31 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/firmware/qcom,scm.txt
> 
> diff --git a/Documentation/devicetree/bindings/firmware/qcom,scm.txt 
> b/Documentation/devicetree/bindings/firmware/qcom,scm.txt
> new file mode 100644
> index 000..57b9b3a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/firmware/qcom,scm.txt
> @@ -0,0 +1,31 @@
> +QCOM Secure Channel Manager (SCM)
> +
> +Qualcomm processors include an interface to communicate to the secure 
> firmware.
> +This interface allows for clients to request different types of actions.  
> These
> +can include CPU power up/down, HDCP requests, loading of firmware, and other
> +assorted actions.
> +
> +Required properties:
> +- compatible: must contain one of the following:
> + * "qcom,scm-apq8064" for APQ8064
> + * "qcom,scm-apq8084" for MSM8084

s/MSM8084/APQ8084

regards,
Stan

[PATCH 3/3] clk: hisilicon: add CRG driver for hi3519 soc

2016-04-23 Thread Jiancheng Xue

The CRG(Clock and Reset Generator) block provides clock
and reset signals for other modules in hi3519 soc.

Signed-off-by: Jiancheng Xue 
Acked-by: Rob Herring 
---
 .../devicetree/bindings/clock/hi3519-crg.txt   |  46 
 drivers/clk/hisilicon/Kconfig  |   8 ++
 drivers/clk/hisilicon/Makefile |   1 +
 drivers/clk/hisilicon/clk-hi3519.c | 131 +
 include/dt-bindings/clock/hi3519-clock.h   |  40 +++
 5 files changed, 226 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/hi3519-crg.txt
 create mode 100644 drivers/clk/hisilicon/clk-hi3519.c
 create mode 100644 include/dt-bindings/clock/hi3519-clock.h

diff --git a/Documentation/devicetree/bindings/clock/hi3519-crg.txt 
b/Documentation/devicetree/bindings/clock/hi3519-crg.txt
new file mode 100644
index 000..acd1f23
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/hi3519-crg.txt
@@ -0,0 +1,46 @@
+* Hisilicon Hi3519 Clock and Reset Generator(CRG)
+
+The Hi3519 CRG module provides clock and reset signals to various
+controllers within the SoC.
+
+This binding uses the following bindings:
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+Documentation/devicetree/bindings/reset/reset.txt
+
+Required Properties:
+
+- compatible: should be one of the following.
+  - "hisilicon,hi3519-crg" - controller compatible with Hi3519 SoC.
+
+- reg: physical base address of the controller and length of memory mapped
+  region.
+
+- #clock-cells: should be 1.
+
+Each clock is assigned an identifier and client nodes use this identifier
+to specify the clock which they consume.
+
+All these identifier could be found in .
+
+- #reset-cells: should be 2.
+
+A reset signal can be controlled by writing a bit register in the CRG module.
+The reset specifier consists of two cells. The first cell represents the
+register offset relative to the base address. The second cell represents the
+bit index in the register.
+
+Example: CRG nodes
+CRG: clock-reset-controller@1201 {
+   compatible = "hisilicon,hi3519-crg";
+   reg = <0x1201 0x1>;
+   #clock-cells = <1>;
+   #reset-cells = <2>;
+};
+
+Example: consumer nodes
+i2c0: i2c@1211 {
+   compatible = "hisilicon,hi3519-i2c";
+   reg = <0x1211 0x1000>;
+   clocks = <&CRG HI3519_I2C0_RST>;
+   resets = <&CRG 0xe4 0>;
+};
diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig
index 3cd349c..3f537a0 100644
--- a/drivers/clk/hisilicon/Kconfig
+++ b/drivers/clk/hisilicon/Kconfig
@@ -1,3 +1,11 @@
+config COMMON_CLK_HI3519
+   tristate "Hi3519 Clock Driver"
+   depends on ARCH_HISI || COMPILE_TEST
+   select RESET_HISI
+   default ARCH_HISI
+   help
+ Build the clock driver for hi3519.
+
 config COMMON_CLK_HI6220
bool "Hi6220 Clock Driver"
depends on ARCH_HISI || COMPILE_TEST
diff --git a/drivers/clk/hisilicon/Makefile b/drivers/clk/hisilicon/Makefile
index c037753..e169ec7 100644
--- a/drivers/clk/hisilicon/Makefile
+++ b/drivers/clk/hisilicon/Makefile
@@ -7,6 +7,7 @@ obj-y   += clk.o clkgate-separated.o clkdivider-hi6220.o
 obj-$(CONFIG_ARCH_HI3xxx)  += clk-hi3620.o
 obj-$(CONFIG_ARCH_HIP04)   += clk-hip04.o
 obj-$(CONFIG_ARCH_HIX5HD2) += clk-hix5hd2.o
+obj-$(CONFIG_COMMON_CLK_HI3519)+= clk-hi3519.o
 obj-$(CONFIG_COMMON_CLK_HI6220)+= clk-hi6220.o
 obj-$(CONFIG_RESET_HISI)   += reset.o
 obj-$(CONFIG_STUB_CLK_HI6220)  += clk-hi6220-stub.o
diff --git a/drivers/clk/hisilicon/clk-hi3519.c 
b/drivers/clk/hisilicon/clk-hi3519.c
new file mode 100644
index 000..715c730
--- /dev/null
+++ b/drivers/clk/hisilicon/clk-hi3519.c
@@ -0,0 +1,131 @@
+/*
+ * Hi3519 Clock Driver
+ *
+ * Copyright (c) 2015-2016 HiSilicon Technologies Co., Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "clk.h"
+#include "reset.h"
+
+#define HI3519_INNER_CLK_OFFSET64
+#define HI3519_FIXED_24M   65
+#define HI3519_FIXED_50M   66
+#define HI3519_FIXED_75M   67
+#define HI3519_FIXED_125M  68
+#define HI3519_FIXED_150M  69
+#define HI3519_FIXED_200M  70
+#define HI3519_FIXED_250M  71
+#define HI3519_FIXED_300M  72
+#define HI3519_FIXED_400M  73
+#define HI3519_FMC_MUX 74

[PATCH 1/3] reset: hisilicon: add reset controller driver for hisilicon SOCs

2016-04-23 Thread Jiancheng Xue

In most of hisilicon SOCs, reset controller and clock provider are
combined together as a block named CRG (Clock and Reset Generator).
This patch mainly implements the reset function.

Signed-off-by: Jiancheng Xue 
Acked-by: Philipp Zabel 
---
 drivers/clk/hisilicon/Kconfig  |   7 +++
 drivers/clk/hisilicon/Makefile |   1 +
 drivers/clk/hisilicon/reset.c  | 134 +
 drivers/clk/hisilicon/reset.h  |  36 +++
 4 files changed, 178 insertions(+)
 create mode 100644 drivers/clk/hisilicon/reset.c
 create mode 100644 drivers/clk/hisilicon/reset.h

diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig
index e434854..3cd349c 100644
--- a/drivers/clk/hisilicon/Kconfig
+++ b/drivers/clk/hisilicon/Kconfig
@@ -5,6 +5,13 @@ config COMMON_CLK_HI6220
help
  Build the Hisilicon Hi6220 clock driver based on the common clock 
framework.
 
+config RESET_HISI
+   bool "HiSilicon Reset Controller Driver"
+   depends on ARCH_HISI || COMPILE_TEST
+   select RESET_CONTROLLER
+   help
+ Build reset controller driver for HiSilicon device chipsets.
+
 config STUB_CLK_HI6220
bool "Hi6220 Stub Clock Driver"
depends on COMMON_CLK_HI6220 && MAILBOX
diff --git a/drivers/clk/hisilicon/Makefile b/drivers/clk/hisilicon/Makefile
index 74dba31..c037753 100644
--- a/drivers/clk/hisilicon/Makefile
+++ b/drivers/clk/hisilicon/Makefile
@@ -8,4 +8,5 @@ obj-$(CONFIG_ARCH_HI3xxx)   += clk-hi3620.o
 obj-$(CONFIG_ARCH_HIP04)   += clk-hip04.o
 obj-$(CONFIG_ARCH_HIX5HD2) += clk-hix5hd2.o
 obj-$(CONFIG_COMMON_CLK_HI6220)+= clk-hi6220.o
+obj-$(CONFIG_RESET_HISI)   += reset.o
 obj-$(CONFIG_STUB_CLK_HI6220)  += clk-hi6220-stub.o
diff --git a/drivers/clk/hisilicon/reset.c b/drivers/clk/hisilicon/reset.c
new file mode 100644
index 000..6aa49c2
--- /dev/null
+++ b/drivers/clk/hisilicon/reset.c
@@ -0,0 +1,134 @@
+/*
+ * Hisilicon Reset Controller Driver
+ *
+ * Copyright (c) 2015-2016 HiSilicon Technologies Co., Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "reset.h"
+
+#defineHISI_RESET_BIT_MASK 0x1f
+#defineHISI_RESET_OFFSET_SHIFT 8
+#defineHISI_RESET_OFFSET_MASK  0x00
+
+struct hisi_reset_controller {
+   spinlock_t  lock;
+   void __iomem*membase;
+   struct reset_controller_dev rcdev;
+};
+
+
+#define to_hisi_reset_controller(rcdev)  \
+   container_of(rcdev, struct hisi_reset_controller, rcdev)
+
+static int hisi_reset_of_xlate(struct reset_controller_dev *rcdev,
+   const struct of_phandle_args *reset_spec)
+{
+   u32 offset;
+   u8 bit;
+
+   offset = (reset_spec->args[0] << HISI_RESET_OFFSET_SHIFT)
+   & HISI_RESET_OFFSET_MASK;
+   bit = reset_spec->args[1] & HISI_RESET_BIT_MASK;
+
+   return (offset | bit);
+}
+
+static int hisi_reset_assert(struct reset_controller_dev *rcdev,
+ unsigned long id)
+{
+   struct hisi_reset_controller *rstc = to_hisi_reset_controller(rcdev);
+   unsigned long flags;
+   u32 offset, reg;
+   u8 bit;
+
+   offset = (id & HISI_RESET_OFFSET_MASK) >> HISI_RESET_OFFSET_SHIFT;
+   bit = id & HISI_RESET_BIT_MASK;
+
+   spin_lock_irqsave(&rstc->lock, flags);
+
+   reg = readl(rstc->membase + offset);
+   writel(reg | BIT(bit), rstc->membase + offset);
+
+   spin_unlock_irqrestore(&rstc->lock, flags);
+
+   return 0;
+}
+
+static int hisi_reset_deassert(struct reset_controller_dev *rcdev,
+   unsigned long id)
+{
+   struct hisi_reset_controller *rstc = to_hisi_reset_controller(rcdev);
+   unsigned long flags;
+   u32 offset, reg;
+   u8 bit;
+
+   offset = (id & HISI_RESET_OFFSET_MASK) >> HISI_RESET_OFFSET_SHIFT;
+   bit = id & HISI_RESET_BIT_MASK;
+
+   spin_lock_irqsave(&rstc->lock, flags);
+
+   reg = readl(rstc->membase + offset);
+   writel(reg & ~BIT(bit), rstc->membase + offset);
+
+   spin_unlock_irqrestore(&rstc->lock, flags);
+
+   return 0;
+}
+
+static const struct reset_control_ops hisi_reset_ops = {
+   .assert = hisi_reset_assert,
+   .deassert   = hisi_reset_deassert,
+};
+
+struct hisi_reset_controller *hisi_reset_in

[PATCH 0/3] clock: hisilicon: Add CRG driver for hi3519 soc

2016-04-23 Thread Jiancheng Xue

This patch set is mainly used to support CRG driver for hi3519 soc.
It's inherited from the patchset "[RESEND PATCH v10 0/6] ARM: hisi:
Add initial support including clock driver for Hi3519 soc" (see 
https://lkml.org/lkml/2016/3/31/175) and includes the patch "[PATCH
v2] reset: hisilicon: add reset controller driver for hisilicon SOCs"(
see https://lkml.org/lkml/2016/4/21/126). If the reset patch is OK, I
hope it can be merged seperately. Because other upcoming hisilicon CRG
drivers also depend on it. Thank you very much!

change log
v1:
-Added header .
-Removed CLK_IS_ROOT.
-Added some cleanup codes in error case in the probe function. 
-Removed module_exit(hi3519_clk_exit)
The reason is that this clock driver won't be removed during
 the system running actually. Just like some clock drivers use 
 builtin_platform_driver(). 

Jiancheng Xue (3):
  reset: hisilicon: add reset controller driver for hisilicon SOCs
  clk: hisilicon: export some hisilicon APIs to modules
  clk: hisilicon: add CRG driver for hi3519 soc

 .../devicetree/bindings/clock/hi3519-crg.txt   |  46 +++
 drivers/clk/hisilicon/Kconfig  |  15 +++
 drivers/clk/hisilicon/Makefile |   2 +
 drivers/clk/hisilicon/clk-hi3519.c | 131 
 drivers/clk/hisilicon/clk.c|  23 ++--
 drivers/clk/hisilicon/clk.h|  14 +--
 drivers/clk/hisilicon/reset.c  | 134 +
 drivers/clk/hisilicon/reset.h  |  36 ++
 include/dt-bindings/clock/hi3519-clock.h   |  40 ++
 9 files changed, 426 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/clock/hi3519-crg.txt
 create mode 100644 drivers/clk/hisilicon/clk-hi3519.c
 create mode 100644 drivers/clk/hisilicon/reset.c
 create mode 100644 drivers/clk/hisilicon/reset.h
 create mode 100644 include/dt-bindings/clock/hi3519-clock.h

-- 
1.9.1

Re: random(4) changes

2016-04-23 Thread Stephan Mueller

Am Freitag, 22. April 2016, 18:27:48 schrieb Sandy Harris:

Hi Sandy,

> Stephan has recently proposed some extensive changes to this driver,
> and I proposed a quite different set earlier. My set can be found at:
> https://github.com/sandy-harris
> 
> This post tries to find the bits of both proposals that seem clearly
> worth doing and entail neither large implementation problems nor large
> risk of throwing out any babies with the bathwater.
> 
> Unfortunately, nothing here deals with the elephant in the room -- the
> distinctly hard problem of making sure the driver is initialised well
> enough & early enough. That needs a separate post, probably a separate
> thread. I do not find Stepan's solution to this problem plausible and
> my stuff does not claim to deal with it, though it includes some
> things that might help.

Interesting, I thought I solved the issue. But if you think it is not solved, 
let us cover that in a separate thread.
> 
> I really like Stephan's idea of simplifying the interrupt handling,
> replacing the multiple entropy-gathering calls in the current driver
> with one routine called for all interrupts. See section 1.2 of his
> doc. That seems to me a much cleaner design, easier both to analyse
> and to optimise as a fast interrupt handler. I also find Stephan's
> arguments that this will work better on modern  systems -- VMs,
> machines with SSDs, etc. -- quite plausible.
> 
> Note, though, that I am only talking about the actual interrupt
> handling, not the rest of Stephan's input handling code: the parity
> calculation and XORing the resulting single bit into the entropy pool.
> I'd be happier, at least initially, with a patch that only implemented
> a single-source interrupt handler that gave 32 or 64 bits to existing
> input-handling code.
> 
> Stephan: would you want to provide such a patch?

Sure, if this is the will if the council, I will see it done.

> Ted: would you be inclined to accept it?
> 
> I also quite like Stephan's idea of replacing the two output pools
> with a NIST-approved DBRG, mainly because this would probably make
> getting various certifications easier. I also like the idea of using
> crypto lib code for that since it makes both testing & maintenance
> easier. This strikes me, though, as a do-when-convenient sort of
> cleanup task, not at all urgent unless there are specific
> certifications we need soon.
> 
> As for my proposals, I of course think they are full of good ideas,
> but there's only one I think is really important.
> 
> In the current driver -- and I think in Stephan's, though I have not
> looked at his code in any detail, only his paper -- heavy use of
> /dev/urandom or the kernel get_random_bytes() call can deplete the
> entropy available to /dev/random. That can be a serious problem in
> some circumstances, but I think I have a fix.

To quote from my paper:

"""
When the secondary DRBG requests a reseeding from the primary DRBG and
the primary DRBG pulls from the entropy pool, an emergency entropy level
of 512 bits of entropy is left in the entropy pool. This emergency entropy is
provided to serve /dev/random even while /dev/urandom is stressed.
"""

Note, the 512 bits are chosen arbitrarily and can be set at compile time to 
any other value with LRNG_EMERG_POOLSIZE. If needed, we can even make this 
runtime-configurable.
> 
> You have an input pool (I) plus a blocking pool (B) & a non-blocking
> pool (NB). The problem is what to do when NB must produce a lot of
> output but you do not want to deplete I too much. B & NB might be
> replaced by DBRGs and the problem would not change.
> 
> B must be reseeded before very /dev/random output, NB after some
> number of output blocks. I used #define SAFE_OUT 503 but some other
> number might be better depending how NB is implemented & how
> paranoid/conservative one feels.
> 
> B can only produce one full-entropy output, suitable for /dev/random,
> per reseed but B and NB are basically the same design so B can also
> produce SAFE_OUT reasonably good random numbers per reseed. Use those
> to reseed NB.and you reduce the load on I for reseeding NB from
> SAFE_OUT (use I every time NB is reseeded) to SAFE_OUT*SAFE_OUT (use I
> only to reseed B).
> 
> This does need analysis by cryptographers, but at a minimum it is
> basically plausible and, even with some fairly small value for
> SAFE_OUT, it greatly alleviates the problem.


Ciao
Stephan

[PATCH 2/3] clk: hisilicon: export some hisilicon APIs to modules

2016-04-23 Thread Jiancheng Xue

From: Jiancheng Xue 

Change some arguments to constant type.
Export some hisilicon APIs to modules.

Signed-off-by: Jiancheng Xue 
---
 drivers/clk/hisilicon/clk.c | 23 +++
 drivers/clk/hisilicon/clk.h | 14 +++---
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/clk/hisilicon/clk.c b/drivers/clk/hisilicon/clk.c
index 9f8e766..9b15adb 100644
--- a/drivers/clk/hisilicon/clk.c
+++ b/drivers/clk/hisilicon/clk.c
@@ -37,7 +37,7 @@
 
 static DEFINE_SPINLOCK(hisi_clk_lock);
 
-struct hisi_clock_data __init *hisi_clk_init(struct device_node *np,
+struct hisi_clock_data *hisi_clk_init(struct device_node *np,
 int nr_clks)
 {
struct hisi_clock_data *clk_data;
@@ -71,8 +71,9 @@ err_data:
 err:
return NULL;
 }
+EXPORT_SYMBOL_GPL(hisi_clk_init);
 
-void __init hisi_clk_register_fixed_rate(struct hisi_fixed_rate_clock *clks,
+void hisi_clk_register_fixed_rate(const struct hisi_fixed_rate_clock *clks,
 int nums, struct hisi_clock_data *data)
 {
struct clk *clk;
@@ -91,8 +92,9 @@ void __init hisi_clk_register_fixed_rate(struct 
hisi_fixed_rate_clock *clks,
data->clk_data.clks[clks[i].id] = clk;
}
 }
+EXPORT_SYMBOL_GPL(hisi_clk_register_fixed_rate);
 
-void __init hisi_clk_register_fixed_factor(struct hisi_fixed_factor_clock 
*clks,
+void hisi_clk_register_fixed_factor(const struct hisi_fixed_factor_clock *clks,
   int nums,
   struct hisi_clock_data *data)
 {
@@ -112,8 +114,9 @@ void __init hisi_clk_register_fixed_factor(struct 
hisi_fixed_factor_clock *clks,
data->clk_data.clks[clks[i].id] = clk;
}
 }
+EXPORT_SYMBOL_GPL(hisi_clk_register_fixed_factor);
 
-void __init hisi_clk_register_mux(struct hisi_mux_clock *clks,
+void hisi_clk_register_mux(const struct hisi_mux_clock *clks,
  int nums, struct hisi_clock_data *data)
 {
struct clk *clk;
@@ -141,8 +144,9 @@ void __init hisi_clk_register_mux(struct hisi_mux_clock 
*clks,
data->clk_data.clks[clks[i].id] = clk;
}
 }
+EXPORT_SYMBOL_GPL(hisi_clk_register_mux);
 
-void __init hisi_clk_register_divider(struct hisi_divider_clock *clks,
+void hisi_clk_register_divider(const struct hisi_divider_clock *clks,
  int nums, struct hisi_clock_data *data)
 {
struct clk *clk;
@@ -170,8 +174,9 @@ void __init hisi_clk_register_divider(struct 
hisi_divider_clock *clks,
data->clk_data.clks[clks[i].id] = clk;
}
 }
+EXPORT_SYMBOL_GPL(hisi_clk_register_divider);
 
-void __init hisi_clk_register_gate(struct hisi_gate_clock *clks,
+void hisi_clk_register_gate(const struct hisi_gate_clock *clks,
   int nums, struct hisi_clock_data *data)
 {
struct clk *clk;
@@ -198,8 +203,9 @@ void __init hisi_clk_register_gate(struct hisi_gate_clock 
*clks,
data->clk_data.clks[clks[i].id] = clk;
}
 }
+EXPORT_SYMBOL_GPL(hisi_clk_register_gate);
 
-void __init hisi_clk_register_gate_sep(struct hisi_gate_clock *clks,
+void hisi_clk_register_gate_sep(const struct hisi_gate_clock *clks,
   int nums, struct hisi_clock_data *data)
 {
struct clk *clk;
@@ -226,8 +232,9 @@ void __init hisi_clk_register_gate_sep(struct 
hisi_gate_clock *clks,
data->clk_data.clks[clks[i].id] = clk;
}
 }
+EXPORT_SYMBOL_GPL(hisi_clk_register_gate_sep);
 
-void __init hi6220_clk_register_divider(struct hi6220_divider_clock *clks,
+void __init hi6220_clk_register_divider(const struct hi6220_divider_clock 
*clks,
int nums, struct hisi_clock_data *data)
 {
struct clk *clk;
diff --git a/drivers/clk/hisilicon/clk.h b/drivers/clk/hisilicon/clk.h
index b56fbc1..20d64af 100644
--- a/drivers/clk/hisilicon/clk.h
+++ b/drivers/clk/hisilicon/clk.h
@@ -111,18 +111,18 @@ struct clk *hi6220_register_clkdiv(struct device *dev, 
const char *name,
u8 shift, u8 width, u32 mask_bit, spinlock_t *lock);
 
 struct hisi_clock_data *hisi_clk_init(struct device_node *, int);
-void hisi_clk_register_fixed_rate(struct hisi_fixed_rate_clock *,
+void hisi_clk_register_fixed_rate(const struct hisi_fixed_rate_clock *,
int, struct hisi_clock_data *);
-void hisi_clk_register_fixed_factor(struct hisi_fixed_factor_clock *,
+void hisi_clk_register_fixed_factor(const struct hisi_fixed_factor_clock *,
int, struct hisi_clock_data *);
-void hisi_clk_register_mux(struct hisi_mux_clock *, int,
+void hisi_clk_register_mux(const struct hisi_mux_clock *, int,
struct hisi_clock_data *);
-void hisi_clk_register_divider(struct hisi_divider_clock *,
+void hisi_clk_register_divider(const struct his

Re: [PATCH 8/8] writeback: throttle buffered writeback

2016-04-23 Thread xiakaixu

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 40b57bf4852c..d941f69dfb4b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -39,6 +39,7 @@
>  
>  #include "blk.h"
>  #include "blk-mq.h"
> +#include "blk-wb.h"
>  
>  EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap);
> @@ -880,6 +881,7 @@ blk_init_allocated_queue(struct request_queue *q, 
> request_fn_proc *rfn,
>  
>  fail:
>   blk_free_flush_queue(q->fq);
> + blk_wb_exit(q);
>   return NULL;
>  }
>  EXPORT_SYMBOL(blk_init_allocated_queue);
> @@ -1395,6 +1397,7 @@ void blk_requeue_request(struct request_queue *q, 
> struct request *rq)
>   blk_delete_timer(rq);
>   blk_clear_rq_complete(rq);
>   trace_block_rq_requeue(q, rq);
> + blk_wb_requeue(q->rq_wb, rq);
>  
>   if (rq->cmd_flags & REQ_QUEUED)
>   blk_queue_end_tag(q, rq);
> @@ -1485,6 +1488,8 @@ void __blk_put_request(struct request_queue *q, struct 
> request *req)
>   /* this is a bio leak */
>   WARN_ON(req->bio != NULL);
>  
> + blk_wb_done(q->rq_wb, req);
> +
>   /*
>* Request may not have originated from ll_rw_blk. if not,
>* it didn't come out of our reserved rq pools
> @@ -1714,6 +1719,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
> struct bio *bio)
>   int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT;
>   struct request *req;
>   unsigned int request_count = 0;
> + bool wb_acct;
>  
>   /*
>* low level driver can indicate that it wants pages above a
> @@ -1766,6 +1772,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
> struct bio *bio)
>   }
>  
>  get_rq:
> + wb_acct = blk_wb_wait(q->rq_wb, bio, q->queue_lock);
> +
>   /*
>* This sync check and mask will be re-done in init_request_from_bio(),
>* but we need to set it earlier to expose the sync flag to the
> @@ -1781,11 +1789,16 @@ get_rq:
>*/
>   req = get_request(q, rw_flags, bio, GFP_NOIO);
>   if (IS_ERR(req)) {
> + if (wb_acct)
> + __blk_wb_done(q->rq_wb);
>   bio->bi_error = PTR_ERR(req);
>   bio_endio(bio);
>   goto out_unlock;
>   }
>  
> + if (wb_acct)
> + req->cmd_flags |= REQ_BUF_INFLIGHT;
> +
>   /*
>* After dropping the lock and possibly sleeping here, our request
>* may now be mergeable after it had proven unmergeable (above).
> @@ -2515,6 +2528,7 @@ void blk_start_request(struct request *req)
>   blk_dequeue_request(req);
>  
>   req->issue_time = ktime_to_ns(ktime_get());
> + blk_wb_issue(req->q->rq_wb, req);
>  
>   /*
>* We are now handing the request to the hardware, initialize
> @@ -2751,6 +2765,7 @@ void blk_finish_request(struct request *req, int error)
>   blk_unprep_request(req);
>  
>   blk_account_io_done(req);
> + blk_wb_done(req->q->rq_wb, req);

Hi Jens,

Seems the function blk_wb_done() will be executed twice even if the end_io
callback is set.
Maybe the same thing would happen in blk-mq.c.

>  
>   if (req->end_io)
>   req->end_io(req, error);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 71b4a13fbf94..c0c5207fe7fd 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -30,6 +30,7 @@
>  #include "blk-mq.h"
>  #include "blk-mq-tag.h"
>  #include "blk-stat.h"
> +#include "blk-wb.h"
>  
>  static DEFINE_MUTEX(all_q_mutex);
>  static LIST_HEAD(all_q_list);
> @@ -275,6 +276,9 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx 
> *hctx,
>  
>   if (rq->cmd_flags & REQ_MQ_INFLIGHT)
>   atomic_dec(&hctx->nr_active);
> +
> + blk_wb_done(q->rq_wb, rq);
> +
>   rq->cmd_flags = 0;
>  
>   clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> @@ -305,6 +309,7 @@ EXPORT_SYMBOL_GPL(blk_mq_free_request);
>  inline void __blk_mq_end_request(struct request *rq, int error)
>  {
>   blk_account_io_done(rq);
> + blk_wb_done(rq->q->rq_wb, rq);
>  
>   if (rq->end_io) {
>   rq->end_io(rq, error);
> @@ -414,6 +419,7 @@ void blk_mq_start_request(struct request *rq)
>   rq->next_rq->resid_len = blk_rq_bytes(rq->next_rq);
>  
>   rq->issue_time = ktime_to_ns(ktime_get());
> + blk_wb_issue(q->rq_wb, rq);
>  
>   blk_add_timer(rq);
>  
> @@ -450,6 +456,7 @@ static void __blk_mq_requeue_request(struct request *rq)
>   struct request_queue *q = rq->q;
>  
>   trace_block_rq_requeue(q, rq);
> + blk_wb_requeue(q->rq_wb, rq);
>  
>   if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
>   if (q->dma_drain_size && blk_rq_bytes(rq))
> @@ -1265,6 +1272,7 @@ static blk_qc_t blk_mq_make_request(struct 
> request_queue *q, struct bio *bio)
>   struct blk_plug *plug;
>   struct request *same_queue_rq = NULL;
>   blk_qc_t cookie;
> + bool wb_acct;
>  
>   blk_queue_bounce(q, &bio);
>  
> @@ -1282,9 +

Re: [PATCH v12 3/3] printk: make printk.synchronous param rw

2016-04-23 Thread Sergey Senozhatsky

Hello,

On (04/23/16 08:56), Jan Kara wrote:
> > 
> > Signed-off-by: Sergey Senozhatsky 
> 
> The patch looks good to me. One suggestion below:
> 
> > @@ -1785,7 +1782,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  * operate in sync mode once panic() occurred.
> >  */
> > if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH &&
> > -   printk_kthread) {
> > +   !printk_sync && printk_kthread) {
> > /* Offload printing to a schedulable context. */
> > printk_kthread_need_flush_console = true;
> > wake_up_process(printk_kthread);
> 
> It would seem more future-proof to hide '!printk_sync && printk_kthread'
> into a wrapper function as it is somewhat subtle detail that printk_kthread
> needn't exist while !printk_sync and I can imagine someone forgetting to
> check that in the future. Something like 'can_print_async()'? But I don't
> feel too strongly about that so feel free to add:

hm, yes. this is what I eventually do in "yet to be posted"
make-console_unlock()-async patch. I move printing kthread
wakeup-s and those async printing checks out of vprintk_emit()
and wake_up_klogd_work_func() to a special function:

static bool console_unlock_async_flush(void)
{
...
   if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH &&
   !printk_sync && printk_kthread) {
   /* Offload printing to a schedulable context. */
   printk_kthread_need_flush_console = true;
   console_locked = 0;
   up_console_sem();
   wake_up_process(printk_kthread);
   return true;
   }
   return false;
}


so async_printk flags live in one place (which makes it easier
to maintain) and vprintk_emit()/wake_up_klogd_work_func() simply
do:

if (console_trylock())
console_unlock();


console_unlock() is the one who decides if it can do async
printk or a 'direct printing' via console_flush_and_unlock().

void console_unlock(void)
{
   if (console_unlock_async_flush())
   return;
   console_flush_and_unlock();
}


console_flush_and_unlock() is what was previously known
as console_unlock() - emit the messages and call_console_drivers().


I guess I can send out an updated version of 0003 as a reply
to the initial patch and hide '!printk_sync && printk_kthread'.


> Reviewed-by: Jan Kara 
> 
> regardless whether you change this or not.

thanks.

-ss

[PATCH] coresight: etm4x: Add DT implementation.

2016-04-23 Thread lipengcheng

Add DT implementation for A72 board.

Signed-off-by: Li Pengcheng 
Signed-off-by: Li Zhong 
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c 
b/drivers/hwtracing/coresight/coresight-etm4x.c
index 6396b28..462f0dc 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -825,6 +825,11 @@ static struct amba_id etm4_ids[] = {
.mask   = 0x000f,
.data   = "ETM 4.0",
},
+   {   /* ETM 4.0 - A72, Maia, HiSilicon */
+   .id = 0x000bb95a,
+   .mask = 0x000f,
+   .data = "ETM 4.0",
+   },
{ 0, 0},
 };
 
-- 
1.8.3.2

Re: [linux-next PATCH] sched: cgroup: enable interrupt before calling threadgroup_change_begin

2016-04-23 Thread Peter Zijlstra

On Fri, Apr 22, 2016 at 08:56:28PM -0700, Yang Shi wrote:
> When kernel oops happens in some kernel thread, i.e. kcompactd in the test,
> the below bug might be triggered by the oops handler:

What are you trying to fix? You already oopsed the thing is wrecked.

[PATCH] net: tsi108: use NULL for pointer-typed argument

2016-04-23 Thread Julia Lawall

The first argument of pci_free_consistent has type struct pci_dev *, so use
NULL instead of 0.

The semantic patch that performs this transformation is as follows:
(http://coccinelle.lip6.fr/)

// 
@@
@@
pci_free_consistent(
- 0
+ NULL
  , ...)
// 

Signed-off-by: Julia Lawall 

---
 drivers/net/ethernet/tundra/tsi108_eth.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -u -p a/drivers/net/ethernet/tundra/tsi108_eth.c 
b/drivers/net/ethernet/tundra/tsi108_eth.c
--- a/drivers/net/ethernet/tundra/tsi108_eth.c
+++ b/drivers/net/ethernet/tundra/tsi108_eth.c
@@ -1314,7 +1314,8 @@ static int tsi108_open(struct net_device
data->txring = dma_zalloc_coherent(NULL, txring_size, &data->txdma,
   GFP_KERNEL);
if (!data->txring) {
-   pci_free_consistent(0, rxring_size, data->rxring, data->rxdma);
+   pci_free_consistent(NULL, rxring_size, data->rxring,
+   data->rxdma);
return -ENOMEM;
}

[PATCH] of: iommu: make of_iommu_init() postcore_initcall_sync

2016-04-23 Thread Kefeng Wang

The of_iommu_init() is called multiple times by arch code,
make it postcore_initcall_sync, then we can drop relevant
calls fully.

Note, the IOMMUs should have a chance to perform some basic
initialisation before we start adding masters to them. So
postcore_initcall_sync is good choice, it ensures of_iommu_init()
called before of_platform_populate.

Cc: Arnd Bergmann 
Cc: Marek Szyprowski 
Cc: Rich Felker 
Cc: Rob Herring 
Cc: Robin Murphy 
Cc: Will Deacon 
Signed-off-by: Kefeng Wang 
---
 arch/arm/kernel/setup.c | 2 --
 arch/arm64/kernel/setup.c   | 2 --
 arch/sh/boards/of-generic.c | 2 --
 drivers/iommu/of_iommu.c| 5 -
 include/linux/of_iommu.h| 2 --
 5 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 2c4bea3..18a29a0 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -902,7 +901,6 @@ static int __init customize_machine(void)
 * machine from the device tree, if no callback is provided,
 * otherwise we would always need an init_machine callback.
 */
-   of_iommu_init();
if (machine_desc->init_machine)
machine_desc->init_machine();
 #ifdef CONFIG_OF
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 9dc6776..e20b64f 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -39,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -365,7 +364,6 @@ void __init setup_arch(char **cmdline_p)
 static int __init arm64_device_init(void)
 {
if (of_have_populated_dt()) {
-   of_iommu_init();
of_platform_populate(NULL, of_default_bus_match_table,
 NULL, NULL);
} else if (acpi_disabled) {
diff --git a/arch/sh/boards/of-generic.c b/arch/sh/boards/of-generic.c
index bf3a166..b4d4313 100644
--- a/arch/sh/boards/of-generic.c
+++ b/arch/sh/boards/of-generic.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -185,7 +184,6 @@ static int __init sh_of_device_init(void)
 {
pr_info("SH generic board support: populating platform devices\n");
if (of_have_populated_dt()) {
-   of_iommu_init();
of_platform_populate(NULL, of_default_bus_match_table,
 NULL, NULL);
} else {
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5fea665..04cc80f 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -174,7 +174,7 @@ err_put_node:
return NULL;
 }
 
-void __init of_iommu_init(void)
+static int __init of_iommu_init(void)
 {
struct device_node *np;
const struct of_device_id *match, *matches = &__iommu_of_table;
@@ -186,4 +186,7 @@ void __init of_iommu_init(void)
pr_err("Failed to initialise IOMMU %s\n",
of_node_full_name(np));
}
+
+   return 0;
 }
+postcore_initcall_sync(of_iommu_init);
diff --git a/include/linux/of_iommu.h b/include/linux/of_iommu.h
index ffbe470..e2c2e71 100644
--- a/include/linux/of_iommu.h
+++ b/include/linux/of_iommu.h
@@ -11,7 +11,6 @@ extern int of_get_dma_window(struct device_node *dn, const 
char *prefix,
 int index, unsigned long *busno, dma_addr_t *addr,
 size_t *size);
 
-extern void of_iommu_init(void);
 extern struct iommu_ops *of_iommu_configure(struct device *dev,
struct device_node *master_np);
 
@@ -24,7 +23,6 @@ static inline int of_get_dma_window(struct device_node *dn, 
const char *prefix,
return -EINVAL;
 }
 
-static inline void of_iommu_init(void) { }
 static inline struct iommu_ops *of_iommu_configure(struct device *dev,
 struct device_node *master_np)
 {
-- 
2.6.0.GIT

[PATCH v4] dmaengine: tegra-apb: proper default init of channel slave_id

2016-04-23 Thread Shardar Shariff Md

Initialize default channel slave_id(req_sel) to invalid id
(i.e max supported slave id + 1) to avoid overwriting of slave_id
during tegra_dma_slave_config() with client data if slave_id
is not initialized through DT

Signed-off-by: Shardar Shariff Md 

---
Changes from v1:
- Instead of initializing the slave id to -1 define macros for
  max slave id and invalid slave id and do the checks accordingly.

Changes from v2:
- Check slave id boundary before dma channel is allocated to
  avoid channel leakage.
- Calculate the max slave id from the CSR slave id(req sel)
  mask bits.

Changes from v3:
- Remove *MAX_SLAVD_ID macro and instead use *REQ_SEL_MASK
- During tegra_dma_slave_config() check for slave_id boundary
  condition
---
 drivers/dma/tegra20-apb-dma.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index 3871f29..01e316f 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -54,6 +54,7 @@
 #define TEGRA_APBDMA_CSR_ONCE  BIT(27)
 #define TEGRA_APBDMA_CSR_FLOW  BIT(21)
 #define TEGRA_APBDMA_CSR_REQ_SEL_SHIFT 16
+#define TEGRA_APBDMA_CSR_REQ_SEL_MASK  0x1F
 #define TEGRA_APBDMA_CSR_WCOUNT_MASK   0xFFFC
 
 /* STATUS register */
@@ -114,6 +115,8 @@
 /* Channel base address offset from APBDMA base address */
 #define TEGRA_APBDMA_CHANNEL_BASE_ADD_OFFSET   0x1000
 
+#define TEGRA_APBDMA_SLAVE_ID_INVALID  (TEGRA_APBDMA_CSR_REQ_SEL_MASK + 1)
+
 struct tegra_dma;
 
 /*
@@ -353,8 +356,11 @@ static int tegra_dma_slave_config(struct dma_chan *dc,
}
 
memcpy(&tdc->dma_sconfig, sconfig, sizeof(*sconfig));
-   if (!tdc->slave_id)
+   if (tdc->slave_id == TEGRA_APBDMA_SLAVE_ID_INVALID) {
+   if (sconfig->slave_id > TEGRA_APBDMA_CSR_REQ_SEL_MASK)
+   return -EINVAL;
tdc->slave_id = sconfig->slave_id;
+   }
tdc->config_init = true;
return 0;
 }
@@ -1236,7 +1242,7 @@ static void tegra_dma_free_chan_resources(struct dma_chan 
*dc)
}
pm_runtime_put(tdma->dev);
 
-   tdc->slave_id = 0;
+   tdc->slave_id = TEGRA_APBDMA_SLAVE_ID_INVALID;
 }
 
 static struct dma_chan *tegra_dma_of_xlate(struct of_phandle_args *dma_spec,
@@ -1246,6 +1252,11 @@ static struct dma_chan *tegra_dma_of_xlate(struct 
of_phandle_args *dma_spec,
struct dma_chan *chan;
struct tegra_dma_channel *tdc;
 
+   if (dma_spec->args[0] > TEGRA_APBDMA_CSR_REQ_SEL_MASK) {
+   dev_err(tdma->dev, "Invalid slave id: %d\n", dma_spec->args[0]);
+   return NULL;
+   }
+
chan = dma_get_any_slave_channel(&tdma->dma_dev);
if (!chan)
return NULL;
@@ -1389,6 +1400,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
&tdma->dma_dev.channels);
tdc->tdma = tdma;
tdc->id = i;
+   tdc->slave_id = TEGRA_APBDMA_SLAVE_ID_INVALID;
 
tasklet_init(&tdc->tasklet, tegra_dma_tasklet,
(unsigned long)tdc);
-- 
1.8.1.5

Re: [PATCH] mmc: mediatek: fix request blocked by cancel_delayed_work

2016-04-23 Thread Chaotian Jing

Hi,
On Fri, 2016-04-22 at 14:24 +0200, Ulf Hansson wrote:
> On 18 April 2016 at 09:13, Chaotian Jing  wrote:
> > there are 2 points will cause could not call mmc_request_done()
> > and eventually cause the caller thread blocked.
> >
> > A. if card was busy, cancel_delayed_work() will return false because
> > the delay work has not been scheduled, in this case, need put
> > mod_delayed_work() in front of msdc_cmd_is_ready()
> >
> > B. if a request really need more than 5s(Some Sandisk TF card), it will
> > use cancel_delayed_work() to cancel itself, and also return false, so use
> > in_interrupt() to avoid this case
> >
> > Signed-off-by: Chaotian Jing 
> > ---
> >  drivers/mmc/host/mtk-sd.c | 11 ---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c
> > index b17f30d..1511b1b 100644
> > --- a/drivers/mmc/host/mtk-sd.c
> > +++ b/drivers/mmc/host/mtk-sd.c
> > @@ -724,7 +724,7 @@ static void msdc_request_done(struct msdc_host *host, 
> > struct mmc_request *mrq)
> > bool ret;
> >
> > ret = cancel_delayed_work(&host->req_timeout);
> > -   if (!ret) {
> > +   if (!ret && in_interrupt()) {
> > /* delay work already running */
> > return;
> > }
> > @@ -824,7 +824,12 @@ static inline bool msdc_cmd_is_ready(struct msdc_host 
> > *host,
> > }
> >
> > if (mmc_resp_type(cmd) == MMC_RSP_R1B || cmd->data) {
> > -   tmo = jiffies + msecs_to_jiffies(20);
> > +   /*
> > +* 2550ms is from EXT_CSD[248], after switch to hs200,
> > +* using CMD13 to polling card status, it will get response
> > +* of 0x800, but EMMC still pull-low DAT0.
> > +*/
> 
> Seems like you are solving a eMMC specific issue on your driver?
> 
> Perhaps we should try to use a card quirk instead?

Actually, this is a Bug of __mmc_switch(), Per JEDEC Spec, while switch
speed mode, should not use CMD13 to get card status, as it's response
cannot reflect that if card was busy now, for this CMD6 switch HS200
case, I tried some Samsung/Sandisk/KSI eMMC, issue CMD13 will always get
0x800, even eMMC has already changed to transfer state and DAT0 is high,
the response of CMD13 is also 0x800, and will never be 0x900.
So, in __mmc_switch(), it's a bug to use CMD13 to know that if card has
already changed to transfer state.
But, Our host do not support MMC_CAP_WAIT_WHILE_BUSY, that's why we hit
this issue.

May you give some advice for this ?
Thx!
> 
> 
> > +   tmo = jiffies + msecs_to_jiffies(2550);
> > /* R1B or with data, should check SDCBUSY */
> > while ((readl(host->base + SDC_STS) & SDC_STS_SDCBUSY) &&
> > time_before(jiffies, tmo))
> > @@ -847,6 +852,7 @@ static void msdc_start_command(struct msdc_host *host,
> > WARN_ON(host->cmd);
> > host->cmd = cmd;
> >
> > +   mod_delayed_work(system_wq, &host->req_timeout, DAT_TIMEOUT);
> > if (!msdc_cmd_is_ready(host, mrq, cmd))
> > return;
> >
> > @@ -858,7 +864,6 @@ static void msdc_start_command(struct msdc_host *host,
> >
> > cmd->error = 0;
> > rawcmd = msdc_cmd_prepare_raw_cmd(host, mrq, cmd);
> > -   mod_delayed_work(system_wq, &host->req_timeout, DAT_TIMEOUT);
> >
> > sdr_set_bits(host->base + MSDC_INTEN, cmd_ints_mask);
> > writel(cmd->arg, host->base + SDC_ARG);
> > --
> > 1.8.1.1.dirty
> >
> 
> Kind regards
> Uffe

[GIT PULL] move ARM LCD display driver to auxdisplay

2016-04-23 Thread Linus Walleij

Hi ARM SoC guys,

these two patches move the ARM character LCD driver from
misc drivers to the auxdisplay subsystem where it belongs and
updates the defconfig for the RealView accordingly.

Please pull it into some cleanup branch in the ARM SoC
tree.

I tried to get some ACK from the auxdisplay maintainer but no
reaction.

Yours,
Linus Walleij

The following changes since commit bf16200689118d19de1b8d2a3c314fc21f5dc7bb:

  Linux 4.6-rc3 (2016-04-10 17:58:30 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator.git
tags/move-auxdisplay

for you to fetch changes up to 700ce70081d4f5ac7bc4fbfba209072152f96c89:

  ARM: realview: update defconfig to match new subsystem (2016-04-23
11:38:25 +0200)


This moves the ARM RealView auxilary display out of
the misc drivers and into the new auxdisplay subsystem.


Linus Walleij (2):
  auxdisplay: move the ARM LCD driver into auxdisplay
  ARM: realview: update defconfig to match new subsystem

 arch/arm/configs/realview_defconfig|  3 ++-
 drivers/auxdisplay/Kconfig | 10 ++
 drivers/auxdisplay/Makefile|  1 +
 drivers/{misc => auxdisplay}/arm-charlcd.c |  0
 drivers/misc/Kconfig   | 10 --
 drivers/misc/Makefile  |  1 -
 6 files changed, 13 insertions(+), 12 deletions(-)
 rename drivers/{misc => auxdisplay}/arm-charlcd.c (100%)

Re: [PATCH v2] pinctrl: pinctrl-single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs

2016-04-23 Thread Linus Walleij

On Fri, Apr 15, 2016 at 5:22 PM, Tony Lindgren  wrote:
> * Linus Walleij  [160415 02:29]:
>> On Thu, Apr 14, 2016 at 6:59 AM, Keerthy  wrote:
>>
>> > pcs_parse_bits_in_pinctrl_entry uses ffs which gives bit indices
>> > ranging from 1 to MAX. This leads to a corner case where we try to request
>> > the pin number = MAX and fails.
>> >
>> > bit_pos value is being calculted using ffs. pin_num_from_lsb uses
>> > bit_pos value. pins array is populated with:
>> >
>> > pin + pin_num_from_lsb.
>> >
>> > The above is 1 more than usual bit indices as bit_pos uses ffs to compute
>> > first set bit. Hence the last of the pins array is populated with the MAX
>> > value and not MAX - 1 which causes error when we call pin_request.
>> >
>> > mask_pos is rightly calculated as ((pcs->fmask) << (bit_pos - 1))
>> > Consequently val_pos and submask are correct.
>> >
>> > Hence use __ffs which gives (ffs(x) - 1) as the first bit set.
>> >
>> > fixes: 4e7e8017a8 ("pinctrl: pinctrl-single: enhance to configure multiple 
>> > pins of different modules")
>> > Signed-off-by: Keerthy 
>> > ---
>> >
>> > Changes in v2:
>> >
>> >   * Changed pcs->fshift to use __ffs instead of ffs to be consistent.
>> >
>> > Boot tesed on da850-evm and checked the pinctrl sysfs nodes.
>>
>> Patch applied for fixes with Tony's ACK.
>>
>> Should it also be tagged for stable?
>
> Probably a good idea, I can see somebody pulling hair out because
> of this in various product trees.

Ooops sorry I totally missed to add that :(

Please ask Greg to take it as a selected stable patch.

Yours,
Linus Walleij

Re: [PATCH 0/2] Embedding Position Independent Executables

2016-04-23 Thread Afzal Mohammed

Hi,

On Sat, Apr 23, 2016 at 12:49:58AM +0200, Alexandre Belloni wrote:

> I think Heiko clarified it but there are actually multiple platforms
> that will benefit from this infrastructure. I can name at least at91,
> rockchip, sunxi and am335x. On am335x, this has been solved by running
> that code on the cortex M3 instead of doing that from Linux but it
> forces to compile and load a firmware on the cortex M3 so it is not
> available for anything else.

afaik on am335x, it has been solved so far by not yet supporting
suspend-resume in mainline ;)

afaiu, am335x suspend-resume has 2 parts, one run in A8 & other in M3.
Saving & restoring RAM config, putting to self refresh in addition to
wfi invocation is handled by A8 code running in OCMC, while M3 cuts A8
clock & does other PM things that can't be done in A8.

Regards
afzal

Re: [PATCH 1/2] EDAC, altera: remove useless casts

2016-04-23 Thread Borislav Petkov

On Sat, Apr 16, 2016 at 10:13:55PM +0200, Arnd Bergmann wrote:
> The altera EDAC driver refers to its per-device data
> using a cast to '(void *)', which makes the pointer
> non-const, though both the source and destination are
> actually const.
> 
> Removing the annotation makes the reference (almost)
> fit into a single line for improved readability, and
> ensures that it is actually defined as const.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/edac/altera_edac.c | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)

Both applied, thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

[GIT PULL] TTY/Serial fixes for 4.6-rc5

2016-04-23 Thread Greg KH

The following changes since commit c3b46c73264b03000d1e18b22f5caf63332547c9:

  Linux 4.6-rc4 (2016-04-17 19:13:32 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git/ tags/tty-4.6-rc5

for you to fetch changes up to f077b73682910bc7dc9439e50e7b1ad97f28f3f1:

  Revert "serial: 8250: Add hardware dependency to RT288X option" (2016-04-19 
15:17:37 +0900)


Serial fixes for 4.6-rc6

Here are 3 serial driver fixes for issues that have been reported.  Two
are reverts, fixing problems that were in the big TTY/Serial driver
merge in 4.6-rc1, and the last one is a simple bugfix for a regression
that showed up in 4.6-rc1 as well.

All have been in linux-next with no reported issues.

Signed-off-by: Greg Kroah-Hartman 


Greg Kroah-Hartman (1):
  Revert "serial: 8250: Add hardware dependency to RT288X option"

Sudip Mukherjee (1):
  Revert "serial-uartlite: Constify uartlite_be/uartlite_le"

Yegor Yefremov (1):
  tty/serial/8250: fix RS485 half-duplex RX

 drivers/tty/serial/8250/8250_port.c | 11 ++-
 drivers/tty/serial/8250/Kconfig |  1 -
 drivers/tty/serial/uartlite.c   |  8 
 3 files changed, 14 insertions(+), 6 deletions(-)

[GIT PULL] USB driver fixes for 4.6-rc5

2016-04-23 Thread Greg KH

The following changes since commit c3b46c73264b03000d1e18b22f5caf63332547c9:

  Linux 4.6-rc4 (2016-04-17 19:13:32 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ tags/usb-4.6-rc5

for you to fetch changes up to d40d334743776937132d3a0c5a203d245407e5a7:

  Merge tag 'phy-for-4.6-rc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-linus 
(2016-04-22 17:13:24 +0900)


USB / PHY driver fixes for 4.6-rc5

Here are two small sets of patches, both from subsystem trees, USB
gadget and PHY drivers.

Full details are in the shortlog, and they have all been in linux-next
for a while (before I merged them to the USB tree.)

Signed-off-by: Greg Kroah-Hartman 


Du, Changbin (1):
  usb: dwc3: fix memory leak of dwc->regset

Felipe Balbi (2):
  usb: dwc3: omap: fix up error path on probe()
  usb: dwc3: core: fix PHY handling during suspend

Greg Kroah-Hartman (2):
  Merge tag 'fixes-for-v4.6-rc5' of git://git.kernel.org/.../balbi/usb into 
usb-linus
  Merge tag 'phy-for-4.6-rc' of git://git.kernel.org/.../kishon/linux-phy 
into usb-linus

Heiko Stuebner (3):
  phy: rockchip-dp: should be a child device of the GRF
  phy: rockchip-emmc: should be a child device of the GRF
  phy: rockchip-emmc: adapt binding to specifiy register offset and length

John Youn (1):
  usb: gadget: composite: Clear reserved fields of SSP Dev Cap

Lars-Peter Clausen (1):
  usb: gadget: f_fs: Fix use-after-free

Roger Quadros (1):
  usb: dwc3: gadget: Fix suspend/resume during device mode

 .../devicetree/bindings/phy/rockchip-dp-phy.txt| 18 ++---
 .../devicetree/bindings/phy/rockchip-emmc-phy.txt  | 22 +
 drivers/phy/phy-rockchip-dp.c  |  7 +--
 drivers/phy/phy-rockchip-emmc.c|  5 -
 drivers/usb/dwc3/core.c| 23 +-
 drivers/usb/dwc3/debugfs.c | 13 +++-
 drivers/usb/dwc3/dwc3-omap.c   | 12 ---
 drivers/usb/dwc3/gadget.c  |  6 ++
 drivers/usb/gadget/composite.c |  2 ++
 drivers/usb/gadget/function/f_fs.c |  5 ++---
 10 files changed, 78 insertions(+), 35 deletions(-)

[PATCH] console: Add persistent scrollback buffers for all VGA consoles

2016-04-23 Thread Manuel Schölling

Add a scrollback buffers for each VGA console. The benefit is that
the scrollback history is not flushed when switching between consoles
but is persistent.
The buffers are allocated on demand when a new console is opened.

This breaks tools like clear_console that rely on flushing the
scrollback history by switching back and forth between consoles
which is why this feature is disabled by default.
Use the escape sequence \e[3J instead for flushing the buffer.

Signed-off-by: Manuel Schölling 
---
 drivers/video/console/Kconfig  |  23 +-
 drivers/video/console/vgacon.c | 172 +++--
 2 files changed, 134 insertions(+), 61 deletions(-)

diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
index 38da6e2..67e52f0 100644
--- a/drivers/video/console/Kconfig
+++ b/drivers/video/console/Kconfig
@@ -43,9 +43,26 @@ config VGACON_SOFT_SCROLLBACK_SIZE
range 1 1024
default "64"
help
- Enter the amount of System RAM to allocate for the scrollback
-buffer.  Each 64KB will give you approximately 16 80x25
-screenfuls of scrollback buffer
+ Enter the amount of System RAM to allocate for scrollback
+ buffers of VGA consoles. Each 64KB will give you approximately
+ 16 80x25 screenfuls of scrollback buffer.
+
+config VGACON_SOFT_SCROLLBACK_FOR_EACH_CONSOLE
+   bool "Persistent Scrollback History for each console"
+   depends on VGACON_SOFT_SCROLLBACK
+   default n
+   help
+ Say Y here if for each VGA console a scrollback buffer should
+ be allocated. The scrollback history will persist when switching
+ between consoles. If you say N here, scrollback is only supported
+ for the active VGA console and scrollback history will be flushed
+ when switching between consoles.
+
+ This breaks legacy versions of tools like clear_console which
+ might cause security issues.
+ Use the escape sequence \e[3J instead if this feature is activated.
+
+ If you use a RAM-constrained system, say N here.
 
 config MDA_CONSOLE
depends on !M68K && !PARISC && ISA
diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c
index 517f565..6c0b9ba 100644
--- a/drivers/video/console/vgacon.c
+++ b/drivers/video/console/vgacon.c
@@ -1,5 +1,5 @@
 /*
- *  linux/drivers/video/vgacon.c -- Low level VGA based console driver
+ *  linux/drivers/video/console/vgacon.c -- Low level VGA based console driver
  *
  * Created 28 Sep 1997 by Geert Uytterhoeven
  *
@@ -106,12 +106,12 @@ static unsigned char  vga_hardscroll_enabled  
__read_mostly;
 static unsigned char   vga_hardscroll_user_enable __read_mostly = 1;
 static unsigned char   vga_font_is_default = 1;
 static int vga_vesa_blanked;
-static int vga_palette_blanked;
-static int vga_is_gfx;
-static int vga_512_chars;
-static int vga_video_font_height;
-static int vga_scan_lines  __read_mostly;
-static unsigned intvga_rolled_over;
+static int vga_palette_blanked;
+static int vga_is_gfx;
+static int vga_512_chars;
+static int vga_video_font_height;
+static int vga_scan_lines  __read_mostly;
+static unsigned intvga_rolled_over;
 
 static int vgacon_text_mode_force;
 
@@ -182,70 +182,125 @@ static inline void vga_set_mem_top(struct vc_data *c)
 
 #ifdef CONFIG_VGACON_SOFT_SCROLLBACK
 /* software scrollback */
-static void *vgacon_scrollback;
-static int vgacon_scrollback_tail;
-static int vgacon_scrollback_size;
-static int vgacon_scrollback_rows;
-static int vgacon_scrollback_cnt;
-static int vgacon_scrollback_cur;
-static int vgacon_scrollback_save;
-static int vgacon_scrollback_restore;
-
-static void vgacon_scrollback_init(int pitch)
+struct vgacon_scrollback_info {
+   void *data;
+   int tail;
+   int size;
+   int rows;
+   int cnt;
+   int cur;
+   int save;
+   int restore;
+};
+static struct vgacon_scrollback_info *vgacon_scrollback_cur;
+#ifdef CONFIG_VGACON_SOFT_SCROLLBACK_FOR_EACH_CONSOLE
+static struct vgacon_scrollback_info vgacon_scrollbacks[MAX_NR_CONSOLES];
+#else
+static struct vgacon_scrollback_info vgacon_scrollbacks[1];
+#endif
+
+static void vgacon_scrollback_reset(size_t reset_size)
 {
-   int rows = CONFIG_VGACON_SOFT_SCROLLBACK_SIZE * 1024/pitch;
-
-   if (vgacon_scrollback) {
-   vgacon_scrollback_cnt  = 0;
-   vgacon_scrollback_tail = 0;
-   vgacon_scrollback_cur  = 0;
-   vgacon_scrollback_rows = rows - 1;
-   vgacon_scrollback_size = rows * pitch;
+   if (vgacon_scrollback_cur->data && reset_size > 0)
+   memset(vgacon_scrollback_cur->data, 0, reset_size);
+
+   vgacon_scrollback_cur->cnt  = 0;
+   vgacon_scrollback_cur->tail = 0;
+   vgacon_scrollback_cur->c

[PATCH v5 02/21] devicetree: bindings: IB: Add binding document for HiSilicon RoCE

2016-04-23 Thread Lijun Ou

This patch adds related DTS binding document for HiSilicon RoCE driver.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 .../bindings/infiniband/hisilicon-hns-roce.txt | 107 +
 1 file changed, 107 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt

diff --git 
a/Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt 
b/Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
new file mode 100644
index 000..5180fef
--- /dev/null
+++ b/Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
@@ -0,0 +1,107 @@
+HiSilicon RoCE DT description
+
+HiSilicon RoCE engine is a part of network subsystem.
+It works depending on other part of network wubsytem, such as, gmac and
+dsa fabric.
+
+Additional properties are described here:
+
+Required properties:
+- compatible: Should contain "hisilicon,hns-roce-v1".
+- reg: Physical base address of the roce driver and
+length of memory mapped region.
+- eth-handle: phandle, specifies a reference to a node
+representing a ethernet device.
+- dsaf-handle: phandle, specifies a reference to a node
+representing a dsaf device.
+- #address-cells: must be 2
+- #size-cells: must be 2
+Optional properties:
+- dma-coherent: Present if DMA operations are coherent.
+- interrupt-parent: the interrupt parent of this device.
+- interrupts: should contain 32 completion event irq,1 async event irq
+and 1 event overflow irq.
+- interrupt-names:should be one of 34 irqs for roce device
+  - roce_ce0_irq ~ roce_ce31_irq: 32 complete event irq
+  - roce_ae_irq: 1 async event irq
+  - roce_common_irq: named common exception warning irq
+Example:
+   infiniband@c400 {
+   compatible = "hisilicon,hns-roce-v1";
+   reg = <0x0 0xc400 0x0 0x10>;
+   dma-coherent;
+   eth-handle = <ð2 ð3 ð4 ð5 ð6 ð7>;
+   dsaf-handle = <&soc0_dsa>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+   interrupt-parent = <&mbigen_dsa>;
+   interrupts = <722 1>,
+   <723 1>,
+   <724 1>,
+   <725 1>,
+   <726 1>,
+   <727 1>,
+   <728 1>,
+   <729 1>,
+   <730 1>,
+   <731 1>,
+   <732 1>,
+   <733 1>,
+   <734 1>,
+   <735 1>,
+   <736 1>,
+   <737 1>,
+   <738 1>,
+   <739 1>,
+   <740 1>,
+   <741 1>,
+   <742 1>,
+   <743 1>,
+   <744 1>,
+   <745 1>,
+   <746 1>,
+   <747 1>,
+   <748 1>,
+   <749 1>,
+   <750 1>,
+   <751 1>,
+   <752 1>,
+   <753 1>,
+   <785 1>,
+   <754 4>;
+
+   interrupt-names = "roce_ce0_irq",
+   "roce_ce1_irq",
+   "roce_ce2_irq",
+   "roce_ce3_irq",
+   "roce_ce4_irq",
+   "roce_ce5_irq",
+   "roce_ce6_irq",
+   "roce_ce7_irq",
+   "roce_ce8_irq",
+   "roce_ce9_irq",
+   "roce_ce10_irq",
+   "roce_ce11_irq",
+   "roce_ce12_irq",
+   "roce_ce13_irq",
+   "roce_ce14_irq",
+   "roce_ce15_irq",
+   "roce_ce16_irq",
+   "roce_ce17_irq",
+   "roce_ce18_irq",
+   "roce_ce19_irq",
+   "roc

[PATCH v5 12/21] IB/hns: Set mtu and gid support

2016-04-23 Thread Lijun Ou

This patch mainly set mtu and gid resource. These resource
will be used to set up network transmission in nodes.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_common.h |  16 
 drivers/infiniband/hw/hns/hns_roce_device.h |  14 
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  64 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   1 +
 drivers/infiniband/hw/hns/hns_roce_main.c   | 123 
 5 files changed, 218 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index fbe6a68..9283f05 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -133,6 +133,14 @@
 
 #define ROCEE_BT_CMD_H_ROCEE_BT_CMD_HW_SYNS_S 31
 
+#define ROCEE_SMAC_H_ROCEE_SMAC_H_S 0
+#define ROCEE_SMAC_H_ROCEE_SMAC_H_M   \
+   (((1UL << 16) - 1) << ROCEE_SMAC_H_ROCEE_SMAC_H_S)
+
+#define ROCEE_SMAC_H_ROCEE_PORT_MTU_S 16
+#define ROCEE_SMAC_H_ROCEE_PORT_MTU_M   \
+   (((1UL << 4) - 1) << ROCEE_SMAC_H_ROCEE_PORT_MTU_S)
+
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S 0
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_M   \
(((1UL << 2) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S)
@@ -173,8 +181,16 @@
 #define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
 #define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
 
+#define ROCEE_PORT_GID_L_0_REG 0x50
+#define ROCEE_PORT_GID_ML_0_REG0x54
+#define ROCEE_PORT_GID_MH_0_REG0x58
+#define ROCEE_PORT_GID_H_0_REG 0x5C
+
 #define ROCEE_BT_CMD_H_REG 0x204
 
+#define ROCEE_SMAC_L_0_REG 0x240
+#define ROCEE_SMAC_H_0_REG 0x244
+
 #define ROCEE_CAEP_AEQE_CONS_IDX_REG   0x3AC
 #define ROCEE_CAEP_CEQC_CONS_IDX_0_REG 0x3BC
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3719c557..a4d4d4c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -21,6 +21,8 @@
 
 #define DRV_NAME "hns_roce"
 
+#define MAC_ADDR_OCTET_NUM 6
+
 #define HNS_ROCE_BA_SIZE   (32 * 4096)
 
 #define HNS_ROCE_MAX_IRQ_NUM   34
@@ -31,6 +33,9 @@
 #define HNS_ROCE_AEQE_VEC_NUM  1
 #define HNS_ROCE_AEQE_OF_VEC_NUM   1
 
+#define HNS_ROCE_MAX_PORTS 6
+#define HNS_ROCE_MAX_GID_NUM   16
+
 #define ADDR_SHIFT_12  12
 #define ADDR_SHIFT_32  32
 #define ADDR_SHIFT_44  44
@@ -237,6 +242,8 @@ struct hns_roce_qp {
 
 struct hns_roce_ib_iboe {
struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
+   /* 16 GID is shared by 6 port in v1 engine. */
+   union ib_gidgid_table[HNS_ROCE_MAX_GID_NUM];
u8  phy_port[HNS_ROCE_MAX_PORTS];
 };
 
@@ -311,6 +318,11 @@ struct hns_roce_hw {
void (*hw_profile)(struct hns_roce_dev *hr_dev);
int (*hw_init)(struct hns_roce_dev *hr_dev);
void (*hw_uninit)(struct hns_roce_dev *hr_dev);
+   void (*set_gid)(struct hns_roce_dev *hr_dev, u8 port, int gid_index,
+   union ib_gid *gid);
+   void (*set_mac)(struct hns_roce_dev *hr_dev, u8 phy_port, u8 *addr);
+   void (*set_mtu)(struct hns_roce_dev *hr_dev, u8 phy_port,
+   enum ib_mtu mtu);
void*priv;
 };
 
@@ -328,6 +340,7 @@ struct hns_roce_dev {
struct hns_roce_capscaps;
struct radix_tree_root  qp_table_tree;
 
+   unsigned char   dev_addr[HNS_ROCE_MAX_PORTS][MAC_ADDR_OCTET_NUM];
u64 fw_ver;
u64 sys_image_guid;
u32 vendor_id;
@@ -397,6 +410,7 @@ void hns_roce_bitmap_free_range(struct hns_roce_bitmap 
*bitmap, u32 obj,
 void hns_roce_cq_completion(struct hns_roce_dev *hr_dev, u32 cqn);
 void hns_roce_cq_event(struct hns_roce_dev *hr_dev, u32 cqn, int event_type);
 void hns_roce_qp_event(struct hns_roce_dev *hr_dev, u32 qpn, int event_type);
+int hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, int gid_index);
 
 extern struct hns_roce_hw hns_roce_hw_v1;
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 26b0d70..b9d396b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -601,9 +601,73 @@ void hns_roce_v1_uninit(struct hns_roce_dev *hr_dev)
hns_roce_db_free(hr_dev);
 }
 
+void hns_roce_v1_set_gid(struct hns_roce_dev *hr_dev, u8 port, int gid_index,
+union ib_gid *gid)
+{
+   u32 *p = NULL;
+   u8 gid_idx = 0;
+
+   gid_idx = hns_get_gid_index(hr_dev, port, gid_index);
+
+

[PATCH v5 09/21] IB/hns: Add hca support

2016-04-23 Thread Lijun Ou

This patch mainly setup hca for RoCE. it will do a series of
initial works as follows:
  1. init uar table, allocate uar resource
  2. init pd table
  3. init cq table
  4. init mr table
  5. init qp table

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_alloc.c  | 104 
 drivers/infiniband/hw/hns/hns_roce_cq.c |  25 
 drivers/infiniband/hw/hns/hns_roce_device.h |  69 ++
 drivers/infiniband/hw/hns/hns_roce_eq.c |   1 -
 drivers/infiniband/hw/hns/hns_roce_icm.c|  88 +
 drivers/infiniband/hw/hns/hns_roce_icm.h|   9 ++
 drivers/infiniband/hw/hns/hns_roce_main.c   |  79 
 drivers/infiniband/hw/hns/hns_roce_mr.c | 187 
 drivers/infiniband/hw/hns/hns_roce_pd.c |  65 ++
 drivers/infiniband/hw/hns/hns_roce_qp.c |  30 +
 10 files changed, 656 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_alloc.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_mr.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_pd.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c 
b/drivers/infiniband/hw/hns/hns_roce_alloc.c
new file mode 100644
index 000..0c76f1b
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -0,0 +1,104 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_device.h"
+
+int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap, u32 *obj)
+{
+   int ret = 0;
+
+   spin_lock(&bitmap->lock);
+   *obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last);
+   if (*obj >= bitmap->max) {
+   bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
+  & bitmap->mask;
+   *obj = find_first_zero_bit(bitmap->table, bitmap->max);
+   }
+
+   if (*obj < bitmap->max) {
+   set_bit(*obj, bitmap->table);
+   bitmap->last = (*obj + 1);
+   if (bitmap->last == bitmap->max)
+   bitmap->last = 0;
+   *obj |= bitmap->top;
+   } else {
+   ret = -1;
+   }
+
+   spin_unlock(&bitmap->lock);
+
+   return ret;
+}
+
+void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, u32 obj)
+{
+   hns_roce_bitmap_free_range(bitmap, obj, 1);
+}
+
+void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap, u32 obj,
+   int cnt)
+{
+   int i;
+
+   obj &= bitmap->max + bitmap->reserved_top - 1;
+
+   spin_lock(&bitmap->lock);
+   for (i = 0; i < cnt; i++)
+   clear_bit(obj + i, bitmap->table);
+
+   bitmap->last = min(bitmap->last, obj);
+   bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
+  & bitmap->mask;
+   spin_unlock(&bitmap->lock);
+}
+
+int hns_roce_bitmap_init(struct hns_roce_bitmap *bitmap, u32 num, u32 mask,
+u32 reserved_bot, u32 reserved_top)
+{
+   u32 i;
+
+   if (num != roundup_pow_of_two(num))
+   return -EINVAL;
+
+   bitmap->last = 0;
+   bitmap->top = 0;
+   bitmap->max = num - reserved_top;
+   bitmap->mask = mask;
+   bitmap->reserved_top = reserved_top;
+   spin_lock_init(&bitmap->lock);
+   bitmap->table = kcalloc(BITS_TO_LONGS(bitmap->max), sizeof(long),
+   GFP_KERNEL);
+   if (!bitmap->table)
+   return -ENOMEM;
+
+   for (i = 0; i < reserved_bot; ++i)
+   set_bit(i, bitmap->table);
+
+   return 0;
+}
+
+void hns_roce_bitmap_cleanup(struct hns_roce_bitmap *bitmap)
+{
+   kfree(bitmap->table);
+}
+
+void hns_roce_cleanup_bitmap(struct hns_roce_dev *hr_dev)
+{
+   hns_roce_cleanup_qp_table(hr_dev);
+   hns_roce_cleanup_cq_table(hr_dev);
+   hns_roce_cleanup_mr_table(hr_dev);
+   hns_roce_cleanup_pd_table(hr_dev);
+   hns_roce_cleanup_uar_table(hr_dev);
+}
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c 
b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 1dc8635..f7baf82 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -52,3 +52,28 @@ void hns_roce_cq_event(struct hns_roce_dev *hr_dev, u32 cqn, 
int event_type)
if (atomic_dec_and_test(&cq->refcount))
complete(&cq->free);
 }
+
+int hns_roce_init_cq_table(struct hns_roce_dev *hr_dev)
+{
+   struct hns_roce_cq_table *cq_table = &hr_dev->cq_table;
+   struct device *dev = &hr_dev->pdev->dev;
+   int ret;
+
+   spin_lock_init(&cq_table->lock);
+

[PATCH v5 18/21] IB/hns: Add CQ operation implemention support

2016-04-23 Thread Lijun Ou

This patch was implemention for Completion Queue(CQ) operations.
A CQ can be used to multiplex work completions from multiple work
queues across queue pairs on the same HCA. CQ as the notification
mechanism for Work Request completions.
CQ operations implemention as follows:
1. create CQ. CQ are created through the Channel Interface,
   The maximum number of Completion Queue Entries (CQEs) that
   may be outstanding on a CQ must be specified when the CQ
   is created.
2. destroy CQ. Destroys the specified CQ. Resources allocated
   by the Channel Interface to implement the CQ must be
   deallocated during the destroy operation.
3. request completion notification. Requests the CQ event handler
   be called when the next completion entry of the specified type
   is added to the specified CQ.
4. poll CQ. Polls the specified CQ for a Work Completion.
   A Work Completion indicates that a Work Request for a Work Queue
   associated with the CQ is done.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_cq.c | 356 
 drivers/infiniband/hw/hns/hns_roce_device.h |  34 ++-
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 340 ++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 117 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |  10 +
 drivers/infiniband/hw/hns/hns_roce_user.h   |   4 +
 6 files changed, 860 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c 
b/drivers/infiniband/hw/hns/hns_roce_cq.c
index f7baf82..16eaa8d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -11,6 +11,362 @@
 #include 
 #include 
 #include "hns_roce_device.h"
+#include "hns_roce_cmd.h"
+#include "hns_roce_icm.h"
+#include "hns_roce_user.h"
+#include "hns_roce_common.h"
+
+static void hns_roce_ib_cq_comp(struct hns_roce_cq *hr_cq)
+{
+   struct ib_cq *ibcq = &hr_cq->ib_cq;
+
+   ibcq->comp_handler(ibcq, ibcq->cq_context);
+}
+
+static void hns_roce_ib_cq_event(struct hns_roce_cq *hr_cq,
+enum hns_roce_event event_type)
+{
+   struct hns_roce_dev *hr_dev;
+   struct ib_event event;
+   struct ib_cq *ibcq;
+
+   ibcq = &hr_cq->ib_cq;
+   hr_dev = to_hr_dev(ibcq->device);
+
+   if (event_type != HNS_ROCE_EVENT_TYPE_CQ_ID_INVALID &&
+   event_type != HNS_ROCE_EVENT_TYPE_CQ_ACCESS_ERROR &&
+   event_type != HNS_ROCE_EVENT_TYPE_CQ_OVERFLOW) {
+   dev_err(&hr_dev->pdev->dev,
+   "hns_roce_ib: Unexpected event type 0x%x on CQ %06x\n",
+   event_type, hr_cq->cqn);
+   return;
+   }
+
+   if (ibcq->event_handler) {
+   event.device = ibcq->device;
+   event.event = IB_EVENT_CQ_ERR;
+   event.element.cq = ibcq;
+   ibcq->event_handler(&event, ibcq->cq_context);
+   }
+}
+
+static int hns_roce_sw2hw_cq(struct hns_roce_dev *dev,
+struct hns_roce_cmd_mailbox *mailbox, int cq_num)
+{
+   return hns_roce_cmd(dev, mailbox->dma, cq_num, 0,
+   HNS_ROCE_CMD_SW2HW_CQ, HNS_ROCE_CMD_TIME_CLASS_A);
+}
+
+static int hns_roce_cq_alloc(struct hns_roce_dev *hr_dev, int nent,
+struct hns_roce_mtt *hr_mtt,
+struct hns_roce_uar *hr_uar,
+struct hns_roce_cq *hr_cq, int vector,
+int collapsed)
+{
+   struct hns_roce_cmd_mailbox *mailbox = NULL;
+   struct hns_roce_cq_table *cq_table = NULL;
+   struct device *dev = &hr_dev->pdev->dev;
+   dma_addr_t dma_handle;
+   u64 *mtts = NULL;
+   int ret = 0;
+
+   cq_table = &hr_dev->cq_table;
+
+   /* Get the physical address of cq buf */
+   mtts = hns_roce_table_find(&hr_dev->mr_table.mtt_table,
+  hr_mtt->first_seg, &dma_handle);
+   if (!mtts) {
+   dev_err(dev, "CQ alloc.Failed to find cq buf addr.\n");
+   return -EINVAL;
+   }
+
+   if (vector >= hr_dev->caps.num_comp_vectors) {
+   dev_err(dev, "CQ alloc.Invalid vector.\n");
+   return -EINVAL;
+   }
+   hr_cq->vector = vector;
+
+   ret = hns_roce_bitmap_alloc(&cq_table->bitmap, &hr_cq->cqn);
+   if (ret == -1) {
+   dev_err(dev, "CQ alloc.Failed to alloc index.\n");
+   return -ENOMEM;
+   }
+
+   /* Get CQC memory icm table */
+   ret = hns_roce_table_get(hr_dev, &cq_table->table, hr_cq->cqn);
+   if (ret) {
+   dev_err(dev, "CQ alloc.Failed to get context mem.\n");
+   goto err_out;
+   }
+
+   /* The cq insert radix tree */
+   spin_lock_irq(&cq_table->lock);
+   /* Radix_tree: The associated pointer and long integer key value like */
+

[PATCH v5 20/21] IB/hns: Kconfig and Makefile for RoCE module

2016-04-23 Thread Lijun Ou

This patch add Kconfig and Makefile for building RoCE module.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/Kconfig |  1 +
 drivers/infiniband/hw/Makefile |  1 +
 drivers/infiniband/hw/hns/Kconfig  | 10 ++
 drivers/infiniband/hw/hns/Makefile |  9 +
 4 files changed, 21 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/Kconfig
 create mode 100644 drivers/infiniband/hw/hns/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 6425c0e..726a4ca 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -74,6 +74,7 @@ source "drivers/infiniband/hw/mlx5/Kconfig"
 source "drivers/infiniband/hw/nes/Kconfig"
 source "drivers/infiniband/hw/ocrdma/Kconfig"
 source "drivers/infiniband/hw/usnic/Kconfig"
+source "drivers/infiniband/hw/hns/Kconfig"
 
 source "drivers/infiniband/ulp/ipoib/Kconfig"
 
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index c7ad0a4..223eb78 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_MLX5_INFINIBAND)   += mlx5/
 obj-$(CONFIG_INFINIBAND_NES)   += nes/
 obj-$(CONFIG_INFINIBAND_OCRDMA)+= ocrdma/
 obj-$(CONFIG_INFINIBAND_USNIC) += usnic/
+obj-$(CONFIG_INFINIBAND_HISILICON_HNS) += hns/
diff --git a/drivers/infiniband/hw/hns/Kconfig 
b/drivers/infiniband/hw/hns/Kconfig
new file mode 100644
index 000..c47c168
--- /dev/null
+++ b/drivers/infiniband/hw/hns/Kconfig
@@ -0,0 +1,10 @@
+config INFINIBAND_HISILICON_HNS
+   tristate "Hisilicon Hns ROCE Driver"
+   depends on NET_VENDOR_HISILICON
+   depends on ARM64 && HNS && HNS_DSAF && HNS_ENET
+   ---help---
+ This is a ROCE/RDMA driver for the Hisilicon RoCE engine. The engine
+ is used in Hisilicon Hi1610 and more further ICT SoC.
+
+ To compile this driver as a module, choose M here: the module
+ will be called hns-roce.
diff --git a/drivers/infiniband/hw/hns/Makefile 
b/drivers/infiniband/hw/hns/Makefile
new file mode 100644
index 000..404a700
--- /dev/null
+++ b/drivers/infiniband/hw/hns/Makefile
@@ -0,0 +1,9 @@
+#
+# Makefile for the HISILICON RoCE drivers.
+#
+
+obj-$(CONFIG_INFINIBAND_HISILICON_HNS) += hns-roce.o
+hns-roce-objs := hns_roce_main.o hns_roce_cmd.o hns_roce_eq.o hns_roce_pd.o \
+   hns_roce_ah.o hns_roce_icm.o hns_roce_mr.o hns_roce_qp.o \
+   hns_roce_cq.o hns_roce_alloc.o hns_roce_hw_v1.o
+
-- 
1.9.1

[PATCH v5 21/21] MAINTAINERS: Add maintainers for HiSilicon RoCE driver

2016-04-23 Thread Lijun Ou

This patch added maintainers for RoCE driver.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 61a323a..cb45b6f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10012,6 +10012,14 @@ W: http://www.emulex.com
 S: Supported
 F: drivers/infiniband/hw/ocrdma/
 
+HISILICON ROCE DRIVER
+M: Wei Hu(Xavier) 
+M: Lijun Ou 
+L: linux-r...@vger.kernel.org
+S: Maintained
+F: drivers/infiniband/hw/hns/
+F: Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
+
 SFC NETWORK DRIVER
 M: Solarflare linux maintainers 
 M: Shradha Shah 
-- 
1.9.1

[PATCH v5 07/21] IB/hns: Add event queue support

2016-04-23 Thread Lijun Ou

This patch added event queue support for RoCE driver. it is used
for RoCE interrupt. RoCE includes 32 synchronous event irqs, 1
asynchronous event irq and 1 common overflow irq.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c|  22 +
 drivers/infiniband/hw/hns/hns_roce_common.h |  73 ++-
 drivers/infiniband/hw/hns/hns_roce_cq.c |  54 ++
 drivers/infiniband/hw/hns/hns_roce_device.h | 138 +
 drivers/infiniband/hw/hns/hns_roce_eq.c | 758 
 drivers/infiniband/hw/hns/hns_roce_eq.h |  95 
 drivers/infiniband/hw/hns/hns_roce_main.c   |  24 +
 drivers/infiniband/hw/hns/hns_roce_qp.c |  39 ++
 8 files changed, 1202 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cq.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_eq.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_eq.h
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_qp.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c 
b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 9a274de..b55c200 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -18,6 +18,14 @@
 
 #define CMD_MAX_NUM32
 
+static int hns_roce_status_to_errno(u8 orig_status)
+{
+   if (orig_status == HNS_ROCE_CMD_SUCCESS)
+   return 0;
+   else
+   return -EIO;
+}
+
 int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
 {
struct device *dev = &hr_dev->pdev->dev;
@@ -90,3 +98,17 @@ void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
kfree(hr_cmd->context);
up(&hr_cmd->poll_sem);
 }
+
+void hns_roce_cmd_event(struct hns_roce_dev *hr_dev, u16 token, u8 status,
+   u64 out_param)
+{
+   struct hns_roce_cmd_context
+   *context = &hr_dev->cmd.context[token & hr_dev->cmd.token_mask];
+
+   if (token != context->token)
+   return;
+
+   context->result = hns_roce_status_to_errno(status);
+   context->out_param = out_param;
+   complete(&context->done);
+}
diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 84a9580..2d083c2 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -10,8 +10,58 @@
 #ifndef _HNS_ROCE_COMMON_H
 #define _HNS_ROCE_COMMON_H
 
-/*ROCEE_REG DEFINITION/
+#define roce_writel(value, addr) writel((value), (addr))
+#define roce_readl(addr)readl((addr))
+#define roce_raw_write(value, addr) \
+   __raw_writel((__force u32)cpu_to_le32(value), (addr))
+
+#define roce_get_field(origin, mask, shift) \
+   (((origin) & (mask)) >> (shift))
+
+#define roce_get_bit(origin, shift) \
+   roce_get_field((origin), (1ul << (shift)), (shift))
+
+#define roce_set_field(origin, mask, shift, val) \
+   do { \
+   (origin) &= (~(mask)); \
+   (origin) |= (((val) << (shift)) & (mask)); \
+   } while (0)
+
+#define roce_set_bit(origin, shift, val) \
+   roce_set_field((origin), (1ul << (shift)), (shift), (val))
+
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S 0
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_M   \
+   (((1UL << 2) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S)
+
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_AEQE_SHIFT_S 8
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_AEQE_SHIFT_M   \
+   (((1UL << 4) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_AEQE_SHIFT_S)
+
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQ_ALM_OVF_INT_ST_S 17
+
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQ_BT_H_S 0
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQ_BT_H_M   \
+   (((1UL << 5) - 1) << ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQ_BT_H_S)
+
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQE_CUR_IDX_S 16
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQE_CUR_IDX_M   \
+   (((1UL << 16) - 1) << ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQE_CUR_IDX_S)
 
+#define ROCEE_CAEP_AEQE_CONS_IDX_CAEP_AEQE_CONS_IDX_S 0
+#define ROCEE_CAEP_AEQE_CONS_IDX_CAEP_AEQE_CONS_IDX_M   \
+   (((1UL << 16) - 1) << ROCEE_CAEP_AEQE_CONS_IDX_CAEP_AEQE_CONS_IDX_S)
+
+#define ROCEE_CAEP_CEQC_SHIFT_CAEP_CEQ_ALM_OVF_INT_ST_S 16
+#define ROCEE_CAEP_CE_IRQ_MASK_CAEP_CEQ_ALM_OVF_MASK_S 1
+#define ROCEE_CAEP_CEQ_ALM_OVF_CAEP_CEQ_ALM_OVF_S 0
+
+#define ROCEE_CAEP_AE_MASK_CAEP_AEQ_ALM_OVF_MASK_S 0
+#define ROCEE_CAEP_AE_MASK_CAEP_AE_IRQ_MASK_S 1
+
+#define ROCEE_CAEP_AE_ST_CAEP_AEQ_ALM_OVF_S 0
+
+/*ROCEE_REG DEFINITION/
 #define ROCEE_VENDOR_ID_REG0x0
 #define ROCEE_VENDOR_PART_ID_REG   0x4
 
@@ -20,8 +70,29 @@
 #define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
 #define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
 
+#define ROCEE_CAEP_AEQE_CONS_IDX_REG   0x3AC
+#define ROCEE_CAEP_CEQC_CONS_IDX_0_REG 0x3BC
+
+#define ROCEE_ECC_UCERR_ALM1_REG

[PATCH v5 13/21] IB/hns: Add interface of the protocol stack registration

2016-04-23 Thread Lijun Ou

This patch mainly added the function module which netif notify
registered the protocol stack. It includes interface functions
as follows:
1. The executive called interface of RoCE when the netlink
   event that registered protocol stack was generated
2. The executive called interface of RoCE when ip address
   that registered protocol stack was changed.
In addition that, it will free the relative resource when RoCE
was removed.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |   3 +
 drivers/infiniband/hw/hns/hns_roce_main.c   | 210 
 2 files changed, 213 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index a4d4d4c..6565b43 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -241,7 +241,10 @@ struct hns_roce_qp {
 };
 
 struct hns_roce_ib_iboe {
+   spinlock_t  lock;
struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
+   struct notifier_block   nb;
+   struct notifier_block   nb_inet;
/* 16 GID is shared by 6 port in v1 engine. */
union ib_gidgid_table[HNS_ROCE_MAX_GID_NUM];
u8  phy_port[HNS_ROCE_MAX_PORTS];
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 415f1fa..8aa8c55 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -44,6 +44,46 @@
 #include "hns_roce_icm.h"
 
 /**
+ * hns_roce_addrconf_ifid_eui48 - Get default gid.
+ * @eui: eui.
+ * @vlan_id:  gid
+ * @dev:  net device
+ * Description:
+ *MAC convert to GID
+ *gid[0..7] = fe80   
+ *gid[8] = mac[0] ^ 2
+ *gid[9] = mac[1]
+ *gid[10] = mac[2]
+ *gid[11] = ff(VLAN ID high byte (4 MS bits))
+ *gid[12] = fe(VLAN ID low byte)
+ *gid[13] = mac[3]
+ *gid[14] = mac[4]
+ *gid[15] = mac[5]
+ */
+static void hns_roce_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
+struct net_device *dev)
+{
+   memcpy(eui, dev->dev_addr, 3);
+   memcpy(eui + 5, dev->dev_addr + 3, 3);
+   if (vlan_id < 0x1000) {
+   eui[3] = vlan_id >> 8;
+   eui[4] = vlan_id & 0xff;
+   } else {
+   eui[3] = 0xff;
+   eui[4] = 0xfe;
+   }
+   eui[0] ^= 2;
+}
+
+void hns_roce_make_default_gid(struct net_device *dev, union ib_gid *gid)
+{
+   memset(gid, 0, sizeof(*gid));
+   gid->raw[0] = 0xFE;
+   gid->raw[1] = 0x80;
+   hns_roce_addrconf_ifid_eui48(&gid->raw[8], 0x, dev);
+}
+
+/**
  * hns_get_gid_index - Get gid index.
  * @hr_dev: pointer to structure hns_roce_dev.
  * @port:  port, value range: 0 ~ MAX
@@ -121,6 +161,152 @@ void hns_roce_update_gids(struct hns_roce_dev *hr_dev, 
int port)
ib_dispatch_event(&event);
 }
 
+static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
+  unsigned long event)
+{
+   struct device *dev = &hr_dev->pdev->dev;
+   struct net_device *netdev;
+   unsigned long flags;
+   union ib_gid gid;
+   int ret = 0;
+
+   netdev = hr_dev->iboe.netdevs[port];
+   if (!netdev) {
+   dev_err(dev, "port(%d) can't find netdev\n", port);
+   return -ENODEV;
+   }
+
+   spin_lock_irqsave(&hr_dev->iboe.lock, flags);
+
+   switch (event) {
+   case NETDEV_UP:
+   case NETDEV_CHANGE:
+   case NETDEV_REGISTER:
+   case NETDEV_CHANGEADDR:
+   hns_roce_set_mac(hr_dev, port, netdev->dev_addr);
+   hns_roce_make_default_gid(netdev, &gid);
+   ret = hns_roce_set_gid(hr_dev, port, 0, &gid);
+   if (!ret)
+   hns_roce_update_gids(hr_dev, port);
+   break;
+   case NETDEV_DOWN:
+   /*
+   * In v1 engine, only support all ports closed together.
+   */
+   break;
+   default:
+   dev_dbg(dev, "NETDEV event = 0x%x!\n", (u32)(event));
+   break;
+   }
+
+   spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
+   return ret;
+}
+
+static int hns_roce_netdev_event(struct notifier_block *self,
+unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct hns_roce_ib_iboe *iboe = NULL;
+   struct hns_roce_dev *hr_dev = NULL;
+   u8 port = 0;
+   int ret = 0;
+
+   hr_dev = container_of(self, struct hns_roce_dev, iboe.nb);
+   iboe = &hr_dev->iboe;
+
+   for (port = 0; port < hr_dev->caps.num_ports; port++) {
+   if (dev == iboe->netdevs[port]) {
+   ret = handle_en_event(hr_dev, port, event);
+

[PATCH v5 05/21] IB/hns: Add initial profile resource

2016-04-23 Thread Lijun Ou

This patch mainly configured some profile resoure. for exmaple,
vendor_id, hardware version, and some data structure sizes so on.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_common.h | 25 +
 drivers/infiniband/hw/hns/hns_roce_device.h | 56 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 78 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 36 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |  9 
 5 files changed, 203 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_common.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
new file mode 100644
index 000..0f90214
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _HNS_ROCE_COMMON_H
+#define _HNS_ROCE_COMMON_H
+
+/*ROCEE_REG DEFINITION/
+
+#define ROCEE_VENDOR_ID_REG0x0
+#define ROCEE_VENDOR_PART_ID_REG   0x4
+
+#define ROCEE_HW_VERSION_REG   0x8
+
+#define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
+#define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
+
+#define ROCEE_ACK_DELAY_REG0x14
+
+#endif /* _HNS_ROCE_COMMON_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 24ac1a8..e3e59d0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -24,17 +24,65 @@
 #define HNS_ROCE_MAX_IRQ_NUM   34
 #define HNS_ROCE_MAX_PORTS 6
 
+#define HNS_ROCE_COMP_VEC_NUM  32
+
+#define HNS_ROCE_AEQE_VEC_NUM  1
+#define HNS_ROCE_AEQE_OF_VEC_NUM   1
+
+#define ADDR_SHIFT_32  32
+
 struct hns_roce_ib_iboe {
struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
u8  phy_port[HNS_ROCE_MAX_PORTS];
 };
 
 struct hns_roce_caps {
-   u8  num_ports;
+   u64 fw_ver;
+   u8  num_ports;
+   int gid_table_len[HNS_ROCE_MAX_PORTS];
+   int pkey_table_len[HNS_ROCE_MAX_PORTS];
+   int local_ca_ack_delay;
+   int num_uars;
+   u32 phy_num_uars;
+   u32 max_sq_sg;  /* 2 */
+   u32 max_sq_inline;  /* 32 */
+   u32 max_rq_sg;  /* 2 */
+   int num_qps;/* 256k */
+   u32 max_wqes;   /* 16k */
+   u32 max_sq_desc_sz; /* 64 */
+   u32 max_rq_desc_sz; /* 64 */
+   int max_qp_init_rdma;
+   int max_qp_dest_rdma;
+   int sqp_start;
+   int num_cqs;
+   int max_cqes;
+   int reserved_cqs;
+   int num_aeq_vectors;/* 1 */
+   int num_comp_vectors;   /* 32 ceq */
+   int num_other_vectors;
+   int num_mtpts;
+   u32 num_mtt_segs;
+   int reserved_mtts;
+   int reserved_mrws;
+   int reserved_uars;
+   int num_pds;
+   int reserved_pds;
+   u32 mtt_entry_sz;
+   u32 cq_entry_sz;
+   u32 page_size_cap;
+   u32 reserved_lkey;
+   int mtpt_entry_sz;
+   int qpc_entry_sz;
+   int irrl_entry_sz;
+   int cqc_entry_sz;
+   int aeqe_depth;
+   int ceqe_depth[HNS_ROCE_COMP_VEC_NUM];
+   enum ib_mtu max_mtu;
 };
 
 struct hns_roce_hw {
int (*reset)(struct hns_roce_dev *hr_dev, u32 val);
+   void (*hw_profile)(struct hns_roce_dev *hr_dev);
 };
 
 struct hns_roce_dev {
@@ -46,6 +94,12 @@ struct hns_roce_dev {
u8 __iomem  *reg_base;
struct hns_roce_capscaps;
 
+   u64 fw_ver;
+   u64 sys_image_guid;
+   u32 vendor_id;
+   u32 vendor_part_id;
+   u32 hw_rev;
+
int cmd_mod;
int loop_idc;
struct hns_roce_hw  *hw;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index ea39e56..54831bf 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include "hns_roce_common.h"
 #include "hns_roce_device.h"
 #include "hns_roce_hw_v1.h"
 
@@ -49,6 +50,83 @@ int hns_roce_v1_reset(struct hns_roce_dev *hr_dev, u32 val)
return ret;
 }
 
+void hns_roce_v1_profile(struct h

[PATCH v5 16/21] IB/hns: Add ah operation support

2016-04-23 Thread Lijun Ou

This patch was for implemention of address handle operation.
It includes three verbs that create ah, query ah and destory
ah. They is completed independently by RoCE driver.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_ah.c | 109 
 drivers/infiniband/hw/hns/hns_roce_common.h |   4 +
 drivers/infiniband/hw/hns/hns_roce_device.h |  31 +++-
 drivers/infiniband/hw/hns/hns_roce_main.c   |   5 ++
 4 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_ah.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_ah.c 
b/drivers/infiniband/hw/hns/hns_roce_ah.c
new file mode 100644
index 000..9d0eb61
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_ah.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_common.h"
+#include "hns_roce_device.h"
+
+#define HNS_ROCE_PORT_NUM_SHIFT24
+#define HNS_ROCE_VLAN_SL_BIT_MASK  7
+#define HNS_ROCE_VLAN_SL_SHIFT 13
+
+struct ib_ah *hns_roce_create_ah(struct ib_pd *ibpd, struct ib_ah_attr 
*ah_attr)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ibpd->device);
+   struct device *dev = &hr_dev->pdev->dev;
+   struct ib_gid_attr gid_attr;
+   struct hns_roce_ah *ah;
+   u16 vlan_tag = 0x;
+   struct in6_addr in6;
+   union ib_gid sgid;
+   int ret;
+
+   ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
+   if (!ah)
+   return ERR_PTR(-ENOMEM);
+
+   /* Get mac address */
+   memcpy(&in6, ah_attr->grh.dgid.raw, sizeof(ah_attr->grh.dgid.raw));
+   if (rdma_is_multicast_addr(&in6))
+   rdma_get_mcast_mac(&in6, ah->av.mac);
+   else
+   memcpy(ah->av.mac, ah_attr->dmac, sizeof(ah_attr->dmac));
+
+   /* Get source gid */
+   ret = ib_get_cached_gid(ibpd->device, ah_attr->port_num,
+   ah_attr->grh.sgid_index, &sgid, &gid_attr);
+   if (ret) {
+   dev_err(dev, "get sgid failed! ret = %d\n", ret);
+   kfree(ah);
+   return ERR_PTR(ret);
+   }
+
+   if (gid_attr.ndev) {
+   if (is_vlan_dev(gid_attr.ndev))
+   vlan_tag = vlan_dev_vlan_id(gid_attr.ndev);
+   dev_put(gid_attr.ndev);
+   }
+
+   if (vlan_tag < 0x1000)
+   vlan_tag |= (ah_attr->sl & HNS_ROCE_VLAN_SL_BIT_MASK) <<
+HNS_ROCE_VLAN_SL_SHIFT;
+
+   ah->av.port_pd = cpu_to_be32(to_hr_pd(ibpd)->pdn | (ah_attr->port_num <<
+HNS_ROCE_PORT_NUM_SHIFT));
+   ah->av.gid_index = ah_attr->grh.sgid_index;
+   ah->av.vlan = cpu_to_le16(vlan_tag);
+   dev_dbg(dev, "gid_index = 0x%x,vlan = 0x%x\n", ah->av.gid_index,
+   ah->av.vlan);
+
+   if (ah_attr->static_rate)
+   ah->av.stat_rate = IB_RATE_10_GBPS;
+
+   memcpy(ah->av.dgid, ah_attr->grh.dgid.raw, HNS_ROCE_GID_SIZE);
+   ah->av.sl_tclass_flowlabel = cpu_to_le32(ah_attr->sl <<
+HNS_ROCE_SL_SHIFT);
+
+   return &ah->ibah;
+}
+
+int hns_roce_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+   struct hns_roce_ah *ah = to_hr_ah(ibah);
+
+   memset(ah_attr, 0, sizeof(*ah_attr));
+
+   ah_attr->sl = le32_to_cpu(ah->av.sl_tclass_flowlabel) >>
+ HNS_ROCE_SL_SHIFT;
+   ah_attr->port_num = le32_to_cpu(ah->av.port_pd) >>
+   HNS_ROCE_PORT_NUM_SHIFT;
+   ah_attr->static_rate = ah->av.stat_rate;
+   ah_attr->ah_flags = IB_AH_GRH;
+   ah_attr->grh.traffic_class = le32_to_cpu(ah->av.sl_tclass_flowlabel) >>
+HNS_ROCE_TCLASS_SHIFT;
+   ah_attr->grh.flow_label = le32_to_cpu(ah->av.sl_tclass_flowlabel) &
+ HNS_ROCE_FLOW_LABLE_MASK;
+   ah_attr->grh.hop_limit = ah->av.hop_limit;
+   ah_attr->grh.sgid_index = ah->av.gid_index;
+   memcpy(ah_attr->grh.dgid.raw, ah->av.dgid, HNS_ROCE_GID_SIZE);
+
+   return 0;
+}
+
+int hns_roce_destroy_ah(struct ib_ah *ah)
+{
+   kfree(to_hr_ah(ah));
+
+   return 0;
+}
diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 9283f05..db32a52 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -10,6 +10,10 @@
 #ifndef _HNS_ROCE_COMMON_H
 #define _HNS_ROCE_COMMON_H
 
+#ifndef assert
+#define assert(cond)
+#endif
+
 #define roce_write

[PATCH v5 14/21] IB/hns: Add operations support for IB device and port

2016-04-23 Thread Lijun Ou

This patch mainly registered some relative verbs for the kernel.
These operation funtions will be called by user. For example:
1. modify device
2. query device
3. query_port
4. modify_port
and so on.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  21 +++
 drivers/infiniband/hw/hns/hns_roce_main.c   | 229 +++-
 drivers/infiniband/hw/hns/hns_roce_user.h   |  17 +++
 3 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_user.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 6565b43..e871c60 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -22,6 +22,8 @@
 #define DRV_NAME "hns_roce"
 
 #define MAC_ADDR_OCTET_NUM 6
+#define HNS_ROCE_MAX_MSG_LEN   0x8000
+
 
 #define HNS_ROCE_BA_SIZE   (32 * 4096)
 
@@ -35,7 +37,10 @@
 
 #define HNS_ROCE_MAX_PORTS 6
 #define HNS_ROCE_MAX_GID_NUM   16
+#define HNS_ROCE_GID_SIZE  16
 
+#define PKEY_ID0x
+#define NODE_DESC_SIZE 64
 #define ADDR_SHIFT_12  12
 #define ADDR_SHIFT_32  32
 #define ADDR_SHIFT_44  44
@@ -106,6 +111,11 @@ struct hns_roce_uar {
u32 index;
 };
 
+struct hns_roce_ucontext {
+   struct ib_ucontext  ibucontext;
+   struct hns_roce_uar uar;
+};
+
 struct hns_roce_bitmap {
/* Bitmap Traversal last a bit which is 1 */
u32last;
@@ -363,6 +373,17 @@ struct hns_roce_dev {
struct hns_roce_hw  *hw;
 };
 
+static inline struct hns_roce_dev *to_hr_dev(struct ib_device *ib_dev)
+{
+   return container_of(ib_dev, struct hns_roce_dev, ib_dev);
+}
+
+static inline struct hns_roce_ucontext
+   *to_hr_ucontext(struct ib_ucontext *ibucontext)
+{
+   return container_of(ibucontext, struct hns_roce_ucontext, ibucontext);
+}
+
 static inline void hns_roce_write64_k(__be32 val[2], void __iomem *dest)
 {
__raw_writeq(*(u64 *) val, dest);
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 8aa8c55..9c111ec 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -41,6 +41,7 @@
 #include 
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
+#include "hns_roce_user.h"
 #include "hns_roce_icm.h"
 
 /**
@@ -341,6 +342,217 @@ int hns_roce_setup_mtu_gids(struct hns_roce_dev  *hr_dev)
return ret;
 }
 
+static int hns_roce_query_device(struct ib_device *ib_dev,
+struct ib_device_attr *props,
+struct ib_udata *uhw)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
+
+   memset(props, 0, sizeof(*props));
+
+   props->fw_ver = hr_dev->fw_ver;
+   props->sys_image_guid = hr_dev->sys_image_guid;
+   props->max_mr_size = (u64)(~(0ULL));
+   props->page_size_cap = hr_dev->caps.page_size_cap;
+   props->vendor_id = hr_dev->vendor_id;
+   props->vendor_part_id = hr_dev->vendor_part_id;
+   props->hw_ver = hr_dev->hw_rev;
+   props->max_qp = hr_dev->caps.num_qps;
+   props->max_qp_wr = hr_dev->caps.max_wqes;
+   props->device_cap_flags = IB_DEVICE_PORT_ACTIVE_EVENT |
+ IB_DEVICE_RC_RNR_NAK_GEN |
+ IB_DEVICE_LOCAL_DMA_LKEY;
+   props->max_sge = hr_dev->caps.max_sq_sg;
+   props->max_sge_rd = 1;
+   props->max_cq = hr_dev->caps.num_cqs;
+   props->max_cqe = hr_dev->caps.max_cqes;
+   props->max_mr = hr_dev->caps.num_mtpts;
+   props->max_pd = hr_dev->caps.num_pds;
+   props->max_qp_rd_atom = hr_dev->caps.max_qp_dest_rdma;
+   props->max_qp_init_rd_atom = hr_dev->caps.max_qp_init_rdma;
+   props->atomic_cap = IB_ATOMIC_NONE;
+   props->max_pkeys = 1;
+   props->local_ca_ack_delay = hr_dev->caps.local_ca_ack_delay;
+
+   return 0;
+}
+
+static int hns_roce_query_port(struct ib_device *ib_dev, u8 port_num,
+  struct ib_port_attr *props)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
+   struct device *dev = &hr_dev->pdev->dev;
+   struct net_device *net_dev;
+   unsigned long flags;
+   enum ib_mtu mtu;
+   u8 port;
+
+   assert(port_num > 0);
+   port = port_num - 1;
+
+   memset(props, 0, sizeof(*props));
+
+   props->max_mtu = hr_dev->caps.max_mtu;
+   props->gid_tbl_len = hr_dev->caps.gid_table_len[port];
+   props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_REINIT_SUP |
+   IB_PORT_VENDOR_CLASS_SUP |
+

[PATCH v5 08/21] IB/hns: Add icm support

2016-04-23 Thread Lijun Ou

This patch mainly added icm support for RoCE. It inits the icm
which managers the relative memory blocks for RoCE. the data
structures of RoCE will be located in it. for example, CQ table,
QP table and MTPT table so on.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_common.h |  19 ++
 drivers/infiniband/hw/hns/hns_roce_device.h |  30 ++
 drivers/infiniband/hw/hns/hns_roce_eq.c |   3 +-
 drivers/infiniband/hw/hns/hns_roce_icm.c| 436 
 drivers/infiniband/hw/hns/hns_roce_icm.h|  95 ++
 drivers/infiniband/hw/hns/hns_roce_main.c   |  84 ++
 6 files changed, 665 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_icm.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_icm.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 2d083c2..5e181de 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -30,6 +30,22 @@
 #define roce_set_bit(origin, shift, val) \
roce_set_field((origin), (1ul << (shift)), (shift), (val))
 
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S 0
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_M   \
+   (((1UL << 19) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S)
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_S 19
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_MDF_S 20
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_MDF_M   \
+   (((1UL << 2) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_MDF_S)
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_BA_H_S 22
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_BA_H_S)
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_HW_SYNS_S 31
+
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S 0
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_M   \
(((1UL << 2) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S)
@@ -70,6 +86,8 @@
 #define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
 #define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
 
+#define ROCEE_BT_CMD_H_REG 0x204
+
 #define ROCEE_CAEP_AEQE_CONS_IDX_REG   0x3AC
 #define ROCEE_CAEP_CEQC_CONS_IDX_0_REG 0x3BC
 
@@ -82,6 +100,7 @@
 
 #define ROCEE_CAEP_CE_INTERVAL_CFG_REG 0x190
 #define ROCEE_CAEP_CE_BURST_NUM_CFG_REG0x194
+#define ROCEE_BT_CMD_L_REG 0x200
 
 #define ROCEE_MB1_REG  0x210
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 6835223..decd4fe 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -91,6 +91,26 @@ enum {
HNS_ROCE_CMD_SUCCESS= 1,
 };
 
+struct hns_roce_icm_table {
+   /* ICM type: 0 = qpc 1 = mtt 2 = cqc 3 = srq 4 = other */
+   u32type;
+   /* ICM array elment num */
+   intnum_icm;
+   /* ICM entry record obj total num */
+   intnum_obj;
+   /*Single obj size */
+   intobj_size;
+   intlowmem;
+   intcoherent;
+   struct mutex   mutex;
+   struct hns_roce_icm **icm;
+};
+
+struct hns_roce_mr_table {
+   struct hns_roce_icm_table   mtt_table;
+   struct hns_roce_icm_table   mtpt_table;
+};
+
 struct hns_roce_buf_list {
void*buf;
dma_addr_t  map;
@@ -106,11 +126,14 @@ struct hns_roce_cq {
 
 struct hns_roce_qp_table {
spinlock_t  lock;
+   struct hns_roce_icm_table   qp_table;
+   struct hns_roce_icm_table   irrl_table;
 };
 
 struct hns_roce_cq_table {
spinlock_t  lock;
struct radix_tree_root  tree;
+   struct hns_roce_icm_table   table;
 };
 
 struct hns_roce_cmd_context {
@@ -239,6 +262,7 @@ struct hns_roce_hw {
 struct hns_roce_dev {
struct ib_deviceib_dev;
struct platform_device  *pdev;
+   spinlock_t  bt_cmd_lock;
struct hns_roce_ib_iboe iboe;
 
int irq[HNS_ROCE_MAX_IRQ_NUM];
@@ -253,6 +277,7 @@ struct hns_roce_dev {
u32 hw_rev;
 
struct hns_roce_cmdq  cmd;
+   struct hns_roce_mr_table  mr_table;
struct hns_roce_cq_table  cq_table;
struct hns_roce_qp_table  qp_table;
struct hns_roce_eq_table  eq_table;
@@ -262,6 +287,11 @@ struct hns_roce_dev {
struct hns_roce_hw  *hw;
 };
 
+static inline void hns_roce_write64_k(__be32 val[2], void __iomem *dest)
+{
+   __raw_writeq(*(u64 *) val, dest);
+}
+
 static inline struct hns_roce_qp
*__hns_roce_qp_lookup(struct hns_roce_dev *hr_dev, u32 qpn)
 {
diff --git a/drivers/infiniband/hw/hns/hns_roce_eq.c 
b/drivers/infiniband/hw/hns/hns_roce_eq.c
index d8ca66f..7d848a0 100644
--- a/drivers/infiniband/hw/hns/hns_roc

[PATCH v5 00/21] Add HiSilicon RoCE driver

2016-04-23 Thread Lijun Ou

The HiSilicon Network Substem is a long term evolution IP which is
supposed to be used in HiSilicon ICT SoCs. HNS (HiSilicon Network
Sybsystem) also has a hardware support of performing RDMA with
RoCEE.
The driver for HiSilicon RoCEE(RoCE Engine) is a platform driver and will
support mulitple versions of SOCs in future. This version of driver is 
meant to support Hip06 SoC (which confirms to RoCEEv1 hardware specifications).
 
Changes v5 -> v4:
1. redesign the patchset for RoCE modules in order to split the huge
patch into small patches.
2. fix the directory path for RoCE module. Delete the hisilicon level.
3. modify the name of roce_v1_hw into roce_hw_v1.

Changes v3 -> v4:
1. modify roce.o into hns-roce.o in Makefile and Kconfig file.

Changes v2 -> v3:
1. modify the formats of RoCE driver code base v2 by the experts 
reviewing. also, it used kmalloc_array instead of kmalloc, used kcalloc
instead of kzalloc, when refer to memory allocation for array
2. remove some functions without use and unconnected macros
3. modify the binding document with RoCE DT base v2 which added interrupt-names
4. redesign the port_map and si_map in hns_dsaf_roce_reset
5. add HiSilicon RoCE driver maintainers introduction in MAINTAINERS document

Changes v1 -> v2:
1. modify the formats of roce driver code by the experts reviewing
2. modify the bindings file with roce dts. add the attribute named 
interrput-names.
3. modify the way of defining port mode in hns_dsaf_main.c
4. move the Kconfig file into the hns directory and send it with roce

Lijun Ou (21):
  net: hns: Add reset function support for RoCE driver
  devicetree: bindings: IB: Add binding document for HiSilicon RoCE
  IB/hns: Add initial main frame driver and get cfg info
  IB/hns: Add RoCE engine reset function
  IB/hns: Add initial profile resource
  IB/hns: Add initial cmd operation
  IB/hns: Add event queue support
  IB/hns: Add icm support
  IB/hns: Add hca support
  IB/hns: Add process flow to init RoCE engine
  IB/hns: Add IB device registration function
  IB/hns: Set mtu and gid support
  IB/hns: Add interface of the protocol stack registration
  IB/hns: Add operations support for IB device and port
  IB/hns: Add PD operations support
  IB/hns: Add ah operation support
  IB/hns: Add QP operation implemention support
  IB/hns: Add CQ operation implemention support
  IB/hns: Add memory region operation support
  IB/hns: Kconfig and Makefile for RoCE module
  MAINTAINERS: Add maintainers for HiSilicon RoCE driver

 .../bindings/infiniband/hisilicon-hns-roce.txt |  107 +
 MAINTAINERS|8 +
 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/hw/Makefile |1 +
 drivers/infiniband/hw/hns/Kconfig  |   10 +
 drivers/infiniband/hw/hns/Makefile |9 +
 drivers/infiniband/hw/hns/hns_roce_ah.c|  109 +
 drivers/infiniband/hw/hns/hns_roce_alloc.c |  238 ++
 drivers/infiniband/hw/hns/hns_roce_cmd.c   |  324 +++
 drivers/infiniband/hw/hns/hns_roce_cmd.h   |   80 +
 drivers/infiniband/hw/hns/hns_roce_common.h|  302 +++
 drivers/infiniband/hw/hns/hns_roce_cq.c|  435 +++
 drivers/infiniband/hw/hns/hns_roce_device.h|  731 +
 drivers/infiniband/hw/hns/hns_roce_eq.c|  756 ++
 drivers/infiniband/hw/hns/hns_roce_eq.h|   95 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 2811 
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h |  966 +++
 drivers/infiniband/hw/hns/hns_roce_icm.c   |  579 
 drivers/infiniband/hw/hns/hns_roce_icm.h   |  112 +
 drivers/infiniband/hw/hns/hns_roce_main.c  | 1075 
 drivers/infiniband/hw/hns/hns_roce_mr.c|  599 +
 drivers/infiniband/hw/hns/hns_roce_pd.c|  127 +
 drivers/infiniband/hw/hns/hns_roce_qp.c|  835 ++
 drivers/infiniband/hw/hns/hns_roce_user.h  |   27 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c |   89 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |   30 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c |   62 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  |   16 +-
 28 files changed, 10520 insertions(+), 14 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
 create mode 100644 drivers/infiniband/hw/hns/Kconfig
 create mode 100644 drivers/infiniband/hw/hns/Makefile
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_ah.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_alloc.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cmd.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cmd.h
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_common.h
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cq.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_device.h
 create mode 100644 drivers/infiniband/h

[PATCH v5 15/21] IB/hns: Add PD operations support

2016-04-23 Thread Lijun Ou

This patch added the verbs to operate PD. It mainly includes
the functions of allocating PD and deallocating PD.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_device.h | 17 
 drivers/infiniband/hw/hns/hns_roce_main.c   |  8 +++-
 drivers/infiniband/hw/hns/hns_roce_pd.c | 62 +
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index e871c60..68d78e9 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -116,6 +116,11 @@ struct hns_roce_ucontext {
struct hns_roce_uar uar;
 };
 
+struct hns_roce_pd {
+   struct ib_pd ibpd;
+   u32  pdn;
+};
+
 struct hns_roce_bitmap {
/* Bitmap Traversal last a bit which is 1 */
u32last;
@@ -384,6 +389,11 @@ static inline struct hns_roce_ucontext
return container_of(ibucontext, struct hns_roce_ucontext, ibucontext);
 }
 
+static inline struct hns_roce_pd *to_hr_pd(struct ib_pd *ibpd)
+{
+   return container_of(ibpd, struct hns_roce_pd, ibpd);
+}
+
 static inline void hns_roce_write64_k(__be32 val[2], void __iomem *dest)
 {
__raw_writeq(*(u64 *) val, dest);
@@ -431,6 +441,13 @@ int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap 
*bitmap, int cnt,
 void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap, u32 obj,
int cnt);
 
+struct ib_pd *hns_roce_alloc_pd(struct ib_device *ib_dev,
+   struct ib_ucontext *context,
+   struct ib_udata *udata);
+int hns_roce_pd_alloc(struct hns_roce_dev *hr_dev, u32 *pdn);
+void hns_roce_pd_free(struct hns_roce_dev *hr_dev, u32 pdn);
+int hns_roce_dealloc_pd(struct ib_pd *pd);
+
 void hns_roce_cq_completion(struct hns_roce_dev *hr_dev, u32 cqn);
 void hns_roce_cq_event(struct hns_roce_dev *hr_dev, u32 cqn, int event_type);
 void hns_roce_qp_event(struct hns_roce_dev *hr_dev, u32 qpn, int event_type);
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 9c111ec..eeae944 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -585,7 +585,9 @@ int hns_roce_register_device(struct hns_roce_dev *hr_dev)
ib_dev->uverbs_cmd_mask =
(1ULL << IB_USER_VERBS_CMD_GET_CONTEXT) |
(1ULL << IB_USER_VERBS_CMD_QUERY_DEVICE) |
-   (1ULL << IB_USER_VERBS_CMD_QUERY_PORT);
+   (1ULL << IB_USER_VERBS_CMD_QUERY_PORT) |
+   (1ULL << IB_USER_VERBS_CMD_ALLOC_PD) |
+   (1ULL << IB_USER_VERBS_CMD_DEALLOC_PD);
 
/* HCA||device||port */
ib_dev->modify_device   = hns_roce_modify_device;
@@ -599,6 +601,10 @@ int hns_roce_register_device(struct hns_roce_dev *hr_dev)
ib_dev->dealloc_ucontext= hns_roce_dealloc_ucontext;
ib_dev->mmap= hns_roce_mmap;
 
+   /* PD */
+   ib_dev->alloc_pd= hns_roce_alloc_pd;
+   ib_dev->dealloc_pd  = hns_roce_dealloc_pd;
+
ret = ib_register_device(ib_dev, NULL);
if (ret) {
dev_err(dev, "ib_register_device failed!\n");
diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c 
b/drivers/infiniband/hw/hns/hns_roce_pd.c
index fb0f7c65..57739eb 100644
--- a/drivers/infiniband/hw/hns/hns_roce_pd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
@@ -17,6 +17,28 @@
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
 
+int hns_roce_pd_alloc(struct hns_roce_dev *hr_dev, u32 *pdn)
+{
+   struct device *dev = &hr_dev->pdev->dev;
+   u32 pd_number;
+   int ret = 0;
+
+   ret = hns_roce_bitmap_alloc(&hr_dev->pd_bitmap, &pd_number);
+   if (ret == -1) {
+   dev_err(dev, "alloc pdn from pdbitmap failed\n");
+   return -ENOMEM;
+   }
+
+   *pdn = pd_number;
+
+   return 0;
+}
+
+void hns_roce_pd_free(struct hns_roce_dev *hr_dev, u32 pdn)
+{
+   hns_roce_bitmap_free(&hr_dev->pd_bitmap, pdn);
+}
+
 int hns_roce_init_pd_table(struct hns_roce_dev *hr_dev)
 {
return hns_roce_bitmap_init(&hr_dev->pd_bitmap, hr_dev->caps.num_pds,
@@ -29,6 +51,46 @@ void hns_roce_cleanup_pd_table(struct hns_roce_dev *hr_dev)
hns_roce_bitmap_cleanup(&hr_dev->pd_bitmap);
 }
 
+struct ib_pd *hns_roce_alloc_pd(struct ib_device *ib_dev,
+   struct ib_ucontext *context,
+   struct ib_udata *udata)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
+   struct device *dev = &hr_dev->pdev->dev;
+   struct hns_roce_pd *pd;
+   int ret;
+
+   pd = kmalloc(sizeof(*pd), GFP_KERNEL);
+   if (!pd)
+   return ERR_PTR(-ENOMEM);
+
+   ret = hns_roce_pd_alloc(to_hr_dev(ib_dev), &pd->pdn)

[PATCH v5 01/21] net: hns: Add reset function support for RoCE driver

2016-04-23 Thread Lijun Ou

It added reset function for RoCE driver. RoCE is a feature of hns.
In hip06 SoC, in RoCE reset process, it's needed to configure dsaf
channel reset, port and sl map info. Reset function of RoCE is
located in dsaf module, we only call it in RoCE driver when needed.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
Signed-off-by: Lisheng 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 89 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 30 
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 62 ---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  | 16 +++-
 4 files changed, 183 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index 5978a5c..593162b 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -748,9 +749,8 @@ static void hns_dsaf_tbl_stat_en(struct dsaf_device 
*dsaf_dev)
  */
 static void hns_dsaf_rocee_bp_en(struct dsaf_device *dsaf_dev)
 {
-   if (AE_IS_VER1(dsaf_dev->dsaf_ver))
-   dsaf_set_dev_bit(dsaf_dev, DSAF_XGE_CTRL_SIG_CFG_0_REG,
-DSAF_FC_XGE_TX_PAUSE_S, 1);
+   dsaf_set_dev_bit(dsaf_dev, DSAF_XGE_CTRL_SIG_CFG_0_REG,
+DSAF_FC_XGE_TX_PAUSE_S, 1);
 }
 
 /* set msk for dsaf exception irq*/
@@ -2594,6 +2594,89 @@ static struct platform_driver g_dsaf_driver = {
 
 module_platform_driver(g_dsaf_driver);
 
+/**
+ * hns_dsaf_roce_reset - reset dsaf and roce
+ * @dsaf_fwnode: Pointer to framework node for the dasf
+ * @val: 0 - request reset , 1 - drop reset
+ * retuen 0 - success , negative -fail
+ */
+int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, u32 val)
+{
+   struct dsaf_device *dsaf_dev;
+   struct platform_device *pdev;
+   unsigned int mp;
+   unsigned int sl;
+   unsigned int credit;
+   int i;
+   const u32 port_map[DSAF_ROCE_CREDIT_CHN][DSAF_ROCE_CHAN_MODE_NUM] = {
+   {DSAF_ROCE_PORT_0, DSAF_ROCE_PORT_0, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_1, DSAF_ROCE_PORT_0, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_2, DSAF_ROCE_PORT_1, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_3, DSAF_ROCE_PORT_1, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_4, DSAF_ROCE_PORT_2, DSAF_ROCE_PORT_1},
+   {DSAF_ROCE_PORT_4, DSAF_ROCE_PORT_2, DSAF_ROCE_PORT_1},
+   {DSAF_ROCE_PORT_5, DSAF_ROCE_PORT_3, DSAF_ROCE_PORT_1},
+   {DSAF_ROCE_PORT_5, DSAF_ROCE_PORT_3, DSAF_ROCE_PORT_1},
+   };
+   const u32 sl_map[DSAF_ROCE_CREDIT_CHN][DSAF_ROCE_CHAN_MODE_NUM] = {
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_0},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_1, DSAF_ROCE_SL_1},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_2},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_1, DSAF_ROCE_SL_3},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_0},
+   {DSAF_ROCE_SL_1, DSAF_ROCE_SL_1, DSAF_ROCE_SL_1},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_2},
+   {DSAF_ROCE_SL_1, DSAF_ROCE_SL_1, DSAF_ROCE_SL_3},
+   };
+
+   if (!is_of_node(dsaf_fwnode)) {
+   pr_err("hisi_dsaf: Only support DT node!\n");
+   return -EINVAL;
+   }
+   pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));
+   dsaf_dev = dev_get_drvdata(&pdev->dev);
+   if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
+   dev_err(dsaf_dev->dev, "%s v1 chip do not support roce!\n",
+   dsaf_dev->ae_dev.name);
+   return -ENODEV;
+   }
+
+   if (!val) {
+   /* Reset rocee-channels in dsaf and rocee */
+   hns_dsaf_srst_chns(dsaf_dev, DSAF_CHNS_MASK, 0);
+   hns_dsaf_roce_srst(dsaf_dev, 0);
+   } else {
+   /* Configure dsaf tx roce correspond to port map and sl map */
+   mp = dsaf_read_dev(dsaf_dev, DSAF_ROCE_PORT_MAP_REG);
+   for (i = 0; i < DSAF_ROCE_CREDIT_CHN; i++)
+   dsaf_set_field(mp, 7 << i * 3, i * 3,
+  port_map[i][DSAF_ROCE_6PORT_MODE]);
+   dsaf_set_field(mp, 3 << i * 3, i * 3, 0);
+   dsaf_write_dev(dsaf_dev, DSAF_ROCE_PORT_MAP_REG, mp);
+
+   sl = dsaf_read_dev(dsaf_dev, DSAF_ROCE_SL_MAP_REG);
+   for (i = 0; i < DSAF_ROCE_CREDIT_CHN; i++)
+   dsaf_set_field(sl, 3 << i * 2, i * 2,
+  sl_map[i][DSAF_ROCE_6PORT_MODE]);
+   dsaf_write_dev(dsaf_dev, DSAF_ROCE_SL_MAP_REG, sl);
+
+   /* De-reset rocee-channels in dsaf and rocee */
+   hns_dsaf_srst_chns(dsaf_dev, DSAF_CHNS_MASK, 1);
+

[patch] tracing: checking for NULL instead of IS_ERR()

2016-04-23 Thread Dan Carpenter

tracing_map_elt_alloc() returns ERR_PTRs on error, never NULL.

Fixes: 08d43a5fa063 ('tracing: Add lock-free tracing_map')
Signed-off-by: Dan Carpenter 

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index e0f1729..e7dfc5e 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -814,7 +814,7 @@ static struct tracing_map_elt *copy_elt(struct 
tracing_map_elt *elt)
unsigned int i;
 
dup_elt = tracing_map_elt_alloc(elt->map);
-   if (!dup_elt)
+   if (IS_ERR(dup_elt))
return NULL;
 
if (elt->map->ops && elt->map->ops->elt_copy)

[PATCH v5 10/21] IB/hns: Add process flow to init RoCE engine

2016-04-23 Thread Lijun Ou

This patch mainly initialized the RoCE engine. It is absolutely
necessary to run RoCE function. It mainly includes that configure
DMAE user, init doorbell and raq operations, enable port.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_common.h | 107 +++
 drivers/infiniband/hw/hns/hns_roce_device.h |  15 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 477 
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |  66 
 drivers/infiniband/hw/hns/hns_roce_main.c   |  20 ++
 5 files changed, 685 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 5e181de..fbe6a68 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -30,6 +30,93 @@
 #define roce_set_bit(origin, shift, val) \
roce_set_field((origin), (1ul << (shift)), (shift), (val))
 
+#define ROCEE_GLB_CFG_ROCEE_DB_SQ_MODE_S 3
+#define ROCEE_GLB_CFG_ROCEE_DB_OTH_MODE_S 4
+
+#define ROCEE_GLB_CFG_SQ_EXT_DB_MODE_S 5
+
+#define ROCEE_GLB_CFG_OTH_EXT_DB_MODE_S 6
+
+#define ROCEE_GLB_CFG_ROCEE_PORT_ST_S 10
+#define ROCEE_GLB_CFG_ROCEE_PORT_ST_M  \
+   (((1UL << 6) - 1) << ROCEE_GLB_CFG_ROCEE_PORT_ST_S)
+
+#define ROCEE_GLB_CFG_TRP_RAQ_DROP_EN_S 16
+
+#define ROCEE_DMAE_USER_CFG1_ROCEE_STREAM_ID_TB_CFG_S 0
+#define ROCEE_DMAE_USER_CFG1_ROCEE_STREAM_ID_TB_CFG_M  \
+   (((1UL << 24) - 1) << ROCEE_DMAE_USER_CFG1_ROCEE_STREAM_ID_TB_CFG_S)
+
+#define ROCEE_DMAE_USER_CFG1_ROCEE_CACHE_TB_CFG_S 24
+#define ROCEE_DMAE_USER_CFG1_ROCEE_CACHE_TB_CFG_M  \
+   (((1UL << 4) - 1) << ROCEE_DMAE_USER_CFG1_ROCEE_CACHE_TB_CFG_S)
+
+#define ROCEE_DMAE_USER_CFG2_ROCEE_STREAM_ID_PKT_CFG_S 0
+#define ROCEE_DMAE_USER_CFG2_ROCEE_STREAM_ID_PKT_CFG_M   \
+   (((1UL << 24) - 1) << ROCEE_DMAE_USER_CFG2_ROCEE_STREAM_ID_PKT_CFG_S)
+
+#define ROCEE_DMAE_USER_CFG2_ROCEE_CACHE_PKT_CFG_S 24
+#define ROCEE_DMAE_USER_CFG2_ROCEE_CACHE_PKT_CFG_M   \
+   (((1UL << 4) - 1) << ROCEE_DMAE_USER_CFG2_ROCEE_CACHE_PKT_CFG_S)
+
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_S 0
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_S)
+
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_EMPTY_S 16
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_EMPTY_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_EMPTY_S)
+
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_S 0
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_S)
+
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_EMPTY_S 16
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_EMPTY_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_EMPTY_S)
+
+#define ROCEE_RAQ_WL_ROCEE_RAQ_WL_S 0
+#define ROCEE_RAQ_WL_ROCEE_RAQ_WL_M   \
+   (((1UL << 8) - 1) << ROCEE_RAQ_WL_ROCEE_RAQ_WL_S)
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_POL_TIME_INTERVAL_S 0
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_POL_TIME_INTERVAL_M   \
+   (((1UL << 15) - 1) << \
+   ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_POL_TIME_INTERVAL_S)
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_CFG_S 16
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_CFG_M   \
+   (((1UL << 4) - 1) << \
+   ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_CFG_S)
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_EN_S 20
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_EXT_RAQ_MODE 21
+
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_SHIFT_S 0
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_SHIFT_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_SHIFT_S)
+
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_BA_H_S 5
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_BA_H_S)
+
+#define ROCEE_EXT_DB_OTH_H_EXT_DB_OTH_SHIFT_S 0
+#define ROCEE_EXT_DB_OTH_H_EXT_DB_OTH_SHIFT_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_OTH_H_EXT_DB_OTH_SHIFT_S)
+
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_OTH_BA_H_S 5
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_OTH_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_SQ_H_EXT_DB_OTH_BA_H_S)
+
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_SHIFT_S 0
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_SHIFT_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_RAQ_H_EXT_RAQ_SHIFT_S)
+
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_BA_H_S 8
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_RAQ_H_EXT_RAQ_BA_H_S)
+
 #define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S 0
 #define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_M   \
(((1UL << 19) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S)
@@ -97,6 +184,26 @@
 #define ROCEE_ECC_CERR_ALM2_REG0xB48
 
 #define ROCEE_ACK_DELAY_REG0x14
+#define ROCEE_GLB_CFG_REG  0x18
+
+#define ROCEE_DMAE_USER_CFG1_REG   0x40
+#define ROCEE_DMAE_USER_CFG2_REG   0x44
+
+#define ROCEE_DB_SQ_WL_REG 0x154
+#define ROCEE_DB_O

[PATCH v5 19/21] IB/hns: Add memory region operation support

2016-04-23 Thread Lijun Ou

This patch was mainly for implemention of memory region.
Memory Registration provides mechanisms that allow Consumers
to describe a set of virtually contiguous memory locations or
a set of physically contiguous memory locations.
MR operations includes as follows:
1. get dma MR in kernel mode
2. get MR in user mode
3. deregister MR
In addition that, the locations of some functions was adjusted
in some files.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.h|   9 +
 drivers/infiniband/hw/hns/hns_roce_device.h |  46 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 156 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 103 
 drivers/infiniband/hw/hns/hns_roce_icm.c|   2 +-
 drivers/infiniband/hw/hns/hns_roce_icm.h|   1 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |   7 +
 drivers/infiniband/hw/hns/hns_roce_mr.c | 252 
 drivers/infiniband/hw/hns/hns_roce_qp.c |   1 +
 9 files changed, 575 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h 
b/drivers/infiniband/hw/hns/hns_roce_cmd.h
index 9583546..4de4cfc 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.h
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h
@@ -13,6 +13,14 @@
 #include 
 
 enum {
+   /* TPT commands */
+   HNS_ROCE_CMD_SW2HW_MPT  = 0xd,
+   HNS_ROCE_CMD_HW2SW_MPT  = 0xf,
+
+   /* CQ commands */
+   HNS_ROCE_CMD_SW2HW_CQ   = 0x16,
+   HNS_ROCE_CMD_HW2SW_CQ   = 0x17,
+
/* QP/EE commands */
HNS_ROCE_CMD_RST2INIT_QP= 0x19,
HNS_ROCE_CMD_INIT2RTR_QP= 0x1a,
@@ -28,6 +36,7 @@ enum {
 
 enum {
HNS_ROCE_CMD_TIME_CLASS_A   = 1,
+   HNS_ROCE_CMD_TIME_CLASS_B   = 1,
HNS_ROCE_CMD_TIME_CLASS_C   = 1,
 };
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 21698c5..38d1099 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -34,8 +34,11 @@
 #define HNS_ROCE_MIN_CQE_NUM   0x40
 #define HNS_ROCE_MIN_WQE_NUM   0x20
 
+/* Hardware specification only for v1 engine */
+#define HNS_ROCE_MAX_INNER_MTPT_NUM0x7
+#define HNS_ROCE_MAX_MTPT_PBL_NUM  0x10
+
 #define HNS_ROCE_MAX_IRQ_NUM   34
-#define HNS_ROCE_MAX_PORTS 6
 
 #define HNS_ROCE_COMP_VEC_NUM  32
 
@@ -51,13 +54,25 @@
 #define HNS_ROCE_MAX_GID_NUM   16
 #define HNS_ROCE_GID_SIZE  16
 
+#define MR_TYPE_MR 0x00
+#define MR_TYPE_DMA0x03
+
 #define PKEY_ID0x
 #define NODE_DESC_SIZE 64
+
+#define SERV_TYPE_RC   0
+#define SERV_TYPE_RD   1
+#define SERV_TYPE_UC   2
+#define SERV_TYPE_UD   3
+
 #define ADDR_SHIFT_12  12
 #define ADDR_SHIFT_32  32
 #define ADDR_SHIFT_44  44
 
+#define PAGES_SHIFT_8  8
 #define PAGES_SHIFT_16 16
+#define PAGES_SHIFT_24 24
+#define PAGES_SHIFT_32 32
 
 enum hns_roce_qp_state {
HNS_ROCE_QP_STATE_RST= 0,
@@ -201,6 +216,23 @@ struct hns_roce_mtt {
intpage_shift;
 };
 
+/* Only support 4K page size for mr register */
+#define MR_SIZE_4K 0
+
+struct hns_roce_mr {
+   struct ib_mribmr;
+   struct ib_umem  *umem;
+   u64 iova; /* MR's virtual orignal addr */
+   u64 size; /* Address range of MR */
+   u32 key; /* Key of MR */
+   u32 pd;   /* PD num of MR */
+   u32 access;/* Access permission of MR */
+   int enabled; /* MR's active status */
+   int type;   /* MR's register type */
+   u64 *pbl_buf;/* MR's PBL space */
+   dma_addr_t  pbl_dma_addr;   /* MR's PBL space PA */
+};
+
 struct hns_roce_mr_table {
struct hns_roce_bitmap  mtpt_bitmap;
struct hns_roce_buddy   mtt_buddy;
@@ -471,6 +503,7 @@ struct hns_roce_hw {
void (*set_mac)(struct hns_roce_dev *hr_dev, u8 phy_port, u8 *addr);
void (*set_mtu)(struct hns_roce_dev *hr_dev, u8 phy_port,
enum ib_mtu mtu);
+   int (*write_mtpt)(void *mb_buf, struct hns_roce_mr *mr, int mtpt_idx);
void (*write_cqc)(struct hns_roce_dev *hr_dev,
  struct hns_roce_cq *hr_cq, void *mb_buf, u64 *mtts,
  dma_addr_

[PATCH v5 17/21] IB/hns: Add QP operation implemention support

2016-04-23 Thread Lijun Ou

This patch was implemention for queue pair operations. QP Consists
of a Send Work Queue and a Receive Work Queue. Send and receive
queues are always created as a pair and remain that way throughout
their lifetime. A Queue Pair is identified by its Queue Pair Number.
QP operations implemention as follows:
1. create QP. When a QP is created, a complete set of initial
   attributes must be specified by the Consumer.
2. query QP. Returns the attribute list and current values for
   the specified QP.
3. modify QP. modify QP relative attributes by it.
4. destroy QP. When a QP is destroyed, any outstanding Work Requests
   are no longer considered to be in the scope of the Channel Interface.
   It is the responsibility of the Consumer to be able to clean up
   any resources
5. post send request. Builds one or more WQEs for the Send Queue in
   the specified QP.
6. post receive request. Builds one or more WQEs for the receive Queue in
   the specified QP.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_alloc.c  |  134 +++
 drivers/infiniband/hw/hns/hns_roce_cmd.c|  236 +++-
 drivers/infiniband/hw/hns/hns_roce_cmd.h|   54 +-
 drivers/infiniband/hw/hns/hns_roce_common.h |   58 +
 drivers/infiniband/hw/hns/hns_roce_device.h |  168 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 1642 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |  626 ++
 drivers/infiniband/hw/hns/hns_roce_icm.c|   55 +
 drivers/infiniband/hw/hns/hns_roce_icm.h|   11 +-
 drivers/infiniband/hw/hns/hns_roce_main.c   |   14 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c |  160 +++
 drivers/infiniband/hw/hns/hns_roce_qp.c |  765 +
 drivers/infiniband/hw/hns/hns_roce_user.h   |6 +
 13 files changed, 3912 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c 
b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index 0c76f1b..5bcffe6 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -47,6 +47,45 @@ void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, 
u32 obj)
hns_roce_bitmap_free_range(bitmap, obj, 1);
 }
 
+int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
+   int align, u32 *obj)
+{
+   int ret = 0;
+   int i;
+
+   if (likely(cnt == 1 && align == 1))
+   return hns_roce_bitmap_alloc(bitmap, obj);
+
+   spin_lock(&bitmap->lock);
+
+   *obj = bitmap_find_next_zero_area(bitmap->table, bitmap->max,
+ bitmap->last, cnt, align - 1);
+   if (*obj >= bitmap->max) {
+   bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
+  & bitmap->mask;
+   *obj = bitmap_find_next_zero_area(bitmap->table, bitmap->max, 0,
+ cnt, align - 1);
+   }
+
+   if (*obj < bitmap->max) {
+   for (i = 0; i < cnt; i++)
+   set_bit(*obj + i, bitmap->table);
+
+   if (*obj == bitmap->last) {
+   bitmap->last = (*obj + cnt);
+   if (bitmap->last >= bitmap->max)
+   bitmap->last = 0;
+   }
+   *obj |= bitmap->top;
+   } else {
+   ret = -1;
+   }
+
+   spin_unlock(&bitmap->lock);
+
+   return ret;
+}
+
 void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap, u32 obj,
int cnt)
 {
@@ -94,6 +133,101 @@ void hns_roce_bitmap_cleanup(struct hns_roce_bitmap 
*bitmap)
kfree(bitmap->table);
 }
 
+void hns_roce_buf_free(struct hns_roce_dev *hr_dev, u32 size,
+  struct hns_roce_buf *buf)
+{
+   int i;
+   struct device *dev = &hr_dev->pdev->dev;
+   u32 bits_per_long = BITS_PER_LONG;
+
+   if (buf->nbufs == 1) {
+   dma_free_coherent(dev, size, buf->direct.buf, buf->direct.map);
+   } else {
+   if (bits_per_long == 64)
+   vunmap(buf->direct.buf);
+
+   for (i = 0; i < buf->nbufs; ++i)
+   if (buf->page_list[i].buf)
+   dma_free_coherent(&hr_dev->pdev->dev, PAGE_SIZE,
+ buf->page_list[i].buf,
+ buf->page_list[i].map);
+   kfree(buf->page_list);
+   }
+}
+
+int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 size, u32 max_direct,
+  struct hns_roce_buf *buf)
+{
+   int i = 0;
+   dma_addr_t t;
+   struct page **pages;
+   struct device *dev = &hr_dev->pdev->dev;
+   u32 bits_per_long = BITS_PER_LONG;
+
+   /* SQ/RQ buf lease than one page, SQ + RQ = 8K */
+   if (size <= max_direct

[PATCH v5 03/21] IB/hns: Add initial main frame driver and get cfg info

2016-04-23 Thread Lijun Ou

This patch mainly added the initial bare main driver. it
could get the relative configure information of net node.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  49 
 drivers/infiniband/hw/hns/hns_roce_main.c   | 182 
 2 files changed, 231 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_device.h
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_main.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
new file mode 100644
index 000..b48f518
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _HNS_ROCE_DEVICE_H
+#define _HNS_ROCE_DEVICE_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME "hns_roce"
+
+#define HNS_ROCE_MAX_IRQ_NUM   34
+#define HNS_ROCE_MAX_PORTS 6
+
+struct hns_roce_ib_iboe {
+   struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
+   u8  phy_port[HNS_ROCE_MAX_PORTS];
+};
+
+struct hns_roce_caps {
+   u8  num_ports;
+};
+
+struct hns_roce_dev {
+   struct ib_deviceib_dev;
+   struct platform_device  *pdev;
+   struct hns_roce_ib_iboe iboe;
+
+   int irq[HNS_ROCE_MAX_IRQ_NUM];
+   u8 __iomem  *reg_base;
+   struct hns_roce_capscaps;
+
+   int cmd_mod;
+   int loop_idc;
+};
+
+#endif /* _HNS_ROCE_DEVICE_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
new file mode 100644
index 000..5bd84f2
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * Authors: Wei Hu 
+ * Authors: Znlong 
+ * Authors: oulijun 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_device.h"
+
+int hns_roce_get_cfg(struct hns_roce_dev *hr_dev)
+{
+   int i;
+   u8 phy_port;
+   int port_cnt = 0;
+   struct device *dev = &hr_dev->pdev->dev;
+   struct device_node *np = dev->of_node;
+   struct device_node *net_node;
+   struct net_device *netdev = NULL;
+   struct platform_device *pdev = NULL;
+   struct resource *res;
+
+   if (!of_device_is_compatible(np, "hisilicon,hns-roce-v1")) {
+   dev_err(dev, "device no compatible!\n");
+   return -EINVAL;
+   }
+
+   res = platform_get_resource(hr_dev->pdev, IORESOURCE_MEM, 0);
+   hr_dev->reg_base = devm_ioremap_resource(dev, res);
+   if (!hr_dev->reg_base) {
+   dev_err(dev, "devm_ioremap_resource failed!\n");
+   return -ENOMEM;
+   }
+
+   for (i = 0; i < HNS_ROCE_MAX_PORTS; i++) {
+   net_node = of_parse_phandle(np, "eth-handle", i);
+   if (net_node) {
+   pdev = of_find_device_by_node(net_node);
+   netdev = platform_get_drvdata(pdev);
+   phy_port = (u8)i;
+   if (netdev) {
+   hr_dev->iboe.netdevs[port_cnt] = netdev;
+   hr_dev->iboe.phy_port[port_cnt] = phy_port;
+   } else {
+   return -ENODEV;
+   }
+   port_cnt++;
+   }
+   }
+
+   hr_dev->caps.num_ports = port_cnt;
+
+   /* Cmd issue mode: 0 is poll, 1 is event */
+   hr_dev->cmd_mod = 1;
+   hr_dev->loop_idc = 0;
+
+   for (i = 0; i < HNS_ROCE_MAX_IRQ_NUM; i++) {
+   hr_dev->irq[i] = platform_get_irq(hr_dev->pdev, i);
+   if (hr_dev->irq[i] <= 0) {
+   dev_err(dev, "Get No.%d irq resource failed!\n", i);
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
+/**
+* hns_roce_probe - RoCE driver entrance
+* @pdev: pointer to platform device
+* Return : int
+*
+*/
+static int hns_roce_probe(struct pla

[PATCH v5 04/21] IB/hns: Add RoCE engine reset function

2016-04-23 Thread Lijun Ou

This patch mainly added RoCE engine reset process in RoCE driver.
it is necessary when RoCE was loaded and removed.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  7 
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 54 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 17 +
 drivers/infiniband/hw/hns/hns_roce_main.c   | 18 --
 4 files changed, 94 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_hw_v1.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_hw_v1.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index b48f518..24ac1a8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -33,6 +33,10 @@ struct hns_roce_caps {
u8  num_ports;
 };
 
+struct hns_roce_hw {
+   int (*reset)(struct hns_roce_dev *hr_dev, u32 val);
+};
+
 struct hns_roce_dev {
struct ib_deviceib_dev;
struct platform_device  *pdev;
@@ -44,6 +48,9 @@ struct hns_roce_dev {
 
int cmd_mod;
int loop_idc;
+   struct hns_roce_hw  *hw;
 };
 
+extern struct hns_roce_hw hns_roce_hw_v1;
+
 #endif /* _HNS_ROCE_DEVICE_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
new file mode 100644
index 000..ea39e56
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * Authors: Wei Hu 
+ * Authors: Znlong 
+ * Authors: oulijun 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_device.h"
+#include "hns_roce_hw_v1.h"
+
+/**
+ * hns_roce_v1_reset - reset roce
+ * @hr_dev: roce device struct pointer
+ * @val: 1 -- drop reset, 0 -- reset
+ * return 0 - success , negative --fail
+ */
+int hns_roce_v1_reset(struct hns_roce_dev *hr_dev, u32 val)
+{
+   struct device_node *dsaf_node;
+   struct device *dev = &hr_dev->pdev->dev;
+   struct device_node *np = dev->of_node;
+   int ret;
+
+   dsaf_node = of_parse_phandle(np, "dsaf-handle", 0);
+
+   if (!val) {
+   ret = hns_dsaf_roce_reset(&dsaf_node->fwnode, 0);
+   } else {
+   ret = hns_dsaf_roce_reset(&dsaf_node->fwnode, 0);
+   if (ret)
+   return ret;
+
+   msleep(SLEEP_TIME_INTERVAL);
+   ret = hns_dsaf_roce_reset(&dsaf_node->fwnode, 1);
+   }
+
+   return ret;
+}
+
+struct hns_roce_hw hns_roce_hw_v1 = {
+   .reset = hns_roce_v1_reset,
+};
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
new file mode 100644
index 000..164041d
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _HNS_ROCE_HW_V1_H
+#define _HNS_ROCE_HW_V1_H
+
+#define SLEEP_TIME_INTERVAL20
+
+extern int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, u32 val);
+
+#endif
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 5bd84f2..df3116f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -53,7 +53,9 @@ int hns_roce_get_cfg(struct hns_roce_dev *hr_dev)
struct platform_device *pdev = NULL;
struct resource *res;
 
-   if (!of_device_is_compatible(np, "hisilicon,hns-roce-v1")) {
+   if (of_device_is_compatible(np, "hisilicon,hns-roce-v1")) {
+   hr_dev->hw = &hns_roce_hw_v1;
+   } else {
dev_err(dev, "device no compatible!\n");
return -EINVAL;
}
@@ -98,6 +100,10 @@ int hns_roce_get_cfg(struct hns_roce_dev *hr_dev)
return 0;
 }
 
+int hns_roce_engine_reset(struct hns_roce_dev *hr_dev, u32 val)
+{
+   return hr_dev->hw->reset(hr_dev, val);
+}
 /**
 * hns_roce_probe - RoCE driver entrance
 * @pdev: pointer to platform device
@@ -131,13 +137,18 @@ static int hns_roce_probe(struct platform_device *pdev)
ret = -EIO;
goto error_failed_get_cfg;
}
-
ret = hns_roce_get_cfg(hr_dev);
if (ret) {
dev_err(dev, "Get Configuration failed!\n");
g

[PATCH v5 11/21] IB/hns: Add IB device registration function

2016-04-23 Thread Lijun Ou

This patch registers IB device and unregister IB device when removed.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_main.c | 48 +++
 1 file changed, 48 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 4c42a24..f98ba23 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -43,6 +43,42 @@
 #include "hns_roce_device.h"
 #include "hns_roce_icm.h"
 
+void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
+{
+   ib_unregister_device(&hr_dev->ib_dev);
+}
+
+int hns_roce_register_device(struct hns_roce_dev *hr_dev)
+{
+   int ret;
+   struct hns_roce_ib_iboe *iboe = NULL;
+   struct ib_device *ib_dev = NULL;
+   struct device *dev = &hr_dev->pdev->dev;
+
+   iboe = &hr_dev->iboe;
+
+   ib_dev = &hr_dev->ib_dev;
+   strlcpy(ib_dev->name, "hisi_%d", IB_DEVICE_NAME_MAX);
+
+   ib_dev->owner   = THIS_MODULE;
+   ib_dev->node_type   = RDMA_NODE_IB_CA;
+   ib_dev->dma_device  = dev;
+
+   ib_dev->phys_port_cnt   = hr_dev->caps.num_ports;
+   ib_dev->local_dma_lkey  = hr_dev->caps.reserved_lkey;
+   ib_dev->num_comp_vectors= hr_dev->caps.num_comp_vectors;
+   ib_dev->uverbs_abi_ver  = 1;
+
+   ret = ib_register_device(ib_dev, NULL);
+   if (ret) {
+   dev_err(dev, "ib_register_device failed!\n");
+   return ret;
+   }
+
+   return 0;
+}
+
+
 int hns_roce_get_cfg(struct hns_roce_dev *hr_dev)
 {
int i;
@@ -347,6 +383,17 @@ static int hns_roce_probe(struct platform_device *pdev)
goto error_failed_engine_init;
}
 
+   ret = hns_roce_register_device(hr_dev);
+   if (ret) {
+   dev_err(dev, "register_device failed!\n");
+   goto error_failed_register_device;
+   }
+
+   return 0;
+
+error_failed_register_device:
+   hns_roce_engine_uninit(hr_dev);
+
 error_failed_engine_init:
hns_roce_cleanup_bitmap(hr_dev);
 
@@ -383,6 +430,7 @@ static int hns_roce_remove(struct platform_device *pdev)
struct hns_roce_dev *hr_dev = platform_get_drvdata(pdev);
int ret = 0;
 
+   hns_roce_unregister_device(hr_dev);
hns_roce_engine_uninit(hr_dev);
hns_roce_cleanup_bitmap(hr_dev);
hns_roce_cleanup_icm(hr_dev);
-- 
1.9.1

[PATCH v5 06/21] IB/hns: Add initial cmd operation

2016-04-23 Thread Lijun Ou

This patch added the operation for initing cmd. In addition that,
added some functions for initing eq table and selecting cmd mode
in next stages.

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu(Xavier) 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c| 92 +
 drivers/infiniband/hw/hns/hns_roce_cmd.h| 19 ++
 drivers/infiniband/hw/hns/hns_roce_common.h |  2 +
 drivers/infiniband/hw/hns/hns_roce_device.h | 41 +
 drivers/infiniband/hw/hns/hns_roce_main.c   | 15 +
 5 files changed, 169 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cmd.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cmd.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c 
b/drivers/infiniband/hw/hns/hns_roce_cmd.c
new file mode 100644
index 000..9a274de
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_common.h"
+#include "hns_roce_device.h"
+#include "hns_roce_cmd.h"
+
+#define CMD_MAX_NUM32
+
+int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
+{
+   struct device *dev = &hr_dev->pdev->dev;
+
+   mutex_init(&hr_dev->cmd.hcr_mutex);
+   sema_init(&hr_dev->cmd.poll_sem, 1);
+   hr_dev->cmd.use_events = 0;
+   hr_dev->cmd.toggle = 1;
+   hr_dev->cmd.max_cmds = CMD_MAX_NUM;
+   hr_dev->cmd.hcr = hr_dev->reg_base + ROCEE_MB1_REG;
+   hr_dev->cmd.pool = dma_pool_create("hns_roce_cmd", dev,
+  HNS_ROCE_MAILBOX_SIZE,
+  HNS_ROCE_MAILBOX_SIZE, 0);
+   if (!hr_dev->cmd.pool) {
+   dev_err(dev, "Couldn't create mailbox pool for cmd.\n");
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+void hns_roce_cmd_cleanup(struct hns_roce_dev *hr_dev)
+{
+   dma_pool_destroy(hr_dev->cmd.pool);
+}
+
+int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
+{
+   struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
+   int i;
+
+   hr_cmd->context = kmalloc(hr_cmd->max_cmds *
+ sizeof(struct hns_roce_cmd_context),
+ GFP_KERNEL);
+   if (!hr_cmd->context)
+   return -ENOMEM;
+
+   for (i = 0; i < hr_cmd->max_cmds; ++i) {
+   hr_cmd->context[i].token = i;
+   hr_cmd->context[i].next = i + 1;
+   }
+
+   hr_cmd->context[hr_cmd->max_cmds - 1].next = -1;
+   hr_cmd->free_head = 0;
+
+   sema_init(&hr_cmd->event_sem, hr_cmd->max_cmds);
+   spin_lock_init(&hr_cmd->context_lock);
+
+   for (hr_cmd->token_mask = 1; hr_cmd->token_mask < hr_cmd->max_cmds;
+hr_cmd->token_mask <<= 1)
+   ;
+   --hr_cmd->token_mask;
+   hr_cmd->use_events = 1;
+
+   down(&hr_cmd->poll_sem);
+
+   return 0;
+}
+
+void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
+{
+   struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
+   int i;
+
+   hr_cmd->use_events = 0;
+
+   for (i = 0; i < hr_cmd->max_cmds; ++i)
+   down(&hr_cmd->event_sem);
+
+   kfree(hr_cmd->context);
+   up(&hr_cmd->poll_sem);
+}
diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h 
b/drivers/infiniband/hw/hns/hns_roce_cmd.h
new file mode 100644
index 000..4e102a4
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h
@@ -0,0 +1,19 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _HNS_ROCE_CMD_H
+#define _HNS_ROCE_CMD_H
+
+#include 
+
+enum {
+   HNS_ROCE_MAILBOX_SIZE   =  4096
+};
+
+#endif /* _HNS_ROCE_CMD_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 0f90214..84a9580 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -22,4 +22,6 @@
 
 #define ROCEE_ACK_DELAY_REG0x14
 
+#define ROCEE_MB1_REG  0x210
+
 #endif /* _HNS_ROCE_COMMON_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index e3e59d0..209c5c0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -31,6 +31,40 @@
 
 #define ADDR_SHIFT_32  32
 
+struct hns_roce_cmd_context {
+   int next;

[PATCH v4 1/3] i2c: tegra: calculate timeout for config load when needed

2016-04-23 Thread Shardar Shariff Md

Instead of calculating timeout for the config load during init,
calculate it after config load register is written by using
readx_poll_timeout().

Signed-off-by: Shardar Shariff Md 

Changes since v1:
- Split timeout calculation to seperate patch
---
 drivers/i2c/busses/i2c-tegra.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index d764d64..c1b02c7 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -110,6 +111,8 @@
 #define I2C_CLKEN_OVERRIDE 0x090
 #define I2C_MST_CORE_CLKEN_OVR (1 << 0)
 
+#define I2C_CONFIG_LOAD_TIMEOUT100
+
 /*
  * msg_end_type: The bus control which need to be send at end of transfer.
  * @MSG_END_STOP: Send stop pulse at end of transfer.
@@ -428,7 +431,6 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
u32 val;
int err = 0;
u32 clk_divisor;
-   unsigned long timeout = jiffies + HZ;
 
err = tegra_i2c_clock_enable(i2c_dev);
if (err < 0) {
@@ -478,24 +480,27 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
i2c_writel(i2c_dev, I2C_MST_CORE_CLKEN_OVR, I2C_CLKEN_OVERRIDE);
 
if (i2c_dev->hw->has_config_load_reg) {
+   u32 val;
+
i2c_writel(i2c_dev, I2C_MSTR_CONFIG_LOAD, I2C_CONFIG_LOAD);
-   while (i2c_readl(i2c_dev, I2C_CONFIG_LOAD) != 0) {
-   if (time_after(jiffies, timeout)) {
-   dev_warn(i2c_dev->dev,
-   "timeout waiting for config load\n");
-   return -ETIMEDOUT;
-   }
-   msleep(1);
+   err = readx_poll_timeout(readl, i2c_dev->base +
+tegra_i2c_reg_addr(i2c_dev,
+I2C_CONFIG_LOAD), val, val == 0,
+1000, I2C_CONFIG_LOAD_TIMEOUT);
+   if (err) {
+   dev_warn(i2c_dev->dev,
+"timeout waiting for config load\n");
+   goto err;
}
}
 
-   tegra_i2c_clock_disable(i2c_dev);
-
if (i2c_dev->irq_disabled) {
i2c_dev->irq_disabled = 0;
enable_irq(i2c_dev->irq);
}
 
+err:
+   tegra_i2c_clock_disable(i2c_dev);
return err;
 }
 
-- 
1.8.1.5

[PATCH v4 2/3] i2c: tegra: add separate function for config_load

2016-04-23 Thread Shardar Shariff Md

Define separate function for configuration load register handling
to make it use by different functions later.

Signed-off-by: Shardar Shariff Md 

---
Changes in v2:
- Remove unnecessary paranthesis and align to 80 characters per line

Changes in v3:
- Add separate function for config load handling

Changes in v4:
- Move timeout calculation to separate patch
---
 drivers/i2c/busses/i2c-tegra.c | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index c1b02c7..8d49995 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -426,6 +426,27 @@ static inline void tegra_i2c_clock_disable(struct 
tegra_i2c_dev *i2c_dev)
clk_disable(i2c_dev->fast_clk);
 }
 
+static int tegra_i2c_wait_for_config_load(struct tegra_i2c_dev *i2c_dev)
+{
+   if (i2c_dev->hw->has_config_load_reg) {
+   u32 val;
+   int err;
+
+   i2c_writel(i2c_dev, I2C_MSTR_CONFIG_LOAD, I2C_CONFIG_LOAD);
+   err = readx_poll_timeout(readl, i2c_dev->base +
+tegra_i2c_reg_addr(i2c_dev,
+I2C_CONFIG_LOAD), val, val == 0,
+1000, I2C_CONFIG_LOAD_TIMEOUT);
+   if (err) {
+   dev_warn(i2c_dev->dev,
+"timeout waiting for config load\n");
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
 {
u32 val;
@@ -479,20 +500,9 @@ static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
if (i2c_dev->is_multimaster_mode && i2c_dev->hw->has_slcg_override_reg)
i2c_writel(i2c_dev, I2C_MST_CORE_CLKEN_OVR, I2C_CLKEN_OVERRIDE);
 
-   if (i2c_dev->hw->has_config_load_reg) {
-   u32 val;
-
-   i2c_writel(i2c_dev, I2C_MSTR_CONFIG_LOAD, I2C_CONFIG_LOAD);
-   err = readx_poll_timeout(readl, i2c_dev->base +
-tegra_i2c_reg_addr(i2c_dev,
-I2C_CONFIG_LOAD), val, val == 0,
-1000, I2C_CONFIG_LOAD_TIMEOUT);
-   if (err) {
-   dev_warn(i2c_dev->dev,
-"timeout waiting for config load\n");
-   goto err;
-   }
-   }
+   err = tegra_i2c_wait_for_config_load(i2c_dev);
+   if (err)
+   goto err;
 
if (i2c_dev->irq_disabled) {
i2c_dev->irq_disabled = 0;
-- 
1.8.1.5

[PATCH v4 3/3] i2c: tegra: proper handling of error cases

2016-04-23 Thread Shardar Shariff Md

To summarize the issue observed in error cases:

SW Flow: For i2c message transfer, packet header and data payload is
posted and then required error/packet completion interrupts are enabled
later.

HW flow: HW process the packet just after packet header is posted, if
ARB lost/NACK error occurs (SW will not handle immediately when error
happens as error interrupts are not enabled at this point). HW assumes
error is acknowledged and clears current data in FIFO, But SW here posts
the remaining data payload which still stays in FIFO as stale data
(data without packet header).

Now once the interrupts are enabled, SW handles ARB lost/NACK error by
clearing the ARB lost/NACK interrupt. Now HW assumes that SW attended
the error and will parse/process stale data (data without packet header)
present in FIFO which causes invalid NACK errors.

Fix: Enable the error interrupts before posting the packet into FIFO
which make sure HW to not clear the fifo. Also disable the packet mode
before acknowledging errors (ARB lost/NACK error) to not process any
stale data. As error interrupts are enabled before posting the packet
header use spinlock to avoid preempting.

Signed-off-by: Shardar Shariff Md 

---
Changes in v2:
- Align the commit message to 72 characters per line.
- Removing unnecessary paranthesis.
- Handle error in isr

Changes in v3:
- Printing error if tegra_i2c_disable_packet_mode() fails
  is already present and handling error is not taken cared
  in ISR which was done in v2 but keeping return error in
  *wait_for_config_load() as its used in tegra_i2c_init()
---
 drivers/i2c/busses/i2c-tegra.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 8d49995..1181bbf 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -194,6 +194,7 @@ struct tegra_i2c_dev {
u16 clk_divisor_non_hs_mode;
bool is_suspended;
bool is_multimaster_mode;
+   spinlock_t xfer_lock;
 };
 
 static void dvc_writel(struct tegra_i2c_dev *i2c_dev, u32 val, unsigned long 
reg)
@@ -514,14 +515,27 @@ err:
return err;
 }
 
+static int tegra_i2c_disable_packet_mode(struct tegra_i2c_dev *i2c_dev)
+{
+   u32 cnfg;
+
+   cnfg = i2c_readl(i2c_dev, I2C_CNFG);
+   if (cnfg & I2C_CNFG_PACKET_MODE_EN)
+   i2c_writel(i2c_dev, cnfg & ~I2C_CNFG_PACKET_MODE_EN, I2C_CNFG);
+
+   return tegra_i2c_wait_for_config_load(i2c_dev);
+}
+
 static irqreturn_t tegra_i2c_isr(int irq, void *dev_id)
 {
u32 status;
const u32 status_err = I2C_INT_NO_ACK | I2C_INT_ARBITRATION_LOST;
struct tegra_i2c_dev *i2c_dev = dev_id;
+   unsigned long flags;
 
status = i2c_readl(i2c_dev, I2C_INT_STATUS);
 
+   spin_lock_irqsave(&i2c_dev->xfer_lock, flags);
if (status == 0) {
dev_warn(i2c_dev->dev, "irq status 0 %08x %08x %08x\n",
 i2c_readl(i2c_dev, I2C_PACKET_TRANSFER_STATUS),
@@ -537,6 +551,7 @@ static irqreturn_t tegra_i2c_isr(int irq, void *dev_id)
}
 
if (unlikely(status & status_err)) {
+   tegra_i2c_disable_packet_mode(i2c_dev);
if (status & I2C_INT_NO_ACK)
i2c_dev->msg_err |= I2C_ERR_NO_ACK;
if (status & I2C_INT_ARBITRATION_LOST)
@@ -566,7 +581,7 @@ static irqreturn_t tegra_i2c_isr(int irq, void *dev_id)
BUG_ON(i2c_dev->msg_buf_remaining);
complete(&i2c_dev->msg_complete);
}
-   return IRQ_HANDLED;
+   goto done;
 err:
/* An error occurred, mask all interrupts */
tegra_i2c_mask_irq(i2c_dev, I2C_INT_NO_ACK | I2C_INT_ARBITRATION_LOST |
@@ -577,6 +592,8 @@ err:
dvc_writel(i2c_dev, DVC_STATUS_I2C_DONE_INTR, DVC_STATUS);
 
complete(&i2c_dev->msg_complete);
+done:
+   spin_unlock_irqrestore(&i2c_dev->xfer_lock, flags);
return IRQ_HANDLED;
 }
 
@@ -586,6 +603,7 @@ static int tegra_i2c_xfer_msg(struct tegra_i2c_dev *i2c_dev,
u32 packet_header;
u32 int_mask;
unsigned long time_left;
+   unsigned long flags;
 
tegra_i2c_flush_fifos(i2c_dev);
 
@@ -598,6 +616,11 @@ static int tegra_i2c_xfer_msg(struct tegra_i2c_dev 
*i2c_dev,
i2c_dev->msg_read = (msg->flags & I2C_M_RD);
reinit_completion(&i2c_dev->msg_complete);
 
+   spin_lock_irqsave(&i2c_dev->xfer_lock, flags);
+
+   int_mask = I2C_INT_NO_ACK | I2C_INT_ARBITRATION_LOST;
+   tegra_i2c_unmask_irq(i2c_dev, int_mask);
+
packet_header = (0 << PACKET_HEADER0_HEADER_SIZE_SHIFT) |
PACKET_HEADER0_PROTOCOL_I2C |
(i2c_dev->cont_id << PACKET_HEADER0_CONT_ID_SHIFT) |
@@ -627,14 +650,15 @@ static int tegra_i2c_xfer_msg(struct tegra_i2c_dev 
*i2c_dev,
if (!(msg->flags & I2C_M_RD))
tegra_i2c_fill_tx_fifo(i2c_dev);
 
-   int_mask = I2C_INT_

Re: [PATCH] x86/boot: Rename overlapping memcpy() to memmove()

2016-04-23 Thread Ingo Molnar


* Kees Cook  wrote:

> --- a/arch/x86/boot/compressed/string.c
> +++ b/arch/x86/boot/compressed/string.c
> @@ -1,7 +1,13 @@
> +/*
> + * This provides an optimized implementation of memcpy, and a simplified
> + * implementation of memset and memmove, to avoid problems with the
> + * built-in implementations when running in the restricted decompression
> + * stub environment.
> + */

Does 'built in' here mean the compiler's implementation?

We cannot call kernel built-in functions yet, so we have to duplicate 
everything 
we might need, right?

Thanks,

Ingo

[GIT PULL] objtool fixes

2016-04-23 Thread Ingo Molnar

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
core-urgent-for-linus

   # HEAD: c2bb9e32e2315971a8535fee77335c04a739d71d objtool: Fix Makefile to 
properly see if libelf is supported

A handful of objtool fixes: two improvements to how warnings are printed plus a 
false positive warning fix, and build environment fix.

 Thanks,

Ingo

-->
Josh Poimboeuf (2):
  objtool: Add workaround for GCC switch jump table bug
  objtool: Detect falling through to the next function

Steven Rostedt (1):
  objtool: Fix Makefile to properly see if libelf is supported


 Makefile |  3 +-
 tools/objtool/Documentation/stack-validation.txt | 38 +++---
 tools/objtool/builtin-check.c| 97 ++--
 3 files changed, 103 insertions(+), 35 deletions(-)

diff --git a/Makefile b/Makefile
index 1d0aef03eae7..70ca38ef9f4b 100644
--- a/Makefile
+++ b/Makefile
@@ -1008,7 +1008,8 @@ prepare0: archprepare FORCE
 prepare: prepare0 prepare-objtool
 
 ifdef CONFIG_STACK_VALIDATION
-  has_libelf := $(shell echo "int main() {}" | $(HOSTCC) -xc -o /dev/null 
-lelf - &> /dev/null && echo 1 || echo 0)
+  has_libelf := $(call try-run,\
+   echo "int main() {}" | $(HOSTCC) -xc -o /dev/null -lelf -,1,0)
   ifeq ($(has_libelf),1)
 objtool_target := tools/objtool FORCE
   else
diff --git a/tools/objtool/Documentation/stack-validation.txt 
b/tools/objtool/Documentation/stack-validation.txt
index 5a95896105bc..55a60d331f47 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -299,18 +299,38 @@ they mean, and suggestions for how to fix them.
 Errors in .c files
 --
 
-If you're getting an objtool error in a compiled .c file, chances are
-the file uses an asm() statement which has a "call" instruction.  An
-asm() statement with a call instruction must declare the use of the
-stack pointer in its output operand.  For example, on x86_64:
+1. c_file.o: warning: objtool: funcA() falls through to next function funcB()
 
-   register void *__sp asm("rsp");
-   asm volatile("call func" : "+r" (__sp));
+   This means that funcA() doesn't end with a return instruction or an
+   unconditional jump, and that objtool has determined that the function
+   can fall through into the next function.  There could be different
+   reasons for this:
 
-Otherwise the stack frame may not get created before the call.
+   1) funcA()'s last instruction is a call to a "noreturn" function like
+  panic().  In this case the noreturn function needs to be added to
+  objtool's hard-coded global_noreturns array.  Feel free to bug the
+  objtool maintainer, or you can submit a patch.
 
-Another possible cause for errors in C code is if the Makefile removes
--fno-omit-frame-pointer or adds -fomit-frame-pointer to the gcc options.
+   2) funcA() uses the unreachable() annotation in a section of code
+  that is actually reachable.
+
+   3) If funcA() calls an inline function, the object code for funcA()
+  might be corrupt due to a gcc bug.  For more details, see:
+  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
+
+2. If you're getting any other objtool error in a compiled .c file, it
+   may be because the file uses an asm() statement which has a "call"
+   instruction.  An asm() statement with a call instruction must declare
+   the use of the stack pointer in its output operand.  For example, on
+   x86_64:
+
+ register void *__sp asm("rsp");
+ asm volatile("call func" : "+r" (__sp));
+
+   Otherwise the stack frame may not get created before the call.
+
+3. Another possible cause for errors in C code is if the Makefile removes
+   -fno-omit-frame-pointer or adds -fomit-frame-pointer to the gcc options.
 
 Also see the above section for .S file errors for more information what
 the individual error messages mean.
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index 7515cb2e879a..e8a1e69eb92c 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -54,6 +54,7 @@ struct instruction {
struct symbol *call_dest;
struct instruction *jump_dest;
struct list_head alts;
+   struct symbol *func;
 };
 
 struct alternative {
@@ -66,6 +67,7 @@ struct objtool_file {
struct list_head insn_list;
DECLARE_HASHTABLE(insn_hash, 16);
struct section *rodata, *whitelist;
+   bool ignore_unreachables, c_file;
 };
 
 const char *objname;
@@ -228,7 +230,7 @@ static int __dead_end_function(struct objtool_file *file, 
struct symbol *func,
}
}
 
-   if (insn->type == INSN_JUMP_DYNAMIC)
+   if (insn->type == INSN_JUMP_DYNAMIC && list_empty(&insn->alts))
/* sibling call */

[GIT PULL] irq fixes

2016-04-23 Thread Ingo Molnar

Linus,

Please pull the latest irq-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
irq-urgent-for-linus

   # HEAD: 91951f980e521d8f7e92283735b99fb9f4b05d93 irqchip/mips-gic: Don't 
overrun pcpu_masks array

A core irq affinity masks related fix and a MIPS irqchip driver fix.

 Thanks,

Ingo

-->
Matt Redfearn (1):
  genirq: Dont allow affinity mask to be updated on IPIs

Paul Burton (1):
  irqchip/mips-gic: Don't overrun pcpu_masks array


 drivers/irqchip/irq-mips-gic.c | 4 ++--
 kernel/irq/ipi.c   | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index 94a30da0cfac..4dffccf532a2 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -467,7 +467,7 @@ static int gic_set_affinity(struct irq_data *d, const 
struct cpumask *cpumask,
gic_map_to_vpe(irq, mips_cm_vp_id(cpumask_first(&tmp)));
 
/* Update the pcpu_masks */
-   for (i = 0; i < gic_vpes; i++)
+   for (i = 0; i < min(gic_vpes, NR_CPUS); i++)
clear_bit(irq, pcpu_masks[i].pcpu_mask);
set_bit(irq, pcpu_masks[cpumask_first(&tmp)].pcpu_mask);
 
@@ -707,7 +707,7 @@ static int gic_shared_irq_domain_map(struct irq_domain *d, 
unsigned int virq,
spin_lock_irqsave(&gic_lock, flags);
gic_map_to_pin(intr, gic_cpu_pin);
gic_map_to_vpe(intr, vpe);
-   for (i = 0; i < gic_vpes; i++)
+   for (i = 0; i < min(gic_vpes, NR_CPUS); i++)
clear_bit(intr, pcpu_masks[i].pcpu_mask);
set_bit(intr, pcpu_masks[vpe].pcpu_mask);
spin_unlock_irqrestore(&gic_lock, flags);
diff --git a/kernel/irq/ipi.c b/kernel/irq/ipi.c
index c37f34b00a11..14777af8e097 100644
--- a/kernel/irq/ipi.c
+++ b/kernel/irq/ipi.c
@@ -94,6 +94,7 @@ unsigned int irq_reserve_ipi(struct irq_domain *domain,
data = irq_get_irq_data(virq + i);
cpumask_copy(data->common->affinity, dest);
data->common->ipi_offset = offset;
+   irq_set_status_flags(virq + i, IRQ_NO_BALANCING);
}
return virq;

Re: [PATCH] mm/vmalloc: Keep a separate lazy-free list

2016-04-23 Thread Roman Peniaev

On Fri, Apr 22, 2016 at 11:49 PM, Andrew Morton
 wrote:
> On Fri, 15 Apr 2016 12:14:31 +0100 Chris Wilson  
> wrote:
>
>> > > purge_fragmented_blocks() manages per-cpu lists, so that looks safe
>> > > under its own rcu_read_lock.
>> > >
>> > > Yes, it looks feasible to remove the purge_lock if we can relax sync.
>> >
>> > what is still left is waiting on vmap_area_lock for !sync mode.
>> > but probably is not that bad.
>>
>> Ok, that's bit beyond my comfort zone with a patch to change the free
>> list handling. I'll chicken out for the time being, atm I am more
>> concerned that i915.ko may call set_page_wb() frequently on individual
>> pages.
>
> Nick Piggin's vmap rewrite.  20x (or more) faster.
> https://lwn.net/Articles/285341/
>
> 10 years ago, never finished.

But that's exactly what we are changing making 20.5x faster :)

--
Roman

[GIT PULL] locking fixes

2016-04-23 Thread Ingo Molnar

Linus,

Please pull the latest locking-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
locking-urgent-for-linus

   # HEAD: fba7cd681b6155e2d93e7862fcd6f970336b83c3 asm-generic/futex: 
Re-enable preemption in futex_atomic_cmpxchg_inatomic()

Misc fixes:
 - pvqspinlocks: an instrumentation fix
 - futexes: a preempt-count vs. pagefault_disable decouple corner case fix
 - futexes: futex requeue plist race window fix
 - futexes: a futex UNLOCK_PI transaction fix for a corner case

 Thanks,

Ingo

-->
Davidlohr Bueso (2):
  locking/pvqspinlock: Fix division by zero in qstat_read()
  futex: Acknowledge a new waiter in counter before plist

Romain Perier (1):
  asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()

Sebastian Andrzej Siewior (1):
  futex: Handle unlock_pi race gracefully


 include/asm-generic/futex.h |  8 ++--
 kernel/futex.c  | 27 +++
 kernel/locking/qspinlock_stat.h |  8 +---
 3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/futex.h b/include/asm-generic/futex.h
index e56272c919b5..bf2d34c9d804 100644
--- a/include/asm-generic/futex.h
+++ b/include/asm-generic/futex.h
@@ -108,11 +108,15 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user 
*uaddr,
u32 val;
 
preempt_disable();
-   if (unlikely(get_user(val, uaddr) != 0))
+   if (unlikely(get_user(val, uaddr) != 0)) {
+   preempt_enable();
return -EFAULT;
+   }
 
-   if (val == oldval && unlikely(put_user(newval, uaddr) != 0))
+   if (val == oldval && unlikely(put_user(newval, uaddr) != 0)) {
+   preempt_enable();
return -EFAULT;
+   }
 
*uval = val;
preempt_enable();
diff --git a/kernel/futex.c b/kernel/futex.c
index a5d2e74c89e0..c20f06f38ef3 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1295,10 +1295,20 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, 
struct futex_q *this,
if (unlikely(should_fail_futex(true)))
ret = -EFAULT;
 
-   if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
+   if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) {
ret = -EFAULT;
-   else if (curval != uval)
-   ret = -EINVAL;
+   } else if (curval != uval) {
+   /*
+* If a unconditional UNLOCK_PI operation (user space did not
+* try the TID->0 transition) raced with a waiter setting the
+* FUTEX_WAITERS flag between get_user() and locking the hash
+* bucket lock, retry the operation.
+*/
+   if ((FUTEX_TID_MASK & curval) == uval)
+   ret = -EAGAIN;
+   else
+   ret = -EINVAL;
+   }
if (ret) {
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
return ret;
@@ -1525,8 +1535,8 @@ void requeue_futex(struct futex_q *q, struct 
futex_hash_bucket *hb1,
if (likely(&hb1->chain != &hb2->chain)) {
plist_del(&q->list, &hb1->chain);
hb_waiters_dec(hb1);
-   plist_add(&q->list, &hb2->chain);
hb_waiters_inc(hb2);
+   plist_add(&q->list, &hb2->chain);
q->lock_ptr = &hb2->lock;
}
get_futex_key_refs(key2);
@@ -2623,6 +2633,15 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned 
int flags)
if (ret == -EFAULT)
goto pi_faulted;
/*
+* A unconditional UNLOCK_PI op raced against a waiter
+* setting the FUTEX_WAITERS bit. Try again.
+*/
+   if (ret == -EAGAIN) {
+   spin_unlock(&hb->lock);
+   put_futex_key(&key);
+   goto retry;
+   }
+   /*
 * wake_futex_pi has detected invalid state. Tell user
 * space.
 */
diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
index eb2a2c9bc3fc..d734b7502001 100644
--- a/kernel/locking/qspinlock_stat.h
+++ b/kernel/locking/qspinlock_stat.h
@@ -136,10 +136,12 @@ static ssize_t qstat_read(struct file *file, char __user 
*user_buf,
}
 
if (counter == qstat_pv_hash_hops) {
-   u64 frac;
+   u64 frac = 0;
 
-   frac = 100ULL * do_div(stat, kicks);
-   frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
+   if (kicks) {
+   frac = 100ULL * do_div(stat, kicks);
+   frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
+   }
 
/*
 * Return a X.XX decimal number

[GIT PULL] perf fix

2016-04-23 Thread Ingo Molnar

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-urgent-for-linus

   # HEAD: a19cad6d66823ddd54b0e7c88d7bddd307cb1161 Merge tag 
'perf-urgent-for-mingo-20160418' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

A single tooling fix for a user-triggerable segfault.

 Thanks,

Ingo

-->
Adrian Hunter (1):
  perf intel-pt: Fix segfault tracing transactions


 tools/perf/util/intel-pt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 407f11b97c8d..617578440989 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1130,7 +1130,7 @@ static int intel_pt_synth_transaction_sample(struct 
intel_pt_queue *ptq)
pr_err("Intel Processor Trace: failed to deliver transaction 
event, error %d\n",
   ret);
 
-   if (pt->synth_opts.callchain)
+   if (pt->synth_opts.last_branch)
intel_pt_reset_last_branch_rb(ptq);
 
return ret;

[GIT PULL] CPU hotplug fix

2016-04-23 Thread Ingo Molnar

Linus,

Please pull the latest smp-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
smp-urgent-for-linus

   # HEAD: 3b9d6da67e11ca8f78fde887918983523a36b0fa cpu/hotplug: Fix rollback 
during error-out in __cpu_disable()

Fix a CPU hotplug corner case regression, introduced by the recent hotplug 
rework.

 Thanks,

Ingo

-->
Sebastian Andrzej Siewior (1):
  cpu/hotplug: Fix rollback during error-out in __cpu_disable()


 kernel/cpu.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6ea42e8da861..3e3f6e49eabb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -36,6 +36,7 @@
  * @target:The target state
  * @thread:Pointer to the hotplug thread
  * @should_run:Thread should execute
+ * @rollback:  Perform a rollback
  * @cb_stat:   The state for a single callback (install/uninstall)
  * @cb:Single callback function (install/uninstall)
  * @result:Result of the operation
@@ -47,6 +48,7 @@ struct cpuhp_cpu_state {
 #ifdef CONFIG_SMP
struct task_struct  *thread;
boolshould_run;
+   boolrollback;
enum cpuhp_statecb_state;
int (*cb)(unsigned int cpu);
int result;
@@ -301,6 +303,11 @@ static int cpu_notify(unsigned long val, unsigned int cpu)
return __cpu_notify(val, cpu, -1, NULL);
 }
 
+static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
+{
+   BUG_ON(cpu_notify(val, cpu));
+}
+
 /* Notifier wrappers for transitioning to state machine */
 static int notify_prepare(unsigned int cpu)
 {
@@ -477,6 +484,16 @@ static void cpuhp_thread_fun(unsigned int cpu)
} else {
ret = cpuhp_invoke_callback(cpu, st->cb_state, st->cb);
}
+   } else if (st->rollback) {
+   BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
+
+   undo_cpu_down(cpu, st, cpuhp_ap_states);
+   /*
+* This is a momentary workaround to keep the notifier users
+* happy. Will go away once we got rid of the notifiers.
+*/
+   cpu_notify_nofail(CPU_DOWN_FAILED, cpu);
+   st->rollback = false;
} else {
/* Cannot happen  */
BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
@@ -636,11 +653,6 @@ static inline void check_for_tasks(int dead_cpu)
read_unlock(&tasklist_lock);
 }
 
-static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
-{
-   BUG_ON(cpu_notify(val, cpu));
-}
-
 static int notify_down_prepare(unsigned int cpu)
 {
int err, nr_calls = 0;
@@ -721,9 +733,10 @@ static int takedown_cpu(unsigned int cpu)
 */
err = stop_machine(take_cpu_down, NULL, cpumask_of(cpu));
if (err) {
-   /* CPU didn't die: tell everyone.  Can't complain. */
-   cpu_notify_nofail(CPU_DOWN_FAILED, cpu);
+   /* CPU refused to die */
irq_unlock_sparse();
+   /* Unpark the hotplug thread so we can rollback there */
+   kthread_unpark(per_cpu_ptr(&cpuhp_state, cpu)->thread);
return err;
}
BUG_ON(cpu_online(cpu));
@@ -832,6 +845,11 @@ static int __ref _cpu_down(unsigned int cpu, int 
tasks_frozen,
 * to do the further cleanups.
 */
ret = cpuhp_down_callbacks(cpu, st, cpuhp_bp_states, target);
+   if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+   st->target = prev_state;
+   st->rollback = true;
+   cpuhp_kick_ap_work(cpu);
+   }
 
hasdied = prev_state != st->state && st->state == CPUHP_OFFLINE;
 out:
@@ -1249,6 +1267,7 @@ static struct cpuhp_step cpuhp_ap_states[] = {
.name   = "notify:online",
.startup= notify_online,
.teardown   = notify_down_prepare,
+   .skip_onerr = true,
},
 #endif
/*

[GIT PULL] timer fix

2016-04-23 Thread Ingo Molnar

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
timers-urgent-for-linus

   # HEAD: 16eeed7e5558a3dcf30f75526a896b2632f299f9 
clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test

Fix a boot hang in the ARM based Tango SoC clocksource driver.

 Thanks,

Ingo

-->
Daniel Lezcano (1):
  clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test


 drivers/clocksource/tango_xtal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/tango_xtal.c b/drivers/clocksource/tango_xtal.c
index 2bcecafdeaea..c407c47a3232 100644
--- a/drivers/clocksource/tango_xtal.c
+++ b/drivers/clocksource/tango_xtal.c
@@ -42,7 +42,7 @@ static void __init tango_clocksource_init(struct device_node 
*np)
 
ret = clocksource_mmio_init(xtal_in_cnt, "tango-xtal", xtal_freq, 350,
32, clocksource_mmio_readl_up);
-   if (!ret) {
+   if (ret) {
pr_err("%s: registration failed\n", np->full_name);
return;
}

[GIT PULL] x86 fixes

2016-04-23 Thread Ingo Molnar

Linus,

Please pull the latest x86-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-urgent-for-linus

   # HEAD: ea5dfb5fae81939f777ca569d8cfb599252da2e8 x86 EDAC, sb_edac.c: Take 
account of channel hashing when needed

Misc fixes: two EDAC driver fixes, a Xen crash fix, a HyperV log spam fix and a 
documentation fix.


  out-of-topic modifications in x86-urgent-for-linus:
  -
  drivers/edac/sb_edac.c # ea5dfb5fae81: x86 EDAC, sb_edac.c: Take 
ac
   # ff15e95c8276: x86 EDAC, sb_edac.c: Repair 

 Thanks,

Ingo

-->
Jan Beulich (1):
  x86/mm/xen: Suppress hugetlbfs in PV guests

Juergen Gross (1):
  x86/doc: Correct limits in Documentation/x86/x86_64/mm.txt

Tony Luck (2):
  x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel 
address
  x86 EDAC, sb_edac.c: Take account of channel hashing when needed

Vitaly Kuznetsov (1):
  x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances


 Documentation/x86/x86_64/mm.txt |  6 +++---
 arch/x86/include/asm/hugetlb.h  |  1 +
 arch/x86/kernel/cpu/mshyperv.c  | 12 
 drivers/edac/sb_edac.c  | 30 ++
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index c518dce7da4d..5aa738346062 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -19,7 +19,7 @@ ff00 - ff7f (=39 bits) %esp fixup 
stacks
 ffef -  (=64 GB) EFI region mapping space
 ... unused hole ...
 8000 - a000 (=512 MB)  kernel text mapping, from phys 0
-a000 - ff5f (=1525 MB) module mapping space
+a000 - ff5f (=1526 MB) module mapping space
 ff60 - ffdf (=8 MB) vsyscalls
 ffe0 -  (=2 MB) unused hole
 
@@ -31,8 +31,8 @@ vmalloc space is lazily synchronized into the different PML4 
pages of
 the processes using the page fault handler, with init_level4_pgt as
 reference.
 
-Current X86-64 implementations only support 40 bits of address space,
-but we support up to 46 bits. This expands into MBZ space in the page tables.
+Current X86-64 implementations support up to 46 bits of address space (64 TB),
+which is our current limit. This expands into MBZ space in the page tables.
 
 We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
 memory window (this size is arbitrary, it can be raised later if needed).
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index f8a29d2c97b0..e6a8613fbfb0 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 
+#define hugepages_supported() cpu_has_pse
 
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr,
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 4e7c6933691c..10c11b4da31d 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -152,6 +152,11 @@ static struct clocksource hyperv_cs = {
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
+static unsigned char hv_get_nmi_reason(void)
+{
+   return 0;
+}
+
 static void __init ms_hyperv_init_platform(void)
 {
/*
@@ -191,6 +196,13 @@ static void __init ms_hyperv_init_platform(void)
machine_ops.crash_shutdown = hv_machine_crash_shutdown;
 #endif
mark_tsc_unstable("running on Hyper-V");
+
+   /*
+* Generation 2 instances don't support reading the NMI status from
+* 0x61 port.
+*/
+   if (efi_enabled(EFI_BOOT))
+   x86_platform.get_nmi_reason = hv_get_nmi_reason;
 }
 
 const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 93f0d4120289..468447aff8eb 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -362,6 +362,7 @@ struct sbridge_pvt {
 
/* Memory type detection */
boolis_mirrored, is_lockstep, is_close_pg;
+   boolis_chan_hash;
 
/* Fifo double buffers */
struct mce  mce_entry[MCE_LOG_LEN];
@@ -1060,6 +1061,20 @@ static inline u8 sad_pkg_ha(u8 pkg)
return (pkg >> 2) & 0x1;
 }
 
+static int haswell_chan_hash(int idx, u64 addr)
+{
+   int i;
+
+   /*
+* XOR even bits from 12:26 to bit0 of idx,
+* odd bits from 13:27 to bit1
+*/
+   for (i = 12; i < 28; i += 2)
+   idx ^= (addr >> i) & 3;
+
+   return idx;
+}
+
 /
Memory check routines
  **

Re: [PATCH 1/3] MIPS: JZ4740: Qi LB60: Remove support for AVT2 variant

2016-04-23 Thread Lars-Peter Clausen

On 04/18/2016 08:58 PM, Maarten ter Huurne wrote:
> AVT2 was a prototype board of which about 5 were made, none of which
> are in use anymore.
> 
> Signed-off-by: Maarten ter Huurne 

Acked-by: Lars-Peter Clausen 

Thanks.

Re: [PATCH v1 00/23] ata: sata_dwc_460ex: make it working again

2016-04-23 Thread Julian Margetson


On 4/22/2016 7:06 AM, Christian Lamparter wrote:

On Friday, April 22, 2016 06:50:44 AM Julian Margetson wrote:

On 4/21/2016 4:25 PM, Christian Lamparter wrote:

On Thursday, April 21, 2016 09:15:21 PM Andy Shevchenko wrote:

The last approach in the commit 8b3444852a2b ("sata_dwc_460ex: move to generic
DMA driver") to switch to generic DMA engine API wasn't tested on bare metal.
Besides that we expecting new board support coming with the same SATA IP but
with different DMA.

The driver has been tested myself on Sam460ex and WD MyBookLive (apollo3g)
boards. In any case I ask Christian, Måns, and Julian to independently test and
provide Tested-by tag or error report.

I did a test run on my WD MyBook Live. I applied all the patches in
this series on top of the topic/dw branch of Vinod Koul:


Tested-by: Christian Lamparter
---
results for my old ST3808110AS HDD. filesystem is ext4.

# hdparm -t /dev/sda

/dev/sda:
   Timing buffered disk reads: 204 MB in  3.02 seconds =  67.51 MB/sec

# bonnie++ -u mbl
Using uid:1000, gid:1000.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
mbl496M98  99 26011  21 17589  20   538  99 80138  39 208.9   8
Latency 95267us1409ms 295ms   26947us9644us1787ms
Version  1.97   --Sequential Create-- Random Create
mbl -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
%CP
   16  6959  78 + +++  5197  40  7250  79 + +++  4718  
37
Latency   149ms6742us 212ms 177ms 767us 217ms
1.97,1.97,mbl,1,1461269771,496M,,98,99,26011,21,17589,20,538,99,80138,39,208.9,8,16,6959,78,+,+++,5197,40,7250,79,+,+++,4718,37,95267us,1409ms,295ms,26947us,9644us,1787ms,149ms,6742us,212ms,177ms,767us,217ms


Again on copy partitions .

Ok, here's the copy from my mail off-list.

Well, a unrelated driver "m41t80" caused a crash:
[   12.912739] Oops: Kernel access of bad area, sig: 11 [#3]
[   12.912743] PREEMPT Canyonlands
[   12.912753] CPU: 0 PID: 1413 Comm: irq/45-m41t80 Tainted: G  D 
4.6.0-rc4-next-20160421-sam460ex-jm #1
[   12.912757] task: ea9834e0 ti: eea6c000 task.ti: eea6c000
[   12.912760] NIP: c0224480 LR: c0023494 CTR: c0042508
[   12.912764] REGS: eea6daf0 TRAP: 0300   Tainted: G  D  
(4.6.0-rc4-next-20160421-sam460ex-jm)
[   12.912774] MSR: 00029000   CR: 24008282  XER: 
[   12.912825] DEAR: 0008 ESR: 
[...]
[   12.912927] --- interrupt: 300 at mutex_lock+0x0/0x1c
[   12.912927] LR = m41t80_handle_irq+0x28/0xac
[   12.912932] [eea6de40] []   (null) (unreliable)
[   12.912938] [eea6de60] [c004ffac] irq_thread_fn+0x2c/0x48
[   12.912944] [eea6de80] [c00501cc] irq_thread+0xc4/0x160
[   12.912951] [eea6ded0] [c003a3f8] kthread+0xc8/0xcc
[   12.912957] [eea6df40] [c000aee8] ret_from_kernel_thread+0x5c/0x64
[   12.912960] Instruction dump:
[   12.912974] 80010014 7fc3f378 bbc10008 7c0803a6 38210010 4be24ca8 9421ffd0 
7c0802a6
[   12.912987] bf210014 90010034 3b4302d8 812302ec <83890008> 812302d8 7f9a4840 
419e011c
[   12.912995] Fixing recursive fault but reboot is needed!
  ^^^ "reboot is needed!"

Another thing that came to my mind: Have you checked if your hard drive
and the cables are ok? Are there any pending sectors or suspicious smart
values? Has the drive passed the extended offline test?
  
Otherwise, I can't reproduce the error with my MyBook system. I've tested

your kernel and it worked on the device without crashing. (I copied/dd'ed
80GB from and back to the hard-drive. It was long and boring, but I didn't
encounter any issues and the crc32 matched).

Sorry, but I can't help you if I can't reproduce it... And short of sending
your box to test, I see no efficient way to debug it. However, what I can
do, if you are interested: I have a few "build your own" My Book Live kits.
It just needs a 3.5" hard-drive and 12v power adapter. If you are interested
PM me off-list, this way you can verify that the kernels you build do work,
just in case this error is due to a hardware issue (zapped controller,
bad ram/drive/cable?) with your sam460ex box.

Regards,
Christian



My Hardware seems ok.
I have swapped cables and drives

[tip:perf/core] perf/x86/intel: Add model number for Skylake Server to perf

2016-04-23 Thread tip-bot for Andi Kleen

Commit-ID:  b89c173788c3a8ed571652c203bf59a0e9d700aa
Gitweb: http://git.kernel.org/tip/b89c173788c3a8ed571652c203bf59a0e9d700aa
Author: Andi Kleen 
AuthorDate: Fri, 15 Apr 2016 13:25:33 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 13:46:44 +0200

perf/x86/intel: Add model number for Skylake Server to perf

Everything the same as base Skylake, just a new model number.

Signed-off-by: Andi Kleen 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/1460751933-2264-1-git-send-email-a...@firstfloor.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 68fa55b..aff7988 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3639,6 +3639,7 @@ __init int intel_pmu_init(void)
 
case 78: /* 14nm Skylake Mobile */
case 94: /* 14nm Skylake Desktop */
+   case 85: /* 14nm Skylake Server */
x86_pmu.late_ack = true;
memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, 
sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, 
sizeof(hw_cache_extra_regs));

[tip:perf/core] perf/x86/intel/rapl: Add missing Haswell model

2016-04-23 Thread tip-bot for Srinivas Pandruvada

Commit-ID:  e1089602a3bf3efd13d0ffc575f3e22213f009da
Gitweb: http://git.kernel.org/tip/e1089602a3bf3efd13d0ffc575f3e22213f009da
Author: Srinivas Pandruvada 
AuthorDate: Sun, 17 Apr 2016 08:43:29 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 13:46:45 +0200

perf/x86/intel/rapl: Add missing Haswell model

Added one missing Haswell model.

Signed-off-by: Srinivas Pandruvada 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: b...@alien8.de
Cc: h...@zytor.com
Link: 
http://lkml.kernel.org/r/1460907809-11897-1-git-send-email-srinivas.pandruv...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/rapl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 70c93f9..1705c9d 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -718,6 +718,7 @@ static int __init rapl_pmu_init(void)
break;
case 60: /* Haswell */
case 69: /* Haswell-Celeron */
+   case 70: /* Haswell GT3e */
case 61: /* Broadwell */
case 71: /* Broadwell-H */
rapl_cntr_mask = RAPL_IDX_HSW;

[tip:locking/urgent] locking/lockdep: Fix ->irq_context calculation

2016-04-23 Thread tip-bot for Boqun Feng

Commit-ID:  c24697566298df04cac9913e0601501b5ee2b3f5
Gitweb: http://git.kernel.org/tip/c24697566298df04cac9913e0601501b5ee2b3f5
Author: Boqun Feng 
AuthorDate: Tue, 16 Feb 2016 13:57:40 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 13:53:03 +0200

locking/lockdep: Fix ->irq_context calculation

task_irq_context() returns the encoded irq_context of the task, the
return value is encoded in the same as ->irq_context of held_lock.

Always return 0 if !(CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING)

Signed-off-by: Boqun Feng 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Josh Triplett 
Cc: Lai Jiangshan 
Cc: Linus Torvalds 
Cc: Mathieu Desnoyers 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: sasha.le...@oracle.com
Link: 
http://lkml.kernel.org/r/1455602265-16490-2-git-send-email-boqun.f...@gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/locking/lockdep.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index ed94109..beb06f6 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2932,6 +2932,11 @@ static int mark_irqflags(struct task_struct *curr, 
struct held_lock *hlock)
return 1;
 }
 
+static inline unsigned int task_irq_context(struct task_struct *task)
+{
+   return 2 * !!task->hardirq_context + !!task->softirq_context;
+}
+
 static int separate_irq_context(struct task_struct *curr,
struct held_lock *hlock)
 {
@@ -2940,8 +2945,6 @@ static int separate_irq_context(struct task_struct *curr,
/*
 * Keep track of points where we cross into an interrupt context:
 */
-   hlock->irq_context = 2*(curr->hardirq_context ? 1 : 0) +
-   curr->softirq_context;
if (depth) {
struct held_lock *prev_hlock;
 
@@ -2973,6 +2976,11 @@ static inline int mark_irqflags(struct task_struct *curr,
return 1;
 }
 
+static inline unsigned int task_irq_context(struct task_struct *task)
+{
+   return 0;
+}
+
 static inline int separate_irq_context(struct task_struct *curr,
struct held_lock *hlock)
 {
@@ -3241,6 +3249,7 @@ static int __lock_acquire(struct lockdep_map *lock, 
unsigned int subclass,
hlock->acquire_ip = ip;
hlock->instance = lock;
hlock->nest_lock = nest_lock;
+   hlock->irq_context = task_irq_context(curr);
hlock->trylock = trylock;
hlock->read = read;
hlock->check = check;

[tip:locking/urgent] lockdep: Fix lock_chain::base size

2016-04-23 Thread tip-bot for Peter Zijlstra

Commit-ID:  75dd602a5198a6e5f75534db52b6e6fbaabb33d1
Gitweb: http://git.kernel.org/tip/75dd602a5198a6e5f75534db52b6e6fbaabb33d1
Author: Peter Zijlstra 
AuthorDate: Wed, 30 Mar 2016 11:36:59 +0200
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 13:53:03 +0200

lockdep: Fix lock_chain::base size

lock_chain::base is used to store an index into the chain_hlocks[]
array, however that array contains more elements than can be indexed
using the u16.

Change the lock_chain structure to use a bitfield to encode the data
it needs and add BUILD_BUG_ON() assertions to check the fields are
wide enough.

Also, for DEBUG_LOCKDEP, assert that we don't run out of elements of
that array; as that would wreck the collision detectoring.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alfredo Alvarez Fernandez 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Sedat Dilek 
Cc: Theodore Ts'o 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/20160330093659.gs3...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
---
 include/linux/lockdep.h   |  8 +---
 kernel/locking/lockdep.c  | 24 +++-
 kernel/locking/lockdep_proc.c |  2 ++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index d026b19..d10ef06 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -196,9 +196,11 @@ struct lock_list {
  * We record lock dependency chains, so that we can cache them:
  */
 struct lock_chain {
-   u8  irq_context;
-   u8  depth;
-   u16 base;
+   /* see BUILD_BUG_ON()s in lookup_chain_cache() */
+   unsigned intirq_context :  2,
+   depth   :  6,
+   base: 24;
+   /* 4 byte hole */
struct hlist_node   entry;
u64 chain_key;
 };
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index beb06f6..78c1c0e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2176,15 +2176,37 @@ cache_hit:
chain->irq_context = hlock->irq_context;
i = get_first_held_lock(curr, hlock);
chain->depth = curr->lockdep_depth + 1 - i;
+
+   BUILD_BUG_ON((1UL << 24) <= ARRAY_SIZE(chain_hlocks));
+   BUILD_BUG_ON((1UL << 6)  <= ARRAY_SIZE(curr->held_locks));
+   BUILD_BUG_ON((1UL << 8*sizeof(chain_hlocks[0])) <= 
ARRAY_SIZE(lock_classes));
+
if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) 
{
chain->base = nr_chain_hlocks;
-   nr_chain_hlocks += chain->depth;
for (j = 0; j < chain->depth - 1; j++, i++) {
int lock_id = curr->held_locks[i].class_idx - 1;
chain_hlocks[chain->base + j] = lock_id;
}
chain_hlocks[chain->base + j] = class - lock_classes;
}
+
+   if (nr_chain_hlocks < MAX_LOCKDEP_CHAIN_HLOCKS)
+   nr_chain_hlocks += chain->depth;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+   /*
+* Important for check_no_collision().
+*/
+   if (unlikely(nr_chain_hlocks > MAX_LOCKDEP_CHAIN_HLOCKS)) {
+   if (debug_locks_off_graph_unlock())
+   return 0;
+
+   print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
+   dump_stack();
+   return 0;
+   }
+#endif
+
hlist_add_head_rcu(&chain->entry, hash_head);
debug_atomic_inc(chain_lookup_misses);
inc_chains();
diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index dbb61a3..a0f61ef 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -141,6 +141,8 @@ static int lc_show(struct seq_file *m, void *v)
int i;
 
if (v == SEQ_START_TOKEN) {
+   if (nr_chain_hlocks > MAX_LOCKDEP_CHAIN_HLOCKS)
+   seq_printf(m, "(buggered) ");
seq_printf(m, "all lock chains:\n");
return 0;
}

[tip:perf/core] perf/x86/intel: Add Goldmont CPU support

2016-04-23 Thread tip-bot for Kan Liang

Commit-ID:  8b92c3a78d40fb220dc5ab122e3274d1b126bfbb
Gitweb: http://git.kernel.org/tip/8b92c3a78d40fb220dc5ab122e3274d1b126bfbb
Author: Kan Liang 
AuthorDate: Fri, 15 Apr 2016 00:42:47 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:12:27 +0200

perf/x86/intel: Add Goldmont CPU support

Add perf core PMU support for Intel Goldmont CPU cores:

 - The init code is based on Silvermont.

 - There is a new cache event list, based on the Silvermont cache event list.

 - Goldmont has 32 LBR entries. It also uses new LBRv6 format, which
   report the cycle information using upper 16-bit of the LBR_TO.

 - It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS for precise cycles.

For details, please refer to the latest SDM058:

 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/1460706167-45320-1-git-send-email-kan.li...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/core.c | 157 +++
 arch/x86/events/intel/ds.c   |   6 ++
 arch/x86/events/intel/lbr.c  |  13 +++-
 arch/x86/events/perf_event.h |   2 +
 4 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index aff7988..92fda6b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1465,6 +1465,140 @@ static __initconst const u64 slm_hw_cache_event_ids
  },
 };
 
+static struct extra_reg intel_glm_extra_regs[] __read_mostly = {
+   /* must define OFFCORE_RSP_X first, see intel_fixup_er() */
+   INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x760005ffbfull, 
RSP_0),
+   INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x360005ffbfull, 
RSP_1),
+   EVENT_EXTRA_END
+};
+
+#define GLM_DEMAND_DATA_RD BIT_ULL(0)
+#define GLM_DEMAND_RFO BIT_ULL(1)
+#define GLM_ANY_RESPONSE   BIT_ULL(16)
+#define GLM_SNP_NONE_OR_MISS   BIT_ULL(33)
+#define GLM_DEMAND_READGLM_DEMAND_DATA_RD
+#define GLM_DEMAND_WRITE   GLM_DEMAND_RFO
+#define GLM_DEMAND_PREFETCH(SNB_PF_DATA_RD|SNB_PF_RFO)
+#define GLM_LLC_ACCESS GLM_ANY_RESPONSE
+#define GLM_SNP_ANY
(GLM_SNP_NONE_OR_MISS|SNB_NO_FWD|SNB_HITM)
+#define GLM_LLC_MISS   (GLM_SNP_ANY|SNB_NON_DRAM)
+
+static __initconst const u64 glm_hw_cache_event_ids
+   [PERF_COUNT_HW_CACHE_MAX]
+   [PERF_COUNT_HW_CACHE_OP_MAX]
+   [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
+   [C(L1D)] = {
+   [C(OP_READ)] = {
+   [C(RESULT_ACCESS)]  = 0x81d0,   /* 
MEM_UOPS_RETIRED.ALL_LOADS */
+   [C(RESULT_MISS)]= 0x0,
+   },
+   [C(OP_WRITE)] = {
+   [C(RESULT_ACCESS)]  = 0x82d0,   /* 
MEM_UOPS_RETIRED.ALL_STORES */
+   [C(RESULT_MISS)]= 0x0,
+   },
+   [C(OP_PREFETCH)] = {
+   [C(RESULT_ACCESS)]  = 0x0,
+   [C(RESULT_MISS)]= 0x0,
+   },
+   },
+   [C(L1I)] = {
+   [C(OP_READ)] = {
+   [C(RESULT_ACCESS)]  = 0x0380,   /* 
ICACHE.ACCESSES */
+   [C(RESULT_MISS)]= 0x0280,   /* 
ICACHE.MISSES */
+   },
+   [C(OP_WRITE)] = {
+   [C(RESULT_ACCESS)]  = -1,
+   [C(RESULT_MISS)]= -1,
+   },
+   [C(OP_PREFETCH)] = {
+   [C(RESULT_ACCESS)]  = 0x0,
+   [C(RESULT_MISS)]= 0x0,
+   },
+   },
+   [C(LL)] = {
+   [C(OP_READ)] = {
+   [C(RESULT_ACCESS)]  = 0x1b7,/* 
OFFCORE_RESPONSE */
+   [C(RESULT_MISS)]= 0x1b7,/* 
OFFCORE_RESPONSE */
+   },
+   [C(OP_WRITE)] = {
+   [C(RESULT_ACCESS)]  = 0x1b7,/* 
OFFCORE_RESPONSE */
+   [C(RESULT_MISS)]= 0x1b7,/* 
OFFCORE_RESPONSE */
+   },
+   [C(OP_PREFETCH)] = {
+   [C(RESULT_ACCESS)]  = 0x1b7,/* 
OFFCORE_RESPONSE */
+   [C(RESULT_MISS)]= 0x1b7,/* 
OFFCORE_RESPONSE */
+   },
+   },
+   [C(DTLB)] = {
+   [C(OP_READ)] = {
+   [C(RESULT_ACCESS)]  = 0x81d0,   /* 
MEM_UOPS_RETIRED.ALL_LOADS */

[tip:perf/core] perf/x86/intel: Add LBR filter support for Silvermont and Airmont CPUs

2016-04-23 Thread tip-bot for Kan Liang

Commit-ID:  f21d5adceb7f2660e5227569faed278f6fb2072e
Gitweb: http://git.kernel.org/tip/f21d5adceb7f2660e5227569faed278f6fb2072e
Author: Kan Liang 
AuthorDate: Fri, 15 Apr 2016 00:53:45 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:12:31 +0200

perf/x86/intel: Add LBR filter support for Silvermont and Airmont CPUs

LBR filtering is also supported on the Silvermont and Airmont
microarchitectures. The layout of MSR_LBR_SELECT is the same as Nehalem.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/1460706825-46163-1-git-send-email-kan.li...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/lbr.c  | 18 ++
 arch/x86/events/perf_event.h |  2 ++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 92fda6b..79b5943 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3581,7 +3581,7 @@ __init int intel_pmu_init(void)
memcpy(hw_cache_extra_regs, slm_hw_cache_extra_regs,
   sizeof(hw_cache_extra_regs));
 
-   intel_pmu_lbr_init_atom();
+   intel_pmu_lbr_init_slm();
 
x86_pmu.event_constraints = intel_slm_event_constraints;
x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index ad26ca7..317e29e 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1058,6 +1058,24 @@ void __init intel_pmu_lbr_init_atom(void)
pr_cont("8-deep LBR, ");
 }
 
+/* slm */
+void __init intel_pmu_lbr_init_slm(void)
+{
+   x86_pmu.lbr_nr = 8;
+   x86_pmu.lbr_tos= MSR_LBR_TOS;
+   x86_pmu.lbr_from   = MSR_LBR_CORE_FROM;
+   x86_pmu.lbr_to = MSR_LBR_CORE_TO;
+
+   x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
+   x86_pmu.lbr_sel_map  = nhm_lbr_sel_map;
+
+   /*
+* SW branch filter usage:
+* - compensate for lack of HW filter
+*/
+   pr_cont("8-deep LBR, ");
+}
+
 /* Knights Landing */
 void intel_pmu_lbr_init_knl(void)
 {
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 8b78481..7d62a02 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -909,6 +909,8 @@ void intel_pmu_lbr_init_nhm(void);
 
 void intel_pmu_lbr_init_atom(void);
 
+void intel_pmu_lbr_init_slm(void);
+
 void intel_pmu_lbr_init_snb(void);
 
 void intel_pmu_lbr_init_hsw(void);

[tip:perf/core] perf/core: Add ::write_backward attribute to perf event

2016-04-23 Thread tip-bot for Wang Nan

Commit-ID:  9ecda41acb971ebd07c8fb35faf24005c0baea12
Gitweb: http://git.kernel.org/tip/9ecda41acb971ebd07c8fb35faf24005c0baea12
Author: Wang Nan 
AuthorDate: Tue, 5 Apr 2016 14:11:18 +
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:12:39 +0200

perf/core: Add ::write_backward attribute to perf event

This patch introduces 'write_backward' bit to perf_event_attr, which
controls the direction of a ring buffer. After set, the corresponding
ring buffer is written from end to beginning. This feature is design to
support reading from overwritable ring buffer.

Ring buffer can be created by mapping a perf event fd. Kernel puts event
records into ring buffer, user tooling like perf fetch them from
address returned by mmap(). To prevent racing between kernel and tooling,
they communicate to each other through 'head' and 'tail' pointers.
Kernel maintains 'head' pointer, points it to the next free area (tail
of the last record). Tooling maintains 'tail' pointer, points it to the
tail of last consumed record (record has already been fetched). Kernel
determines the available space in a ring buffer using these two
pointers to avoid overwrite unfetched records.

By mapping without 'PROT_WRITE', an overwritable ring buffer is created.
Different from normal ring buffer, tooling is unable to maintain 'tail'
pointer because writing is forbidden. Therefore, for this type of ring
buffers, kernel overwrite old records unconditionally, works like flight
recorder. This feature would be useful if reading from overwritable ring
buffer were as easy as reading from normal ring buffer. However,
there's an obscure problem.

The following figure demonstrates a full overwritable ring buffer. In
this figure, the 'head' pointer points to the end of last record, and a
long record 'E' is pending. For a normal ring buffer, a 'tail' pointer
would have pointed to position (X), so kernel knows there's no more
space in the ring buffer. However, for an overwritable ring buffer,
kernel ignore the 'tail' pointer.

   (X)  head
.|
.V
+--+---+--+--+---+
|AA|B.B|CC|DD|   |
+--+---+--+--+---+

Record 'A' is overwritten by event 'E':

  head
   |
   V
+--+---+---+--+--+---+
|.E|..A|B.B|CC|DD|E..|
+--+---+---+--+--+---+

Now tooling decides to read from this ring buffer. However, none of these
two natural positions, 'head' and the start of this ring buffer, are
pointing to the head of a record. Even the full ring buffer can be
accessed by tooling, it is unable to find a position to start decoding.

The first attempt tries to solve this problem AFAIK can be found from
[1]. It makes kernel to maintain 'tail' pointer: updates it when ring
buffer is half full. However, this approach introduces overhead to
fast path. Test result shows a 1% overhead [2]. In addition, this method
utilizes no more tham 50% records.

Another attempt can be found from [3], which allows putting the size of
an event at the end of each record. This approach allows tooling to find
records in a backward manner from 'head' pointer by reading size of a
record from its tail. However, because of alignment requirement, it
needs 8 bytes to record the size of a record, which is a huge waste. Its
performance is also not good, because more data need to be written.
This approach also introduces some extra branch instructions to fast
path.

'write_backward' is a better solution to this problem.

Following figure demonstrates the state of the overwritable ring buffer
when 'write_backward' is set before overwriting:

   head
|
V
+---+--+--+---+--+
|   |DD|CC|B.B|AA|
+---+--+--+---+--+

and after overwriting:
 head
  |
  V
+---+--+--+---+---+--+
|..E|DD|CC|B.B|A..|E.|
+---+--+--+---+---+--+

In each situation, 'head' points to the beginning of the newest record.
>From this record, tooling can iterate over the full ring buffer and fetch
records one by one.

The only limitation that needs to be considered is back-to-back reading.
Due to the non-deterministic of user programs, it is impossible to ensure
the ring buffer keeps stable during reading. Consider an extreme situation:
tooling is scheduled out after reading record 'D', then a burst of events
come, eat up the whole ring buffer (one or multiple rounds). When the
tooling process comes back, reading after 'D' is incorrect now.

To prevent this problem, we need to find a way to ensure the ring buffer
is stable during reading. ioctl(PERF_EVENT_IOC_PAUSE_OUTPUT) is
suggested because its overhead is lower than
ioctl(PERF_EVENT_IOC_ENABL

[tip:perf/core] perf/x86/intel/rapl: Support Skylake RAPL domains

2016-04-23 Thread tip-bot for Srinivas Pandruvada

Commit-ID:  dcee75b3b7f025cc6765e6c92ba0a4e59a4d25f4
Gitweb: http://git.kernel.org/tip/dcee75b3b7f025cc6765e6c92ba0a4e59a4d25f4
Author: Srinivas Pandruvada 
AuthorDate: Sun, 17 Apr 2016 15:03:00 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:13:36 +0200

perf/x86/intel/rapl: Support Skylake RAPL domains

Add Skylake client support for RAPL domains. In addition to RAPL domains
in Broadwell clients, it has support for platform domain (aka PSys). The
PSys domain controls the entire SoC instead of just a CPU package. Unlike
package domain, PSys support requires more than just processor level
implementation. The other parts in the system need additional HW level
signaling, which OEMs need to support. When not supported, the energy
counter register in PSys domain returns 0.

Also corrected error in comment for GPU counter, which previously was
DRAM counter.

Signed-off-by: Srinivas Pandruvada 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: b...@alien8.de
Cc: h...@zytor.com
Cc: jacob.jun@linux.intel.com
Cc: r...@rjwysocki.net
Link: 
http://lkml.kernel.org/r/1460930581-29748-2-git-send-email-srinivas.pandruv...@linux.intel.com
Signed-off-by: Ingo Molnar 

Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/rapl.c | 54 ++--
 arch/x86/include/asm/msr-index.h |  2 ++
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index c9b7489..26c7d7d 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -27,10 +27,14 @@
  *   event: rapl_energy_dram
  *perf code: 0x3
  *
- * dram counter: consumption of the builtin-gpu domain (client only)
+ * gpu counter: consumption of the builtin-gpu domain (client only)
  *   event: rapl_energy_gpu
  *perf code: 0x4
  *
+ *  psys counter: consumption of the builtin-psys domain (client only)
+ *   event: rapl_energy_psys
+ *perf code: 0x5
+ *
  * We manage those counters as free running (read-only). They may be
  * use simultaneously by other tools, such as turbostat.
  *
@@ -66,13 +70,16 @@ MODULE_LICENSE("GPL");
 #define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */
 #define RAPL_IDX_PP1_NRG_STAT  3   /* gpu */
 #define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */
+#define RAPL_IDX_PSYS_NRG_STAT 4   /* psys */
+#define INTEL_RAPL_PSYS0x5 /* pseudo-encoding */
 
-#define NR_RAPL_DOMAINS 0x4
+#define NR_RAPL_DOMAINS 0x5
 static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
"pp0-core",
"package",
"dram",
"pp1-gpu",
+   "psys",
 };
 
 /* Clients have PP0, PKG */
@@ -91,6 +98,13 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] 
__initconst = {
 1<

[tip:sched/core] sched/fair: Fix asym packing to select correct CPU

2016-04-23 Thread tip-bot for Srikar Dronamraju

Commit-ID:  1f621e028baf391f6684003e32e009bc934b750f
Gitweb: http://git.kernel.org/tip/1f621e028baf391f6684003e32e009bc934b750f
Author: Srikar Dronamraju 
AuthorDate: Wed, 6 Apr 2016 18:47:40 +0530
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:35 +0200

sched/fair: Fix asym packing to select correct CPU

When asymmetric packing is set in the sched_domain and target CPU is
busy, update_sd_pick_busiest() may not select the busiest runqueue.
When target CPU is busy, find_busiest_group() will ignore checks for
asym packing and may continue to load balance using the currently
selected not-the-busiest runqueue as source runqueue.
Selecting the busiest runqueue as source when the target CPU is busy,
should result in achieving much better load balance.

Also when target CPU is not busy and asymmetric packing is set in sd,
select higher CPU as source CPU for load balancing.

While doing this change, move the check to see if target CPU is busy
into check_asym_packing().

The extent of performance benefit from this change decreases with the
increasing load. However there is benefit in undercommit as well as
overcommit conditions.

1. Record per second ebizzy (32 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
Testcase: Min Max Avg  StdDev
  ebizzy:  5223767.00 10368236.00  7946971.00  1753094.76

4.6.0-rc2+asym-changes
Testcase: Min Max Avg  StdDev %Change
  ebizzy:  8617191.00 13872356.00 11383980.00  1783400.89 +24.78%

2. Record per second ebizzy (64 threads) on a 64 CPU power 7 box. (5 iterations)
4.6.0-rc2
Testcase: Min Max Avg  StdDev
  ebizzy:  6497666.00 18399783.00 10818093.20  4051452.08

4.6.0-rc2+asym-changes
Testcase: Min Max Avg  StdDev %Change
  ebizzy:  7567365.00 19456937.00 11674063.60  4295407.48  +4.40%

3. Record per second ebizzy (128 threads) on a 64 CPU power 7 box. (5 
iterations)
4.6.0-rc2
Testcase: Min Max Avg  StdDev
  ebizzy: 37073983.00 40341911.00 38776241.80  1259766.82

4.6.0-rc2+asym-changes
Testcase: Min Max Avg  StdDev %Change
  ebizzy: 38030399.00 4178.00 39827404.40  1255001.86  +2.54%

Signed-off-by: Srikar Dronamraju 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Gautham R Shenoy 
Cc: Michael Neuling 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vaidyanathan Srinivasan 
Link: 
http://lkml.kernel.org/r/1459948660-16073-1-git-send-email-sri...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8cc1c3..6e371f4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6679,6 +6679,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
if (!(env->sd->flags & SD_ASYM_PACKING))
return true;
 
+   /* No ASYM_PACKING if target cpu is already busy */
+   if (env->idle == CPU_NOT_IDLE)
+   return true;
/*
 * ASYM_PACKING needs to move all the work to the lowest
 * numbered CPUs in the group, therefore mark all groups
@@ -6688,7 +6691,8 @@ static bool update_sd_pick_busiest(struct lb_env *env,
if (!sds->busiest)
return true;
 
-   if (group_first_cpu(sds->busiest) > group_first_cpu(sg))
+   /* Prefer to move from highest possible cpu's work */
+   if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
return true;
}
 
@@ -6834,6 +6838,9 @@ static int check_asym_packing(struct lb_env *env, struct 
sd_lb_stats *sds)
if (!(env->sd->flags & SD_ASYM_PACKING))
return 0;
 
+   if (env->idle == CPU_NOT_IDLE)
+   return 0;
+
if (!sds->busiest)
return 0;
 
@@ -7026,8 +7033,7 @@ static struct sched_group *find_busiest_group(struct 
lb_env *env)
busiest = &sds.busiest_stat;
 
/* ASYM feature bypasses nice load balance check */
-   if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
-   check_asym_packing(env, &sds))
+   if (check_asym_packing(env, &sds))
return sds.busiest;
 
/* There is no busy sibling group to pull tasks from */

[tip:sched/core] sched/fair: Call cpufreq hook in additional paths

2016-04-23 Thread tip-bot for Steve Muckle

Commit-ID:  a2c6c91f98247fef0fe75216d607812485aeb0df
Gitweb: http://git.kernel.org/tip/a2c6c91f98247fef0fe75216d607812485aeb0df
Author: Steve Muckle 
AuthorDate: Thu, 24 Mar 2016 15:26:07 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:40 +0200

sched/fair: Call cpufreq hook in additional paths

The cpufreq hook should be called any time the root CFS rq utilization
changes. This can occur when a task is switched to or from the fair
class, or a task moves between groups or CPUs, but these paths
currently do not call the cpufreq hook.

Fix this by adding the hook to attach_entity_load_avg() and
detach_entity_load_avg().

Suggested-by: Vincent Guittot 
Signed-off-by: Steve Muckle 
[ Added the .update_freq argument to update_cfs_rq_load_avg() to avoid a double 
cpufreq call. ]
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Byungchul Park 
Cc: Dietmar Eggemann 
Cc: Juri Lelli 
Cc: Michael Turquette 
Cc: Mike Galbraith 
Cc: Morten Rasmussen 
Cc: Patrick Bellasi 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1458858367-2831-1-git-send-email-smuc...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 73 ++---
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8155281..c328bd7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2874,13 +2874,41 @@ static inline void update_tg_load_avg(struct cfs_rq 
*cfs_rq, int force) {}
 
 static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
 
+static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
+{
+   struct rq *rq = rq_of(cfs_rq);
+   int cpu = cpu_of(rq);
+
+   if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) {
+   unsigned long max = rq->cpu_capacity_orig;
+
+   /*
+* There are a few boundary cases this might miss but it should
+* get called often enough that that should (hopefully) not be
+* a real problem -- added to that it only calls on the local
+* CPU, so if we enqueue remotely we'll miss an update, but
+* the next tick/schedule should update.
+*
+* It will not get called when we go idle, because the idle
+* thread is a different class (!fair), nor will the utilization
+* number include things like RT tasks.
+*
+* As is, the util number is not freq-invariant (we'd have to
+* implement arch_scale_freq_capacity() for that).
+*
+* See cpu_util().
+*/
+   cpufreq_update_util(rq_clock(rq),
+   min(cfs_rq->avg.util_avg, max), max);
+   }
+}
+
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
-static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
+static inline int
+update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 {
struct sched_avg *sa = &cfs_rq->avg;
-   struct rq *rq = rq_of(cfs_rq);
int decayed, removed_load = 0, removed_util = 0;
-   int cpu = cpu_of(rq);
 
if (atomic_long_read(&cfs_rq->removed_load_avg)) {
s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
@@ -2896,7 +2924,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
removed_util = 1;
}
 
-   decayed = __update_load_avg(now, cpu, sa,
+   decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, 
cfs_rq);
 
 #ifndef CONFIG_64BIT
@@ -2904,29 +2932,8 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
 #endif
 
-   if (cpu == smp_processor_id() && &rq->cfs == cfs_rq &&
-   (decayed || removed_util)) {
-   unsigned long max = rq->cpu_capacity_orig;
-
-   /*
-* There are a few boundary cases this might miss but it should
-* get called often enough that that should (hopefully) not be
-* a real problem -- added to that it only calls on the local
-* CPU, so if we enqueue remotely we'll miss an update, but
-* the next tick/schedule should update.
-*
-* It will not get called when we go idle, because the idle
-* thread is a different class (!fair), nor will the utilization
-* number include things like RT tasks.
-*
-* As is, the util number is not freq-invariant (we'd have to
-* implement arch_scale_freq_capacity() for that).
-*
-* See cpu_util().
-*/
-

[tip:sched/core] sched/fair: Move cpufreq hook to update_cfs_rq_load_avg()

2016-04-23 Thread tip-bot for Steve Muckle

Commit-ID:  21e96f88776deead303ecd30a17d1d7c2a1776e3
Gitweb: http://git.kernel.org/tip/21e96f88776deead303ecd30a17d1d7c2a1776e3
Author: Steve Muckle 
AuthorDate: Mon, 21 Mar 2016 17:21:07 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:35 +0200

sched/fair: Move cpufreq hook to update_cfs_rq_load_avg()

The cpufreq hook should be called whenever the root cfs_rq
utilization changes so update_cfs_rq_load_avg() is a better
place for it. The current location is not invoked in the
enqueue_entity() or update_blocked_averages() paths.

Suggested-by: Vincent Guittot 
Signed-off-by: Steve Muckle 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Dietmar Eggemann 
Cc: Juri Lelli 
Cc: Michael Turquette 
Cc: Mike Galbraith 
Cc: Morten Rasmussen 
Cc: Patrick Bellasi 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1458606068-7476-1-git-send-email-smuc...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 50 ++
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e371f4..6df80d4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2878,7 +2878,9 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq 
*cfs_rq);
 static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 {
struct sched_avg *sa = &cfs_rq->avg;
+   struct rq *rq = rq_of(cfs_rq);
int decayed, removed = 0;
+   int cpu = cpu_of(rq);
 
if (atomic_long_read(&cfs_rq->removed_load_avg)) {
s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
@@ -2893,7 +2895,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
}
 
-   decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
+   decayed = __update_load_avg(now, cpu, sa,
scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, 
cfs_rq);
 
 #ifndef CONFIG_64BIT
@@ -2901,28 +2903,6 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
 #endif
 
-   return decayed || removed;
-}
-
-/* Update task and its cfs_rq load average */
-static inline void update_load_avg(struct sched_entity *se, int update_tg)
-{
-   struct cfs_rq *cfs_rq = cfs_rq_of(se);
-   u64 now = cfs_rq_clock_task(cfs_rq);
-   struct rq *rq = rq_of(cfs_rq);
-   int cpu = cpu_of(rq);
-
-   /*
-* Track task load average for carrying it to new CPU after migrated, 
and
-* track group sched_entity load average for task_h_load calc in 
migration
-*/
-   __update_load_avg(now, cpu, &se->avg,
- se->on_rq * scale_load_down(se->load.weight),
- cfs_rq->curr == se, NULL);
-
-   if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
-   update_tg_load_avg(cfs_rq, 0);
-
if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) {
unsigned long max = rq->cpu_capacity_orig;
 
@@ -2943,8 +2923,30 @@ static inline void update_load_avg(struct sched_entity 
*se, int update_tg)
 * See cpu_util().
 */
cpufreq_update_util(rq_clock(rq),
-   min(cfs_rq->avg.util_avg, max), max);
+   min(sa->util_avg, max), max);
}
+
+   return decayed || removed;
+}
+
+/* Update task and its cfs_rq load average */
+static inline void update_load_avg(struct sched_entity *se, int update_tg)
+{
+   struct cfs_rq *cfs_rq = cfs_rq_of(se);
+   u64 now = cfs_rq_clock_task(cfs_rq);
+   struct rq *rq = rq_of(cfs_rq);
+   int cpu = cpu_of(rq);
+
+   /*
+* Track task load average for carrying it to new CPU after migrated, 
and
+* track group sched_entity load average for task_h_load calc in 
migration
+*/
+   __update_load_avg(now, cpu, &se->avg,
+ se->on_rq * scale_load_down(se->load.weight),
+ cfs_rq->curr == se, NULL);
+
+   if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
+   update_tg_load_avg(cfs_rq, 0);
 }
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se)

[tip:sched/core] sched/fair: Do not call cpufreq hook unless util changed

2016-04-23 Thread tip-bot for Steve Muckle

Commit-ID:  41e0d37f7ac81297c07ba311e4ad39465b8c8295
Gitweb: http://git.kernel.org/tip/41e0d37f7ac81297c07ba311e4ad39465b8c8295
Author: Steve Muckle 
AuthorDate: Mon, 21 Mar 2016 17:21:08 -0700
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:36 +0200

sched/fair: Do not call cpufreq hook unless util changed

There's no reason to call the cpufreq hook if the root cfs_rq
utilization has not been modified.

Signed-off-by: Steve Muckle 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Dietmar Eggemann 
Cc: Juri Lelli 
Cc: Michael Turquette 
Cc: Mike Galbraith 
Cc: Morten Rasmussen 
Cc: Patrick Bellasi 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Thomas Gleixner 
Cc: Vincent Guittot 
Link: 
http://lkml.kernel.org/r/1458606068-7476-2-git-send-email-smuc...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6df80d4..8155281 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2879,20 +2879,21 @@ static inline int update_cfs_rq_load_avg(u64 now, 
struct cfs_rq *cfs_rq)
 {
struct sched_avg *sa = &cfs_rq->avg;
struct rq *rq = rq_of(cfs_rq);
-   int decayed, removed = 0;
+   int decayed, removed_load = 0, removed_util = 0;
int cpu = cpu_of(rq);
 
if (atomic_long_read(&cfs_rq->removed_load_avg)) {
s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
sa->load_avg = max_t(long, sa->load_avg - r, 0);
sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
-   removed = 1;
+   removed_load = 1;
}
 
if (atomic_long_read(&cfs_rq->removed_util_avg)) {
long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
sa->util_avg = max_t(long, sa->util_avg - r, 0);
sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+   removed_util = 1;
}
 
decayed = __update_load_avg(now, cpu, sa,
@@ -2903,7 +2904,8 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
 #endif
 
-   if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) {
+   if (cpu == smp_processor_id() && &rq->cfs == cfs_rq &&
+   (decayed || removed_util)) {
unsigned long max = rq->cpu_capacity_orig;
 
/*
@@ -2926,7 +2928,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
min(sa->util_avg, max), max);
}
 
-   return decayed || removed;
+   return decayed || removed_load;
 }
 
 /* Update task and its cfs_rq load average */

[tip:sched/core] sched/fair: Gather CPU load functions under a more conventional namespace

2016-04-23 Thread tip-bot for Frederic Weisbecker

Commit-ID:  cee1afce3053e7aa0793fbd5f2e845fa2cef9e33
Gitweb: http://git.kernel.org/tip/cee1afce3053e7aa0793fbd5f2e845fa2cef9e33
Author: Frederic Weisbecker 
AuthorDate: Wed, 13 Apr 2016 15:56:50 +0200
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:41 +0200

sched/fair: Gather CPU load functions under a more conventional namespace

The CPU load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.

Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:

update_cpu_load_*() -> cpu_load_update_*()

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E . McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1460555812-25375-2-git-send-email-fweis...@gmail.com
Signed-off-by: Ingo Molnar 
---
 Documentation/trace/ftrace.txt | 10 +-
 include/linux/sched.h  |  4 ++--
 kernel/sched/core.c|  2 +-
 kernel/sched/fair.c| 24 
 kernel/sched/sched.h   |  4 ++--
 kernel/time/tick-sched.c   |  2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297..9857606 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set.
   -0   3dN.1   12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
   -0   3dN.1   12us : ktime_get <-tick_nohz_idle_exit
   -0   3dN.1   12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
-  -0   3dN.1   13us : update_cpu_load_nohz <-tick_nohz_idle_exit
-  -0   3dN.1   13us : _raw_spin_lock <-update_cpu_load_nohz
+  -0   3dN.1   13us : cpu_load_update_nohz <-tick_nohz_idle_exit
+  -0   3dN.1   13us : _raw_spin_lock <-cpu_load_update_nohz
   -0   3dN.1   13us : add_preempt_count <-_raw_spin_lock
-  -0   3dN.2   13us : __update_cpu_load <-update_cpu_load_nohz
-  -0   3dN.2   14us : sched_avg_update <-__update_cpu_load
-  -0   3dN.2   14us : _raw_spin_unlock <-update_cpu_load_nohz
+  -0   3dN.2   13us : __cpu_load_update <-cpu_load_update_nohz
+  -0   3dN.2   14us : sched_avg_update <-__cpu_load_update
+  -0   3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   -0   3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
   -0   3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
   -0   3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 13c1c1d..0b7f602 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, 
unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void update_cpu_load_nohz(int active);
+extern void cpu_load_update_nohz(int active);
 #else
-static inline void update_cpu_load_nohz(int active) { }
+static inline void cpu_load_update_nohz(int active) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 06efbb9..c98a268 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2917,7 +2917,7 @@ void scheduler_tick(void)
raw_spin_lock(&rq->lock);
update_rq_clock(rq);
curr->sched_class->task_tick(rq, curr, 0);
-   update_cpu_load_active(rq);
+   cpu_load_update_active(rq);
calc_global_load_tick(rq);
raw_spin_unlock(&rq->lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c328bd7..ecd81c4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4559,7 +4559,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
 }
 
 /**
- * __update_cpu_load - update the rq->cpu_load[] statistics
+ * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
@@ -4594,7 +4594,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
  unsigned long pending_updates, int active)
 {
unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
@@ -4642,7 +4642,7 @@ static unsigned long weighted_cpuload(

[tip:sched/core] sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU load updates

2016-04-23 Thread tip-bot for Frederic Weisbecker

Commit-ID:  9fd81dd5ce0b12341c9f83346f8d32ac68bd3841
Gitweb: http://git.kernel.org/tip/9fd81dd5ce0b12341c9f83346f8d32ac68bd3841
Author: Frederic Weisbecker 
AuthorDate: Tue, 19 Apr 2016 17:36:51 +0200
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:42 +0200

sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU load updates

Some code in CPU load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E . McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1461080211-16271-1-git-send-email-fweis...@gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c  | 5 ++---
 kernel/sched/fair.c  | 9 +++--
 kernel/sched/sched.h | 6 --
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c98a268..71dffbb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7381,8 +7381,6 @@ void __init sched_init(void)
for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
rq->cpu_load[j] = 0;
 
-   rq->last_load_update_tick = jiffies;
-
 #ifdef CONFIG_SMP
rq->sd = NULL;
rq->rd = NULL;
@@ -7401,12 +7399,13 @@ void __init sched_init(void)
 
rq_attach_root(rq, &def_root_domain);
 #ifdef CONFIG_NO_HZ_COMMON
+   rq->last_load_update_tick = jiffies;
rq->nohz_flags = 0;
 #endif
 #ifdef CONFIG_NO_HZ_FULL
rq->last_sched_tick = 0;
 #endif
-#endif
+#endif /* CONFIG_SMP */
init_rq_hrtick(rq);
atomic_set(&rq->nr_iowait, 0);
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b70367a..b8a33ab 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4491,7 +4491,7 @@ static void dequeue_task_fair(struct rq *rq, struct 
task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
-
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * per rq 'load' arrray crap; XXX kill this.
  */
@@ -4557,6 +4557,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
}
return load;
 }
+#endif /* CONFIG_NO_HZ_COMMON */
 
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
@@ -4596,7 +4597,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
 static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
unsigned long pending_updates)
 {
-   unsigned long tickless_load = this_rq->cpu_load[0];
+   unsigned long __maybe_unused tickless_load = this_rq->cpu_load[0];
int i, scale;
 
this_rq->nr_load_updates++;
@@ -4609,6 +4610,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned 
long this_load,
/* scale is effectively 1 << i now, and >> i divides by scale */
 
old_load = this_rq->cpu_load[i];
+#ifdef CONFIG_NO_HZ_COMMON
old_load = decay_load_missed(old_load, pending_updates - 1, i);
if (tickless_load) {
old_load -= decay_load_missed(tickless_load, 
pending_updates - 1, i);
@@ -4619,6 +4621,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned 
long this_load,
 */
old_load += tickless_load;
}
+#endif
new_load = this_load;
/*
 * Round up the averaging division if load is increasing. This
@@ -4731,8 +4734,10 @@ static inline void cpu_load_update_nohz(struct rq 
*this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
/* See the mess around cpu_load_update_nohz(). */
this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 32d9e22..69da6fc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
#define CPU_LOAD_IDX_MAX 5
unsigned long cpu_load[CPU_LOAD_IDX_MAX];
-   unsigned long last_load_update_tick;
 #ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
+   unsigned long last_load_update_tick;
+#endif /* CONFIG_SMP */
u64 nohz_stamp;
unsigned long nohz_flags;
-#endif
+#endif /* CONFIG_NO_HZ_COMMON

[tip:sched/core] sched/fair: Correctly handle nohz ticks CPU load accounting

2016-04-23 Thread tip-bot for Frederic Weisbecker

Commit-ID:  1f41906a6fda1114debd3898668bd7ab6470ee41
Gitweb: http://git.kernel.org/tip/1f41906a6fda1114debd3898668bd7ab6470ee41
Author: Frederic Weisbecker 
AuthorDate: Wed, 13 Apr 2016 15:56:51 +0200
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:42 +0200

sched/fair: Correctly handle nohz ticks CPU load accounting

Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.

Now this confusion happens to be a problem with CPU load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.

In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.

So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E . McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1460555812-25375-3-git-send-email-fweis...@gmail.com
Signed-off-by: Ingo Molnar 
---
 include/linux/sched.h|  6 ++-
 kernel/sched/fair.c  | 97 +++-
 kernel/time/tick-sched.c |  9 +++--
 3 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0b7f602..d894f2d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, 
unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void cpu_load_update_nohz(int active);
+extern void cpu_load_update_nohz_start(void);
+extern void cpu_load_update_nohz_stop(void);
 #else
-static inline void cpu_load_update_nohz(int active) { }
+static inline void cpu_load_update_nohz_start(void) { }
+static inline void cpu_load_update_nohz_stop(void) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ecd81c4..b70367a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4563,7 +4563,6 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
- * @active: !0 for NOHZ_FULL
  *
  * Update rq->cpu_load[] statistics. This function is usually called every
  * scheduler tick (TICK_NSEC).
@@ -4592,12 +4591,12 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  *   load[i]_n = (1 - 1/2^i)^n * load[i]_0
  *
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
- * term. See the @active paramter.
+ * term.
  */
-static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
- unsigned long pending_updates, int active)
+static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
+   unsigned long pending_updates)
 {
-   unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
+   unsigned long tickless_load = this_rq->cpu_load[0];
int i, scale;
 
this_rq->nr_load_updates++;
@@ -4642,10 +4641,23 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __cpu_load_update_nohz(struct rq *this_rq,
-  unsigned long curr_jiffies,
-  unsigned long load,
-  int active)
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *

[tip:perf/core] perf trace: Fix build when DWARF unwind isn't available

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  ccd62a896ffe3dbd60f3b7570a2b74e4fe030ed6
Gitweb: http://git.kernel.org/tip/ccd62a896ffe3dbd60f3b7570a2b74e4fe030ed6
Author: Arnaldo Carvalho de Melo 
AuthorDate: Sat, 16 Apr 2016 09:36:32 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Sat, 16 Apr 2016 09:44:28 -0300

perf trace: Fix build when DWARF unwind isn't available

The variable is initialized and then conditionally set to a different
value, but not used when DWARF unwinding is not available, bummer, write
1000 times: "Run make -C tools/perf build-test"...

  builtin-trace.c: In function ‘cmd_trace’:
  builtin-trace.c:3112:6: error: variable ‘max_stack_user_set’ set but not
  used [-Werror=unused-but-set-variable]
bool max_stack_user_set = true;
^
  cc1: all warnings being treated as err

Fix it by marking it as __maybe_unused.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Fixes: 056149932602 ("perf trace: Make --(min,max}-stack imply "--call-graph 
dwarf"")
Link: http://lkml.kernel.org/n/tip-85r40c5hhv6jnmph77l1h...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 026ec0c..0e3c1ce 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3109,7 +3109,7 @@ int cmd_trace(int argc, const char **argv, const char 
*prefix __maybe_unused)
"per thread proc mmap processing timeout in ms"),
OPT_END()
};
-   bool max_stack_user_set = true;
+   bool __maybe_unused max_stack_user_set = true;
bool mmap_pages_user_set = true;
const char * const trace_subcommands[] = { "record", NULL };
int err;

[tip:sched/core] sched/deadline: Fix a bug in dl_overflow()

2016-04-23 Thread tip-bot for Xunlei Pang

Commit-ID:  fec148c000d0f9ac21679601722811eb60b4cc52
Gitweb: http://git.kernel.org/tip/fec148c000d0f9ac21679601722811eb60b4cc52
Author: Xunlei Pang 
AuthorDate: Thu, 14 Apr 2016 20:19:28 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 23 Apr 2016 14:20:43 +0200

sched/deadline: Fix a bug in dl_overflow()

I got a minus(very big) dl_b->total_bw during my deadline tests.

# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw   : -97900

Something unusual must have happened.

After some digging, I finally noticed that when changing a deadline
task to normal(cfs), and changing it back to deadline immediately,
after it died, we will got the wrong dl_bw->total_bw.

The root cause is in dl_overflow(), it has:
if (new_bw == p->dl.dl_bw)
return 0;

1) When a deadline task is changed to !deadline task, it will start
   dl timer in switched_from_dl(), and retain previous deadline parameter
   till the timer expires.

2) If we change it back to deadline with the same bandwidth parameter
   before the timer expires, as it keeps the old bandwidth although it
   is not a deadline task. dl_overflow() simply returns success without
   updating the right data, and got the wrong dl_bw->total_bw.

The solution is simple, if @p is not deadline, don't return.

Signed-off-by: Xunlei Pang 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Juri Lelli 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1460636368-1993-1-git-send-email-xlp...@redhat.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 71dffbb..9d84d60 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2378,7 +2378,8 @@ static int dl_overflow(struct task_struct *p, int policy,
u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
int cpus, err = -1;
 
-   if (new_bw == p->dl.dl_bw)
+   /* !deadline task may carry old deadline bandwidth */
+   if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
return 0;
 
/*

[tip:perf/core] perf script: Check sample->callchain before using it

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  922315210b8007a26374e30712813b714af71cac
Gitweb: http://git.kernel.org/tip/922315210b8007a26374e30712813b714af71cac
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 18 Apr 2016 11:31:46 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 18 Apr 2016 11:31:46 -0300

perf script: Check sample->callchain before using it

Found by code inspection, while looking at thread__resolve_callchain()
callsites, one had it, the other didn't.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-6r8i2afd3523thuuaxl39...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 0e93282b..5099740 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -791,7 +791,7 @@ static void process_event(struct perf_script *script,
if (PRINT_FIELD(IP)) {
struct callchain_cursor *cursor = NULL, cursor_callchain;
 
-   if (symbol_conf.use_callchain &&
+   if (symbol_conf.use_callchain && sample->callchain &&
thread__resolve_callchain(al->thread, &cursor_callchain, 
evsel,
  sample, NULL, NULL, 
scripting_max_stack) == 0)
cursor = &cursor_callchain;

[tip:perf/core] perf evsel: Add missign class prefix to has_branch_stack method

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  acf2abbd0b7fcc6325e9690a8a32ee924c827f70
Gitweb: http://git.kernel.org/tip/acf2abbd0b7fcc6325e9690a8a32ee924c827f70
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 18 Apr 2016 10:35:03 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 18 Apr 2016 11:17:09 -0300

perf evsel: Add missign class prefix to has_branch_stack method

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-5i07ivw1yjsweb7gztr25...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/evsel.h   | 2 +-
 tools/perf/util/machine.c | 2 +-
 tools/perf/util/session.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index b993218..8a644fe 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -420,7 +420,7 @@ for ((_evsel) = list_entry((_leader)->node.next, struct 
perf_evsel, node);  \
  (_evsel) && (_evsel)->leader == (_leader);
\
  (_evsel) = list_entry((_evsel)->node.next, struct perf_evsel, node))
 
-static inline bool has_branch_callstack(struct perf_evsel *evsel)
+static inline bool perf_evsel__has_branch_callstack(const struct perf_evsel 
*evsel)
 {
return evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
 }
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0c4dabc..52b51e0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1808,7 +1808,7 @@ static int thread__resolve_callchain_sample(struct thread 
*thread,
 
callchain_cursor_reset(cursor);
 
-   if (has_branch_callstack(evsel)) {
+   if (perf_evsel__has_branch_callstack(evsel)) {
err = resolve_lbr_callchain_sample(thread, cursor, sample, 
parent,
   root_al, max_stack);
if (err)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ca1827c..2335b28 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -907,7 +907,7 @@ static void callchain__printf(struct perf_evsel *evsel,
unsigned int i;
struct ip_callchain *callchain = sample->callchain;
 
-   if (has_branch_callstack(evsel))
+   if (perf_evsel__has_branch_callstack(evsel))
callchain__lbr_callstack_printf(sample);
 
printf("... FP chain: nr:%" PRIu64 "\n", callchain->nr);
@@ -1081,7 +1081,7 @@ static void dump_sample(struct perf_evsel *evsel, union 
perf_event *event,
if (sample_type & PERF_SAMPLE_CALLCHAIN)
callchain__printf(evsel, sample);
 
-   if ((sample_type & PERF_SAMPLE_BRANCH_STACK) && 
!has_branch_callstack(evsel))
+   if ((sample_type & PERF_SAMPLE_BRANCH_STACK) && 
!perf_evsel__has_branch_callstack(evsel))
branch_stack__printf(sample);
 
if (sample_type & PERF_SAMPLE_REGS_USER)

[tip:perf/core] perf report: Use callchain_param.enabled instead of tool specific knob

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  1cc83815d5fdb40a7d06c3f9871134a10e5ffa79
Gitweb: http://git.kernel.org/tip/1cc83815d5fdb40a7d06c3f9871134a10e5ffa79
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 18 Apr 2016 11:54:31 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 18 Apr 2016 12:26:25 -0300

perf report: Use callchain_param.enabled instead of tool specific knob

We have callchain_param.enabled, so no need to have something just for
'perf report' to do the same thing.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-wbeisubpualwogwi5u8ut...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-report.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 160ea23..1d5be0b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -47,7 +47,6 @@ struct report {
struct perf_tooltool;
struct perf_session *session;
booluse_tui, use_gtk, use_stdio;
-   booldont_use_callchains;
boolshow_full_info;
boolshow_threads;
boolinverted_callchain;
@@ -247,7 +246,7 @@ static int report__setup_sample_type(struct report *rep)
  "you call 'perf record' without -g?\n");
return -1;
}
-   } else if (!rep->dont_use_callchains &&
+   } else if (!callchain_param.enabled &&
   callchain_param.mode != CHAIN_NONE &&
   !symbol_conf.use_callchain) {
symbol_conf.use_callchain = true;
@@ -599,13 +598,15 @@ static int __cmd_report(struct report *rep)
 static int
 report_parse_callchain_opt(const struct option *opt, const char *arg, int 
unset)
 {
-   struct report *rep = (struct report *)opt->value;
+   struct callchain_param *callchain = opt->value;
 
+   callchain->enabled = !unset;
/*
 * --no-call-graph
 */
if (unset) {
-   rep->dont_use_callchains = true;
+   symbol_conf.use_callchain = false;
+   callchain->mode = CHAIN_NONE;
return 0;
}
 
@@ -734,7 +735,7 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __maybe_unused)
   "regex filter to identify parent, see: '--sort parent'"),
OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
"Only display entries with parent-match"),
-   OPT_CALLBACK_DEFAULT('g', "call-graph", &report,
+   OPT_CALLBACK_DEFAULT('g', "call-graph", &callchain_param,
 
"print_type,threshold[,print_limit],order,sort_key[,branch],value",
 report_callchain_help, &report_parse_callchain_opt,
 callchain_default_opt),

[tip:perf/core] perf callchain: Set callchain_param.enabled when parsing --call-graph

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  30234f0925c1deeb472b579b57a9f50791157c58
Gitweb: http://git.kernel.org/tip/30234f0925c1deeb472b579b57a9f50791157c58
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 18 Apr 2016 11:53:07 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 18 Apr 2016 11:53:07 -0300

perf callchain: Set callchain_param.enabled when parsing --call-graph

Trying to move in the direction of using callchain_param for all
callchain parameters, eventually ditching them from symbol_conf.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-kixllia6r26mz45ng056z...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/callchain.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 2b4ceaf..aa248dc 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -109,6 +109,7 @@ __parse_callchain_report_opt(const char *arg, bool 
allow_record_opt)
bool record_opt_set = false;
bool try_stack_size = false;
 
+   callchain_param.enabled = true;
symbol_conf.use_callchain = true;
 
if (!arg)
@@ -117,6 +118,7 @@ __parse_callchain_report_opt(const char *arg, bool 
allow_record_opt)
while ((tok = strtok((char *)arg, ",")) != NULL) {
if (!strncmp(tok, "none", strlen(tok))) {
callchain_param.mode = CHAIN_NONE;
+   callchain_param.enabled = false;
symbol_conf.use_callchain = false;
return 0;
}

[tip:perf/core] perf tools: Ditch record_opts.callgraph_set

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  2ddd5c049e71dd8551c268e7386fefeb7495e988
Gitweb: http://git.kernel.org/tip/2ddd5c049e71dd8551c268e7386fefeb7495e988
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 18 Apr 2016 12:09:08 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 18 Apr 2016 12:26:27 -0300

perf tools: Ditch record_opts.callgraph_set

We have callchain_param.enabled for that.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-silwqjc2t25ls42dsvg28...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-record.c | 14 ++
 tools/perf/builtin-top.c| 13 ++---
 tools/perf/builtin-trace.c  |  8 
 tools/perf/perf.h   |  1 -
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5b4758a..bd95933 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -946,7 +946,6 @@ int record_opts__parse_callchain(struct record_opts *record,
 const char *arg, bool unset)
 {
int ret;
-   record->callgraph_set = true;
callchain->enabled = !unset;
 
/* --no-call-graph */
@@ -978,15 +977,14 @@ int record_callchain_opt(const struct option *opt,
 const char *arg __maybe_unused,
 int unset __maybe_unused)
 {
-   struct record_opts *record = (struct record_opts *)opt->value;
+   struct callchain_param *callchain = opt->value;
 
-   record->callgraph_set = true;
-   callchain_param.enabled = true;
+   callchain->enabled = true;
 
-   if (callchain_param.record_mode == CALLCHAIN_NONE)
-   callchain_param.record_mode = CALLCHAIN_FP;
+   if (callchain->record_mode == CALLCHAIN_NONE)
+   callchain->record_mode = CALLCHAIN_FP;
 
-   callchain_debug(&callchain_param);
+   callchain_debug(callchain);
return 0;
 }
 
@@ -1224,7 +1222,7 @@ struct option __record_options[] = {
 record__parse_mmap_pages),
OPT_BOOLEAN(0, "group", &record.opts.group,
"put the counters into a counter group"),
-   OPT_CALLBACK_NOOPT('g', NULL, &record.opts,
+   OPT_CALLBACK_NOOPT('g', NULL, &callchain_param,
   NULL, "enables call-graph recording" ,
   &record_callchain_opt),
OPT_CALLBACK(0, "call-graph", &record.opts,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 8846df0..f0cfdf3 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1045,18 +1045,17 @@ callchain_opt(const struct option *opt, const char 
*arg, int unset)
 static int
 parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 {
-   struct record_opts *record = (struct record_opts *)opt->value;
+   struct callchain_param *callchain = opt->value;
 
-   record->callgraph_set = true;
-   callchain_param.enabled = !unset;
-   callchain_param.record_mode = CALLCHAIN_FP;
+   callchain->enabled = !unset;
+   callchain->record_mode = CALLCHAIN_FP;
 
/*
 * --no-call-graph
 */
if (unset) {
symbol_conf.use_callchain = false;
-   callchain_param.record_mode = CALLCHAIN_NONE;
+   callchain->record_mode = CALLCHAIN_NONE;
return 0;
}
 
@@ -1162,10 +1161,10 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
   "output field(s): overhead, period, sample plus all of sort 
keys"),
OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples,
"Show a column with the number of samples"),
-   OPT_CALLBACK_NOOPT('g', NULL, &top.record_opts,
+   OPT_CALLBACK_NOOPT('g', NULL, &callchain_param,
   NULL, "enables call-graph recording and display",
   &callchain_opt),
-   OPT_CALLBACK(0, "call-graph", &top.record_opts,
+   OPT_CALLBACK(0, "call-graph", &callchain_param,
 
"record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]",
 top_callchain_help, &parse_callchain_opt),
OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 0e3c1ce..5e2614b 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2457,7 +2457,7 @@ static int trace__add_syscall_newtp(struct trace *trace)
perf_evlist__add(evlist, sys_enter);
perf_evlist__add(evlist, sys_exit);
 
-   if (trace->opts.callgraph_set && !trace->kernel_syscallchains) {
+   if (callchain_param.enabled && !trace->kernel_syscallchains) {
/*
 * We're interested only in the user space callchain
 *

[tip:perf/core] perf hists browser: Fold two consecutive symbol_conf.use_callchain ifs

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  1b6b678ecfb73724914a8b12d57909a4c514a9bd
Gitweb: http://git.kernel.org/tip/1b6b678ecfb73724914a8b12d57909a4c514a9bd
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 18 Apr 2016 12:24:41 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 18 Apr 2016 12:26:27 -0300

perf hists browser: Fold two consecutive symbol_conf.use_callchain ifs

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-u701i6qpecgm9jiat52i8...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/browsers/hists.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e70df2e..6a46819 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1896,11 +1896,10 @@ static int hist_browser__fprintf_entry(struct 
hist_browser *browser,
bool first = true;
int ret;
 
-   if (symbol_conf.use_callchain)
+   if (symbol_conf.use_callchain) {
folded_sign = hist_entry__folded(he);
-
-   if (symbol_conf.use_callchain)
printed += fprintf(fp, "%c ", folded_sign);
+   }
 
hists__for_each_format(browser->hists, fmt) {
if (perf_hpp__should_skip(fmt, he->hists))

[tip:perf/core] perf top: Use callchain_param.enabled instead of symbol_conf.use_callchain

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  e3815264a6c57147f8b5639536b1df3c98244642
Gitweb: http://git.kernel.org/tip/e3815264a6c57147f8b5639536b1df3c98244642
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 18 Apr 2016 12:30:16 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 18 Apr 2016 12:30:16 -0300

perf top: Use callchain_param.enabled instead of symbol_conf.use_callchain

One more step in the direction of using just callchain_param for
callchain parameters.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-3b1o9kb2dc94zldz0klck...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-top.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index f0cfdf3..c130a11 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -917,15 +917,15 @@ out_err:
return -1;
 }
 
-static int perf_top__setup_sample_type(struct perf_top *top __maybe_unused)
+static int callchain_param__setup_sample_type(struct callchain_param 
*callchain)
 {
if (!sort__has_sym) {
-   if (symbol_conf.use_callchain) {
+   if (callchain->enabled) {
ui__error("Selected -g but \"sym\" not present in 
--sort/-s.");
return -EINVAL;
}
-   } else if (callchain_param.mode != CHAIN_NONE) {
-   if (callchain_register_param(&callchain_param) < 0) {
+   } else if (callchain->mode != CHAIN_NONE) {
+   if (callchain_register_param(callchain) < 0) {
ui__error("Can't register callchain params.\n");
return -EINVAL;
}
@@ -952,7 +952,7 @@ static int __cmd_top(struct perf_top *top)
goto out_delete;
}
 
-   ret = perf_top__setup_sample_type(top);
+   ret = callchain_param__setup_sample_type(&callchain_param);
if (ret)
goto out_delete;
 
@@ -1311,7 +1311,7 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
 
top.sym_evsel = perf_evlist__first(top.evlist);
 
-   if (!symbol_conf.use_callchain) {
+   if (!callchain_param.enabled) {
symbol_conf.cumulate_callchain = false;
perf_hpp__cancel_cumulate();
}

[tip:perf/core] perf script: Fix postgresql ubuntu install instructions

2016-04-23 Thread tip-bot for Chris Phlipot

Commit-ID:  d6632dd59b66c89724ef28e2723586d1429382aa
Gitweb: http://git.kernel.org/tip/d6632dd59b66c89724ef28e2723586d1429382aa
Author: Chris Phlipot 
AuthorDate: Tue, 19 Apr 2016 01:56:02 -0700
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 19 Apr 2016 12:36:54 -0300

perf script: Fix postgresql ubuntu install instructions

The current instructions for setting up an Ubuntu system for using the
export-to-postgresql.py script are incorrect.

The instructions in the script have been updated to work on newer
versions of ubuntu.

-Add missing dependencies to apt-get command:
python-pyside.qtsql, libqt4-sql-psql
-Add '-s' option to createuser command to force the user to be a
superuser since the command doesn't prompt as indicated in the
current instructions.

Tested on: Ubuntu 14.04, Ubuntu 16.04(beta)

Signed-off-by: Chris Phlipot 
Cc: Adrian Hunter 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1461056164-14914-3-git-send-email-cphlip...@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/scripts/python/export-to-postgresql.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/perf/scripts/python/export-to-postgresql.py 
b/tools/perf/scripts/python/export-to-postgresql.py
index 1b02cdc..6f0ca68 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -34,10 +34,9 @@ import datetime
 #
 # ubuntu:
 #
-#  $ sudo apt-get install postgresql
+#  $ sudo apt-get install postgresql python-pyside.qtsql libqt4-sql-psql
 #  $ sudo su - postgres
-#  $ createuser 
-#  Shall the new role be a superuser? (y/n) y
+#  $ createuser -s 
 #
 # An example of using this script with Intel PT:
 #

[tip:perf/core] perf jit: memset() variable 'st' using the correct size

2016-04-23 Thread tip-bot for Colin Ian King

Commit-ID:  f56ebf20d0f535f5da7cfcfab3e0af133f81
Gitweb: http://git.kernel.org/tip/f56ebf20d0f535f5da7cfcfab3e0af133f81
Author: Colin Ian King 
AuthorDate: Tue, 19 Apr 2016 00:07:18 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 19 Apr 2016 12:37:01 -0300

perf jit: memset() variable 'st' using the correct size

The current code is memsetting the 'struct stat' variable 'st' with the size of
'stat' (which turns out to be 1 byte) rather than the size of variable 'sz'.

Committer notes:

sizeof(function) isn't valid, the result depends on the compiler used, with
gcc, enabling pedantic warnings we get:

  $ cat sizeof_function.c
  #include 
  #include 
  #include 
  #include 

  int main(void)
  {
  printf("sizeof(stat)=%zd, stat=%p\n", sizeof(stat), stat);
  return 0;
  }
  $ readelf -sW sizeof_function | grep -w stat
  49: 0040063016 FUNCWEAK   HIDDEN13 stat
  $ cc -pedantic sizeof_function.c   -o sizeof_function
  sizeof_function.c: In function ‘main’:
  sizeof_function.c:8:46: warning: invalid application of ‘sizeof’ to a 
function type [-Wpointer-arith]
printf("sizeof(stat)=%zd, stat=%p\n", sizeof(stat), stat);
  ^
  $ ./sizeof_function
  sizeof(stat)=1, stat=0x400630
  $

  Standard C, section 6.5.3.4:

  "The sizeof operator shall not be applied to an expression that has function
   type or an incomplete type, to the parenthesized name of such a type,
   or to an expression that designates a bit-field member."

  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

Signed-off-by: Colin Ian King 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Fixes: 9b07e27f88b9 ("perf inject: Add jitdump mmap injection support")
Link: 
http://lkml.kernel.org/r/1461020838-9260-1-git-send-email-colin.k...@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/jitdump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/jitdump.c b/tools/perf/util/jitdump.c
index 52fcef3..86afe96 100644
--- a/tools/perf/util/jitdump.c
+++ b/tools/perf/util/jitdump.c
@@ -412,7 +412,7 @@ static int jit_repipe_code_load(struct jit_buf_desc *jd, 
union jr_entry *jr)
return -1;
}
if (stat(filename, &st))
-   memset(&st, 0, sizeof(stat));
+   memset(&st, 0, sizeof(st));
 
event->mmap2.header.type = PERF_RECORD_MMAP2;
event->mmap2.header.misc = PERF_RECORD_MISC_USER;
@@ -500,7 +500,7 @@ static int jit_repipe_code_move(struct jit_buf_desc *jd, 
union jr_entry *jr)
size++; /* for \0 */
 
if (stat(filename, &st))
-   memset(&st, 0, sizeof(stat));
+   memset(&st, 0, sizeof(st));
 
size = PERF_ALIGN(size, sizeof(u64));

[tip:perf/core] perf symbols: Allow loading kallsyms without considering kcore files

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  e02092b9a922f17e951b2df5f12f4aafe7383a21
Gitweb: http://git.kernel.org/tip/e02092b9a922f17e951b2df5f12f4aafe7383a21
Author: Arnaldo Carvalho de Melo 
AuthorDate: Tue, 19 Apr 2016 12:12:49 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 19 Apr 2016 12:38:56 -0300

perf symbols: Allow loading kallsyms without considering kcore files

Before the support for using /proc/kcore was introduced, the kallsyms
routines used /proc/modules and the first 'perf test' entry expected
finding maps for each module in the system, which is not the case with
the kcore code. Provide a way to ignore kcore files so that the test can
have its expectations met.

Improving the test to cover kcore files as well needs to be done.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-ek5urnu103dlhfk4l6pcw...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 12 +---
 tools/perf/util/machine.h |  2 ++
 tools/perf/util/symbol.c  | 12 +---
 tools/perf/util/symbol.h  |  2 ++
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 52b51e0..656c1d7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -908,11 +908,11 @@ int machines__create_kernel_maps(struct machines 
*machines, pid_t pid)
return machine__create_kernel_maps(machine);
 }
 
-int machine__load_kallsyms(struct machine *machine, const char *filename,
-  enum map_type type, symbol_filter_t filter)
+int __machine__load_kallsyms(struct machine *machine, const char *filename,
+enum map_type type, bool no_kcore, symbol_filter_t 
filter)
 {
struct map *map = machine__kernel_map(machine);
-   int ret = dso__load_kallsyms(map->dso, filename, map, filter);
+   int ret = __dso__load_kallsyms(map->dso, filename, map, no_kcore, 
filter);
 
if (ret > 0) {
dso__set_loaded(map->dso, type);
@@ -927,6 +927,12 @@ int machine__load_kallsyms(struct machine *machine, const 
char *filename,
return ret;
 }
 
+int machine__load_kallsyms(struct machine *machine, const char *filename,
+  enum map_type type, symbol_filter_t filter)
+{
+   return __machine__load_kallsyms(machine, filename, type, false, filter);
+}
+
 int machine__load_vmlinux_path(struct machine *machine, enum map_type type,
   symbol_filter_t filter)
 {
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 382873b..4822de5 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -215,6 +215,8 @@ struct symbol *machine__find_kernel_function_by_name(struct 
machine *machine,
 struct map *machine__findnew_module_map(struct machine *machine, u64 start,
const char *filename);
 
+int __machine__load_kallsyms(struct machine *machine, const char *filename,
+enum map_type type, bool no_kcore, symbol_filter_t 
filter);
 int machine__load_kallsyms(struct machine *machine, const char *filename,
   enum map_type type, symbol_filter_t filter);
 int machine__load_vmlinux_path(struct machine *machine, enum map_type type,
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a36823c..415c4f6 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1208,8 +1208,8 @@ static int kallsyms__delta(struct map *map, const char 
*filename, u64 *delta)
return 0;
 }
 
-int dso__load_kallsyms(struct dso *dso, const char *filename,
-  struct map *map, symbol_filter_t filter)
+int __dso__load_kallsyms(struct dso *dso, const char *filename,
+struct map *map, bool no_kcore, symbol_filter_t filter)
 {
u64 delta = 0;
 
@@ -1230,12 +1230,18 @@ int dso__load_kallsyms(struct dso *dso, const char 
*filename,
else
dso->symtab_type = DSO_BINARY_TYPE__KALLSYMS;
 
-   if (!dso__load_kcore(dso, map, filename))
+   if (!no_kcore && !dso__load_kcore(dso, map, filename))
return dso__split_kallsyms_for_kcore(dso, map, filter);
else
return dso__split_kallsyms(dso, map, delta, filter);
 }
 
+int dso__load_kallsyms(struct dso *dso, const char *filename,
+  struct map *map, symbol_filter_t filter)
+{
+   return __dso__load_kallsyms(dso, filename, map, false, filter);
+}
+
 static int dso__load_perf_map(struct dso *dso, struct map *map,
  symbol_filter_t filter)
 {
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 1da7b10..c8e4397 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -240,6 +240,8 @@ int dso__load_vmlinux(struct dso *dso, struct map *map,
  symbol_filter_t filter);
 int dso__load_

[tip:perf/core] perf build: Remove x86 references from arch-neutral Build

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  2cc4666927402ec748122cac15ceac35a5e298a3
Gitweb: http://git.kernel.org/tip/2cc4666927402ec748122cac15ceac35a5e298a3
Author: Arnaldo Carvalho de Melo 
AuthorDate: Tue, 19 Apr 2016 12:01:51 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 19 Apr 2016 12:37:02 -0300

perf build: Remove x86 references from arch-neutral Build

It will already be dealt with generating the syscalltbl.c file in the
x86 arch specific Build files, namely via 'archheaders'.

This fixes the build on !x86 arches, as reported for powerpcle

Reported-by: Stephen Rothwell 
Tested-by: Jiri Olsa 
Cc: Adrian Hunter 
Cc: David Ahern 
Cc: "H. Peter Anvin" 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Wang Nan 
Fixes: 1b700c997500 ("perf tools: Build syscall table .c header from kernel's 
syscall_64.tbl")
Link: http://lkml.kernel.org/r/20160415212831.gt9...@kernel.org
[ Removed the syscalltbl.o altogether, as per Jiri's suggestion ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/Build | 4 
 1 file changed, 4 deletions(-)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 85a9ab6..90229a8 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -150,10 +150,6 @@ CFLAGS_libstring.o += -Wno-unused-parameter 
-DETC_PERFCONFIG="BUILD_STR($(ET
 CFLAGS_hweight.o   += -Wno-unused-parameter 
-DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
 CFLAGS_parse-events.o  += -Wno-redundant-decls
 
-$(OUTPUT)util/syscalltbl.o: util/syscalltbl.c 
arch/x86/entry/syscalls/syscall_64.tbl 
$(OUTPUT)arch/x86/include/generated/asm/syscalls_64.c FORCE
-   $(call rule_mkdir)
-   $(call if_changed_dep,cc_o_c)
-
 $(OUTPUT)util/kallsyms.o: ../lib/symbol/kallsyms.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_o_c)

[tip:perf/core] perf test: Add missing verbose output explaining the reason for failure

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  6566feafb4dba4eef30a9c0b25e6f49f996178b6
Gitweb: http://git.kernel.org/tip/6566feafb4dba4eef30a9c0b25e6f49f996178b6
Author: Arnaldo Carvalho de Melo 
AuthorDate: Tue, 19 Apr 2016 12:22:25 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 19 Apr 2016 12:39:36 -0300

perf test: Add missing verbose output explaining the reason for failure

One of the branches leading to an error had no debug message emitted,
fix it, the new lines are:

  # perf test -v kallsyms

  0x81001000: diff name v: xen_hypercall_set_trap_table k: 
hypercall_page
  0x810691f0: diff name v: try_to_free_pud_page k: try_to_free_pmd_page

  0x8150bb20: diff name v: wakeup_expire_count_show.part.5 k: 
wakeup_active_count_show.part.7
  0x816bc7f0: diff name v: phys_switch_id_show.part.11 k: 
phys_port_name_show.part.12
  0x817bbb90: diff name v: __do_softirq k: __softirqentry_text_start


This in turn exercises another bug, still under investigation, because those
aliases _are_ in kallsyms, with the same name...

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Fixes: ab414dcda8fa ("perf test: Fixup aliases checking in the 'vmlinux matches 
kallsyms' test")
Link: http://lkml.kernel.org/n/tip-5fhea7a54a54gsmagu9ob...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/vmlinux-kallsyms.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/tests/vmlinux-kallsyms.c 
b/tools/perf/tests/vmlinux-kallsyms.c
index c05f1bd..e63abab 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -163,6 +163,9 @@ next_pair:
 
pr_debug("%#" PRIx64 ": diff name v: %s 
k: %s\n",
 mem_start, sym->name, 
pair->name);
+   } else {
+   pr_debug("%#" PRIx64 ": diff name v: %s 
k: %s\n",
+mem_start, sym->name, 
first_pair->name);
}
}
} else

[tip:perf/core] perf test: Ignore kcore files in the "vmlinux matches kallsyms" test

2016-04-23 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  53d0fe68275dbdaf6a532bb4e87f00db5d36c140
Gitweb: http://git.kernel.org/tip/53d0fe68275dbdaf6a532bb4e87f00db5d36c140
Author: Arnaldo Carvalho de Melo 
AuthorDate: Tue, 19 Apr 2016 12:16:55 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 19 Apr 2016 12:39:35 -0300

perf test: Ignore kcore files in the "vmlinux matches kallsyms" test

Before:

  # perf test -v kallsyms

  Maps only in vmlinux:
   81d5e000-81ec3ac8 115e000 [kernel].init.text
   81ec3ac8-a000 12c3ac8 [kernel].exit.text
   a000-a000c000 0 [fjes]
   a000c000-a0017000 0 [video]
   a0017000-a001c000 0 [grace]

   a0a7f000-a0ba5000 0 [xfs]
   a0ba5000- 0 [veth]
  Maps in vmlinux with a different name in kallsyms:
  Maps only in kallsyms:
   8810-88001000b000 8103000 [kernel.kallsyms]
   88001000b000-8801 8001000e000 [kernel.kallsyms]
   8801-c900 8013000 [kernel.kallsyms]

   a000-ff60 7fffa0003000 [kernel.kallsyms]
   ff60- 7f603000 [kernel.kallsyms]
  test child finished with -1
   end 
  vmlinux symtab matches kallsyms: FAILED!
  #

After:

  # perf test -v 1
   1: vmlinux symtab matches kallsyms  :
  --- start ---
  test child forked, pid 7058
  Looking at the vmlinux_path (8 entries long)
  Using /lib/modules/4.6.0-rc1+/build/vmlinux for symbols
  0x81076870: diff end addr for aesni_gcm_dec v: 0x810791f2 k: 
0x81076902
  0x81079200: diff end addr for aesni_gcm_enc v: 0x8107bb03 k: 
0x81079292
  0x8107e8d0: diff end addr for aesni_gcm_enc_avx_gen2 v: 
0x81083e76 k: 0x8107e943
  0x81083e80: diff end addr for aesni_gcm_dec_avx_gen2 v: 
0x81089611 k: 0x81083ef3
  0x81089990: diff end addr for aesni_gcm_enc_avx_gen4 v: 
0x8108e7c4 k: 0x81089a03
  0x8108e7d0: diff end addr for aesni_gcm_dec_avx_gen4 v: 
0x810937ef k: 0x8108e843
  Maps only in vmlinux:
   81d5e000-81ec3ac8 115e000 [kernel].init.text
   81ec3ac8-a000 12c3ac8 [kernel].exit.text
  Maps in vmlinux with a different name in kallsyms:
  Maps only in kallsyms:
  test child finished with -1
   end 
 vmlinux symtab matches kallsyms: FAILED!
  #

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Fixes: 8e0cf965f95e ("perf symbols: Add support for reading from /proc/kcore")
Link: http://lkml.kernel.org/n/tip-n6vrwt9t89w8k769y349g...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/vmlinux-kallsyms.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/vmlinux-kallsyms.c 
b/tools/perf/tests/vmlinux-kallsyms.c
index 630b0b4..c05f1bd 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -54,8 +54,14 @@ int test__vmlinux_matches_kallsyms(int subtest 
__maybe_unused)
 * Step 3:
 *
 * Load and split /proc/kallsyms into multiple maps, one per module.
+* Do not use kcore, as this test was designed before kcore support
+* and has parts that only make sense if using the non-kcore code.
+* XXX: extend it to stress the kcorre code as well, hint: the list
+* of modules extracted from /proc/kcore, in its current form, can't
+* be compacted against the list of modules found in the "vmlinux"
+* code and with the one got from /proc/modules from the "kallsyms" 
code.
 */
-   if (machine__load_kallsyms(&kallsyms, "/proc/kallsyms", type, NULL) <= 
0) {
+   if (__machine__load_kallsyms(&kallsyms, "/proc/kallsyms", type, true, 
NULL) <= 0) {
pr_debug("dso__load_kallsyms ");
goto out;
}

oferta personal y préstamo de negocios

2016-04-23 Thread Eric Charles

Hola,

   am Dr. Eric CHARLES, legítimo y confiable prestamista préstamo de Skye de 
Servicios Financieros. Ofrecemos préstamos en una forma clara y comprensible y 
condiciones en la tasa de interés del 3%. De $ 5,000.00 a $ 450,000,000.00USD y 
Euros Solo. Nos ofrecen préstamos de negocios, préstamos personales, préstamos 
estudiantiles, préstamos para automóviles y préstamos para pagar las facturas, 
BG / SBLC en tasas bajas también disponibles, Póngase en contacto con nosotros 
por correo electrónico ahora.

sound: deadlock involving snd_hrtimer_callback

2016-04-23 Thread Dmitry Vyukov

Hi Takashi,

I've incorporated your hrtimer fixes (but also updated to
ddce192106e4f984123884f8e878f66ace94b573) and now I am seeing lots of
the following deadlock messages:


[ INFO: possible circular locking dependency detected ]
4.6.0-rc4+ #351 Not tainted
---
swapper/0/0 is trying to acquire lock:
 (&(&timer->lock)->rlock){-.-...}, at: []
snd_timer_interrupt+0xa9/0xd30 sound/core/timer.c:701

but task is already holding lock:
 (&(&stime->lock)->rlock){-.}, at: []
snd_hrtimer_callback+0x4f/0x2b0 sound/core/hrtimer.c:54

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&(&stime->lock)->rlock){-.}:
   [] lock_acquire+0x1e3/0x460
kernel/locking/lockdep.c:3677
   [< inline >] __raw_spin_lock_irqsave
include/linux/spinlock_api_smp.h:112
   [] _raw_spin_lock_irqsave+0x9f/0xd0
kernel/locking/spinlock.c:159
   [] snd_hrtimer_start+0x4a/0xf0 sound/core/hrtimer.c:112
   [] snd_timer_start1+0x2b4/0x5a0 sound/core/timer.c:457
   [] snd_timer_start+0x5d/0xa0 sound/core/timer.c:571
   [< inline >] seq_timer_start sound/core/seq/seq_timer.c:393
   [] snd_seq_timer_start+0x1a0/0x2b0
sound/core/seq/seq_timer.c:405
   [< inline >] snd_seq_queue_process_event
sound/core/seq/seq_queue.c:687
   [] snd_seq_control_queue+0x304/0x8b0
sound/core/seq/seq_queue.c:748
   [] event_input_timer+0x25/0x30
sound/core/seq/seq_system.c:118
   []
snd_seq_deliver_single_event.constprop.11+0x3f4/0x740
sound/core/seq/seq_clientmgr.c:636
   [] snd_seq_deliver_event+0x118/0x800
sound/core/seq/seq_clientmgr.c:833
   [] snd_seq_kernel_client_dispatch+0x126/0x170
sound/core/seq/seq_clientmgr.c:2418
   [] send_timer_event.isra.0+0x10b/0x150
sound/core/seq/oss/seq_oss_timer.c:153
   [] snd_seq_oss_timer_start+0x1ca/0x310
sound/core/seq/oss/seq_oss_timer.c:174
   [< inline >] old_event sound/core/seq/oss/seq_oss_event.c:125
   [] snd_seq_oss_process_event+0xa1f/0x2ce0
sound/core/seq/oss/seq_oss_event.c:100
   [< inline >] insert_queue sound/core/seq/oss/seq_oss_rw.c:179
   [] snd_seq_oss_write+0x321/0x810
sound/core/seq/oss/seq_oss_rw.c:148
   [] odev_write+0x59/0xa0
sound/core/seq/oss/seq_oss.c:177
   [] __vfs_write+0x113/0x4b0 fs/read_write.c:529
   [] vfs_write+0x167/0x4a0 fs/read_write.c:578
   [< inline >] SYSC_write fs/read_write.c:625
   [] SyS_write+0x111/0x220 fs/read_write.c:617
   [] entry_SYSCALL_64_fastpath+0x23/0xc1
arch/x86/entry/entry_64.S:207

-> #0 (&(&timer->lock)->rlock){-.-...}:
   [< inline >] check_prev_add kernel/locking/lockdep.c:1823
   [< inline >] check_prevs_add kernel/locking/lockdep.c:1933
   [< inline >] validate_chain kernel/locking/lockdep.c:2238
   [] __lock_acquire+0x3625/0x4d00
kernel/locking/lockdep.c:3298
   [] lock_acquire+0x1e3/0x460
kernel/locking/lockdep.c:3677
   [< inline >] __raw_spin_lock_irqsave
include/linux/spinlock_api_smp.h:112
   [] _raw_spin_lock_irqsave+0x9f/0xd0
kernel/locking/spinlock.c:159
   [] snd_timer_interrupt+0xa9/0xd30
sound/core/timer.c:701
   [] snd_hrtimer_callback+0x185/0x2b0
sound/core/hrtimer.c:59
   [< inline >] __run_hrtimer kernel/time/hrtimer.c:1242
   [] __hrtimer_run_queues+0x331/0xe90
kernel/time/hrtimer.c:1306
   [] hrtimer_interrupt+0x182/0x430
kernel/time/hrtimer.c:1340
   [] local_apic_timer_interrupt+0x72/0xe0
arch/x86/kernel/apic/apic.c:907
   [] smp_apic_timer_interrupt+0x79/0xa0
arch/x86/kernel/apic/apic.c:931
   [] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:454
   [< inline >] arch_safe_halt ./arch/x86/include/asm/paravirt.h:118
   [] default_idle+0x52/0x370
arch/x86/kernel/process.c:307
   [] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:298
   [] default_idle_call+0x48/0xa0 kernel/sched/idle.c:93
   [< inline >] cpuidle_idle_call kernel/sched/idle.c:151
   [< inline >] cpu_idle_loop kernel/sched/idle.c:242
   [] cpu_startup_entry+0x58f/0x7b0
kernel/sched/idle.c:291
   [] rest_init+0x18d/0x1a0 init/main.c:408
   [] start_kernel+0x63a/0x660 init/main.c:661
   [] x86_64_start_reservations+0x38/0x3a
arch/x86/kernel/head64.c:195
   [] x86_64_start_kernel+0x158/0x167
arch/x86/kernel/head64.c:176

other info that might help us debug this:

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(&(&stime->lock)->rlock);
   lock(&(&timer->lock)->rlock);
   lock(&(&stime->lock)->rlock);
  lock(&(&timer->lock)->rlock);

 *** DEADLOCK ***

1 lock held by swapper/0/0:
 #0:  (&(&stime->lock)->rlock){-.}, at: []
snd_hrtimer_callback+0x4f/0x2b0 sound/core/hrtimer.c:54

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0

[PATCH] perf tools: replace assignment with comparison on assert check

2016-04-23 Thread Colin King

From: Colin Ian King 

The current assert check is checking an assignment, which will always
be true.  Instead, the assert should be checking if scale is equal
to 0.122

Signed-off-by: Colin Ian King 
---
 tools/perf/tests/event_update.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/event_update.c b/tools/perf/tests/event_update.c
index 012eab5..63ecf21 100644
--- a/tools/perf/tests/event_update.c
+++ b/tools/perf/tests/event_update.c
@@ -30,7 +30,7 @@ static int process_event_scale(struct perf_tool *tool 
__maybe_unused,
 
TEST_ASSERT_VAL("wrong id", ev->id == 123);
TEST_ASSERT_VAL("wrong id", ev->type == PERF_EVENT_UPDATE__SCALE);
-   TEST_ASSERT_VAL("wrong scale", ev_data->scale = 0.123);
+   TEST_ASSERT_VAL("wrong scale", ev_data->scale == 0.123);
return 0;
 }
 
-- 
2.7.4

[PATCH] printk: make printk.synchronous param rw

2016-04-23 Thread Sergey Senozhatsky

Change `synchronous' printk param to be RW, so user space
can change printk mode back and forth to/from sync mode
(which is considered to be more reliable).

Signed-off-by: Sergey Senozhatsky 
Reviewed-by: Jan Kara 
---

-- added Jan's Reviewed-by
-- factored out async printk checks to can_printk_async()

 kernel/printk/printk.c | 56 --
 1 file changed, 45 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 89f5441..9345a29 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -288,14 +288,16 @@ static u32 log_buf_len = __LOG_BUF_LEN;
 
 /* Control whether printing to console must be synchronous. */
 static bool __read_mostly printk_sync = true;
-module_param_named(synchronous, printk_sync, bool, S_IRUGO);
-MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
-
 /* Printing kthread for async printk */
 static struct task_struct *printk_kthread;
 /* When `true' printing thread has messages to print */
 static bool printk_kthread_need_flush_console;
 
+static inline bool can_printk_async(void)
+{
+   return !printk_sync && printk_kthread;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1785,7 +1787,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 * operate in sync mode once panic() occurred.
 */
if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH &&
-   printk_kthread) {
+   can_printk_async()) {
/* Offload printing to a schedulable context. */
printk_kthread_need_flush_console = true;
wake_up_process(printk_kthread);
@@ -2757,6 +2759,13 @@ static int __init printk_late_init(void)
 late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
+/*
+ * Prevent starting printk_kthread from start_kernel()->parse_args().
+ * It's not possible at this stage. Instead, do it via the inticall
+ * or a sysfs knob.
+ */
+static bool printk_kthread_can_run;
+
 static int printk_kthread_func(void *data)
 {
while (1) {
@@ -2780,18 +2789,14 @@ static int printk_kthread_func(void *data)
return 0;
 }
 
-/*
- * Init async printk via late_initcall, after core/arch/device/etc.
- * initialization.
- */
-static int __init init_printk_kthread(void)
+static int __init_printk_kthread(void)
 {
struct task_struct *thread;
struct sched_param param = {
.sched_priority = MAX_RT_PRIO - 1,
};
 
-   if (printk_sync)
+   if (!printk_kthread_can_run || printk_sync || printk_kthread)
return 0;
 
thread = kthread_run(printk_kthread_func, NULL, "printk");
@@ -2805,6 +2810,35 @@ static int __init init_printk_kthread(void)
printk_kthread = thread;
return 0;
 }
+
+static int printk_sync_set(const char *val, const struct kernel_param *kp)
+{
+   int ret;
+
+   ret = param_set_bool(val, kp);
+   if (ret)
+   return ret;
+   return __init_printk_kthread();
+}
+
+static const struct kernel_param_ops param_ops_printk_sync = {
+   .set = printk_sync_set,
+   .get = param_get_bool,
+};
+
+module_param_cb(synchronous, ¶m_ops_printk_sync, &printk_sync,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+/*
+ * Init async printk via late_initcall, after core/arch/etc.
+ * initialization.
+ */
+static __init int init_printk_kthread(void)
+{
+   printk_kthread_can_run = true;
+   return __init_printk_kthread();
+}
 late_initcall(init_printk_kthread);
 
 /*
@@ -2820,7 +2854,7 @@ static void wake_up_klogd_work_func(struct irq_work 
*irq_work)
int pending = __this_cpu_xchg(printk_pending, 0);
 
if (pending & PRINTK_PENDING_OUTPUT) {
-   if (printk_kthread) {
+   if (can_printk_async()) {
wake_up_process(printk_kthread);
} else {
/*
-- 
2.8.0

Re: [PATCH 4/8] firmware: qcom: scm: Add support for ARM64 SoCs

2016-04-23 Thread Bjorn Andersson

On Fri 22 Apr 21:52 PDT 2016, Andy Gross wrote:

> On Fri, Apr 22, 2016 at 04:41:05PM -0700, Bjorn Andersson wrote:
> > On Fri 22 Apr 15:17 PDT 2016, Andy Gross wrote:
[..]
> > > diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
> > > index 8e1eeb8..7d7b12b 100644
> > [..]
> > >  
> > > +static void qcom_scm_init(void)
> > > +{
> > > + __qcom_scm_init();
> > > +}
> > > +
> > >  static int qcom_scm_probe(struct platform_device *pdev)
> > >  {
> > >   struct qcom_scm *scm;
> > > @@ -208,6 +213,8 @@ static int qcom_scm_probe(struct platform_device 
> > > *pdev)
> > >   __scm = scm;
> > >   __scm->dev = &pdev->dev;
> > >  
> > > + qcom_scm_init();
> > > +
> > 
> > Why don't you call __qcom_scm_init() directly here?
> 
> Yeah that would save some stack ops.
> 
> As a side note, what do you think about just making the first transaction on 
> the
> scm-64 side do this init to figure out 32/64 calling convention?
> 
> That would eliminate this mess.
> 

We will have quite a bunch of entry points in this API, so it will
probably be messier to have them all call some potential-init function.

Perhaps if it's possible to push it to the __qcom_scm_call{,_atomic}.
But I'm not sure we want those to be more complicated just to save this
one call...

> > >   return 0;
> > >  }
> > >  
> > > diff --git a/drivers/firmware/qcom_scm.h b/drivers/firmware/qcom_scm.h
> > [..]
> > > +#define QCOM_SCM_V2_EBUSY-12
> > >  #define QCOM_SCM_ENOMEM  -5
> > >  #define QCOM_SCM_EOPNOTSUPP  -4
> > >  #define QCOM_SCM_EINVAL_ADDR -3
> > > @@ -56,6 +58,8 @@ static inline int qcom_scm_remap_error(int err)
> > >   return -EOPNOTSUPP;
> > >   case QCOM_SCM_ENOMEM:
> > >   return -ENOMEM;
> > > + case QCOM_SCM_V2_EBUSY:
> > > + return err;
> > 
> > I don't think return -ENOMEM is the right thing to do here.
> 
> -EBUSY?
> 

That seems better.

> > >   return -EINVAL;
> > >  }

Regards,
Bjorn

Re: [PATCH 1/4] pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS

2016-04-23 Thread William Breathitt Gray

On Sat, Apr 23, 2016 at 01:51:19AM +0200, Rafael J. Wysocki wrote:
>On 4/11/2016 3:25 PM, William Breathitt Gray wrote:
>> The PNPBIOS driver requires preprocessor defines (located in
>> include/asm/segment.h) only declared if the architecture is set to
>> X86_32. If the architecture is set to X86_64, the PNPBIOS driver will
>> not build properly. The X86 dependecy for the PNPBIOS configuration
>> option is changed to an explicit X86_32  dependency in order to prevent
>> an attempt to build for an unsupported architecture.
>>
>> Cc: Rafael J. Wysocki 
>> Signed-off-by: William Breathitt Gray 
>
>Has anyone taken care of this already?
>
>If not, can you possibly resend this patch with a CC to 
>linux-a...@vger.kernel.org so I can pick it up via Patchwork more easily?

Greg K-H,

Will this patch appear in the driver-core repository along with the
other patches in this set?

Thanks,

William Breathitt Gray

1 2 >

1 - 100 of 199 matches

Mail list logo