date:20161018

Re: [PATCH] powerpc: ignore the pkey system calls for now

2016-10-18 Thread Balbir Singh



On 18/10/16 09:33, Stephen Rothwell wrote:
> Eliminates warning messages:
> 
> :1316:2: warning: #warning syscall pkey_mprotect not implemented 
> [-Wcpp]
> :1319:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp]
> :1322:2: warning: #warning syscall pkey_free not implemented [-Wcpp]
> 
> Hopefully we will remember to revert this commit if we ever implement
> them.
> 
> Signed-off-by: Stephen Rothwell 

That makes sense for now

Acked-by: Balbir Singh

[PATCH] Setup per-cpu cpu<->node binding early

2016-10-18 Thread Balbir Singh

Michael Ellerman debugged an issue w.r.t workqueue changes
(see https://lkml.org/lkml/2016/10/17/352) down to the fact
that we don't setup our per cpu (cpu to node) binding early
enough (in setup_per_cpu_areas like x86 does).

This lead to a problem with workqueue changes where the
cpus seen by for_each_node() in workqueue_init_early() was
different from their binding seen later in

for_each_possible_cpu(cpu) {
node = cpu_to_node(cpu)
...

}

In setup_arch()->initmem_init() we have access to the binding
in numa_cpu_lookup_table()

This patch implements Michael's suggestion of setting up
the per cpu node binding inside of setup_per_cpu_areas()

I did not remove the original setting of these values
from smp_prepare_cpus(). I've also not setup per cpu
mem's via set_cpu_numa_mem() since zonelists are not
yet built by the time we do per cpu setup.

Reported-by: Michael Ellerman 
Signed-off-by: Balbir Singh 
---
 arch/powerpc/kernel/setup_64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index c3e1290..842415a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -625,6 +625,8 @@ void __init setup_per_cpu_areas(void)
for_each_possible_cpu(cpu) {
 __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
paca[cpu].data_offset = __per_cpu_offset[cpu];
+
+   set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
}
 }
 #endif
-- 
2.5.5

[RFC PATCH 1/3] cxl: Split _CXL_LOOP_HCALL9 out into a separate macro

2016-10-18 Thread Michael Ellerman

In a subsequent patch we want to change the type of retbuf between
plpar_hcall() and plpar_hcall9(), but we can't because _CXL_LOOP_HCALL
expects the same type for retbuf regardless of the hcall type.

So duplicate the logic in a separate macro for plpar_hcall9().

Signed-off-by: Michael Ellerman 
---
 drivers/misc/cxl/hcalls.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/hcalls.c b/drivers/misc/cxl/hcalls.c
index d6d11f4056d7..01a8d917631c 100644
--- a/drivers/misc/cxl/hcalls.c
+++ b/drivers/misc/cxl/hcalls.c
@@ -77,8 +77,37 @@
msleep(delay);  \
}   \
}
+
+#define _CXL_LOOP_HCALL9(call, rc, retbuf, fn, ...)\
+   {   \
+   unsigned int delay, total_delay = 0;\
+   u64 token = 0;  \
+   \
+   memset(retbuf, 0, sizeof(retbuf));  \
+   while (1) { \
+   rc = call(fn, retbuf, __VA_ARGS__, token);  \
+   token = retbuf[0];  \
+   if (rc != H_BUSY && !H_IS_LONG_BUSY(rc))\
+   break;  \
+   \
+   if (rc == H_BUSY)   \
+   delay = 10; \
+   else\
+   delay = get_longbusy_msecs(rc); \
+   \
+   total_delay += delay;   \
+   if (total_delay > CXL_HCALL_TIMEOUT) {  \
+   WARN(1, "Warning: Giving up waiting for CXL 
hcall " \
+"%#x after %u msec\n", fn, total_delay); \
+   rc = H_BUSY;\
+   break;  \
+   }   \
+   msleep(delay);  \
+   }   \
+   }
+
 #define CXL_H_WAIT_UNTIL_DONE(...)  _CXL_LOOP_HCALL(plpar_hcall, __VA_ARGS__)
-#define CXL_H9_WAIT_UNTIL_DONE(...) _CXL_LOOP_HCALL(plpar_hcall9, __VA_ARGS__)
+#define CXL_H9_WAIT_UNTIL_DONE(...) _CXL_LOOP_HCALL9(plpar_hcall9, __VA_ARGS__)
 
 #define _PRINT_MSG(rc, format, ...)\
{   \
-- 
2.7.4

[RFC PATCH 2/3] powerpc/pseries: Define & use a type for the plpar_hcall() retvals

2016-10-18 Thread Michael Ellerman

We have now had two nasty stack corruption bugs caused by incorrect
sizing of the return buffer for plpar_hcall()/plpar_hcall9().

To avoid any more such bugs, define a type which encodes the size of the
return buffer, and change the argument of plpar_hcall() to be of that
type, meaning the compiler will check for us that we passed the right
size buffer.

There isn't an easy way to do this incrementally, without introducing a
new function name, eg. plpar_hcall_with_struct(), which is ugly as hell.
So just do it in one tree-wide change.

Signed-off-by: Michael Ellerman 
---

If anyone can think of a tricky way of doing this without requiring us to
update all usages of retbuf[x], then let me know.

 arch/powerpc/include/asm/hvcall.h  | 16 +-
 arch/powerpc/include/asm/plpar_wrappers.h  | 44 +-
 arch/powerpc/kernel/rtas.c |  6 ++--
 arch/powerpc/platforms/pseries/hvconsole.c | 10 +++---
 arch/powerpc/platforms/pseries/lparcfg.c   | 14 -
 arch/powerpc/platforms/pseries/rng.c   |  6 ++--
 arch/powerpc/platforms/pseries/suspend.c   |  6 ++--
 arch/powerpc/sysdev/xics/icp-hv.c  |  6 ++--
 drivers/char/hw_random/pseries-rng.c   |  6 ++--
 drivers/misc/cxl/hcalls.c  | 50 +++---
 drivers/net/ethernet/ibm/ibmveth.h |  6 ++--
 drivers/net/ethernet/ibm/ibmvnic.c |  8 ++---
 12 files changed, 90 insertions(+), 88 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 708edebcf147..b3a6c6ec6b6f 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -318,32 +318,34 @@
  */
 long plpar_hcall_norets(unsigned long opcode, ...);
 
+struct plpar_hcall_retvals
+{
+   unsigned long v[4];
+};
+
 /**
  * plpar_hcall: - Make a pseries hypervisor call
  * @opcode: The hypervisor call to make.
  * @retbuf: Buffer to store up to 4 return arguments in.
  *
- * This call supports up to 6 arguments and 4 return arguments. Use
- * PLPAR_HCALL_BUFSIZE to size the return argument buffer.
+ * This call supports up to 6 arguments and 4 return arguments.
  *
  * Used for all but the craziest of phyp interfaces (see plpar_hcall9)
  */
-#define PLPAR_HCALL_BUFSIZE 4
-long plpar_hcall(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall(unsigned long opcode, struct plpar_hcall_retvals *retvals, 
...);
 
 /**
  * plpar_hcall_raw: - Make a hypervisor call without calculating hcall stats
  * @opcode: The hypervisor call to make.
  * @retbuf: Buffer to store up to 4 return arguments in.
  *
- * This call supports up to 6 arguments and 4 return arguments. Use
- * PLPAR_HCALL_BUFSIZE to size the return argument buffer.
+ * This call supports up to 6 arguments and 4 return arguments.
  *
  * Used when phyp interface needs to be called in real mode. Similar to
  * plpar_hcall, but plpar_hcall_raw works in real mode and does not
  * calculate hypervisor call statistics.
  */
-long plpar_hcall_raw(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall_raw(unsigned long opcode, struct plpar_hcall_retvals 
*retvals, ...);
 
 /**
  * plpar_hcall9: - Make a pseries hypervisor call with up to 9 return arguments
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 1b394247afc2..17885cd60fb9 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -131,12 +131,12 @@ static inline long plpar_pte_enter(unsigned long flags,
unsigned long hpte_group, unsigned long hpte_v,
unsigned long hpte_r, unsigned long *slot)
 {
+   struct plpar_hcall_retvals retvals;
long rc;
-   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
 
-   rc = plpar_hcall(H_ENTER, retbuf, flags, hpte_group, hpte_v, hpte_r);
+   rc = plpar_hcall(H_ENTER, &retvals, flags, hpte_group, hpte_v, hpte_r);
 
-   *slot = retbuf[0];
+   *slot = retvals.v[0];
 
return rc;
 }
@@ -145,13 +145,13 @@ static inline long plpar_pte_remove(unsigned long flags, 
unsigned long ptex,
unsigned long avpn, unsigned long *old_pteh_ret,
unsigned long *old_ptel_ret)
 {
+   struct plpar_hcall_retvals retvals;
long rc;
-   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
 
-   rc = plpar_hcall(H_REMOVE, retbuf, flags, ptex, avpn);
+   rc = plpar_hcall(H_REMOVE, &retvals, flags, ptex, avpn);
 
-   *old_pteh_ret = retbuf[0];
-   *old_ptel_ret = retbuf[1];
+   *old_pteh_ret = retvals.v[0];
+   *old_ptel_ret = retvals.v[1];
 
return rc;
 }
@@ -161,13 +161,13 @@ static inline long plpar_pte_remove_raw(unsigned long 
flags, unsigned long ptex,
unsigned long avpn, unsigned long *old_pteh_ret,
unsigned long *old_ptel_ret)
 {
+   struct plpar_hcall_retvals retvals;
long rc;
-   unsigned long retbuf[PLPAR_

[RFC PATCH 3/3] powerpc/pseries: Define & use a type for the plpar_hcall9() retvals

2016-10-18 Thread Michael Ellerman

We have now had two nasty stack corruption bugs caused by incorrect
sizing of the return buffer for plpar_hcall()/plpar_hcall9().

To avoid any more such bugs, define a type which encodes the size of the
return buffer, and change the argument of plpar_hcall9() to be of that
type, meaning the compiler will check for us that we passed the right
size buffer.

There isn't an easy way to do this incrementally, without introducing a
new function name, eg. plpar_hcall9_with_struct(), which is ugly as hell.
So just do it in one tree-wide change.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/hvcall.h   |  15 ++--
 arch/powerpc/include/asm/plpar_wrappers.h   |  12 +--
 arch/powerpc/mm/numa.c  |   6 +-
 arch/powerpc/platforms/pseries/lpar.c   |  46 +-
 arch/powerpc/platforms/pseries/lparcfg.c|  28 +++
 arch/powerpc/platforms/pseries/pseries_energy.c |  12 +--
 drivers/misc/cxl/hcalls.c   |  32 +++
 drivers/net/ethernet/ibm/ehea/ehea_phyp.c   | 107 
 drivers/net/ethernet/ibm/ibmveth.h  |   8 +-
 9 files changed, 136 insertions(+), 130 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index b3a6c6ec6b6f..c2728ab84b4f 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -347,17 +347,20 @@ long plpar_hcall(unsigned long opcode, struct 
plpar_hcall_retvals *retvals, ...)
  */
 long plpar_hcall_raw(unsigned long opcode, struct plpar_hcall_retvals 
*retvals, ...);
 
+struct plpar_hcall9_retvals
+{
+   unsigned long v[9];
+};
+
 /**
  * plpar_hcall9: - Make a pseries hypervisor call with up to 9 return arguments
  * @opcode: The hypervisor call to make.
- * @retbuf: Buffer to store up to 9 return arguments in.
+ * @retvals: Buffer to store up to 9 return arguments in.
  *
- * This call supports up to 9 arguments and 9 return arguments. Use
- * PLPAR_HCALL9_BUFSIZE to size the return argument buffer.
+ * This call supports up to 9 arguments and 9 return arguments.
  */
-#define PLPAR_HCALL9_BUFSIZE 9
-long plpar_hcall9(unsigned long opcode, unsigned long *retbuf, ...);
-long plpar_hcall9_raw(unsigned long opcode, unsigned long *retbuf, ...);
+long plpar_hcall9(unsigned long opcode, struct plpar_hcall9_retvals *retvals, 
...);
+long plpar_hcall9_raw(unsigned long opcode, struct plpar_hcall9_retvals 
*retvals, ...);
 
 /* For hcall instrumentation.  One structure per-hcall, per-CPU */
 struct hcall_stats {
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 17885cd60fb9..865bc9a726e4 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -209,11 +209,11 @@ static inline long plpar_pte_read_4(unsigned long flags, 
unsigned long ptex,
 
 {
long rc;
-   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+   struct plpar_hcall9_retvals retvals;
 
-   rc = plpar_hcall9(H_READ, retbuf, flags | H_READ_4, ptex);
+   rc = plpar_hcall9(H_READ, &retvals, flags | H_READ_4, ptex);
 
-   memcpy(ptes, retbuf, 8*sizeof(unsigned long));
+   memcpy(ptes, &retvals.v, 8*sizeof(unsigned long));
 
return rc;
 }
@@ -227,11 +227,11 @@ static inline long plpar_pte_read_4_raw(unsigned long 
flags, unsigned long ptex,
 
 {
long rc;
-   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+   struct plpar_hcall9_retvals retvals;
 
-   rc = plpar_hcall9_raw(H_READ, retbuf, flags | H_READ_4, ptex);
+   rc = plpar_hcall9_raw(H_READ, &retvals, flags | H_READ_4, ptex);
 
-   memcpy(ptes, retbuf, 8*sizeof(unsigned long));
+   memcpy(ptes, &retvals.v, 8*sizeof(unsigned long));
 
return rc;
 }
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 75b9cd6150cc..aeaf1b1cb893 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1277,12 +1277,12 @@ static int update_cpu_associativity_changes_mask(void)
 static long hcall_vphn(unsigned long cpu, __be32 *associativity)
 {
long rc;
-   long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
+   struct plpar_hcall9_retvals retvals = { 0 };
u64 flags = 1;
int hwcpu = get_hard_smp_processor_id(cpu);
 
-   rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, hwcpu);
-   vphn_unpack_associativity(retbuf, associativity);
+   rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, &retvals, flags, hwcpu);
+   vphn_unpack_associativity(&retvals.v[0], associativity);
 
return rc;
 }
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 86707e67843f..d191df0c1535 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -394,6 +394,7 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned 
long *slot,
 int psize, int ssize)
 {
u

[PATCH] powerpc: Fix numa topology console print

2016-10-18 Thread Aneesh Kumar K.V

With recent update to printk, we get console output like below

[0.550639] Brought up 160 CPUs
[0.550718] Node 0 CPUs:
[0.550721]  0
[0.550754] -39

[0.550794] Node 1 CPUs:
[0.550798]  40
[0.550817] -79

[0.550856] Node 16 CPUs:
[0.550860]  80
[0.550880] -119

[0.550917] Node 17 CPUs:
[0.550923]  120
[0.550942] -159

Fix this by using proper printk args

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/numa.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 75b9cd6150cc..c8fd54ec40e4 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -845,7 +845,7 @@ void __init dump_numa_cpu_topology(void)
return;
 
for_each_online_node(node) {
-   printk(KERN_DEBUG "Node %d CPUs:", node);
+   pr_info("Node %d CPUs:", node);
 
count = 0;
/*
@@ -856,18 +856,18 @@ void __init dump_numa_cpu_topology(void)
if (cpumask_test_cpu(cpu,
node_to_cpumask_map[node])) {
if (count == 0)
-   printk(" %u", cpu);
+   pr_cont(" %u", cpu);
++count;
} else {
if (count > 1)
-   printk("-%u", cpu - 1);
+   pr_cont("-%u", cpu - 1);
count = 0;
}
}
 
if (count > 1)
-   printk("-%u", nr_cpu_ids - 1);
-   printk("\n");
+   pr_cont("-%u", nr_cpu_ids - 1);
+   pr_cont("\n");
}
 }
 
@@ -882,7 +882,7 @@ static void __init dump_numa_memory_topology(void)
for_each_online_node(node) {
unsigned long i;
 
-   printk(KERN_DEBUG "Node %d Memory:", node);
+   pr_info("Node %d Memory:", node);
 
count = 0;
 
@@ -890,18 +890,18 @@ static void __init dump_numa_memory_topology(void)
 i += (1 << SECTION_SIZE_BITS)) {
if (early_pfn_to_nid(i >> PAGE_SHIFT) == node) {
if (count == 0)
-   printk(" 0x%lx", i);
+   pr_cont(" 0x%lx", i);
++count;
} else {
if (count > 0)
-   printk("-0x%lx", i);
+   pr_cont("-0x%lx", i);
count = 0;
}
}
 
if (count > 0)
-   printk("-0x%lx", i);
-   printk("\n");
+   pr_cont("-0x%lx", i);
+   pr_cont("\n");
}
 }
 
-- 
2.10.1

Re: [PATCH v2] console: Don't prefer first registered if DT specifies stdout-path

2016-10-18 Thread Paul Burton

On Monday, 17 October 2016 19:39:57 BST Andreas Schwab wrote:
> On Okt 17 2016, Paul Burton  wrote:
> > Could you share the device tree from your system?
> 
> This is the contents of chosen/linux,stdout-path on the systems I have:
> 
> chosen/linux,stdout-path
>  "/pci@f000/ATY,SnowyParent@10/ATY,Snowy_A@0"
> 
> chosen/linux,stdout-path
>  "/pci@0,f000/NVDA,Parent@10/NVDA,Display-B@1"
> 
> Is that what you need?  There is also chosen/stdout, but no
> aliases/stdout.
> 
> Andreas.

Hi Andreas,

I think I see the problem & I'm hoping this patch will fix it:

https://lkml.org/lkml/2016/10/18/142

Could you give it a try & let me know?

Thanks,
Paul

signature.asc
Description: This is a digitally signed message part.

[PATCH] console: use first console if stdout-path device doesn't appear

2016-10-18 Thread Paul Burton

If a device tree specified a preferred device for kernel console output
via the stdout-path or linux,stdout-path chosen node properties there's
no guarantee that it will have specified a device for which we have a
driver. It may also be the case that we do have a driver but it doesn't
call of_console_check() to register as a preferred console (eg. offb
driver as used on powermac systems). In these cases try to ensure that
we provide some console output by enabling the first console in the
console_drivers list.

As I don't have access to an affected system this has only been build
tested - testing would be most appreciated.

Signed-off-by: Paul Burton 
Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies 
stdout-path")
Reported-by: Andreas Schwab 
Cc: Andreas Schwab 
Cc: Andrew Morton 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Borislav Petkov 
Cc: Tejun Heo 
Cc: linux-ker...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
---
A potential alternative to this might be to have the affected offb
driver call of_check_console(), and perhaps that should happen anyway,
but doing so seems non-trivial since the offb driver doesn't know the
index of the framebuffer console device it may be about to register &
the fbdev core doesn't know the associated device tree node. This also
wouldn't catch the case of us not having a driver for the device
specified by stdout-path, so this fallback seems worthwhile anyway.
---
 kernel/printk/printk.c | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d5e3973..7091e2f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2835,10 +2835,45 @@ EXPORT_SYMBOL(unregister_console);
  * intersects with the init section. Note that code exists elsewhere to get
  * rid of the boot console as soon as the proper console shows up, so there
  * won't be side-effects from postponing the removal.
+ *
+ * Additionally we may be using a device tree which specifies valid
+ * stdout-path referencing a device for which we don't have a driver, or for
+ * which we have a driver that doesn't register itself as preferred console
+ * using of_console_check(). In these cases we attempt here to enable the
+ * first registered console.
  */
 static int __init printk_late_init(void)
 {
-   struct console *con;
+   struct console *con, *enabled;
+
+   if (of_specified_console) {
+   console_lock();
+
+   /* Find the enabled console, if there is one */
+   enabled = NULL;
+   for_each_console(con) {
+   if (!(con->flags & CON_ENABLED))
+   continue;
+
+   enabled = con;
+   break;
+   }
+
+   /* Enable the first console if none were already enabled */
+   con = console_drivers;
+   if (!enabled && con) {
+   if (con->index < 0)
+   con->index = 0;
+   if (con->setup == NULL ||
+   con->setup(con, NULL) == 0) {
+   con->flags |= CON_ENABLED;
+   if (con->device)
+   con->flags |= CON_CONSDEV;
+   }
+   }
+
+   console_unlock();
+   }
 
for_each_console(con) {
if (!keep_bootcon && con->flags & CON_BOOT) {
-- 
2.10.0

Re: [v12, 0/8] Fix eSDHC host version register bug

2016-10-18 Thread Ulf Hansson

On 21 September 2016 at 08:57, Yangbo Lu  wrote:
> This patchset is used to fix a host version register bug in the 
> T4240-R1.0-R2.0
> eSDHC controller. To match the SoC version and revision, 10 previous version
> patchsets had tried many methods but all of them were rejected by reviewers.
> Such as
> - dts compatible method
> - syscon method
> - ifdef PPC method
> - GUTS driver getting SVR method
> Anrd suggested a soc_device_match method in v10, and this is the only 
> available
> method left now. This v11 patchset introduces the soc_device_match interface 
> in
> soc driver.
>
> The first six patches of Yangbo are to add the GUTS driver. This is used to
> register a soc device which contain soc version and revision information.
> The other two patches introduce the soc_device_match method in soc driver
> and apply it on esdhc driver to fix this bug.
>
> Arnd Bergmann (1):
>   base: soc: introduce soc_device_match() interface
>
> Yangbo Lu (7):
>   dt: bindings: update Freescale DCFG compatible
>   ARM64: dts: ls2080a: add device configuration node
>   dt: bindings: move guts devicetree doc out of powerpc directory
>   powerpc/fsl: move mpc85xx.h to include/linux/fsl
>   soc: fsl: add GUTS driver for QorIQ platforms
>   MAINTAINERS: add entry for Freescale SoC drivers
>   mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
>
>  Documentation/devicetree/bindings/arm/fsl.txt  |   6 +-
>  .../bindings/{powerpc => soc}/fsl/guts.txt |   3 +
>  MAINTAINERS|  11 +-
>  arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi |   6 +
>  arch/powerpc/kernel/cpu_setup_fsl_booke.S  |   2 +-
>  arch/powerpc/sysdev/fsl_pci.c  |   2 +-
>  drivers/base/Kconfig   |   1 +
>  drivers/base/soc.c |  66 ++
>  drivers/clk/clk-qoriq.c|   3 +-
>  drivers/i2c/busses/i2c-mpc.c   |   2 +-
>  drivers/iommu/fsl_pamu.c   |   3 +-
>  drivers/mmc/host/Kconfig   |   1 +
>  drivers/mmc/host/sdhci-of-esdhc.c  |  20 ++
>  drivers/net/ethernet/freescale/gianfar.c   |   2 +-
>  drivers/soc/Kconfig|   2 +-
>  drivers/soc/fsl/Kconfig|  19 ++
>  drivers/soc/fsl/Makefile   |   1 +
>  drivers/soc/fsl/guts.c | 257 
> +
>  include/linux/fsl/guts.h   | 125 ++
>  .../asm/mpc85xx.h => include/linux/fsl/svr.h   |   4 +-
>  include/linux/sys_soc.h|   3 +
>  21 files changed, 478 insertions(+), 61 deletions(-)
>  rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)
>  create mode 100644 drivers/soc/fsl/Kconfig
>  create mode 100644 drivers/soc/fsl/guts.c
>  rename arch/powerpc/include/asm/mpc85xx.h => include/linux/fsl/svr.h (97%)
>
> --
> 2.1.0.27.g96db324
>

This looks good to me! I am not sure which tree you want this to be
picked up through, but unless no other volunteers I can take it
through my mmc tree.

Although, before considering to apply, I need an ack from Scott/Arnd
for the guts driver in patch 5/8 and I need an ack from Greg for patch
7/8, where the soc_device_match() interface is added (seems like you
didn't add him on cc/to).

Kind regards
Uffe

Re: [PATCH 01/10] mm: remove write/force parameters from __get_user_pages_locked()

2016-10-18 Thread Jan Kara

On Thu 13-10-16 01:20:11, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from 
> __get_user_pages_locked()
> to make the use of FOLL_FORCE explicit in callers as use of this flag can 
> result
> in surprising behaviour (and hence bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH 02/10] mm: remove write/force parameters from __get_user_pages_unlocked()

2016-10-18 Thread Jan Kara

On Thu 13-10-16 01:20:12, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from
> __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in callers 
> as
> use of this flag can result in surprising behaviour (and hence bugs) within 
> the
> mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

The patch looks good. You can add:

Reviewed-by: Jan Kara 

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH 03/10] mm: replace get_user_pages_unlocked() write/force parameters with gup_flags

2016-10-18 Thread Jan Kara

On Thu 13-10-16 01:20:13, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from 
> get_user_pages_unlocked()
> and replaces them with a gup_flags parameter to make the use of FOLL_FORCE
> explicit in callers as use of this flag can result in surprising behaviour 
> (and
> hence bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH 04/10] mm: replace get_user_pages_locked() write/force parameters with gup_flags

2016-10-18 Thread Jan Kara

On Thu 13-10-16 01:20:14, Lorenzo Stoakes wrote:
> This patch removes the write and force parameters from get_user_pages_locked()
> and replaces them with a gup_flags parameter to make the use of FOLL_FORCE
> explicit in callers as use of this flag can result in surprising behaviour 
> (and
> hence bugs) within the mm subsystem.
> 
> Signed-off-by: Lorenzo Stoakes 
> ---
>  include/linux/mm.h |  2 +-
>  mm/frame_vector.c  |  8 +++-
>  mm/gup.c   | 12 +++-
>  mm/nommu.c |  5 -
>  4 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6adc4bc..27ab538 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1282,7 +1282,7 @@ long get_user_pages(unsigned long start, unsigned long 
> nr_pages,
>   int write, int force, struct page **pages,
>   struct vm_area_struct **vmas);
>  long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> - int write, int force, struct page **pages, int *locked);
> + unsigned int gup_flags, struct page **pages, int *locked);

Hum, the prototype is inconsistent with e.g. __get_user_pages_unlocked()
where gup_flags come after **pages argument. Actually it makes more sense
to have it before **pages so that input arguments come first and output
arguments second but I don't care that much. But it definitely should be
consistent...

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH 04/10] mm: replace get_user_pages_locked() write/force parameters with gup_flags

2016-10-18 Thread Lorenzo Stoakes

On Tue, Oct 18, 2016 at 02:54:25PM +0200, Jan Kara wrote:
> > @@ -1282,7 +1282,7 @@ long get_user_pages(unsigned long start, unsigned 
> > long nr_pages,
> > int write, int force, struct page **pages,
> > struct vm_area_struct **vmas);
> >  long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> > -   int write, int force, struct page **pages, int *locked);
> > +   unsigned int gup_flags, struct page **pages, int *locked);
>
> Hum, the prototype is inconsistent with e.g. __get_user_pages_unlocked()
> where gup_flags come after **pages argument. Actually it makes more sense
> to have it before **pages so that input arguments come first and output
> arguments second but I don't care that much. But it definitely should be
> consistent...

It was difficult to decide quite how to arrange parameters as there was
inconsitency with regards to parameter ordering already - for example
__get_user_pages() places its flags argument before pages whereas, as you note,
__get_user_pages_unlocked() puts them afterwards.

I ended up compromising by trying to match the existing ordering of the function
as much as I could by replacing write, force pairs with gup_flags in the same
location (with the exception of get_user_pages_unlocked() which I felt should
match __get_user_pages_unlocked() in signature) or if there was already a
gup_flags parameter as in the case of __get_user_pages_unlocked() I simply
removed the write, force pair and left the flags as the last parameter.

I am happy to rearrange parameters as needed, however I am not sure if it'd be
worthwhile for me to do so (I am keen to try to avoid adding too much noise here
:)

If we were to rearrange parameters for consistency I'd suggest adjusting
__get_user_pages_unlocked() to put gup_flags before pages and do the same with
get_user_pages_unlocked(), let me know what you think.

Re: [RFC PATCH 2/3] powerpc/pseries: Define & use a type for the plpar_hcall() retvals

2016-10-18 Thread Balbir Singh



On 18/10/16 19:40, Michael Ellerman wrote:
> We have now had two nasty stack corruption bugs caused by incorrect
> sizing of the return buffer for plpar_hcall()/plpar_hcall9().
> 
> To avoid any more such bugs, define a type which encodes the size of the
> return buffer, and change the argument of plpar_hcall() to be of that
> type, meaning the compiler will check for us that we passed the right
> size buffer.
> 
> There isn't an easy way to do this incrementally, without introducing a
> new function name, eg. plpar_hcall_with_struct(), which is ugly as hell.
> So just do it in one tree-wide change.
> 
Conceptually looks god, but I think we need to abstract the return values
as well. I'll test and see if I can send you something on top of this
Balbir

Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-18 Thread Michal Hocko

On Thu 13-10-16 01:20:10, Lorenzo Stoakes wrote:
> This patch series adjusts functions in the get_user_pages* family such that
> desired FOLL_* flags are passed as an argument rather than implied by flags.
> 
> The purpose of this change is to make the use of FOLL_FORCE explicit so it is
> easier to grep for and clearer to callers that this flag is being used. The 
> use
> of FOLL_FORCE is an issue as it overrides missing VM_READ/VM_WRITE flags for 
> the
> VMA whose pages we are reading from/writing to, which can result in surprising
> behaviour.
> 
> The patch series came out of the discussion around commit 38e0885, which
> addressed a BUG_ON() being triggered when a page was faulted in with PROT_NONE
> set but having been overridden by FOLL_FORCE. do_numa_page() was run on the
> assumption the page _must_ be one marked for NUMA node migration as an actual
> PROT_NONE page would have been dealt with prior to this code path, however
> FOLL_FORCE introduced a situation where this assumption did not hold.
> 
> See https://marc.info/?l=linux-mm&m=147585445805166 for the patch proposal.

I like this cleanup. Tracking FOLL_FORCE users was always a nightmare
and the flag behavior is really subtle so we should better be explicit
about it. I haven't gone through each patch separately but rather
applied the whole series and checked the resulting diff. This all seems
OK to me and feel free to add
Acked-by: Michal Hocko 

I am wondering whether we can go further. E.g. it is not really clear to
me whether we need an explicit FOLL_REMOTE when we can in fact check
mm != current->mm and imply that. Maybe there are some contexts which
wouldn't work, I haven't checked.

Then I am also wondering about FOLL_TOUCH behavior.
__get_user_pages_unlocked has only few callers which used to be
get_user_pages_unlocked before 1e9877902dc7e ("mm/gup: Introduce
get_user_pages_remote()"). To me a dropped FOLL_TOUCH seems
unintentional. Now that get_user_pages_unlocked has gup_flags argument I
guess we might want to get rid of the __g-u-p-u version altogether, no?

__get_user_pages is quite low level and imho shouldn't be exported. It's
only user - kvm - should rather pull those two functions to gup instead
and export them. There is nothing really KVM specific in them.

I also cannot say I would be entirely thrilled about get_user_pages_locked,
we only have one user which can simply do lock g-u-p unlock AFAICS.

I guess there is more work in that area and I do not want to impose all
that work on you, but I couldn't resist once I saw you playing in that
area ;) Definitely a good start!
-- 
Michal Hocko
SUSE Labs

[PATCH] mm/hugetlb: Use the right pte val for compare in hugetlb_cow

2016-10-18 Thread Aneesh Kumar K.V

We cannot use the pte value used in set_pte_at for pte_same comparison,
because archs like ppc64, filter/add new pte flag in set_pte_at. Instead
fetch the pte value inside hugetlb_cow. We are comparing pte value to
make sure the pte didn't change since we dropped the page table lock.
hugetlb_cow get called with page table lock held, and we can take a copy
of the pte value before we drop the page table lock.

With hugetlbfs, we optimize the MAP_PRIVATE write fault path with no
previous mapping (huge_pte_none entries), by forcing a cow in the fault
path. This avoid take an addition fault to covert a read-only mapping
to read/write. Here we were comparing a recently instantiated pte (via
set_pte_at) to the pte values from linux page table. As explained above
on ppc64 such pte_same check returned wrong result, resulting in us
taking an additional fault on ppc64.

Fixes: 6a119eae942c ("powerpc/mm: Add a _PAGE_PTE bit")

Reported-by: Jan Stancek 
Signed-off-by: Aneesh Kumar K.V 
---
 mm/hugetlb.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ec49d9ef1eef..da8fbd02b92e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3386,15 +3386,17 @@ static void unmap_ref_private(struct mm_struct *mm, 
struct vm_area_struct *vma,
  * Keep the pte_same checks anyway to make transition from the mutex easier.
  */
 static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
-   unsigned long address, pte_t *ptep, pte_t pte,
-   struct page *pagecache_page, spinlock_t *ptl)
+  unsigned long address, pte_t *ptep,
+  struct page *pagecache_page, spinlock_t *ptl)
 {
+   pte_t pte;
struct hstate *h = hstate_vma(vma);
struct page *old_page, *new_page;
int ret = 0, outside_reserve = 0;
unsigned long mmun_start;   /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
 
+   pte = huge_ptep_get(ptep);
old_page = pte_page(pte);
 
 retry_avoidcopy:
@@ -3668,7 +3670,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
-   ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page, ptl);
+   ret = hugetlb_cow(mm, vma, address, ptep, page, ptl);
}
 
spin_unlock(ptl);
@@ -3822,8 +3824,8 @@ int hugetlb_fault(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
if (flags & FAULT_FLAG_WRITE) {
if (!huge_pte_write(entry)) {
-   ret = hugetlb_cow(mm, vma, address, ptep, entry,
-   pagecache_page, ptl);
+   ret = hugetlb_cow(mm, vma, address, ptep,
+ pagecache_page, ptl);
goto out_put_page;
}
entry = huge_pte_mkdirty(entry);
-- 
2.10.1

[PATCH v6 0/3] powerpc/pseries: Implement indexed-count memory hotplug

2016-10-18 Thread Nathan Fontenot

Indexed-count memory management allows addition and removal of contiguous
lmb blocks with a single command. When compared to the series of calls
previously required to manage contiguous blocks, indexed-count decreases
command frequency and reduces risk of buffer overflow.

-Nathan
---

Changes in v2:
--
-[PATCH 1/2]:   -remove potential memory leak when parsing command
-use u32s drc_index and count instead of u32 ic[]
 in dlpar_memory
-[PATCH 2/2]:   -use u32s drc_index and count instead of u32 ic[]
 in dlpar_memory

Changes in v3:
--
-[PATCH 1/2]:   -add logic to handle invalid drc_index input
-update indexed-count trigger to follow naming convention
-update dlpar_memory to follow kernel if-else style
-[PATCH 2/2]:   -add logic to handle invalid drc_index input

Changes in v4:
--
-[PATCH 1/2]:   -add logic to handle kstrdup failure
-add logic to handle invalid command format

Changes in v5:
--
-[PATCH 2/2]:   -update for() loop to use start_index

Changes in v6:
-
-[PATCH 1/3]:   -Added new patch 1 that corrects reading past the end of
 the sysfs buffer when parsing sysfs input.
-[PATCH 2/3]:   -Updated sysfs buffer parsing

Nathan Fontenot (1):
  powerpc/pseries: Correct possible read beyond dlpar sysfs buffer

Sahil Mehta (2):
  powerpc/pseries: Implement indexed-count hotplug memory add
  powerpc/pseries: Implement indexed-count hotplug memory remove


 arch/powerpc/include/asm/rtas.h |2 
 arch/powerpc/platforms/pseries/dlpar.c  |  177 +++-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  200 ++-
 3 files changed, 323 insertions(+), 56 deletions(-)

[PATCH v6 1/3] powerpc/pseries: Correct possible read beyond dlpar sysfs buffer

2016-10-18 Thread Nathan Fontenot

The pasrsing of data written to the dlpar file in sysfs does not correctly
account for the possibility of reading past the end of the buffer. Correct
this by updating the buffer parsing code to make a local copy and use the
strsep() and sysfs_streq() routines to parse the buffer. This also
separates the parsing code into subroutines to make cleaner.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/platforms/pseries/dlpar.c |  141 ++--
 1 file changed, 96 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 423e450..ec5d677 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -418,84 +418,135 @@ void queue_hotplug_event(struct pseries_hp_errorlog 
*hp_errlog,
}
 }
 
-static ssize_t dlpar_store(struct class *class, struct class_attribute *attr,
-  const char *buf, size_t count)
+static int dlpar_parse_resource(char **cmd, struct pseries_hp_errorlog 
*hp_elog)
 {
-   struct pseries_hp_errorlog *hp_elog;
-   struct completion hotplug_done;
-   const char *arg;
-   int rc;
+   char *arg;
 
-   hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
-   if (!hp_elog) {
-   rc = -ENOMEM;
-   goto dlpar_store_out;
-   }
+   arg = strsep(cmd, " ");
+   if (!arg)
+   return -EINVAL;
 
-   /* Parse out the request from the user, this will be in the form
-*
-*/
-   arg = buf;
-   if (!strncmp(arg, "memory", 6)) {
+   if (sysfs_streq(arg, "memory")) {
hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_MEM;
-   arg += strlen("memory ");
-   } else if (!strncmp(arg, "cpu", 3)) {
+   } else if (sysfs_streq(arg, "cpu")) {
hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_CPU;
-   arg += strlen("cpu ");
} else {
-   pr_err("Invalid resource specified: \"%s\"\n", buf);
-   rc = -EINVAL;
-   goto dlpar_store_out;
+   pr_err("Invalid resource specified.\n");
+   return -EINVAL;
}
 
-   if (!strncmp(arg, "add", 3)) {
+   return 0;
+}
+
+static int dlpar_parse_action(char **cmd, struct pseries_hp_errorlog *hp_elog)
+{
+   char *arg;
+
+   arg = strsep(cmd, " ");
+   if (!arg)
+   return -EINVAL;
+
+   if (sysfs_streq(arg, "add")) {
hp_elog->action = PSERIES_HP_ELOG_ACTION_ADD;
-   arg += strlen("add ");
-   } else if (!strncmp(arg, "remove", 6)) {
+   } else if (sysfs_streq(arg, "remove")) {
hp_elog->action = PSERIES_HP_ELOG_ACTION_REMOVE;
-   arg += strlen("remove ");
} else {
-   pr_err("Invalid action specified: \"%s\"\n", buf);
-   rc = -EINVAL;
-   goto dlpar_store_out;
+   pr_err("Invalid action specified.\n");
+   return -EINVAL;
}
 
-   if (!strncmp(arg, "index", 5)) {
-   u32 index;
+   return 0;
+}
 
+static int dlpar_parse_id_type(char **cmd, struct pseries_hp_errorlog *hp_elog)
+{
+   char *arg;
+   u32 count, index;
+
+   arg = strsep(cmd, " ");
+   if (!arg)
+   return -EINVAL;
+
+   if (sysfs_streq(arg, "index")) {
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
-   arg += strlen("index ");
+   arg = strsep(cmd, " ");
+   if (!arg) {
+   pr_err("No DRC Index specified.\n");
+   return -EINVAL;
+   }
+
if (kstrtou32(arg, 0, &index)) {
-   rc = -EINVAL;
-   pr_err("Invalid drc_index specified: \"%s\"\n", buf);
-   goto dlpar_store_out;
+   pr_err("Invalid DRC Index specified.\n");
+   return -EINVAL;
}
 
hp_elog->_drc_u.drc_index = cpu_to_be32(index);
-   } else if (!strncmp(arg, "count", 5)) {
-   u32 count;
-
+   } else if (sysfs_streq(arg, "count")) {
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_COUNT;
-   arg += strlen("count ");
+   arg = strsep(cmd, " ");
+   if (!arg) {
+   pr_err("No DRC count specified.\n");
+   return -EINVAL;
+   }
+
if (kstrtou32(arg, 0, &count)) {
-   rc = -EINVAL;
-   pr_err("Invalid count specified: \"%s\"\n", buf);
-   goto dlpar_store_out;
+   pr_err("Invalid DRC count specified.\n");
+   return -EINVAL;
}
 
hp_elog->_drc_u.drc_count = cpu_to_be32(count);
} else {
-   pr_err("Invalid id_type specified: \"%s\"\n",

[PATCH v6 2/3] powerpc/pseries: Implement indexed-count hotplug memory add

2016-10-18 Thread Nathan Fontenot

From: Sahil Mehta 

Indexed-count add for memory hotplug guarantees that a contiguous block
of  lmbs beginning at a specified  will be assigned (NOT
that  lmbs will be added). Because of Qemu's per-DIMM memory
management, the addition of a contiguous block of memory currently
requires a series of individual calls. Indexed-count add reduces
this series into a single call.

Signed-off-by: Sahil Mehta 
Signed-off-by: Nathan Fontenot 
---
v2: -remove potential memory leak when parsing command
-use u32s drc_index and count instead of u32 ic[]
 in dlpar_memory
v3: -add logic to handle invalid drc_index input
-update indexed-count trigger to follow naming convention
-update dlpar_memory to follow kernel if-else style
v4: -add logic to handle kstrdup failure
-add logic to handle invalid command format
v5: -none
v6: -Corrected the sysfs buffer parsing to align with the updated
 parsing code in the new patch 1 of the series.

 arch/powerpc/include/asm/rtas.h |2 
 arch/powerpc/platforms/pseries/dlpar.c  |   38 +++-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  110 +--
 3 files changed, 138 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9c23baa..5d65de7 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -307,6 +307,7 @@ struct pseries_hp_errorlog {
union {
__be32  drc_index;
__be32  drc_count;
+   __be32  indexed_count[2];
chardrc_name[1];
} _drc_u;
 };
@@ -322,6 +323,7 @@ struct pseries_hp_errorlog {
 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
 #define PSERIES_HP_ELOG_ID_DRC_COUNT   3
+#define PSERIES_HP_ELOG_ID_DRC_IC  4
 
 struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
  uint16_t section_id);
diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index ec5d677..2e01b1d 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -354,11 +354,17 @@ static int handle_dlpar_errorlog(struct 
pseries_hp_errorlog *hp_elog)
switch (hp_elog->id_type) {
case PSERIES_HP_ELOG_ID_DRC_COUNT:
hp_elog->_drc_u.drc_count =
-   be32_to_cpu(hp_elog->_drc_u.drc_count);
+   be32_to_cpu(hp_elog->_drc_u.drc_count);
break;
case PSERIES_HP_ELOG_ID_DRC_INDEX:
hp_elog->_drc_u.drc_index =
-   be32_to_cpu(hp_elog->_drc_u.drc_index);
+   be32_to_cpu(hp_elog->_drc_u.drc_index);
+   break;
+   case PSERIES_HP_ELOG_ID_DRC_IC:
+   hp_elog->_drc_u.indexed_count[0] =
+   be32_to_cpu(hp_elog->_drc_u.indexed_count[0]);
+   hp_elog->_drc_u.indexed_count[1] =
+   be32_to_cpu(hp_elog->_drc_u.indexed_count[1]);
}
 
switch (hp_elog->resource) {
@@ -467,7 +473,33 @@ static int dlpar_parse_id_type(char **cmd, struct 
pseries_hp_errorlog *hp_elog)
if (!arg)
return -EINVAL;
 
-   if (sysfs_streq(arg, "index")) {
+   if (sysfs_streq(arg, "indexed-count")) {
+   hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_IC;
+   arg = strsep(cmd, " ");
+   if (!arg) {
+   pr_err("No DRC count specified.\n");
+   return -EINVAL;
+   }
+
+   if (kstrtou32(arg, 0, &count)) {
+   pr_err("Invalid DRC count specified.\n");
+   return -EINVAL;
+   }
+
+   arg = strsep(cmd, " ");
+   if (!arg) {
+   pr_err("No DRC Index specified.\n");
+   return -EINVAL;
+   }
+
+   if (kstrtou32(arg, 0, &index)) {
+   pr_err("Invalid DRC Index specified.\n");
+   return -EINVAL;
+   }
+
+   hp_elog->_drc_u.indexed_count[0] = cpu_to_be32(count);
+   hp_elog->_drc_u.indexed_count[1] = cpu_to_be32(index);
+   } else if (sysfs_streq(arg, "index")) {
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
arg = strsep(cmd, " ");
if (!arg) {
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 76ec104..badc66d 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -725,6 +725,89 @@ static int dlpar_memory_add_by_index(u32 drc_index, struct 
property *prop)
return rc;
 }
 
+static int dlpar_

[PATCH v6 3/3] powerpc/pseries: Implement indexed-count hotplug memory remove

2016-10-18 Thread Nathan Fontenot

From: Sahil Mehta 

Indexed-count remove for memory hotplug guarantees that a contiguous block
of  lmbs beginning at a specified  will be unassigned (NOT
that  lmbs will be removed). Because of Qemu's per-DIMM memory
management, the removal of a contiguous block of memory currently
requires a series of individual calls. Indexed-count remove reduces
this series into a single call.

Signed-off-by: Sahil Mehta 
Signed-off-by: Nathan Fontenot 
---
v2: -use u32s drc_index and count instead of u32 ic[]
 in dlpar_memory
v3: -add logic to handle invalid drc_index input
v4: -none
v5: -Update for() loop to start at start_index
v6: -none

 arch/powerpc/platforms/pseries/hotplug-memory.c |   90 +++
 1 file changed, 90 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index badc66d..19ad081 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -558,6 +558,92 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
return rc;
 }
 
+static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index,
+struct property *prop)
+{
+   struct of_drconf_cell *lmbs;
+   u32 num_lmbs, *p;
+   int i, rc, start_lmb_found;
+   int lmbs_available = 0, start_index = 0, end_index;
+
+   pr_info("Attempting to hot-remove %u LMB(s) at %x\n",
+   lmbs_to_remove, drc_index);
+
+   if (lmbs_to_remove == 0)
+   return -EINVAL;
+
+   p = prop->value;
+   num_lmbs = *p++;
+   lmbs = (struct of_drconf_cell *)p;
+   start_lmb_found = 0;
+
+   /* Navigate to drc_index */
+   while (start_index < num_lmbs) {
+   if (lmbs[start_index].drc_index == drc_index) {
+   start_lmb_found = 1;
+   break;
+   }
+
+   start_index++;
+   }
+
+   if (!start_lmb_found)
+   return -EINVAL;
+
+   end_index = start_index + lmbs_to_remove;
+
+   /* Validate that there are enough LMBs to satisfy the request */
+   for (i = start_index; i < end_index; i++) {
+   if (lmbs[i].flags & DRCONF_MEM_RESERVED)
+   break;
+
+   lmbs_available++;
+   }
+
+   if (lmbs_available < lmbs_to_remove)
+   return -EINVAL;
+
+   for (i = start_index; i < end_index; i++) {
+   if (!(lmbs[i].flags & DRCONF_MEM_ASSIGNED))
+   continue;
+
+   rc = dlpar_remove_lmb(&lmbs[i]);
+   if (rc)
+   break;
+
+   lmbs[i].reserved = 1;
+   }
+
+   if (rc) {
+   pr_err("Memory indexed-count-remove failed, adding any removed 
LMBs\n");
+
+   for (i = start_index; i < end_index; i++) {
+   if (!lmbs[i].reserved)
+   continue;
+
+   rc = dlpar_add_lmb(&lmbs[i]);
+   if (rc)
+   pr_err("Failed to add LMB, drc index %x\n",
+  be32_to_cpu(lmbs[i].drc_index));
+
+   lmbs[i].reserved = 0;
+   }
+   rc = -EINVAL;
+   } else {
+   for (i = start_index; i < end_index; i++) {
+   if (!lmbs[i].reserved)
+   continue;
+
+   pr_info("Memory at %llx (drc index %x) was 
hot-removed\n",
+   lmbs[i].base_addr, lmbs[i].drc_index);
+
+   lmbs[i].reserved = 0;
+   }
+   }
+
+   return rc;
+}
+
 #else
 static inline int pseries_remove_memblock(unsigned long base,
  unsigned int memblock_size)
@@ -853,6 +939,10 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
} else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
drc_index = hp_elog->_drc_u.drc_index;
rc = dlpar_memory_remove_by_index(drc_index, prop);
+   } else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_IC) {
+   count = hp_elog->_drc_u.indexed_count[0];
+   drc_index = hp_elog->_drc_u.indexed_count[1];
+   rc = dlpar_memory_remove_by_ic(count, drc_index, prop);
} else {
rc = -EINVAL;
}

Re: [PATCH] mm/hugetlb: Use the right pte val for compare in hugetlb_cow

2016-10-18 Thread Andrew Morton

On Tue, 18 Oct 2016 21:12:45 +0530 "Aneesh Kumar K.V" 
 wrote:

> We cannot use the pte value used in set_pte_at for pte_same comparison,
> because archs like ppc64, filter/add new pte flag in set_pte_at. Instead
> fetch the pte value inside hugetlb_cow. We are comparing pte value to
> make sure the pte didn't change since we dropped the page table lock.
> hugetlb_cow get called with page table lock held, and we can take a copy
> of the pte value before we drop the page table lock.
> 
> With hugetlbfs, we optimize the MAP_PRIVATE write fault path with no
> previous mapping (huge_pte_none entries), by forcing a cow in the fault
> path. This avoid take an addition fault to covert a read-only mapping
> to read/write. Here we were comparing a recently instantiated pte (via
> set_pte_at) to the pte values from linux page table. As explained above
> on ppc64 such pte_same check returned wrong result, resulting in us
> taking an additional fault on ppc64.

>From my reading this is a minor performance improvement and a -stable
backport isn't needed.  But it is unclear whether the impact warrants a
4.9 merge.

Please be careful about describing end-user visible impacts when fixing
bugs, thanks.

Re: Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot)

2016-10-18 Thread Tejun Heo

Hello, Michael.

On Tue, Oct 18, 2016 at 03:37:42PM +1100, Michael Ellerman wrote:
> That doesn't compile, wq doesn't exist.
> 
> I guessed that you meant:
> 
> +   wq_numa_init();
> +   list_for_each_entry(wq, &workqueues, list)
> +   wq_update_unbound_numa(wq, smp_processor_id(), true);

Yeap, sorry about that.

> And that does boot.
> 
> The sysrq-t output is below, it's rather large.

Didn't expect that many cpus but it looks good.  I'll post proper
patches soon.

Thanks!

-- 
tejun

Re: [PATCH] console: use first console if stdout-path device doesn't appear

2016-10-18 Thread Andreas Schwab

On Okt 18 2016, Paul Burton  wrote:

> If a device tree specified a preferred device for kernel console output
> via the stdout-path or linux,stdout-path chosen node properties there's
> no guarantee that it will have specified a device for which we have a
> driver. It may also be the case that we do have a driver but it doesn't
> call of_console_check() to register as a preferred console (eg. offb
> driver as used on powermac systems). In these cases try to ensure that
> we provide some console output by enabling the first console in the
> console_drivers list.
>
> As I don't have access to an affected system this has only been build
> tested - testing would be most appreciated.

Unfortunately that doesn't work.  The initial console still cannot be
opened.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Re: powerpc/64: Fix incorrect return value from __copy_tofrom_user

2016-10-18 Thread Michael Ellerman

On Tue, 2016-11-10 at 11:25:47 UTC, Paul Mackerras wrote:
> Debugging a data corruption issue with virtio-net/vhost-net led to
> the observation that __copy_tofrom_user was occasionally returning
> a value 16 larger than it should.  Since the return value from
> __copy_tofrom_user is the number of bytes not copied, this means
> that __copy_tofrom_user can occasionally return a value larger
> than the number of bytes it was asked to copy.  In turn this can
> cause higher-level copy functions such as copy_page_to_iter_iovec
> to corrupt memory by copying data into the wrong memory locations.
> 
> It turns out that the failing case involves a fault on the store
> at label 79, and at that point the first unmodified byte of the
> destination is at R3 + 16.  Consequently the exception handler
> for that store needs to add 16 to R3 before using it to work out
> how many bytes were not copied, but in this one case it was not
> adding the offset to R3.  To fix it, this moves the label 179 to
> the point where we add 16 to R3.  I have checked manually all the
> exception handlers for the loads and stores in this code and the
> rest of them are correct (it would be excellent to have an
> automated test of all the exception cases).
> 
> This bug has been present since this code was initially
> committed in May 2002 to Linux version 2.5.20.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paul Mackerras 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1a34439e5a0b2235e43f96816dbb15

cheers

Re: powerpc/mm/hash64: Fix might_have_hea() check

2016-10-18 Thread Michael Ellerman

On Tue, 2016-11-10 at 10:15:04 UTC, Michael Ellerman wrote:
> In commit 2b4e3ad8f579 ("powerpc/mm/hash64: Don't test for machine type
> to detect HEA special case") we changed the logic in might_have_hea()
> to check FW_FEATURE_SPLPAR rather than machine_is(pseries).
> 
> However the check was incorrectly negated, leading to crashes on
> machines with HEA adapters, such as:
> 
>   mm: Hashing failure ! EA=0xd80080004040 access=0x800e 
> current=NetworkManager
>   trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 
> pte=0xc0003cc033e701ae
>   Unable to handle kernel paging request for data at address 
> 0xd80080004040
>   Call Trace:
> .ehea_create_cq+0x148/0x340 [ehea] (unreliable)
> .ehea_up+0x258/0x1200 [ehea]
> .ehea_open+0x44/0x1a0 [ehea]
> ...
> 
> Fix it by removing the negation.
> 
> Fixes: 2b4e3ad8f579 ("powerpc/mm/hash64: Don't test for machine type to 
> detect HEA special case")
> Cc: sta...@vger.kernel.org # v4.8+
> Reported-by: Denis Kirjanov 
> Reported-by: Jan Stancek 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/08bf75ba852ef8304a84b6a030466b

cheers

Re: powerpc/pseries: Fix stack corruption in htpe code

2016-10-18 Thread Michael Ellerman

On Thu, 2016-06-10 at 13:33:21 UTC, Laurent Dufour wrote:
> This commit fixes a stack corruption in the pseries specific code dealing
> with the huge pages.
> 
> In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
> to the hypervisor is not large enough. This leads to a stack corruption
> where a previously saved register could be corrupted leading to unexpected
> result in the caller, like the following panic:
> 
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: virtio_balloon ip_tables x_tables autofs4
> virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio
> CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76
> task: c5394880 task.stack: c557
> NIP: c027bf6c LR: c027bf64 CTR: 
> REGS: c5573820 TRAP: 0300   Not tainted  (4.8.0)
> MSR: 80009033   CR: 84822884  XER:
> 2000
> CFAR: c010a924 DAR: 4214e5e0 DSISR: 4000 SOFTE: 1
> GPR00: c027bf64 c5573aa0 c0e02800 c4447964
> GPR04: c404de18 c4d38810 042100f5 f5002104
> GPR08: e000f5002104 0001 042100f500e0 042100f5
> GPR12: 2200 cfe02c00 c404de18 
> GPR16: c1ffe7ff 3fff6200 4214e5e0 3fff6300
> GPR20: 0008 c000f7014800 0405e6e0 0001
> GPR24: c4d38810 c4447c10 c404de18 c4447964
> GPR28: c5573b10 c4d38810 3fff6200 4214e5e0
> NIP [c027bf6c] zap_huge_pmd+0x4c/0x470
> LR [c027bf64] zap_huge_pmd+0x44/0x470
> Call Trace:
> [c5573aa0] [c027bf64] zap_huge_pmd+0x44/0x470 (unreliable)
> [c5573af0] [c022bbd8] unmap_page_range+0xcf8/0xed0
> [c5573c30] [c022c2d4] unmap_vmas+0x84/0x120
> [c5573c80] [c0235448] unmap_region+0xd8/0x1b0
> [c5573d80] [c02378f0] do_munmap+0x2d0/0x4c0
> [c5573df0] [c0237be4] SyS_munmap+0x64/0xb0
> [c5573e30] [c0009560] system_call+0x38/0x108
> Instruction dump:
> fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1
> 7cde3378 4bfffea9 7c7b1b79 41820298  48000130 7fa5eb78 7fc4f378
> 
> Most of the time, the bug is surfacing in a caller up in the stack from
> __pSeries_lpar_hugepage_invalidate() which is quite confusing.
> 
> This bug is pending since v3.11 but was hidden if a caller of the
> caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
> register (r18 in this case) in the stack and is not using it until
> restoring it. GCC 6.2.0 seems to raise it more frequently.
> 
> This commit also change the definition of the parameter buffer in
> pSeries_lpar_flush_hash_range() to rely on the global define
> PLPAR_HCALL9_BUFSIZE (no functional change here).
> 
> Fixes: 1a5272866f87 ("powerpc: Optimize hugepage invalidate")
> Cc: 
> Cc: Aneesh Kumar K.V 
> Signed-off-by: Laurent Dufour 
> Reviewed-by: Aneesh Kumar K.V 
> Acked-by: Balbir Singh 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/05af40e885955065aee8bb7425058e

cheers

Re: MAINTAINERS: Drop separate pseries entry

2016-10-18 Thread Michael Ellerman

On Wed, 2016-05-10 at 05:04:00 UTC, Michael Ellerman wrote:
> Paul is no longer acting as a separate maintainer for pseries, it is
> handled along with the rest of powerpc. The URL no longer links anywhere
> meaningful, so drop it also.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/55099115ca523b5edf5c542bd5c328

cheers

Re: ppc64 qemu test failure since commit f9aa67142 ("powerpc/64s: Consolidate Alignment 0x600 interrupt")

2016-10-18 Thread Michael Ellerman

On Tue, 2016-11-10 at 07:47:56 UTC, Nicholas Piggin wrote:
> On Mon, 10 Oct 2016 07:15:11 -0700
> Guenter Roeck  wrote:
> 
> > On 10/09/2016 10:49 PM, Nicholas Piggin wrote:
> > > On Sun, 9 Oct 2016 08:21:21 -0700
> > > Guenter Roeck  wrote:
> > >  
> > >> Nicholas,
> > >>
> > >> some of my qemu tests for ppc64 started failing on mainline (and -next).
> > >> You can find a test log at
> > >> http://kerneltests.org/builders/qemu-ppc64-master/builds/580/steps/qemubuildcommand/logs/stdio
> > >>
> > >> The scripts to run the test are available at
> > >> https://github.com/groeck/linux-build-test/tree/master/rootfs/ppc64
> > >>
> > >> Bisect points to commit f9aa67142ef26 ("powerpc/64s: Consolidate 
> > >> Alignment 0x600
> > >> interrupt"). Bisect log is attached.
> > >>
> > >> Since I don't have the means to run the code on a real system, I have no 
> > >> idea
> > >> if the problem is caused by qemu or by the code. It is interesting, 
> > >> though, that
> > >> only the 'mac99' tests are affected.
> > >>
> > >> Please let me know if there is anything I can do to help tracking down 
> > >> the
> > >> problem.  
> > >
> > > Thanks for this. That patch just moves a small amount of code, so it's 
> > > likely
> > > that it's caused something to get placed out of range of its caller, or 
> > > the
> > > linker started generating a stub for some reason. I can't immediately see 
> > > the
> > > problem, but it could be specific to your exact toolchain.
> > >
> > > Something that might help, would you be able to put the compiled vmlinux 
> > > binaries
> > > from before/after the bad patch somewhere I can grab them?
> > >  
> > 
> > http://server.roeck-us.net/qemu/ppc64/mac99/
> > 
> > 'bad' is at f9aa67142ef26, 'good' is one commit earlier, 'tot' is from top 
> > of tree
> > (b66484cd7470, more specifically).
> > 
> > Key difference in System.map, from the bad case:
> > 
> > c0005c00 T __end_interrupts
> > c0007000 t end_virt_trampolines
> > c0008000 t 0010.long_branch.power4_fixup_nap+0
> > c0008100 t fs_label
> > c0008100 t start_text
> > 
> > 0010.long_branch.power4_fixup_nap+0 does not exist in the good case,
> > and fs_label/start_text are at c0008000.
> > 
> > Toolchain is from poky 1.5.1, which uses gcc 4.8.1 and binutils 2.23.2.
> > I also tried with the toolchain from poky 1.6, using gcc 4.8.2 and binutils 
> > 2.24,
> > with the same result.
> 
> Thank you for the quick response, this points to the exact problem.
> I've attached a patch which should fix the bug. There are some checks
> I've got planned that will catch this type of thing at build time and
> be much easier to track down.
> 
> Thanks,
> Nick
> 
> From: Nicholas Piggin 
> Date: Tue, 11 Oct 2016 18:33:26 +1100
> Subject: [PATCH] powerpc/64s: fix power4_fixup_nap placement
> 
> power4_fixup_nap is called from the "common" handlers, not the virt/real
> handlers, therefore it should itself be a common handler. Placing it
> down in the trampoline space caused it to go out of reach of its
> callers, requiring a trampoline inserted at the start of the text
> section, which breaks the fixed section address calculations.
> 
> Reported-by: Guenter Roeck 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7c8cb4b50f3cc6f4a8f7bfddad6fb5

cheers

Re: MAINTAINERS: Update powerpc website & add selftests

2016-10-18 Thread Michael Ellerman

On Wed, 2016-05-10 at 05:08:16 UTC, Michael Ellerman wrote:
> The selftests under tools/testing/selftests/powerpc are maintained by
> us, so add a file pattern for them.
> 
> Also drop the www.penguinppc.org link, it's not dead, but the site is
> dead (database error). Instead link to the wiki attached to our github,
> there is some info there which may be useful, which is better than none.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/ad654f25a12d1cbac890d33b7bcc87

cheers

Re: selftests/powerpc: Fix build break caused by EXPORT_SYMBOL changes

2016-10-18 Thread Michael Ellerman

On Wed, 2016-05-10 at 05:22:57 UTC, Michael Ellerman wrote:
> The changes to make EXPORT_SYMBOL work in asm, specifically commit
> 9445aa1a3062 ("ppc: move exports to definitions"), in the kbuild tree,
> breaks some of our selftests.
> 
> That is because we symlink the kernel code into the selftest, and shim
> the required headers, and we are now missing asm/export.h
> 
> So create a minimal export.h to keep the tests building once powerpc and
> the kbuild trees are merged.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/8321564a11bbeadffcc7d6335bcf3c

cheers

RE: [v12, 0/8] Fix eSDHC host version register bug

2016-10-18 Thread Y.B. Lu

> -Original Message-
> From: Ulf Hansson [mailto:ulf.hans...@linaro.org]
> Sent: Tuesday, October 18, 2016 6:48 PM
> To: Y.B. Lu
> Cc: linux-mmc; Scott Wood; Arnd Bergmann; linuxppc-dev@lists.ozlabs.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; linux-clk; linux-...@vger.kernel.org;
> io...@lists.linux-foundation.org; net...@vger.kernel.org; Mark Rutland;
> Rob Herring; Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil;
> Bhupesh Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B.
> Xie; M.H. Lian
> Subject: Re: [v12, 0/8] Fix eSDHC host version register bug
> 
> On 21 September 2016 at 08:57, Yangbo Lu  wrote:
> > This patchset is used to fix a host version register bug in the
> > T4240-R1.0-R2.0 eSDHC controller. To match the SoC version and
> > revision, 10 previous version patchsets had tried many methods but all
> of them were rejected by reviewers.
> > Such as
> > - dts compatible method
> > - syscon method
> > - ifdef PPC method
> > - GUTS driver getting SVR method Anrd suggested a
> > soc_device_match method in v10, and this is the only available method
> > left now. This v11 patchset introduces the soc_device_match interface
> > in soc driver.
> >
> > The first six patches of Yangbo are to add the GUTS driver. This is
> > used to register a soc device which contain soc version and revision
> information.
> > The other two patches introduce the soc_device_match method in soc
> > driver and apply it on esdhc driver to fix this bug.
> >
> > Arnd Bergmann (1):
> >   base: soc: introduce soc_device_match() interface
> >
> > Yangbo Lu (7):
> >   dt: bindings: update Freescale DCFG compatible
> >   ARM64: dts: ls2080a: add device configuration node
> >   dt: bindings: move guts devicetree doc out of powerpc directory
> >   powerpc/fsl: move mpc85xx.h to include/linux/fsl
> >   soc: fsl: add GUTS driver for QorIQ platforms
> >   MAINTAINERS: add entry for Freescale SoC drivers
> >   mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
> >
> >  Documentation/devicetree/bindings/arm/fsl.txt  |   6 +-
> >  .../bindings/{powerpc => soc}/fsl/guts.txt |   3 +
> >  MAINTAINERS|  11 +-
> >  arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi |   6 +
> >  arch/powerpc/kernel/cpu_setup_fsl_booke.S  |   2 +-
> >  arch/powerpc/sysdev/fsl_pci.c  |   2 +-
> >  drivers/base/Kconfig   |   1 +
> >  drivers/base/soc.c |  66 ++
> >  drivers/clk/clk-qoriq.c|   3 +-
> >  drivers/i2c/busses/i2c-mpc.c   |   2 +-
> >  drivers/iommu/fsl_pamu.c   |   3 +-
> >  drivers/mmc/host/Kconfig   |   1 +
> >  drivers/mmc/host/sdhci-of-esdhc.c  |  20 ++
> >  drivers/net/ethernet/freescale/gianfar.c   |   2 +-
> >  drivers/soc/Kconfig|   2 +-
> >  drivers/soc/fsl/Kconfig|  19 ++
> >  drivers/soc/fsl/Makefile   |   1 +
> >  drivers/soc/fsl/guts.c | 257
> +
> >  include/linux/fsl/guts.h   | 125 ++
> >  .../asm/mpc85xx.h => include/linux/fsl/svr.h   |   4 +-
> >  include/linux/sys_soc.h|   3 +
> >  21 files changed, 478 insertions(+), 61 deletions(-)  rename
> > Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)
> > create mode 100644 drivers/soc/fsl/Kconfig  create mode 100644
> > drivers/soc/fsl/guts.c  rename arch/powerpc/include/asm/mpc85xx.h =>
> > include/linux/fsl/svr.h (97%)
> >
> > --
> > 2.1.0.27.g96db324
> >
> 
> This looks good to me! I am not sure which tree you want this to be
> picked up through, but unless no other volunteers I can take it through
> my mmc tree.
> 
> Although, before considering to apply, I need an ack from Scott/Arnd for
> the guts driver in patch 5/8 and I need an ack from Greg for patch 7/8,
> where the soc_device_match() interface is added (seems like you didn't
> add him on cc/to).
> 

[Lu Yangbo-B47093] Thanks a lot for your clarifying, Uffe.
This patchset was based on mmc tree, and needed your picking up.
But I think it needs to be rebased now since I saw qbman driver was in 
drivers/soc/fsl/ now.
I will do that after collecting others' ACKs or comments.

Hi Scott and Arnd,
Could I get your ACTs if there're no other changes needed?
Thanks a lot.

> Kind regards
> Uffe

RE: [v12, 0/8] Fix eSDHC host version register bug

2016-10-18 Thread Y.B. Lu

+ Greg

Hi Greg,

I submitted this patchset for a MMC bug fix, and introduce the below patch 
which needs your ACK.
> > Arnd Bergmann (1):
> >   base: soc: introduce soc_device_match() interface
https://patchwork.kernel.org/patch/9342913/

Could you help to review it and give some comments or ACK.
Thank you very much.



Best regards,
Yangbo Lu


> -Original Message-
> From: Ulf Hansson [mailto:ulf.hans...@linaro.org]
> Sent: Tuesday, October 18, 2016 6:48 PM
> To: Y.B. Lu
> Cc: linux-mmc; Scott Wood; Arnd Bergmann; linuxppc-dev@lists.ozlabs.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; linux-clk; linux-...@vger.kernel.org;
> io...@lists.linux-foundation.org; net...@vger.kernel.org; Mark Rutland;
> Rob Herring; Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil;
> Bhupesh Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B.
> Xie; M.H. Lian
> Subject: Re: [v12, 0/8] Fix eSDHC host version register bug
> 
> On 21 September 2016 at 08:57, Yangbo Lu  wrote:
> > This patchset is used to fix a host version register bug in the
> > T4240-R1.0-R2.0 eSDHC controller. To match the SoC version and
> > revision, 10 previous version patchsets had tried many methods but all
> of them were rejected by reviewers.
> > Such as
> > - dts compatible method
> > - syscon method
> > - ifdef PPC method
> > - GUTS driver getting SVR method Anrd suggested a
> > soc_device_match method in v10, and this is the only available method
> > left now. This v11 patchset introduces the soc_device_match interface
> > in soc driver.
> >
> > The first six patches of Yangbo are to add the GUTS driver. This is
> > used to register a soc device which contain soc version and revision
> information.
> > The other two patches introduce the soc_device_match method in soc
> > driver and apply it on esdhc driver to fix this bug.
> >
> > Arnd Bergmann (1):
> >   base: soc: introduce soc_device_match() interface
> >
> > Yangbo Lu (7):
> >   dt: bindings: update Freescale DCFG compatible
> >   ARM64: dts: ls2080a: add device configuration node
> >   dt: bindings: move guts devicetree doc out of powerpc directory
> >   powerpc/fsl: move mpc85xx.h to include/linux/fsl
> >   soc: fsl: add GUTS driver for QorIQ platforms
> >   MAINTAINERS: add entry for Freescale SoC drivers
> >   mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
> >
> >  Documentation/devicetree/bindings/arm/fsl.txt  |   6 +-
> >  .../bindings/{powerpc => soc}/fsl/guts.txt |   3 +
> >  MAINTAINERS|  11 +-
> >  arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi |   6 +
> >  arch/powerpc/kernel/cpu_setup_fsl_booke.S  |   2 +-
> >  arch/powerpc/sysdev/fsl_pci.c  |   2 +-
> >  drivers/base/Kconfig   |   1 +
> >  drivers/base/soc.c |  66 ++
> >  drivers/clk/clk-qoriq.c|   3 +-
> >  drivers/i2c/busses/i2c-mpc.c   |   2 +-
> >  drivers/iommu/fsl_pamu.c   |   3 +-
> >  drivers/mmc/host/Kconfig   |   1 +
> >  drivers/mmc/host/sdhci-of-esdhc.c  |  20 ++
> >  drivers/net/ethernet/freescale/gianfar.c   |   2 +-
> >  drivers/soc/Kconfig|   2 +-
> >  drivers/soc/fsl/Kconfig|  19 ++
> >  drivers/soc/fsl/Makefile   |   1 +
> >  drivers/soc/fsl/guts.c | 257
> +
> >  include/linux/fsl/guts.h   | 125 ++
> >  .../asm/mpc85xx.h => include/linux/fsl/svr.h   |   4 +-
> >  include/linux/sys_soc.h|   3 +
> >  21 files changed, 478 insertions(+), 61 deletions(-)  rename
> > Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)
> > create mode 100644 drivers/soc/fsl/Kconfig  create mode 100644
> > drivers/soc/fsl/guts.c  rename arch/powerpc/include/asm/mpc85xx.h =>
> > include/linux/fsl/svr.h (97%)
> >
> > --
> > 2.1.0.27.g96db324
> >
> 
> This looks good to me! I am not sure which tree you want this to be
> picked up through, but unless no other volunteers I can take it through
> my mmc tree.
> 
> Although, before considering to apply, I need an ack from Scott/Arnd for
> the guts driver in patch 5/8 and I need an ack from Greg for patch 7/8,
> where the soc_device_match() interface is added (seems like you didn't
> add him on cc/to).
> 
> Kind regards
> Uffe

[PATCH 0/7] build updates

2016-10-18 Thread Nicholas Piggin

Hi,

I was hoping to get these posted earlier, but they had a dependency
on the kbuild tree which took a while to merge. Hopefully we can get
at least the first 4 patches in, which are just sanity checks on the
final binary.

With this series applied, I'm able to build[*] and boot a 64-bit
book3s allyesconfig kernel, a 200MB vmlinux (although today's tree
breaks in some random driver initcall in late boot). That's largely
a culmination of Stephen Rothwell's work (including the kbuild changes
that were merged earlier in 4.9), I just picked up some of his patches
so attributions may not be 100% reflective of reality.

[*] Building allyesconfig still requires KALLSYMS_EXTRA_PASS=1, which
I'm yet to look into.

Thanks,
Nick

Nicholas Piggin (7):
  powerpc: use the new post-link pass to check relocations
  powerpc: add arch/powerpc/tools directory
  powerpc/64s: tool to flag direct branches from unrelocated interrupt
vectors
  powerpc/64: tool to check head sections location sanity
  powerpc/64: handle linker stubs in low .text code
  powerpc: switch to using thin archives if COMPILE_TEST is set
  powerpc/64: allow CONFIG_RELOCATABLE if COMPILE_TEST

 arch/powerpc/Kconfig   |  5 +-
 arch/powerpc/Makefile  | 17 +
 arch/powerpc/Makefile.postlink | 48 ++
 arch/powerpc/include/asm/head-64.h | 20 --
 arch/powerpc/kernel/vmlinux.lds.S  | 28 ++--
 .../gcc-check-mprofile-kernel.sh   |  0
 arch/powerpc/tools/head_check.sh   | 74 ++
 arch/powerpc/{ => tools}/relocs_check.sh   |  0
 arch/powerpc/tools/unrel_branch_check.sh   | 56 
 9 files changed, 203 insertions(+), 45 deletions(-)
 create mode 100644 arch/powerpc/Makefile.postlink
 rename arch/powerpc/{scripts => tools}/gcc-check-mprofile-kernel.sh (100%)
 create mode 100755 arch/powerpc/tools/head_check.sh
 rename arch/powerpc/{ => tools}/relocs_check.sh (100%)
 create mode 100755 arch/powerpc/tools/unrel_branch_check.sh

-- 
2.9.3

[PATCH 1/7] powerpc: use the new post-link pass to check relocations

2016-10-18 Thread Nicholas Piggin

Currently powerpc has to introduce a dependency on its default build
target zImage in order to run a relocation check pass over the linked
vmlinux. This is deficient because the check is not run if the plain
vmlinux target is built, or if one of the other boot targets is built.

Switch to using the kbuild post-link pass in order to run this check.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Makefile  | 11 ---
 arch/powerpc/Makefile.postlink | 34 ++
 2 files changed, 34 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/Makefile.postlink

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 617dece..314dd77 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -263,17 +263,6 @@ PHONY += $(BOOT_TARGETS1) $(BOOT_TARGETS2)
 
 boot := arch/$(ARCH)/boot
 
-ifeq ($(CONFIG_RELOCATABLE),y)
-quiet_cmd_relocs_check = CALL$<
-  cmd_relocs_check = $(CONFIG_SHELL) $< "$(OBJDUMP)" "$(obj)/vmlinux"
-
-PHONY += relocs_check
-relocs_check: arch/powerpc/relocs_check.sh vmlinux
-   $(call cmd,relocs_check)
-
-zImage: relocs_check
-endif
-
 $(BOOT_TARGETS1): vmlinux
$(Q)$(MAKE) ARCH=ppc64 $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
 $(BOOT_TARGETS2): vmlinux
diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
new file mode 100644
index 000..f90bdc0
--- /dev/null
+++ b/arch/powerpc/Makefile.postlink
@@ -0,0 +1,34 @@
+# ===
+# Post-link powerpc pass
+# ===
+#
+# 1. Check that vmlinux relocations look sane
+
+PHONY := __archpost
+__archpost:
+
+include include/config/auto.conf
+include scripts/Kbuild.include
+
+quiet_cmd_relocs_check = CHKREL  $@
+  cmd_relocs_check = $(CONFIG_SHELL) 
$(srctree)/arch/powerpc/relocs_check.sh "$(OBJDUMP)" "$@"
+
+# `@true` prevents complaint when there is nothing to be done
+
+vmlinux: FORCE
+   @true
+ifeq ($(CONFIG_RELOCATABLE),y)
+   $(call if_changed,relocs_check)
+endif
+
+%.ko: FORCE
+   @true
+
+clean:
+   @true
+
+PHONY += FORCE clean
+
+FORCE:
+
+.PHONY: $(PHONY)
-- 
2.9.3

[PATCH 2/7] powerpc: add arch/powerpc/tools directory

2016-10-18 Thread Nicholas Piggin

Move a couple of existing scripts there, and remove scripts directory:
a script is a tool, a tool is not a script. I plan to add some tools
that require compilation (though not in this series), so this matches
other architectures more closely.

Signed-off-by: Nick Piggin 
---
 arch/powerpc/Makefile| 2 +-
 arch/powerpc/Makefile.postlink   | 2 +-
 arch/powerpc/{scripts => tools}/gcc-check-mprofile-kernel.sh | 0
 arch/powerpc/{ => tools}/relocs_check.sh | 0
 4 files changed, 2 insertions(+), 2 deletions(-)
 rename arch/powerpc/{scripts => tools}/gcc-check-mprofile-kernel.sh (100%)
 rename arch/powerpc/{ => tools}/relocs_check.sh (100%)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 314dd77..0fcb47c 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -126,7 +126,7 @@ CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=powerpc64
 endif
 
 ifdef CONFIG_MPROFILE_KERNEL
-ifeq ($(shell $(srctree)/arch/powerpc/scripts/gcc-check-mprofile-kernel.sh 
$(CC) -I$(srctree)/include -D__KERNEL__),OK)
+ifeq ($(shell $(srctree)/arch/powerpc/tools/gcc-check-mprofile-kernel.sh 
$(CC) -I$(srctree)/include -D__KERNEL__),OK)
 CC_FLAGS_FTRACE := -pg -mprofile-kernel
 KBUILD_CPPFLAGS += -DCC_USING_MPROFILE_KERNEL
 else
diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
index f90bdc0..29c0a3a 100644
--- a/arch/powerpc/Makefile.postlink
+++ b/arch/powerpc/Makefile.postlink
@@ -11,7 +11,7 @@ include include/config/auto.conf
 include scripts/Kbuild.include
 
 quiet_cmd_relocs_check = CHKREL  $@
-  cmd_relocs_check = $(CONFIG_SHELL) 
$(srctree)/arch/powerpc/relocs_check.sh "$(OBJDUMP)" "$@"
+  cmd_relocs_check = $(CONFIG_SHELL) 
$(srctree)/arch/powerpc/tools/relocs_check.sh "$(OBJDUMP)" "$@"
 
 # `@true` prevents complaint when there is nothing to be done
 
diff --git a/arch/powerpc/scripts/gcc-check-mprofile-kernel.sh 
b/arch/powerpc/tools/gcc-check-mprofile-kernel.sh
similarity index 100%
rename from arch/powerpc/scripts/gcc-check-mprofile-kernel.sh
rename to arch/powerpc/tools/gcc-check-mprofile-kernel.sh
diff --git a/arch/powerpc/relocs_check.sh b/arch/powerpc/tools/relocs_check.sh
similarity index 100%
rename from arch/powerpc/relocs_check.sh
rename to arch/powerpc/tools/relocs_check.sh
-- 
2.9.3

[PATCH 3/7] powerpc/64s: tool to flag direct branches from unrelocated interrupt vectors

2016-10-18 Thread Nicholas Piggin

Direct banches from code below __end_interrupts to code above
__end_interrupts when built with CONFIG_RELOCATABLE are disallowed
because they will break when the kernel is not located at 0.

Sample output:

WARNING: Unrelocated relative branches
c118 bl-> 0xc0038fb8 
c13c b-> 0xc01068a4 
c148 b-> 0xc003919c 
c14c b-> 0xc003923c 
c5a4 b-> 0xc0106ffc 
c0001af0 b-> 0xc0106ffc 
c0001b24 b-> 0xc0106ffc 
c0001b58 b-> 0xc0106ffc 

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Makefile.postlink   |  9 -
 arch/powerpc/tools/unrel_branch_check.sh | 56 
 2 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100755 arch/powerpc/tools/unrel_branch_check.sh

diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
index 29c0a3a..1725e64 100644
--- a/arch/powerpc/Makefile.postlink
+++ b/arch/powerpc/Makefile.postlink
@@ -11,7 +11,14 @@ include include/config/auto.conf
 include scripts/Kbuild.include
 
 quiet_cmd_relocs_check = CHKREL  $@
-  cmd_relocs_check = $(CONFIG_SHELL) 
$(srctree)/arch/powerpc/tools/relocs_check.sh "$(OBJDUMP)" "$@"
+ifeq ($(CONFIG_PPC_BOOK3S_64),y)
+  cmd_relocs_check =   \
+   $(CONFIG_SHELL) $(srctree)/arch/powerpc/tools/relocs_check.sh 
"$(OBJDUMP)" "$@" ; \
+   $(CONFIG_SHELL) $(srctree)/arch/powerpc/tools/unrel_branch_check.sh 
"$(OBJDUMP)" "$@"
+else
+  cmd_relocs_check =   \
+   $(CONFIG_SHELL) $(srctree)/arch/powerpc/tools/relocs_check.sh 
"$(OBJDUMP)" "$@"
+endif
 
 # `@true` prevents complaint when there is nothing to be done
 
diff --git a/arch/powerpc/tools/unrel_branch_check.sh 
b/arch/powerpc/tools/unrel_branch_check.sh
new file mode 100755
index 000..524a9e2
--- /dev/null
+++ b/arch/powerpc/tools/unrel_branch_check.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+#
+# Copyright © 2016 IBM Corporation
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version
+# 2 of the License, or (at your option) any later version.
+#
+# This script checks the relocations of a vmlinux for "suspicious"
+# branches from unrelocated code (head_64.S code).
+
+# Have Kbuild supply the path to objdump so we handle cross compilation.
+objdump="$1"
+vmlinux="$2"
+
+#__end_interrupts should be located within the first 64K
+
+end_intr=0x$(
+"$objdump" -R "$vmlinux" -d --start-address=0xc000 \
+--stop-address=0xc001 |
+grep '\<__end_interrupts>:' |
+awk '{print $1}'
+)
+
+BRANCHES=$(
+"$objdump" -R "$vmlinux" -D --start-address=0xc000 \
+   --stop-address=${end_intr} |
+grep -e 
"^c[0-9a-f]*:[[:space:]]*\([0-9a-f][0-9a-f][[:space:]]\)\{4\}[[:space:]]*b" |
+grep -v '\<__start_initialization_multiplatform>' |
+grep -v -e 'b.\?.\?ctr' |
+grep -v -e 'b.\?.\?lr' |
+sed 's/://' |
+awk '{ print $1 ":" $6 ":0x" $7 ":" $8 " "}'
+)
+
+for tuple in $BRANCHES
+do
+   from=`echo $tuple | cut -d':' -f1`
+   branch=`echo $tuple | cut -d':' -f2`
+   to=`echo $tuple | cut -d':' -f3 | sed 's/cr[0-7],//'`
+   sym=`echo $tuple | cut -d':' -f4`
+
+   if (( $to > $end_intr ))
+   then
+   if [ -z "$bad_branches" ]; then
+   echo "WARNING: Unrelocated relative branches"
+   bad_branches="yes"
+   fi
+   echo "$from $branch-> $to $sym"
+   fi
+done
+
+if [ -z "$bad_branches" ]; then
+   exit 0
+fi
-- 
2.9.3

[PATCH 4/7] powerpc/64: tool to check head sections location sanity

2016-10-18 Thread Nicholas Piggin

Use a tool to check the location of "fixed sections" is where we
expected them, which catches cases the linker script can't (stubs
being added to start of .text section), and which ends up being
neater.

Sample output:

  ERROR: start_text address is c0008100, should be c0008000
  ERROR: see comments in arch/powerpc/tools/head_check.sh

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Makefile.postlink |  7 
 arch/powerpc/include/asm/head-64.h |  4 +--
 arch/powerpc/kernel/vmlinux.lds.S  | 26 ++
 arch/powerpc/tools/head_check.sh   | 69 ++
 4 files changed, 80 insertions(+), 26 deletions(-)
 create mode 100755 arch/powerpc/tools/head_check.sh

diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
index 1725e64..b8fe12b 100644
--- a/arch/powerpc/Makefile.postlink
+++ b/arch/powerpc/Makefile.postlink
@@ -10,6 +10,9 @@ __archpost:
 include include/config/auto.conf
 include scripts/Kbuild.include
 
+quiet_cmd_head_check = CHKHEAD $@
+  cmd_head_check = $(CONFIG_SHELL) 
$(srctree)/arch/powerpc/tools/head_check.sh "$(NM)" "$@"
+
 quiet_cmd_relocs_check = CHKREL  $@
 ifeq ($(CONFIG_PPC_BOOK3S_64),y)
   cmd_relocs_check =   \
@@ -24,6 +27,9 @@ endif
 
 vmlinux: FORCE
@true
+ifeq ($(CONFIG_PPC64),y)
+   $(call cmd,head_check)
+endif
 ifeq ($(CONFIG_RELOCATABLE),y)
$(call if_changed,relocs_check)
 endif
@@ -32,6 +38,7 @@ endif
@true
 
 clean:
+   rm -f .tmp_symbols.txt
@true
 
 PHONY += FORCE clean
diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index ab90c2f..492ebe7 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -49,8 +49,8 @@
  *   CLOSE_FIXED_SECTION() or elsewhere, there may be something
  *   unexpected being added there. Remove the '. = x_len' line, rebuild, and
  *   check what is pushing the section down.
- * - If the build dies in linking, check arch/powerpc/kernel/vmlinux.lds.S
- *   for instructions.
+ * - If the build dies in linking, check arch/powerpc/tools/head_check.sh
+ *   comments.
  * - If the kernel crashes or hangs in very early boot, it could be linker
  *   stubs at the start of the main text.
  */
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 8295f51..7de0b05 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -58,7 +58,6 @@ SECTIONS
 #ifdef CONFIG_PPC64
KEEP(*(.head.text.first_256B));
 #ifdef CONFIG_PPC_BOOK3E
-# define END_FIXED 0x100
 #else
KEEP(*(.head.text.real_vectors));
*(.head.text.real_trampolines);
@@ -66,38 +65,17 @@ SECTIONS
*(.head.text.virt_trampolines);
 # if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
KEEP(*(.head.data.fwnmi_page));
-#  define END_FIXED0x8000
-# else
-#  define END_FIXED0x7000
 # endif
 #endif
-   ASSERT((. == END_FIXED), "vmlinux.lds.S: fixed section overflow 
error");
 #else /* !CONFIG_PPC64 */
HEAD_TEXT
 #endif
} :kernel
 
-   /*
-* If the build dies here, it's likely code in head_64.S is referencing
-* labels it can't reach, and the linker inserting stubs without the
-* assembler's knowledge. To debug, remove the above assert and
-* rebuild. Look for branch stubs in the fixed section region.
-*
-* Linker stub generation could be allowed in "trampoline"
-* sections if absolutely necessary, but this would require
-* some rework of the fixed sections. Before resorting to this,
-* consider references that have sufficient addressing range,
-* (e.g., hand coded trampolines) so the linker does not have
-* to add stubs.
-*
-* Linker stubs at the top of the main text section are currently not
-* detected, and will result in a crash at boot due to offsets being
-* wrong.
-*/
.text : AT(ADDR(.text) - LOAD_OFFSET) {
-   ALIGN_FUNCTION();
/* careful! __ftr_alt_* sections need to be close to .text */
-   *(.text .fixup __ftr_alt_* .ref.text)
+   ALIGN_FUNCTION();
+   *(.text .fixup __ftr_alt_* .ref.text);
SCHED_TEXT
CPUIDLE_TEXT
LOCK_TEXT
diff --git a/arch/powerpc/tools/head_check.sh b/arch/powerpc/tools/head_check.sh
new file mode 100755
index 000..9635fe7
--- /dev/null
+++ b/arch/powerpc/tools/head_check.sh
@@ -0,0 +1,69 @@
+#!/bin/sh
+
+# Copyright © 2016 IBM Corporation
+
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version
+# 2 of the License, or (at your option) any later version.
+
+# Thi

[PATCH 6/7] powerpc: switch to using thin archives if COMPILE_TEST is set

2016-10-18 Thread Nicholas Piggin

Enable thin archives build for powerpc when COMPILE_TEST is set.
Thin archives are explained in this commit:

  a5967db9af51a84f5e181600954714a9e4c69f1f

This is a gradual way to introduce the option to testers.

Some change to the way we invoke ar is required so it can be used
by scripts/link-vmlinux.sh.

Signed-off-by: Stephen Rothwell 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig  | 1 +
 arch/powerpc/Makefile | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65fba4c..00d9e31 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -80,6 +80,7 @@ config ARCH_HAS_DMA_SET_COHERENT_MASK
 config PPC
bool
default y
+   select THIN_ARCHIVES if COMPILE_TEST
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select BINFMT_ELF
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 0fcb47c..fe76cfe 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -23,7 +23,7 @@ CROSS32AR := $(CROSS32_COMPILE)ar
 ifeq ($(HAS_BIARCH),y)
 ifeq ($(CROSS32_COMPILE),)
 CROSS32CC  := $(CC) -m32
-CROSS32AR  := GNUTARGET=elf32-powerpc $(AR)
+KBUILD_ARFLAGS += --target=elf32-powerpc
 endif
 endif
 
@@ -85,7 +85,7 @@ ifeq ($(HAS_BIARCH),y)
 override AS+= -a$(BITS)
 override LD+= -m elf$(BITS)$(LDEMULATION)
 override CC+= -m$(BITS)
-override AR:= GNUTARGET=elf$(BITS)-$(GNUTARGET) $(AR)
+KBUILD_ARFLAGS += --target=elf$(BITS)-$(GNUTARGET)
 endif
 
 LDFLAGS_vmlinux-y := -Bstatic
-- 
2.9.3

[PATCH 7/7] powerpc/64: allow CONFIG_RELOCATABLE if COMPILE_TEST

2016-10-18 Thread Nicholas Piggin

This is cruft to work around allmodconfig build breakage.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 00d9e31..f48d2eb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -458,7 +458,7 @@ config KEXEC
 
 config RELOCATABLE
bool "Build a relocatable kernel"
-   depends on (PPC64 && !COMPILE_TEST) || (FLATMEM && (44x || FSL_BOOKE))
+   depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
select NONSTATIC_KERNEL
help
  This builds a kernel image that is capable of running at the
@@ -482,7 +482,7 @@ config RELOCATABLE
 config CRASH_DUMP
bool "Build a kdump crash kernel"
depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP)
-   select RELOCATABLE if (PPC64 && !COMPILE_TEST) || 44x || FSL_BOOKE
+   select RELOCATABLE if PPC64 || 44x || FSL_BOOKE
help
  Build a kernel suitable for use as a kdump capture kernel.
  The same kernel binary can be used as production kernel and dump
-- 
2.9.3

[PATCH 5/7] powerpc/64: handle linker stubs in low .text code

2016-10-18 Thread Nicholas Piggin

Very large kernels require linker stubs, but the linker tends to place
them in ways that make it very difficult to detect programatically
with the assembler when taking absolute real (physical) addresses.
This breaks the early boot code. Create a small section just before
the .text section with an empty 256 - 4 bytes, and adjust the start of
the .text section to match. The linker will tend to put stubs in that
section and not break our start-of-.text section label.

This is a tiny waste of space on common kernels, but allows large
kernels to build and boot, which is convenient enough to outweigh the
cost.

This is a sad hack, which I will improve on if I can find out how to
achieve it a better way. Until then, it seems to work.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/head-64.h | 16 +---
 arch/powerpc/kernel/vmlinux.lds.S  |  2 ++
 arch/powerpc/tools/head_check.sh   |  5 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index 492ebe7..f7131cf 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -63,11 +63,21 @@
. = 0x0;\
 start_##sname:
 
+/*
+ * .linker_stub_catch section is used to catch linker stubs from being
+ * inserted in our .text section, above the start_text label (which breaks
+ * the ABS_ADDR calculation). See kernel/vmlinux.lds.S and tools/head_check.sh
+ * for more details. We would prefer to just keep a cacheline (0x80), but
+ * 0x100 seems to be how the linker aligns branch stub groups.
+ */
 #define OPEN_TEXT_SECTION(start)   \
-   text_start = (start);   \
+   .section ".linker_stub_catch","ax",@progbits;   \
+linker_stub_catch: \
+   . = 0x4;\
.section ".text","ax",@progbits;\
-   . = 0x0;\
-start_text:
+   text_start = (start) + 0x100;   \
+   .balign 0x100;  \
+start_text:\
 
 #define ZERO_FIXED_SECTION(sname, start, end)  \
sname##_start = (start);\
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 7de0b05..a09c666 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -73,6 +73,8 @@ SECTIONS
} :kernel
 
.text : AT(ADDR(.text) - LOAD_OFFSET) {
+   *(.linker_stub_catch);
+   . = . ;
/* careful! __ftr_alt_* sections need to be close to .text */
ALIGN_FUNCTION();
*(.text .fixup __ftr_alt_* .ref.text);
diff --git a/arch/powerpc/tools/head_check.sh b/arch/powerpc/tools/head_check.sh
index 9635fe7..ae9faeb 100755
--- a/arch/powerpc/tools/head_check.sh
+++ b/arch/powerpc/tools/head_check.sh
@@ -23,6 +23,11 @@
 # etc) in the fixed section region (0 - 0x8000ish). Check what places are
 # calling those stubs.
 #
+# A ".linker_stub_catch" section is used to catch some stubs generated by
+# early .text code, which tend to get placed at the start of the section.
+# If there are too many such stubs, they can overflow this section. Expanding
+# it may help (or reducing the number of stub branches).
+#
 # Linker stubs use the TOC pointer, so even if fixed section code could
 # tolerate them being inserted into head code, they can't be allowed in low
 # level entry code (boot, interrupt vectors, etc) until r2 is set up. This
-- 
2.9.3

Re: [PATCH 3/7] powerpc/64s: tool to flag direct branches from unrelocated interrupt vectors

2016-10-18 Thread Balbir Singh



On 19/10/16 14:15, Nicholas Piggin wrote:
> Direct banches from code below __end_interrupts to code above
> __end_interrupts when built with CONFIG_RELOCATABLE are disallowed
> because they will break when the kernel is not located at 0.
> 
> Sample output:
> 
> WARNING: Unrelocated relative branches
> c118 bl-> 0xc0038fb8 
> c13c b-> 0xc01068a4 
> c148 b-> 0xc003919c 
> c14c b-> 0xc003923c 
> c5a4 b-> 0xc0106ffc 
> c0001af0 b-> 0xc0106ffc 
> c0001b24 b-> 0xc0106ffc 
> c0001b58 b-> 0xc0106ffc 
> 
> Signed-off-by: Nicholas Piggin 
> ---

If this is the same script I gave you, you can add my SOB line as well

Balbir

Re: [PATCH] mm/hugetlb: Use the right pte val for compare in hugetlb_cow

2016-10-18 Thread Aneesh Kumar K.V

Andrew Morton  writes:

> On Tue, 18 Oct 2016 21:12:45 +0530 "Aneesh Kumar K.V" 
>  wrote:
>
>> We cannot use the pte value used in set_pte_at for pte_same comparison,
>> because archs like ppc64, filter/add new pte flag in set_pte_at. Instead
>> fetch the pte value inside hugetlb_cow. We are comparing pte value to
>> make sure the pte didn't change since we dropped the page table lock.
>> hugetlb_cow get called with page table lock held, and we can take a copy
>> of the pte value before we drop the page table lock.
>> 
>> With hugetlbfs, we optimize the MAP_PRIVATE write fault path with no
>> previous mapping (huge_pte_none entries), by forcing a cow in the fault
>> path. This avoid take an addition fault to covert a read-only mapping
>> to read/write. Here we were comparing a recently instantiated pte (via
>> set_pte_at) to the pte values from linux page table. As explained above
>> on ppc64 such pte_same check returned wrong result, resulting in us
>> taking an additional fault on ppc64.
>
> From my reading this is a minor performance improvement and a -stable
> backport isn't needed.  But it is unclear whether the impact warrants a
> 4.9 merge.

This patch workaround the issue reported at 
https://lkml.kernel.org/r/57ff7bb4.1070...@redhat.com
The reason for that OOM was a reserve count accounting issue which
happens in the error path of hugetlb_cow. Not this patch avoid us taking
the error path and hence we don't have the reported OOM.

An actual fix for that issue is being worked on by Mike Kravetz.

>
> Please be careful about describing end-user visible impacts when fixing
> bugs, thanks.

-aneesh

Re: [PATCH] mm/hugetlb: Use the right pte val for compare in hugetlb_cow

2016-10-18 Thread Hillf Danton

On Tuesday, October 18, 2016 11:43 PM Aneesh Kumar K.V wrote:
> 
> We cannot use the pte value used in set_pte_at for pte_same comparison,
> because archs like ppc64, filter/add new pte flag in set_pte_at. Instead
> fetch the pte value inside hugetlb_cow. We are comparing pte value to
> make sure the pte didn't change since we dropped the page table lock.
> hugetlb_cow get called with page table lock held, and we can take a copy
> of the pte value before we drop the page table lock.
> 
> With hugetlbfs, we optimize the MAP_PRIVATE write fault path with no
> previous mapping (huge_pte_none entries), by forcing a cow in the fault
> path. This avoid take an addition fault to covert a read-only mapping
> to read/write. Here we were comparing a recently instantiated pte (via
> set_pte_at) to the pte values from linux page table. As explained above
> on ppc64 such pte_same check returned wrong result, resulting in us
> taking an additional fault on ppc64.
> 
> Fixes: 6a119eae942c ("powerpc/mm: Add a _PAGE_PTE bit")
> 
> Reported-by: Jan Stancek 
> Signed-off-by: Aneesh Kumar K.V 
> ---

Acked-by: Hillf Danton 

>  mm/hugetlb.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ec49d9ef1eef..da8fbd02b92e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3386,15 +3386,17 @@ static void unmap_ref_private(struct mm_struct *mm, 
> struct vm_area_struct *vma,
>   * Keep the pte_same checks anyway to make transition from the mutex easier.
>   */
>  static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long address, pte_t *ptep, pte_t pte,
> - struct page *pagecache_page, spinlock_t *ptl)
> +unsigned long address, pte_t *ptep,
> +struct page *pagecache_page, spinlock_t *ptl)
>  {
> + pte_t pte;
>   struct hstate *h = hstate_vma(vma);
>   struct page *old_page, *new_page;
>   int ret = 0, outside_reserve = 0;
>   unsigned long mmun_start;   /* For mmu_notifiers */
>   unsigned long mmun_end; /* For mmu_notifiers */
> 
> + pte = huge_ptep_get(ptep);
>   old_page = pte_page(pte);
> 
>  retry_avoidcopy:
> @@ -3668,7 +3670,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct 
> vm_area_struct *vma,
>   hugetlb_count_add(pages_per_huge_page(h), mm);
>   if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
>   /* Optimization, do the COW without a second fault */
> - ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page, ptl);
> + ret = hugetlb_cow(mm, vma, address, ptep, page, ptl);
>   }
> 
>   spin_unlock(ptl);
> @@ -3822,8 +3824,8 @@ int hugetlb_fault(struct mm_struct *mm, struct 
> vm_area_struct *vma,
> 
>   if (flags & FAULT_FLAG_WRITE) {
>   if (!huge_pte_write(entry)) {
> - ret = hugetlb_cow(mm, vma, address, ptep, entry,
> - pagecache_page, ptl);
> + ret = hugetlb_cow(mm, vma, address, ptep,
> +   pagecache_page, ptl);
>   goto out_put_page;
>   }
>   entry = huge_pte_mkdirty(entry);
> --
> 2.10.1

[PATCH v2] powerpc/hash64: Be more careful when generating tlbiel

2016-10-18 Thread Michael Ellerman

From: Balbir Singh 

In ISA v2.05, the tlbiel instruction takes two arguments, RB and L:

tlbiel RB,L

+-+-++-+-+-++
|31   |/| L  |/|RB   |   274   | /  |
| 31 - 26 | 25 - 22 | 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
+-+-++-+-+-++

In ISA v2.06 tlbiel takes only one argument, RB:

tlbiel RB

+-+-+-+-+-++
|31   |/|/|RB   |   274   | /  |
| 31 - 26 | 25 - 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
+-+-+-+-+-++

And in ISA v3.00 tlbiel takes five arguments:

tlbiel RB,RS,RIC,PRS,R

+-+-++-+++-+-++
|31   |RS   | /  |   RIC   |PRS | R  |RB   |   274   | /  |
| 31 - 26 | 25 - 21 | 20 | 19 - 18 | 17 | 16 | 15 - 11 |  10 - 1 | 0  |
+-+-++-+++-+-++

However the assembler also accepts "tlbiel RB", and generates
"tlbiel RB,r0,0,0,0".

As you can see above the L field from the v2.05 encoding overlaps with the
reserved field of the v2.06 encoding, and the low bit of the RS field of the
v3.00 encoding.

Currently in __tlbiel() we generate two tlbiel instructions manually using hex
constants. In the first case, for MMU_PAGE_4K, we generate "tlbiel RB,0", which
is safe in all cases, because the L bit is zero.

However in the default case we generate "tlbiel RB,1", therefore setting bit 21
to 1.

This is not an actual bug on v2.06 processors, because the CPU ignores the value
of the reserved field. However software is supposed to encode the reserved
fields as zero to enable forward compatibility.

On v3.00 processors setting bit 21 to 1 and no other bits of RS, means we are
using r1 for the value of RS.

Although it's not obvious, the code sets the IS field (bits 10-11) to 0 (by
omission), and L=1, in the va value, which is passed as RB. We also pass R=0 in
the instruction.

The combination of IS=0, L=1 and R=0 means the value of RS is not used, so even
on ISA v3.00 there is no actual bug.

We should still fix it, as setting a reserved bit on v2.06 is naughty, and we
are only avoiding a bug on v3.00 by accident rather than design. Use
ASM_FTR_IFSET() to generate the single argument form on ISA v2.06 and later, and
the two argument form on pre v2.06.

Although there may be very old toolchains which don't understand tlbiel, we have
other code in the tree which has been using tlbiel for over five years, and no
one has reported any build failures, so just let the assembler generate the
instructions.

Signed-off-by: Balbir Singh 
[mpe: Rewrite change log, use IFSET instead of IFCLR]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/hash_native_64.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

v2: Rewrite change log, use IFSET instead of IFCLR.

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 83ddc0e171b0..9d9b3eff123e 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -123,8 +123,9 @@ static inline void __tlbiel(unsigned long vpn, int psize, 
int apsize, int ssize)
va |= ssize << 8;
sllp = get_sllp_encoding(apsize);
va |= sllp << 5;
-   asm volatile(".long 0x7c000224 | (%0 << 11) | (0 << 21)"
-: : "r"(va) : "memory");
+   asm volatile(ASM_FTR_IFSET("tlbiel %0", "tlbiel %0,0", %1)
+: : "r" (va), "i" (CPU_FTR_ARCH_206)
+: "memory");
break;
default:
/* We need 14 to 14 + i bits of va */
@@ -141,8 +142,9 @@ static inline void __tlbiel(unsigned long vpn, int psize, 
int apsize, int ssize)
 */
va |= (vpn & 0xfe);
va |= 1; /* L */
-   asm volatile(".long 0x7c000224 | (%0 << 11) | (1 << 21)"
-: : "r"(va) : "memory");
+   asm volatile(ASM_FTR_IFSET("tlbiel %0", "tlbiel %0,1", %1)
+: : "r" (va), "i" (CPU_FTR_ARCH_206)
+: "memory");
break;
}
 
-- 
2.7.4

Re: [PATCH 6/7] powerpc: switch to using thin archives if COMPILE_TEST is set

2016-10-18 Thread Stephen Rothwell

Hi Nick,

On Wed, 19 Oct 2016 14:15:59 +1100 Nicholas Piggin  wrote:
>
> Enable thin archives build for powerpc when COMPILE_TEST is set.
> Thin archives are explained in this commit:
> 
>   a5967db9af51a84f5e181600954714a9e4c69f1f

This reference should be like this:

  a5967db9af51 ("kbuild: allow architectures to use thin archives instead of ld 
-r")

-- 
Cheers,
Stephen Rothwell

Re: [PATCH v2] powernv: Simplify searching for compatible device nodes

2016-10-18 Thread Cyril Bur

On Wed, 2016-08-10 at 19:32 -0500, Jack Miller wrote:
> This condenses the opal node searching into a single function that
> finds
> all compatible nodes, instead of just searching the ibm,opal
> children,
> for ipmi, flash, and prd similar to how opal-i2c nodes are found.
> 

Hi Michael,

It seems this patch hasn't made it in. Without it linux cannot find the
flash on p9 as skiboot has moved the node to a more appropriate place.

Thanks,

Cyril

Cc: sta...@vger.kernel.org # v4.6+

> Signed-off-by: Jack Miller 
> ---
>  arch/powerpc/platforms/powernv/opal.c | 24 +++-
>  1 file changed, 7 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/opal.c
> b/arch/powerpc/platforms/powernv/opal.c
> index 8b4fc68..9db12ce 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -631,21 +631,11 @@ static void __init opal_dump_region_init(void)
>   "rc = %d\n", rc);
>  }
>  
> -static void opal_pdev_init(struct device_node *opal_node,
> - const char *compatible)
> +static void opal_pdev_init(const char *compatible)
>  {
>   struct device_node *np;
>  
> - for_each_child_of_node(opal_node, np)
> - if (of_device_is_compatible(np, compatible))
> - of_platform_device_create(np, NULL, NULL);
> -}
> -
> -static void opal_i2c_create_devs(void)
> -{
> - struct device_node *np;
> -
> - for_each_compatible_node(np, NULL, "ibm,opal-i2c")
> + for_each_compatible_node(np, NULL, compatible)
>   of_platform_device_create(np, NULL, NULL);
>  }
>  
> @@ -717,7 +707,7 @@ static int __init opal_init(void)
>   opal_hmi_handler_init();
>  
>   /* Create i2c platform devices */
> - opal_i2c_create_devs();
> + opal_pdev_init("ibm,opal-i2c");
>  
>   /* Setup a heatbeat thread if requested by OPAL */
>   opal_init_heartbeat();
> @@ -752,12 +742,12 @@ static int __init opal_init(void)
>   }
>  
>   /* Initialize platform devices: IPMI backend, PRD & flash
> interface */
> - opal_pdev_init(opal_node, "ibm,opal-ipmi");
> - opal_pdev_init(opal_node, "ibm,opal-flash");
> - opal_pdev_init(opal_node, "ibm,opal-prd");
> + opal_pdev_init("ibm,opal-ipmi");
> + opal_pdev_init("ibm,opal-flash");
> + opal_pdev_init("ibm,opal-prd");
>  
>   /* Initialise platform device: oppanel interface */
> - opal_pdev_init(opal_node, "ibm,opal-oppanel");
> + opal_pdev_init("ibm,opal-oppanel");
>  
>   /* Initialise OPAL kmsg dumper for flushing console on panic
> */
>   opal_kmsg_init();

Re: [PATCH 3/7] powerpc/64s: tool to flag direct branches from unrelocated interrupt vectors

2016-10-18 Thread Nicholas Piggin

On Wed, 19 Oct 2016 15:28:40 +1100
Balbir Singh  wrote:

> On 19/10/16 14:15, Nicholas Piggin wrote:
> > Direct banches from code below __end_interrupts to code above
> > __end_interrupts when built with CONFIG_RELOCATABLE are disallowed
> > because they will break when the kernel is not located at 0.
> > 
> > Sample output:
> > 
> > WARNING: Unrelocated relative branches
> > c118 bl-> 0xc0038fb8 
> > c13c b-> 0xc01068a4 
> > c148 b-> 0xc003919c 
> > c14c b-> 0xc003923c 
> > c5a4 b-> 0xc0106ffc 
> > c0001af0 b-> 0xc0106ffc 
> > c0001b24 b-> 0xc0106ffc 
> > c0001b58 b-> 0xc0106ffc 
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---  
> 
> If this is the same script I gave you, you can add my SOB line as well

I can't remember now, but I know you had something... I'm usually careful
to get attributions right. I'll add your SOB.

[PATCH v4 1/5] kernel/sched: introduce vcpu preempted check interface

2016-10-18 Thread Pan Xinhui

This patch support to fix lock holder preemption issue.

For kernel users, we could use bool vcpu_is_preempted(int cpu) to detech if
one vcpu is preempted or not.

The default implementation is a macro defined by false. So compiler can
wrap it out if arch dose not support such vcpu pteempted check.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Pan Xinhui 
---
 include/linux/sched.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..44c1ce7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3506,6 +3506,18 @@ static inline void set_task_cpu(struct task_struct *p, 
unsigned int cpu)
 
 #endif /* CONFIG_SMP */
 
+/*
+ * In order to deal with a various lock holder preemption issues provide an
+ * interface to see if a vCPU is currently running or not.
+ *
+ * This allows us to terminate optimistic spin loops and block, analogous to
+ * the native optimistic spin heuristic of testing if the lock owner task is
+ * running or not.
+ */
+#ifndef vcpu_is_preempted
+#define vcpu_is_preempted(cpu) false
+#endif
+
 extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
 extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 
-- 
2.4.11

[PATCH v4 0/5] implement vcpu preempted check

2016-10-18 Thread Pan Xinhui

change from v3:
add x86 vcpu preempted check patch
change from v2:
no code change, fix typos, update some comments
change from v1:
a simplier definition of default vcpu_is_preempted
skip mahcine type check on ppc, and add config. remove dedicated macro.
add one patch to drop overload of rwsem_spin_on_owner and 
mutex_spin_on_owner. 
add more comments
thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set

We also have observed some performace improvements.

PPC test result:

1 copy - 0.94%
2 copy - 7.17%
4 copy - 11.9%
8 copy -  3.04%
16 copy - 15.11%

details below:
Without patch:

1 copy - File Write 4096 bufsize 8000 maxblocks  2188223.0 KBps  (30.0 s, 1 
samples)
2 copy - File Write 4096 bufsize 8000 maxblocks  1804433.0 KBps  (30.0 s, 1 
samples)
4 copy - File Write 4096 bufsize 8000 maxblocks  1237257.0 KBps  (30.0 s, 1 
samples)
8 copy - File Write 4096 bufsize 8000 maxblocks  1032658.0 KBps  (30.0 s, 1 
samples)
16 copy - File Write 4096 bufsize 8000 maxblocks   768000.0 KBps  (30.1 s, 
1 samples)

With patch: 

1 copy - File Write 4096 bufsize 8000 maxblocks  2209189.0 KBps  (30.0 s, 1 
samples)
2 copy - File Write 4096 bufsize 8000 maxblocks  1943816.0 KBps  (30.0 s, 1 
samples)
4 copy - File Write 4096 bufsize 8000 maxblocks  1405591.0 KBps  (30.0 s, 1 
samples)
8 copy - File Write 4096 bufsize 8000 maxblocks  1065080.0 KBps  (30.0 s, 1 
samples)
16 copy - File Write 4096 bufsize 8000 maxblocks   904762.0 KBps  (30.0 s, 
1 samples)

X86 test result:
test-case   after-patch   before-patch
Execl Throughput   |18307.9 lps  |11701.6 lps 
File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
File Copy 256 bufsize 500 maxblocks|   367555.6 KBps |   222867.7 KBps
File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
Pipe Throughput| 11872208.7 lps  | 11855628.9 lps 
Pipe-based Context Switching   |  1495126.5 lps  |  1490533.9 lps 
Process Creation   |29881.2 lps  |28572.8 lps 
Shell Scripts (1 concurrent)   |23224.3 lpm  |22607.4 lpm 
Shell Scripts (8 concurrent)   | 3531.4 lpm  | 3211.9 lpm 
System Call Overhead   | 10385653.0 lps  | 10419979.0 lps 

Pan Xinhui (5):
  kernel/sched: introduce vcpu preempted check interface
  locking/osq: Drop the overload of osq_lock()
  kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
  powerpc/spinlock: support vcpu preempted check
  x86, kvm: support vcpu preempted check

 arch/powerpc/include/asm/spinlock.h   |  8 
 arch/x86/include/asm/paravirt_types.h |  6 ++
 arch/x86/include/asm/spinlock.h   |  8 
 arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
 arch/x86/kernel/kvm.c | 11 +++
 arch/x86/kernel/paravirt.c| 11 +++
 arch/x86/kvm/x86.c| 12 
 include/linux/sched.h | 12 
 kernel/locking/mutex.c| 15 +--
 kernel/locking/osq_lock.c | 10 +-
 kernel/locking/rwsem-xadd.c   | 16 +---
 11 files changed, 105 insertions(+), 7 deletions(-)

-- 
2.4.11

[PATCH v4 2/5] locking/osq: Drop the overload of osq_lock()

2016-10-18 Thread Pan Xinhui

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
osq_lock().

This is because vCPU A hold the osq lock and yield out, vCPU B wait per_cpu
node->locked to be set. IOW, vCPU B wait vCPU A to run and unlock the osq
lock.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

Suggested-by: Boqun Feng 
Signed-off-by: Pan Xinhui 
---
 kernel/locking/osq_lock.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..39d1385 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
return cpu_nr + 1;
 }
 
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+   return node->cpu - 1;
+}
+
 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
 {
int cpu_nr = encoded_cpu_val - 1;
@@ -118,8 +123,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
while (!READ_ONCE(node->locked)) {
/*
 * If we need to reschedule bail... so we can block.
+* Use vcpu_is_preempted to detech lock holder preemption issue
+* and break. vcpu_is_preempted is a macro defined by false if
+* arch does not support vcpu preempted check,
 */
-   if (need_resched())
+   if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
goto unqueue;
 
cpu_relax_lowlatency();
-- 
2.4.11

[PATCH v4 3/5] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner

2016-10-18 Thread Pan Xinhui

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown] [H] 0xc00768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui 
---
 kernel/locking/mutex.c  | 15 +--
 kernel/locking/rwsem-xadd.c | 16 +---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..8927e96 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct 
task_struct *owner)
 */
barrier();
 
-   if (!owner->on_cpu || need_resched()) {
+   /*
+* Use vcpu_is_preempted to detech lock holder preemption issue
+* and break. vcpu_is_preempted is a macro defined by false if
+* arch does not support vcpu preempted check,
+*/
+   if (!owner->on_cpu || need_resched() ||
+   vcpu_is_preempted(task_cpu(owner))) {
ret = false;
break;
}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex 
*lock)
 
rcu_read_lock();
owner = READ_ONCE(lock->owner);
+
+   /*
+* As lock holder preemption issue, we both skip spinning if task is not
+* on cpu or its cpu is preempted
+*/
if (owner)
-   retval = owner->on_cpu;
+   retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
rcu_read_unlock();
/*
 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..ad0b5bb 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -336,7 +336,11 @@ static inline bool rwsem_can_spin_on_owner(struct 
rw_semaphore *sem)
goto done;
}
 
-   ret = owner->on_cpu;
+   /*
+* As lock holder preemption issue, we both skip spinning if task is not
+* on cpu or its cpu is preempted
+*/
+   ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
rcu_read_unlock();
return ret;
@@ -362,8 +366,14 @@ static noinline bool rwsem_spin_on_owner(struct 
rw_semaphore *sem)
 */
barrier();
 
-   /* abort spinning when need_resched or owner is not running */
-   if (!owner->on_cpu || need_resched()) {
+   /*
+* abort spinning when need_resched or owner is not running or
+* owner's cpu is preempted. vcpu_is_preempted is a macro
+* defined by false if arch does not support vcpu preempted
+* check
+*/
+   if (!owner->on_cpu || need_resched() ||
+   vcpu_is_preempted(task_cpu(owner))) {
rcu_read_unlock();
return false;
}
-- 
2.4.11

[PATCH v4 4/5] powerpc/spinlock: support vcpu preempted check

2016-10-18 Thread Pan Xinhui

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted. Then
kernel can break the spin loops upon on the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

Only pSeries need support it. And the fact is powerNV are built into
same kernel image with pSeries. So we need return false if we are runnig
as powerNV. The another fact is that lppaca->yiled_count keeps zero on
powerNV. So we can just skip the machine type check.

Suggested-by: Boqun Feng 
Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/spinlock.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index abb6b0f..af4285b 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,14 @@
 #define SYNC_IO
 #endif
 
+#ifdef CONFIG_PPC_PSERIES
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+   return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
+}
+#endif
+
 #if defined(CONFIG_PPC_SPLPAR)
 /* We only yield to the hypervisor if we are in shared processor mode */
 #define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
-- 
2.4.11

[PATCH v4 5/5] x86, kvm: support vcpu preempted check

2016-10-18 Thread Pan Xinhui

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted.  Then
kernel can break the spin loops upon on the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

We use one field of struct kvm_steal_time to indicate that if one vcpu
is running or not.

unix benchmark result:
host:  kernel 4.8.1, i5-4570, 4 cpus
guest: kernel 4.8.1, 8 vcpus

test-case   after-patch   before-patch
Execl Throughput   |18307.9 lps  |11701.6 lps 
File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
File Copy 256 bufsize 500 maxblocks|   367555.6 KBps |   222867.7 KBps
File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
Pipe Throughput| 11872208.7 lps  | 11855628.9 lps 
Pipe-based Context Switching   |  1495126.5 lps  |  1490533.9 lps 
Process Creation   |29881.2 lps  |28572.8 lps 
Shell Scripts (1 concurrent)   |23224.3 lpm  |22607.4 lpm 
Shell Scripts (8 concurrent)   | 3531.4 lpm  | 3211.9 lpm 
System Call Overhead   | 10385653.0 lps  | 10419979.0 lps 

Signed-off-by: Pan Xinhui 
---
 arch/x86/include/asm/paravirt_types.h |  6 ++
 arch/x86/include/asm/spinlock.h   |  8 
 arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
 arch/x86/kernel/kvm.c | 11 +++
 arch/x86/kernel/paravirt.c| 11 +++
 arch/x86/kvm/x86.c| 12 
 6 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0f400c0..b1c7937 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -98,6 +98,10 @@ struct pv_time_ops {
unsigned long long (*steal_clock)(int cpu);
 };
 
+struct pv_vcpu_ops {
+   bool (*vcpu_is_preempted)(int cpu);
+};
+
 struct pv_cpu_ops {
/* hooks for various privileged instructions */
unsigned long (*get_debugreg)(int regno);
@@ -318,6 +322,7 @@ struct pv_lock_ops {
 struct paravirt_patch_template {
struct pv_init_ops pv_init_ops;
struct pv_time_ops pv_time_ops;
+   struct pv_vcpu_ops pv_vcpu_ops;
struct pv_cpu_ops pv_cpu_ops;
struct pv_irq_ops pv_irq_ops;
struct pv_mmu_ops pv_mmu_ops;
@@ -327,6 +332,7 @@ struct paravirt_patch_template {
 extern struct pv_info pv_info;
 extern struct pv_init_ops pv_init_ops;
 extern struct pv_time_ops pv_time_ops;
+extern struct pv_vcpu_ops pv_vcpu_ops;
 extern struct pv_cpu_ops pv_cpu_ops;
 extern struct pv_irq_ops pv_irq_ops;
 extern struct pv_mmu_ops pv_mmu_ops;
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 921bea7..52fd942 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -26,6 +26,14 @@
 extern struct static_key paravirt_ticketlocks_enabled;
 static __always_inline bool static_key_false(struct static_key *key);
 
+#ifdef CONFIG_PARAVIRT
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+   return pv_vcpu_ops.vcpu_is_preempted(cpu);
+}
+#endif
+
 #include 
 
 /*
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e9c12a1 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -45,7 +45,8 @@ struct kvm_steal_time {
__u64 steal;
__u32 version;
__u32 flags;
-   __u32 pad[12];
+   __u32 preempted;
+   __u32 pad[11];
 };
 
 #define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index edbbfc8..0011bef 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static bool kvm_vcpu_is_preempted(int cpu)
+{
+   struct kvm_steal_time *src;
+
+   src = &per_cpu(steal_time, cpu);
+
+   return !!src->preempted;
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -488,6 +497,8 @@ void __init kvm_guest_init(void)
kvm_guest_cpu_init();
 #endif
 
+   pv_vcpu_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
+
/*
 * Hard lockup detection is enabled by default. Disable it, as guests
 * can get false positives too easily, for example if the host is
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index bbf3d59..7adb7e9 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -122,6 +122,7 @@ static void *get_call_destination(u8 type)
struct paravirt_patch_template tmpl = {

Re: [PATCH v4 0/5] implement vcpu preempted check

2016-10-18 Thread Christian Borntraeger

On 10/19/2016 12:20 PM, Pan Xinhui wrote:
> change from v3:
>   add x86 vcpu preempted check patch

If you want you could add the s390 patch that I provided for your last version.
I also gave my Acked-by for all previous patches.



> change from v2:
>   no code change, fix typos, update some comments
> change from v1:
>   a simplier definition of default vcpu_is_preempted
>   skip mahcine type check on ppc, and add config. remove dedicated macro.
>   add one patch to drop overload of rwsem_spin_on_owner and 
> mutex_spin_on_owner. 
>   add more comments
>   thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch 
> set
> 
> We also have observed some performace improvements.
> 
> PPC test result:
> 
> 1 copy - 0.94%
> 2 copy - 7.17%
> 4 copy - 11.9%
> 8 copy -  3.04%
> 16 copy - 15.11%
> 
> details below:
> Without patch:
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2188223.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1804433.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1237257.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1032658.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   768000.0 KBps  (30.1 
> s, 1 samples)
> 
> With patch: 
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2209189.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1943816.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1405591.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1065080.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   904762.0 KBps  (30.0 
> s, 1 samples)
> 
> X86 test result:
>   test-case   after-patch   before-patch
> Execl Throughput   |18307.9 lps  |11701.6 lps 
> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
> File Copy 256 bufsize 500 maxblocks|   367555.6 KBps |   222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
> Pipe Throughput| 11872208.7 lps  | 11855628.9 lps 
> Pipe-based Context Switching   |  1495126.5 lps  |  1490533.9 lps 
> Process Creation   |29881.2 lps  |28572.8 lps 
> Shell Scripts (1 concurrent)   |23224.3 lpm  |22607.4 lpm 
> Shell Scripts (8 concurrent)   | 3531.4 lpm  | 3211.9 lpm 
> System Call Overhead   | 10385653.0 lps  | 10419979.0 lps 
> 
> Pan Xinhui (5):
>   kernel/sched: introduce vcpu preempted check interface
>   locking/osq: Drop the overload of osq_lock()
>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>   powerpc/spinlock: support vcpu preempted check
>   x86, kvm: support vcpu preempted check
> 
>  arch/powerpc/include/asm/spinlock.h   |  8 
>  arch/x86/include/asm/paravirt_types.h |  6 ++
>  arch/x86/include/asm/spinlock.h   |  8 
>  arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
>  arch/x86/kernel/kvm.c | 11 +++
>  arch/x86/kernel/paravirt.c| 11 +++
>  arch/x86/kvm/x86.c| 12 
>  include/linux/sched.h | 12 
>  kernel/locking/mutex.c| 15 +--
>  kernel/locking/osq_lock.c | 10 +-
>  kernel/locking/rwsem-xadd.c   | 16 +---
>  11 files changed, 105 insertions(+), 7 deletions(-)
>

55 matches

Mail list logo