date:20170927

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2017-09-27 Thread Segher Boessenkool

On Wed, Sep 27, 2017 at 01:38:09PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > A lot of userspace code uses V*X, more and more with newer CPUs and newer
> > compiler versions.  If you already paid the price for using vector
> > registers you do not need to again :-)
> 
> True, but you don't know if you've paid the price already.
> 
> You also pay the price on every context switch (more state to switch),
> so it's not free even once enabled. Which is why the kernel will
> eventually turn it off if it's unused again.

Yup.  But my point is that because user space code uses vector registers
more and more, the penalty for user space code to use vector registers
even more keeps shrinking.

> But now that I've actually looked at the glibc version, it does do some
> checks for minimum length before doing any vector instructions, so
> that's probably all we want. The exact trade off between checking some
> bytes without vector vs turning on vector depends on your input data, so
> it's tricky to tune in general.

You also need nasty code to deal with the start and end of strings, with
conditional branches and whatnot, which quickly overwhelms the benefit
of using vector registers at all.  This tradeoff also changes with newer
ISA versions.

Things have to become *really* cheap before it will be good to often use
vector registers in the kernel though.

Segher

RE: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2017-09-27 Thread David Laight

From: Segher Boessenkool
> Sent: 27 September 2017 10:28
...
> You also need nasty code to deal with the start and end of strings, with
> conditional branches and whatnot, which quickly overwhelms the benefit
> of using vector registers at all.  This tradeoff also changes with newer
> ISA versions.

The goal posts keep moving.
For instance with modern intel x86 cpus 'rep movsb' is by far the fastest
way to copy data (from cached memory).

> Things have to become *really* cheap before it will be good to often use
> vector registers in the kernel though.

I've had thoughts about this in the past.
If the vector registers belong to the current process then you might
get away with just saving the ones you want to use.
If they belong to a different process then you also need to tell the
FPU save code where you've saved the registers.
Then the IPI code can recover all the correct values.

On X86 all the AVX registers are caller saved, the system call
entry could issue the instruction that invalidates them all.
Kernel code running in the context of a user process could then
use the registers without saving them.
It would only need to set a mark to ensure they are invalidated
again on return to user (might be cheap enough to do anyway).
Dunno about PPC though.

David

[PATCH 2/2] powerpc/strict_rwx: fixup region splitting

2017-09-27 Thread Balbir Singh

We were aggressive with splitting regions and missed the case
when _stext and __init_begin completely overlap addr and addr+mapping.

This patch fixes that case and allows us to keep the largest possible
mapping through the overlap. The patch also simplies the check
into a function

Fixes: 7614ff3 ("powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for 
Radix")

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-radix.c | 39 ---
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index c2a2b46..ea0da3b 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -210,17 +210,36 @@ static inline void __meminit print_mapping(unsigned long 
start,
pr_info("Mapped 0x%016lx-0x%016lx with %s pages\n", start, end, buf);
 }
 
+static inline int __meminit
+should_split_mapping_size(unsigned long addr, unsigned long mapping_size)
+{
+#ifdef CONFIG_STRICT_KERNEL_RWX
+   /*
+* If addr, addr+mapping overlap the text region and
+* _stext and __init_begin end up lying between
+* addr, addr+mapping split the mapping size down
+* further
+*/
+   if ((addr < __pa_symbol(__init_begin)) &&
+   (addr + mapping_size) > __pa_symbol(_stext)) {
+
+   if (((addr + mapping_size) <= __pa_symbol(__init_begin)) &&
+   (addr >= __pa_symbol(_stext)))
+   return 0;
+
+   return 1;
+   }
+#endif
+   return 0;
+}
+
+
 static int __meminit create_physical_mapping(unsigned long start,
 unsigned long end)
 {
unsigned long vaddr, addr, mapping_size = 0;
pgprot_t prot;
unsigned long max_mapping_size;
-#ifdef CONFIG_STRICT_KERNEL_RWX
-   int split_text_mapping = 1;
-#else
-   int split_text_mapping = 0;
-#endif
 
start = _ALIGN_UP(start, PAGE_SIZE);
for (addr = start; addr < end; addr += mapping_size) {
@@ -242,16 +261,14 @@ static int __meminit create_physical_mapping(unsigned 
long start,
else
mapping_size = PAGE_SIZE;
 
-   if (split_text_mapping && (mapping_size == PUD_SIZE) &&
-   (addr <= __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(_stext)) {
+   if ((mapping_size == PUD_SIZE) &&
+   should_split_mapping_size(addr, mapping_size)) {
max_mapping_size = PMD_SIZE;
goto retry;
}
 
-   if (split_text_mapping && (mapping_size == PMD_SIZE) &&
-   (addr <= __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(_stext))
+   if ((mapping_size == PMD_SIZE) &&
+   should_split_mapping_size(addr, mapping_size))
mapping_size = PAGE_SIZE;
 
if (mapping_size != previous_size) {
-- 
2.9.5

[PATCH 1/2] powerpc/strict_rwx: quirks for STRICT_RWX patches

2017-09-27 Thread Balbir Singh

This patch disables STRICT_RWX for power9 DD1 machines
where due to some limitations with the way we do tlb
updates, we clear the TLB entry of the text that's doing
the update to kernel text and that does lead to a
crash.

Fixes: 7614ff3 ("powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for 
Radix")

Cc: sta...@vger.kernel.org

Reported-by: Andrew Jeffery 
Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-radix.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 39c252b..c2a2b46 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -169,6 +169,18 @@ void radix__mark_rodata_ro(void)
 {
unsigned long start, end;
 
+   /*
+* mark_rodata_ro() will mark itself as !writable at some point
+* due to workaround in radix__pte_update(), we'll end up with
+* an invalid pte and the system will crash quite severly.
+* The alternatives are quite cumbersome and leaving out
+* the page containing the flush code is not very nice.
+*/
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+   pr_warn("Warning: Unable to mark rodata read only on P9 DD1\n");
+   return;
+   }
+
start = (unsigned long)_stext;
end = (unsigned long)__init_begin;
 
-- 
2.9.5

Re: [2/2] powerpc/powernv: Rework EEH initialization on powernv

2017-09-27 Thread Michael Ellerman

On Thu, 2017-09-07 at 06:35:44 UTC, Benjamin Herrenschmidt wrote:
> Remove the post_init callback which is only used
> by powernv, we can just call it explicitly from
> the powernv code.
> 
> This partially kills the ability to "disable" eeh at
> runtime via debugfs as this was calling that same
> callback again, but this is both unused and broken
> in several ways. If we want to revive it, we need
> to create a dedicated enable/disable callback on the
> backend that does the right thing.
> 
> Let the bulk of eeh initialize normally at
> core_initcall() like it does on pseries by removing
> the hack in eeh_init() that delays it.
> 
> Instead we make sure our eeh->probe cleanly bails
> out of the PEs haven't been created yet and we force
> a re-probe where we used to call eeh_init() again.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Acked-by: Russell Currey 

Patch 2 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b9fde58db7e5738cacb740b0ec5479

cheers

Re: [1/2] powerpc: Add workaround for P9 vector CI load issue

2017-09-27 Thread Michael Ellerman

On Fri, 2017-09-15 at 05:25:48 UTC, Michael Neuling wrote:
> POWER9 DD2.1 and earlier has an issue where some cache inhibited
> vector load will return bad data. The workaround is two part, one
> firmware/microcode part triggers HMI interrupts when hitting such
> loads, the other part is this patch which then emulates the
> instructions in Linux.
> 
> The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
> lxvh8x.
> 
> When an instruction triggers the HMI, all threads in the core will be
> sent to the HMI handler, not just the one running the vector load.
> 
> In general, these spurious HMIs are detected by the emulation code and
> we just return back to the running process. Unfortunately, if a
> spurious interrupt occurs on a vector load that's to normal memory we
> have no way to detect that it's spurious (unless we walk the page
> tables, which is very expensive). In this case we emulate the load but
> we need do so using a vector load itself to ensure 128bit atomicity is
> preserved.
> 
> Some additional debugfs emulated instruction counters are added also.
> 
> Signed-off-by: Michael Neuling 
> Signed-off-by: Benjamin Herrenschmidt 

Patch 1 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5080332c2c893118dbc18755f35c8b

cheers

Re: FSL serial 166550 errata broken?

2017-09-27 Thread Joakim Tjernlund

On Mon, 2017-09-25 at 17:26 +, York Sun wrote:
> On 09/25/2017 09:55 AM, Joakim Tjernlund wrote:
> > We got some "broken" boards(mpx8321) where UART RX is held low(BREAK)
> > There we get a few:
> >serial8250: too much work for irq18
> > and the board freezes.
> > 
> > Looking inte to driver/CPU there is an errtum(A-004737) w.r.t BREAK handling
> > and I can see we are hitting the irq function fsl8250_handle_irq() added
> > in commit: 9deaa53ac7fa373623123aa4f18828dd62292b1a
> >   "serial: add irq handler for Freescale 16550 errata."
> > For all I can tell this workaround is broken and I cannot figure out why.
> > Any pointers?
> > 
> 
> Jocke,
> 
> Did you mean MPC8321?
> 
> I personally don't have experience debugging this erratum. Have you 
> tried to contact the patch author Paul Gortmaker to see how he managed 
> to get it work?

No, but I just found out it is u-boot related. If I use an very old u-boot it 
works.
Between then and now we did a massive upgrade of u-boot and now if breaks. And 
no,
bisect not possible due to local u-boot mods :)

Any idea what could be different? I cannot see and I have tested
what I could see/think of but now I am out of ideas.

 Jocke

Re: [PATCH v8 1/6] powerpc/fadump: reduce memory consumption for capture kernel

2017-09-27 Thread Hari Bathini




On Tuesday 12 September 2017 09:31 PM, Michal Suchanek wrote:

With fadump (dump capture) kernel booting like a regular kernel, it needs
almost the same amount of memory to boot as the production kernel, which is
unwarranted for a dump capture kernel. But with no option to disable some
of the unnecessary subsystems in fadump kernel, that much memory is wasted
on fadump, depriving the production kernel of that memory.

Introduce kernel parameter 'fadump_extra_args=' that would take regular
parameters as a space separated quoted string, to be enforced when fadump
is active. This 'fadump_extra_args=' parameter can be leveraged to pass
parameters like nr_cpus=1, cgroup_disable=memory and numa=off, to disable
unwarranted resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed early
in the boot process to put the subsequent fadump messages in context.

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
Signed-off-by: Michal Suchanek 


For the series...

Tested-by: Hari Bathini 


---
Changes from v6:
Correct and simplify quote handling. Ideally I would like to extend
parse_args to give the length of the original quoted value to callback.
However, parse_args removes at most one doubel-quote from the start and
one from the end so that is easy to detect. Otherwise all other users
will have to be updated to trash the new argument.
Changes from v7:
Handle leading quote in parameter name.
---
  arch/powerpc/include/asm/fadump.h |   2 +
  arch/powerpc/kernel/fadump.c  | 122 +-
  arch/powerpc/kernel/prom.c|   7 +++
  3 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 5a23010af600..41b50b317a67 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -208,12 +208,14 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data);
  extern int fadump_reserve_mem(void);
  extern int setup_fadump(void);
+extern void enforce_fadump_extra_args(char *cmdline);
  extern int is_fadump_active(void);
  extern int should_fadump_crash(void);
  extern void crash_fadump(struct pt_regs *, const char *);
  extern void fadump_cleanup(void);

  #else /* CONFIG_FA_DUMP */
+static inline void enforce_fadump_extra_args(char *cmdline) { }
  static inline int is_fadump_active(void) { return 0; }
  static inline int should_fadump_crash(void) { return 0; }
  static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index e1431800bfb9..0e08f1a80af2 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 * dump data waiting for us.
 */
fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
-   if (fdm_active)
+   if (fdm_active) {
+   pr_info("Firmware-assisted dump is active.\n");
fw_dump.dump_active = 1;
+   }

/* Get the sizes required to store dump data for the firmware provided
 * dump sections.
@@ -339,8 +341,11 @@ int __init fadump_reserve_mem(void)
  {
unsigned long base, size, memory_boundary;

-   if (!fw_dump.fadump_enabled)
+   if (!fw_dump.fadump_enabled) {
+   if (fw_dump.dump_active)
+   pr_warn("Firmware-assisted dump was active but kernel booted 
with fadump disabled!\n");
return 0;
+   }

if (!fw_dump.fadump_supported) {
printk(KERN_INFO "Firmware-assisted dump is not supported on"
@@ -380,7 +385,6 @@ int __init fadump_reserve_mem(void)
memory_boundary = memblock_end_of_DRAM();

if (fw_dump.dump_active) {
-   printk(KERN_INFO "Firmware-assisted dump is active.\n");
/*
 * If last boot has crashed then reserve all the memory
 * above boot_memory_size so that we don't touch it until
@@ -467,6 +471,118 @@ static int __init early_fadump_reserve_mem(char *p)
  }
  early_param("fadump_reserve_mem", early_fadump_reserve_mem);

+#define FADUMP_EXTRA_ARGS_PARAM"fadump_extra_args="
+#define FADUMP_EXTRA_ARGS_LEN  (strlen(FADUMP_EXTRA_ARGS_PARAM) - 1)
+
+struct param_info {
+   char*cmdline;
+   char*tmp_cmdline;
+   int  shortening;
+};
+
+static void __init fadump_update_params(struct param_info *param_info,
+   char *param, char *val)
+{
+   ptrdiff_t param_offset = param - param_info->tmp_cmdline;
+   size_t vallen = val ? strlen(val) : 0;
+   char *tgt = param_info->cmdline + param_offset
+   - param_info->shortening;
+   int shortening = 0;
+   int quoted = 0;
+
+

Re: [PATCH v8 1/6] powerpc/fadump: reduce memory consumption for capture kernel

2017-09-27 Thread Hari Bathini




On Tuesday 12 September 2017 09:31 PM, Michal Suchanek wrote:

With fadump (dump capture) kernel booting like a regular kernel, it needs
almost the same amount of memory to boot as the production kernel, which is
unwarranted for a dump capture kernel. But with no option to disable some
of the unnecessary subsystems in fadump kernel, that much memory is wasted
on fadump, depriving the production kernel of that memory.

Introduce kernel parameter 'fadump_extra_args=' that would take regular
parameters as a space separated quoted string, to be enforced when fadump
is active. This 'fadump_extra_args=' parameter can be leveraged to pass
parameters like nr_cpus=1, cgroup_disable=memory and numa=off, to disable
unwarranted resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed early
in the boot process to put the subsequent fadump messages in context.

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
Signed-off-by: Michal Suchanek 


For the series..
Tested-by: Hari Bathini 


---
Changes from v6:
Correct and simplify quote handling. Ideally I would like to extend
parse_args to give the length of the original quoted value to callback.
However, parse_args removes at most one doubel-quote from the start and
one from the end so that is easy to detect. Otherwise all other users
will have to be updated to trash the new argument.
Changes from v7:
Handle leading quote in parameter name.
---
  arch/powerpc/include/asm/fadump.h |   2 +
  arch/powerpc/kernel/fadump.c  | 122 +-
  arch/powerpc/kernel/prom.c|   7 +++
  3 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 5a23010af600..41b50b317a67 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -208,12 +208,14 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data);
  extern int fadump_reserve_mem(void);
  extern int setup_fadump(void);
+extern void enforce_fadump_extra_args(char *cmdline);
  extern int is_fadump_active(void);
  extern int should_fadump_crash(void);
  extern void crash_fadump(struct pt_regs *, const char *);
  extern void fadump_cleanup(void);

  #else /* CONFIG_FA_DUMP */
+static inline void enforce_fadump_extra_args(char *cmdline) { }
  static inline int is_fadump_active(void) { return 0; }
  static inline int should_fadump_crash(void) { return 0; }
  static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index e1431800bfb9..0e08f1a80af2 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 * dump data waiting for us.
 */
fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
-   if (fdm_active)
+   if (fdm_active) {
+   pr_info("Firmware-assisted dump is active.\n");
fw_dump.dump_active = 1;
+   }

/* Get the sizes required to store dump data for the firmware provided
 * dump sections.
@@ -339,8 +341,11 @@ int __init fadump_reserve_mem(void)
  {
unsigned long base, size, memory_boundary;

-   if (!fw_dump.fadump_enabled)
+   if (!fw_dump.fadump_enabled) {
+   if (fw_dump.dump_active)
+   pr_warn("Firmware-assisted dump was active but kernel booted 
with fadump disabled!\n");
return 0;
+   }

if (!fw_dump.fadump_supported) {
printk(KERN_INFO "Firmware-assisted dump is not supported on"
@@ -380,7 +385,6 @@ int __init fadump_reserve_mem(void)
memory_boundary = memblock_end_of_DRAM();

if (fw_dump.dump_active) {
-   printk(KERN_INFO "Firmware-assisted dump is active.\n");
/*
 * If last boot has crashed then reserve all the memory
 * above boot_memory_size so that we don't touch it until
@@ -467,6 +471,118 @@ static int __init early_fadump_reserve_mem(char *p)
  }
  early_param("fadump_reserve_mem", early_fadump_reserve_mem);

+#define FADUMP_EXTRA_ARGS_PARAM"fadump_extra_args="
+#define FADUMP_EXTRA_ARGS_LEN  (strlen(FADUMP_EXTRA_ARGS_PARAM) - 1)
+
+struct param_info {
+   char*cmdline;
+   char*tmp_cmdline;
+   int  shortening;
+};
+
+static void __init fadump_update_params(struct param_info *param_info,
+   char *param, char *val)
+{
+   ptrdiff_t param_offset = param - param_info->tmp_cmdline;
+   size_t vallen = val ? strlen(val) : 0;
+   char *tgt = param_info->cmdline + param_offset
+   - param_info->shortening;
+   int shortening = 0;
+   int quoted = 0;
+
+

Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command

2017-09-27 Thread Nicholas Piggin

On Tue, 26 Sep 2017 20:43:28 + (UTC)
Mathieu Desnoyers  wrote:

> - On Sep 26, 2017, at 1:51 PM, Mathieu Desnoyers 
> mathieu.desnoy...@efficios.com wrote:
> 
> > Provide a new command allowing processes to register their intent to use
> > the private expedited command.
> >   
> 
> I missed a few maintainers that should have been CC'd. Adding them now.
> This patch is aimed to go through Paul E. McKenney's tree.

Honestly this is pretty ugly new user API and fairly large amount of
complexity just to avoid the powerpc barrier. And you end up with arch
specific hooks anyway!

So my plan was to add an arch-overridable loop primitive that iterates
over all running threads for an mm. powerpc will use its mm_cpumask for
iterating and use runqueue locks to avoid the barrier. x86 will most
likely want to use its mm_cpumask to iterate.

For the powerpc approach, yes there is some controversy about using
runqueue locks even for cpus that we already can interfere with, but I
think we have a lot of options we could look at *after* it ever shows
up as a problem.

Thanks,
Nick

[Part1 PATCH v5 09/17] resource: Provide resource struct in resource walk callback

2017-09-27 Thread Brijesh Singh

From: Tom Lendacky 

In prep for a new function that will need additional resource information
during the resource walk, update the resource walk callback to pass the
resource structure.  Since the current callback start and end arguments
are pulled from the resource structure, the callback functions can obtain
them from the resource structure directly.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: linux-ker...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Benjamin Herrenschmidt 
Reviewed-by: Kees Cook 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/powerpc/kernel/machine_kexec_file_64.c | 12 +---
 arch/x86/kernel/crash.c | 18 +-
 arch/x86/kernel/pmem.c  |  2 +-
 include/linux/ioport.h  |  4 ++--
 include/linux/kexec.h   |  2 +-
 kernel/kexec_file.c |  5 +++--
 kernel/resource.c   |  9 +
 7 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c 
b/arch/powerpc/kernel/machine_kexec_file_64.c
index 992c0d258e5d..e4395f937d63 100644
--- a/arch/powerpc/kernel/machine_kexec_file_64.c
+++ b/arch/powerpc/kernel/machine_kexec_file_64.c
@@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
  * and that value will be returned. If all free regions are visited without
  * func returning non-zero, then zero will be returned.
  */
-int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+   int (*func)(struct resource *, void *))
 {
int ret = 0;
u64 i;
phys_addr_t mstart, mend;
+   struct resource res = { };
 
if (kbuf->top_down) {
for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
@@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
@@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 44404e2307bb..815008c9ca18 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
 
@@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data 
*ced,
return ret;
 }
 
-static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
+static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
struct crash_elf_data *ced = arg;
Elf64_Ehdr *ehdr;
@@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, 
u64 end, void *arg)
ehdr = ced->ehdr;
 
/* Exclude unwanted mem ranges */
-   ret = elf_header_exclude_ranges(ced, start, end);
+   ret = elf_header_exclude_ranges(ced, res->start, res->end);
if (ret)
return ret;
 
@@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, 
struct e820_entry *entry)
return 0;
 }
 
-static int memmap_entry_callback(u64 start, u64 end, void *arg)
+static int memmap_entry_callback(struct resource *res, void *arg)
 {
struct crash_memmap_data *cmd = arg;
struct boot_params *params = cmd->params;
struct e820_entry ei;
 
-   ei.addr = start;
-   ei.size = end - start + 1;
+   ei.addr = res->start;
+   ei.size = res->end - res->start + 1;
ei.type = cmd->type;
add_e820_entry(params, &ei);
 
@@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
return ret;
 }
 
-static int determine_backup_region(u64 start, u64 end, void *arg)
+static int determine_backup_regi

Re: FSL serial 166550 errata broken?

2017-09-27 Thread York Sun

On 09/27/2017 04:03 AM, Joakim Tjernlund wrote:
> On Mon, 2017-09-25 at 17:26 +, York Sun wrote:
>> On 09/25/2017 09:55 AM, Joakim Tjernlund wrote:
>>> We got some "broken" boards(mpx8321) where UART RX is held low(BREAK)
>>> There we get a few:
>>> serial8250: too much work for irq18
>>> and the board freezes.
>>>
>>> Looking inte to driver/CPU there is an errtum(A-004737) w.r.t BREAK handling
>>> and I can see we are hitting the irq function fsl8250_handle_irq() added
>>> in commit: 9deaa53ac7fa373623123aa4f18828dd62292b1a
>>>"serial: add irq handler for Freescale 16550 errata."
>>> For all I can tell this workaround is broken and I cannot figure out why.
>>> Any pointers?
>>>
>>
>> Jocke,
>>
>> Did you mean MPC8321?
>>
>> I personally don't have experience debugging this erratum. Have you
>> tried to contact the patch author Paul Gortmaker to see how he managed
>> to get it work?
> 
> No, but I just found out it is u-boot related. If I use an very old u-boot it 
> works.
> Between then and now we did a massive upgrade of u-boot and now if breaks. 
> And no,
> bisect not possible due to local u-boot mods :)

How old? It is a good sign that an old U-Boot works. Git bisect would be 
a good tool if practical. Sometimes I have to reapply some local changes 
for every step of bisect to make it useful. You mileage may vary though.

> 
> Any idea what could be different? I cannot see and I have tested
> what I could see/think of but now I am out of ideas.

I don't believe we have much change for MPC8321 for a long time. If any 
change has impact on kernel but not U-Boot itself, it may be other 
errata workarounds.

I presume this happens on your own board. If I am in your shoes, I would 
try to reduce the number of local commits and reapply them to various 
U-Boot tags to further narrow down the search scope. In the mean time, 
getting register dump to compare the good and bad may be another way to go.

York

Re: [RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-27 Thread Sergey Senozhatsky

On (09/27/17 16:26), Michael Ellerman wrote:
[..]
> > Tested-by: Santosh Sivaraj 
> 
> Thanks Santosh.
> 
> I also gave it a quick spin. I'll give you an ack for the powerpc changes.
> 
> Acked-by: Michael Ellerman  (powerpc)
> 
> 
> Thanks for cleaning this up Sergey.

thanks!

-ss

Re: [PATCH 1/1] KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()

2017-09-27 Thread David Gibson

On Tue, Sep 26, 2017 at 04:47:04PM +1000, Sam Bobroff wrote:
> In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the
> value of state->guest_server as "server". However, this value is not
> set by it's counterpart kvmppc_xive_set_xive(). When the guest uses
> this interface to migrate interrupts away from a CPU that is going
> offline, it sees all interrupts as belonging to CPU 0, so they are
> left assigned to (now) offline CPUs.
> 
> This patch removes the guest_server field from the state, and returns
> act_server in it's place (that is, the CPU actually handling the
> interrupt, which may differ from the one requested).
> 
> Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE
> interrupt controller")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Sam Bobroff 
> ---
> The other obvious way to patch this would be to set state->guest_server in
> kvmppc_xive_set_xive() and that does also work because act_server is usually
> equal to guest_server.
> 
> However, in the cases where guest_server differed from act_server, the guest
> would only move IRQs correctly if it got act_server (the CPU actually handling
> the interrupt) here. So, that approach seemed better.

Paolo, again this is a pretty urgent fix for KVM on Power and Paulus
is away.  We're hoping BenH will ack shortly (he's the logical
technical reviewer), after which can you merge this direct into the
KVM staging tree?  (RHBZ 1477391, and we suspect several more are
related).

>  arch/powerpc/kvm/book3s_xive.c | 5 ++---
>  arch/powerpc/kvm/book3s_xive.h | 1 -
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 13304622ab1c..bf457843e032 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -622,7 +622,7 @@ int kvmppc_xive_get_xive(struct kvm *kvm, u32 irq, u32 
> *server,
>   return -EINVAL;
>   state = &sb->irq_state[idx];
>   arch_spin_lock(&sb->lock);
> - *server = state->guest_server;
> + *server = state->act_server;
>   *priority = state->guest_priority;
>   arch_spin_unlock(&sb->lock);
>  
> @@ -1331,7 +1331,7 @@ static int xive_get_source(struct kvmppc_xive *xive, 
> long irq, u64 addr)
>   xive->saved_src_count++;
>  
>   /* Convert saved state into something compatible with xics */
> - val = state->guest_server;
> + val = state->act_server;
>   prio = state->saved_scan_prio;
>  
>   if (prio == MASKED) {
> @@ -1507,7 +1507,6 @@ static int xive_set_source(struct kvmppc_xive *xive, 
> long irq, u64 addr)
>   /* First convert prio and mark interrupt as untargetted */
>   act_prio = xive_prio_from_guest(guest_prio);
>   state->act_priority = MASKED;
> - state->guest_server = server;
>  
>   /*
>* We need to drop the lock due to the mutex below. Hopefully
> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> index 5938f7644dc1..6ba63f8e8a61 100644
> --- a/arch/powerpc/kvm/book3s_xive.h
> +++ b/arch/powerpc/kvm/book3s_xive.h
> @@ -35,7 +35,6 @@ struct kvmppc_xive_irq_state {
>   struct xive_irq_data *pt_data;  /* XIVE Pass-through associated data */
>  
>   /* Targetting as set by guest */
> - u32 guest_server;   /* Current guest selected target */
>   u8 guest_priority;  /* Guest set priority */
>   u8 saved_priority;  /* Saved priority when masking */
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2017-09-27 Thread Simon Guo

Hi Michael,
On Wed, Sep 27, 2017 at 01:38:09PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> 
> > On Tue, Sep 26, 2017 at 03:34:36PM +1000, Michael Ellerman wrote:
> >> Cyril Bur  writes:
> >> > This was written for userspace which doesn't have to explicitly enable
> >> > VMX in order to use it - we need to be smarter in the kernel.
> >> 
> >> Well the kernel has to do it for them after a trap, which is actually
> >> even more expensive, so arguably the glibc code should be smarter too
> >> and the threshold before using VMX should probably be higher than in the
> >> kernel (to cover the cost of the trap).
> >
> > A lot of userspace code uses V*X, more and more with newer CPUs and newer
> > compiler versions.  If you already paid the price for using vector
> > registers you do not need to again :-)
> 
> True, but you don't know if you've paid the price already.
> 
> You also pay the price on every context switch (more state to switch),
> so it's not free even once enabled. Which is why the kernel will
> eventually turn it off if it's unused again.
> 
> But now that I've actually looked at the glibc version, it does do some
> checks for minimum length before doing any vector instructions, so

Looks the glibc version will use VSX instruction and lead to trap in a 
9 bytes size cmp with src/dst 16 bytes aligned. 
 132 /* Now both rSTR1 and rSTR2 are aligned to QW.  */
 133 .align  4
 134 L(qw_align):
 135 vspltisbv0, 0
 136 srdi.   r6, rN, 6
 137 li  r8, 16
 138 li  r10, 32
 139 li  r11, 48
 140 ble cr0, L(lessthan64)
 141 mtctr   r6
 142 vspltisbv8, 0
 143 vspltisbv6, 0
 144 /* Aligned vector loop.  */
 145 .align  4
line 135 is the VSX instruction causing trap. Did I miss anything?

> that's probably all we want. The exact trade off between checking some
> bytes without vector vs turning on vector depends on your input data, so
> it's tricky to tune in general.

Discussed offline with Cyril. The plan is to use (>=4KB) as the minimum len 
before vector regs steps at v3. Cyril will consolidate his existing work on 
KSM optimization later, which is probably making 64bytes comparison-ahead to 
determine whether it is an early or late matching pattern.

Cyril has also some other valuable comments and I will rework on v3.

Is it OK for you?

Thanks,
- Simon

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2017-09-27 Thread Simon Guo

On Wed, Sep 27, 2017 at 09:43:44AM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 27 September 2017 10:28
> ...
> > You also need nasty code to deal with the start and end of strings, with
> > conditional branches and whatnot, which quickly overwhelms the benefit
> > of using vector registers at all.  This tradeoff also changes with newer
> > ISA versions.
> 
> The goal posts keep moving.
> For instance with modern intel x86 cpus 'rep movsb' is by far the fastest
> way to copy data (from cached memory).
> 
> > Things have to become *really* cheap before it will be good to often use
> > vector registers in the kernel though.
> 
> I've had thoughts about this in the past.
> If the vector registers belong to the current process then you might
> get away with just saving the ones you want to use.
> If they belong to a different process then you also need to tell the
> FPU save code where you've saved the registers.
> Then the IPI code can recover all the correct values.
> 
> On X86 all the AVX registers are caller saved, the system call
> entry could issue the instruction that invalidates them all.
> Kernel code running in the context of a user process could then
> use the registers without saving them.
> It would only need to set a mark to ensure they are invalidated
> again on return to user (might be cheap enough to do anyway).
> Dunno about PPC though.

I am not aware of any ppc instruction which can set a "mark" or provide 
any high granularity flag against single or subgroup of vec regs' validity.
But ppc experts may want to correct me.

Thanks,
- Simon

[PATCH kernel v2] vfio/spapr: Add cond_resched() for huge updates

2017-09-27 Thread Alexey Kardashevskiy

Clearing very big IOMMU tables can trigger soft lockups. This adds
cond_resched() for every second spend in the loop.

Signed-off-by: Alexey Kardashevskiy 
---

The testcase is POWER9 box with 264GB guest, 4 VFIO devices from
independent IOMMU groups, 64K IOMMU pages. This configuration produces
4325376 TCE entries, each entry update incurs 4 OPAL calls to update
an individual PE TCE cache; this produced lockups for more than 20s.
Reducing table size to 4194304 (i.e. 256GB guest) or removing one
of 4 VFIO devices makes the problem go away.

---
Changes:
v2:
* replaced with time based solution
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 63112c36ab2d..b7a317520f2a 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -505,8 +505,15 @@ static int tce_iommu_clear(struct tce_container *container,
unsigned long oldhpa;
long ret;
enum dma_data_direction direction;
+   unsigned long time_limit = jiffies + HZ;
 
for ( ; pages; --pages, ++entry) {
+
+   if (time_after(jiffies, time_limit)) {
+   cond_resched();
+   time_limit = jiffies + HZ;
+   }
+
direction = DMA_NONE;
oldhpa = 0;
ret = iommu_tce_xchg(tbl, entry, &oldhpa, &direction);
-- 
2.11.0

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

RE: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

[PATCH 2/2] powerpc/strict_rwx: fixup region splitting

[PATCH 1/2] powerpc/strict_rwx: quirks for STRICT_RWX patches

Re: [2/2] powerpc/powernv: Rework EEH initialization on powernv

Re: [1/2] powerpc: Add workaround for P9 vector CI load issue

Re: FSL serial 166550 errata broken?

Re: [PATCH v8 1/6] powerpc/fadump: reduce memory consumption for capture kernel

Re: [PATCH v8 1/6] powerpc/fadump: reduce memory consumption for capture kernel

Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command

[Part1 PATCH v5 09/17] resource: Provide resource struct in resource walk callback

Re: FSL serial 166550 errata broken?

Re: [RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

Re: [PATCH 1/1] KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

[PATCH kernel v2] vfio/spapr: Add cond_resched() for huge updates

17 matches

Site Navigation

Mail list logo

Footer information