date:20170528

Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

2017-05-28 Thread Michal Suchanek

On Tue, 9 May 2017 17:43:27 -0700
Dmitry Torokhov  wrote:

> Hi Michal,
> 
> On Tue, May 09, 2017 at 09:14:18PM +0200, Michal Suchanek wrote:
> > There is nothing mac-specific about this driver. Non-mac hardware
> > with suboptimal built-in pointer devices exists.
> > 
> > This makes it possible to use this emulation not only on x86 and ppc
> > notebooks but also on arm and mips.  
> 
> I'd rather we did not promote from drivers/macintosh to other
> platforms, but rather removed it. The same functionality can be done
> from userspace.

What is the status of this?

Do you reply to every patch to drivers/input that is not the the core
infrastructure that you would rather drop the driver because it can be
done is in userspace?

It sure can be done. Remove everything but the bus drivers and uinput
from drivers/input and the rest can be done in userspace.

The question is who does it?

Are you saying that you will implement the userspace equivalent?

If not then please do your job as maintainer and accept trivial patches
for perfectly working drivers we have now.

If you want to move drivers/input into userspace I am not against it
but I am not willing to do that for you either.

> 
> What hardware do you believe would benefit from this and why?

Any touchpad hardware where you cannot press two buttons at once to
emulate the third button due to hardware design. And any touchpad
hardware on which some of the buttons are broken when it comes to it.

It is built into a notebook and works fine for moving the cursor but
due to lack of usable buttons you still need a mouse to use the
notebook.

Thanks

Michal

> 
> Thanks.
> 
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> >  drivers/input/mouse/Kconfig  | 20
> >  drivers/input/mouse/Makefile
> > |  1 + drivers/{macintosh => input/mouse}/mac_hid.c |  0
> >  drivers/macintosh/Kconfig| 17 -
> >  drivers/macintosh/Makefile   |  1 -
> >  5 files changed, 21 insertions(+), 18 deletions(-)
> >  rename drivers/{macintosh => input/mouse}/mac_hid.c (100%)
> > 
> > diff --git a/drivers/input/mouse/Kconfig
> > b/drivers/input/mouse/Kconfig index 89ebb8f39fee..5533fd3a113f
> > 100644 --- a/drivers/input/mouse/Kconfig
> > +++ b/drivers/input/mouse/Kconfig
> > @@ -12,6 +12,26 @@ menuconfig INPUT_MOUSE
> >  
> >  if INPUT_MOUSE
> >  
> > +config MAC_EMUMOUSEBTN
> > +   tristate "Support for mouse button 2+3 emulation"
> > +   depends on SYSCTL && INPUT
> > +   help
> > + This provides generic support for emulating the 2nd and
> > 3rd mouse
> > + button with keypresses.  If you say Y here, the
> > emulation is still
> > + disabled by default.  The emulation is controlled by
> > these sysctl
> > + entries:
> > + /proc/sys/dev/mac_hid/mouse_button_emulation
> > + /proc/sys/dev/mac_hid/mouse_button2_keycode
> > + /proc/sys/dev/mac_hid/mouse_button3_keycode
> > +
> > + If you have an Apple machine with a 1-button mouse, say
> > Y here. +
> > + This emulation can be useful on notebooks with
> > suboptimal touchpad
> > + hardware as well.
> > +
> > + To compile this driver as a module, choose M here: the
> > + module will be called mac_hid.
> > +
> >  config MOUSE_PS2
> > tristate "PS/2 mouse"
> > default y
> > diff --git a/drivers/input/mouse/Makefile
> > b/drivers/input/mouse/Makefile index 56bf0ad877c6..dfaad1dd8857
> > 100644 --- a/drivers/input/mouse/Makefile
> > +++ b/drivers/input/mouse/Makefile
> > @@ -4,6 +4,7 @@
> >  
> >  # Each configuration option enables a list of files.
> >  
> > +obj-$(CONFIG_MAC_EMUMOUSEBTN)  += mac_hid.o
> >  obj-$(CONFIG_MOUSE_AMIGA)  += amimouse.o
> >  obj-$(CONFIG_MOUSE_APPLETOUCH) += appletouch.o
> >  obj-$(CONFIG_MOUSE_ATARI)  += atarimouse.o
> > diff --git a/drivers/macintosh/mac_hid.c
> > b/drivers/input/mouse/mac_hid.c similarity index 100%
> > rename from drivers/macintosh/mac_hid.c
> > rename to drivers/input/mouse/mac_hid.c
> > diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig
> > index 97a420c11eed..011df09c5167 100644
> > --- a/drivers/macintosh/Kconfig
> > +++ b/drivers/macintosh/Kconfig
> > @@ -159,23 +159,6 @@ config INPUT_ADBHID
> >  
> >   If unsure, say Y.
> >  
> > -config MAC_EMUMOUSEBTN
> > -   tristate "Support for mouse button 2+3 emulation"
> > -   depends on SYSCTL && INPUT
> > -   help
> > - This provides generic support for emulating the 2nd and
> > 3rd mouse
> > - button with keypresses.  If you say Y here, the
> > emulation is still
> > - disabled by default.  The emulation is controlled by
> > these sysctl
> > - entries:
> > - /proc/sys/dev/mac_hid/mouse_button_emulation
> > - /proc/sys/dev/mac_hid/mouse_button2_keycode
> > - /proc/sys/dev/mac_hid/mouse_button3_keycode
> > -
> > - If you have an Apple machine with a 1-button mouse, say
> > Y here. -
> > - To compile this driver as a m

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread christophe leroy




Le 25/05/2017 à 05:36, Balbir Singh a écrit :

Today our patching happens via direct copy and
patch_instruction. The patching code is well
contained in the sense that copying bits are limited.

While considering implementation of CONFIG_STRICT_RWX,
the first requirement is to a create another mapping
that will allow for patching. We create the window using
text_poke_area, allocated via get_vm_area(), which might
be an overkill. We can do per-cpu stuff as well. The
downside of these patches that patch_instruction is
now synchornized using a lock. Other arches do similar
things, but use fixmaps. The reason for not using
fixmaps is to make use of any randomization in the
future. The code also relies on set_pte_at and pte_clear
to do the appropriate tlb flushing.

Signed-off-by: Balbir Singh 


[...]


+static int kernel_map_addr(void *addr)
+{
+   unsigned long pfn;
int err;

-   __put_user_size(instr, addr, 4, err);
+   if (is_vmalloc_addr(addr))
+   pfn = vmalloc_to_pfn(addr);
+   else
+   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+
+   err = map_kernel_page((unsigned long)text_poke_area->addr,
+   (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);


map_kernel_page() doesn't exist on powerpc32, so compilation fails.

However a similar function exists and is called map_page()

Maybe the below modification could help (not tested yet)

Christophe



---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++
 arch/powerpc/mm/8xx_mmu.c| 2 +-
 arch/powerpc/mm/dma-noncoherent.c| 2 +-
 arch/powerpc/mm/mem.c| 4 ++--
 arch/powerpc/mm/mmu_decl.h   | 1 -
 arch/powerpc/mm/pgtable_32.c | 8 
 7 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h

index 26ed228..7fb7558 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -297,6 +297,8 @@ static inline void __ptep_set_access_flags(struct 
mm_struct *mm,
 extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t 
**ptep,

  pmd_t **pmdp);

+int map_kernel_page(unsigned long va, phys_addr_t pa, int flags);
+
 /* Generic accessors to PTE bits */
 static inline int pte_write(pte_t pte)		{ return !!(pte_val(pte) & 
_PAGE_RW);}
 static inline int pte_dirty(pte_t pte)		{ return !!(pte_val(pte) & 
_PAGE_DIRTY); }
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h

index 5134ade..9131426 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -340,6 +340,8 @@ static inline void __ptep_set_access_flags(struct 
mm_struct *mm,
 extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t 
**ptep,

  pmd_t **pmdp);

+int map_kernel_page(unsigned long va, phys_addr_t pa, int flags);
+
 #endif /* !__ASSEMBLY__ */

 #endif /* __ASM_POWERPC_NOHASH_32_PGTABLE_H */
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 6c5025e..f4c6472 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -88,7 +88,7 @@ static void mmu_mapin_immr(void)
int offset;

for (offset = 0; offset < IMMR_SIZE; offset += PAGE_SIZE)
-   map_page(v + offset, p + offset, f);
+   map_kernel_page(v + offset, p + offset, f);
 }

 /* Address of instructions to patch */
diff --git a/arch/powerpc/mm/dma-noncoherent.c 
b/arch/powerpc/mm/dma-noncoherent.c

index 2dc74e5..3825284 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -227,7 +227,7 @@ __dma_alloc_coherent(struct device *dev, size_t 
size, dma_addr_t *handle, gfp_t


do {
SetPageReserved(page);
-   map_page(vaddr, page_to_phys(page),
+   map_kernel_page(vaddr, page_to_phys(page),
 pgprot_val(pgprot_noncached(PAGE_KERNEL)));
page++;
vaddr += PAGE_SIZE;
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9ee536e..04f4c98 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -313,11 +313,11 @@ void __init paging_init(void)
unsigned long end = __fix_to_virt(FIX_HOLE);

for (; v < end; v += PAGE_SIZE)
-   map_page(v, 0, 0); /* XXX gross */
+   map_kernel_page(v, 0, 0); /* XXX gross */
 #endif

 #ifdef CONFIG_HIGHMEM
-   map_page(PKMAP_BASE, 0, 0); /* XXX gross */
+   map_kernel_page(PKMAP_BASE, 0, 0);  /* XXX gross */
pkmap_page_table = virt_to_kpte(PKMAP_BASE);

kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
diff --git a/arch/powerpc/mm/mmu_decl.h b/

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread christophe leroy




Le 25/05/2017 à 05:36, Balbir Singh a écrit :

Today our patching happens via direct copy and
patch_instruction. The patching code is well
contained in the sense that copying bits are limited.

While considering implementation of CONFIG_STRICT_RWX,
the first requirement is to a create another mapping
that will allow for patching. We create the window using
text_poke_area, allocated via get_vm_area(), which might
be an overkill. We can do per-cpu stuff as well. The
downside of these patches that patch_instruction is
now synchornized using a lock. Other arches do similar
things, but use fixmaps. The reason for not using
fixmaps is to make use of any randomization in the
future. The code also relies on set_pte_at and pte_clear
to do the appropriate tlb flushing.


Isn't it overkill to remap the text in another area ?

Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64, 
parisc, s390, x86/32, x86/64):

- arm, x86/32 and x86/64 set text RW during the modification
- s390 seems to uses a special instruction which bypassed write protection
- parisc doesn't seem to implement any function which modifies kernel text.

Therefore it seems only arm64 does it via another mapping.
Wouldn't it be lighter to just unprotect memory during the modification 
as done on arm and x86 ?


Or another alternative could be to disable DMMU and do the write at 
physical address ?


Christophe



Signed-off-by: Balbir Singh 
---
 arch/powerpc/lib/code-patching.c | 88 ++--
 1 file changed, 84 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 500b0f6..0a16b2f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -16,19 +16,98 @@
 #include 
 #include 
 #include 
+#include 
+#include 

+struct vm_struct *text_poke_area;
+static DEFINE_RAW_SPINLOCK(text_poke_lock);

-int patch_instruction(unsigned int *addr, unsigned int instr)
+/*
+ * This is an early_initcall and early_initcalls happen at the right time
+ * for us, after slab is enabled and before we mark ro pages R/O. In the
+ * future if get_vm_area is randomized, this will be more flexible than
+ * fixmap
+ */
+static int __init setup_text_poke_area(void)
 {
+   text_poke_area = get_vm_area(PAGE_SIZE, VM_ALLOC);
+   if (!text_poke_area) {
+   WARN_ONCE(1, "could not create area for mapping kernel addrs"
+   " which allow for patching kernel code\n");
+   return 0;
+   }
+   pr_info("text_poke area ready...\n");
+   raw_spin_lock_init(&text_poke_lock);
+   return 0;
+}
+
+/*
+ * This can be called for kernel text or a module.
+ */
+static int kernel_map_addr(void *addr)
+{
+   unsigned long pfn;
int err;

-   __put_user_size(instr, addr, 4, err);
+   if (is_vmalloc_addr(addr))
+   pfn = vmalloc_to_pfn(addr);
+   else
+   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+
+   err = map_kernel_page((unsigned long)text_poke_area->addr,
+   (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);
+   pr_devel("Mapped addr %p with pfn %lx\n", text_poke_area->addr, pfn);
if (err)
-   return err;
-   asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r" (addr));
+   return -1;
return 0;
 }

+static inline void kernel_unmap_addr(void *addr)
+{
+   pte_t *pte;
+   unsigned long kaddr = (unsigned long)addr;
+
+   pte = pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(kaddr),
+   kaddr), kaddr), kaddr);
+   pr_devel("clearing mm %p, pte %p, kaddr %lx\n", &init_mm, pte, kaddr);
+   pte_clear(&init_mm, kaddr, pte);
+}
+
+int patch_instruction(unsigned int *addr, unsigned int instr)
+{
+   int err;
+   unsigned int *dest = NULL;
+   unsigned long flags;
+   unsigned long kaddr = (unsigned long)addr;
+
+   /*
+* During early early boot patch_instruction is called
+* when text_poke_area is not ready, but we still need
+* to allow patching. We just do the plain old patching
+*/
+   if (!text_poke_area) {
+   __put_user_size(instr, addr, 4, err);
+   asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (addr));
+   return 0;
+   }
+
+   raw_spin_lock_irqsave(&text_poke_lock, flags);
+   if (kernel_map_addr(addr)) {
+   err = -1;
+   goto out;
+   }
+
+   dest = (unsigned int *)(text_poke_area->addr) +
+   ((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
+   __put_user_size(instr, dest, 4, err);
+   asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (dest));
+   kernel_unmap_addr(text_poke_area->addr);
+out:
+   raw_spin_unlock_irqrestore(&text_poke_lock, flags);
+   return err;
+}
+NOKPROBE_SYMBOL(patch_instruction);
+
 int patch_branch(unsigned

Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

2017-05-28 Thread Dmitry Torokhov

On Sun, May 28, 2017 at 11:47:58AM +0200, Michal Suchanek wrote:
> On Tue, 9 May 2017 17:43:27 -0700
> Dmitry Torokhov  wrote:
> 
> > Hi Michal,
> > 
> > On Tue, May 09, 2017 at 09:14:18PM +0200, Michal Suchanek wrote:
> > > There is nothing mac-specific about this driver. Non-mac hardware
> > > with suboptimal built-in pointer devices exists.
> > > 
> > > This makes it possible to use this emulation not only on x86 and ppc
> > > notebooks but also on arm and mips.  
> > 
> > I'd rather we did not promote from drivers/macintosh to other
> > platforms, but rather removed it. The same functionality can be done
> > from userspace.
> 
> What is the status of this?

The same as in above paragraph.

> 
> Do you reply to every patch to drivers/input that is not the the core
> infrastructure that you would rather drop the driver because it can be
> done is in userspace?
> 
> It sure can be done. Remove everything but the bus drivers and uinput
> from drivers/input and the rest can be done in userspace.
> 
> The question is who does it?
> 
> Are you saying that you will implement the userspace equivalent?

No, I spend my time mostly with the kernel.

> 
> If not then please do your job as maintainer and accept trivial patches
> for perfectly working drivers we have now.

I am doing my job as a maintainer right now. The driver might have been
beneficial 15 years ago, when we did not have better options, but I
would rather not continue expanding it's use.

The main problem with the driver is that the functionality it is not
easily discoverable by end users. And once you plumb it through
userspace to present users with options you might as well handle it all
in userspace.

> 
> If you want to move drivers/input into userspace I am not against it
> but I am not willing to do that for you either.

Then we are at impasse.

> 
> > 
> > What hardware do you believe would benefit from this and why?
> 
> Any touchpad hardware where you cannot press two buttons at once to
> emulate the third button due to hardware design. And any touchpad
> hardware on which some of the buttons are broken when it comes to it.
> 
> It is built into a notebook and works fine for moving the cursor but
> due to lack of usable buttons you still need a mouse to use the
> notebook.

Have you tried simply redefining keymap of your keyboard to emit
BTN_RIGHT/BTN_MIDDLE? Both atkbd and HID keyboards support keymap
updates from userspace/udev/hwdb and if there is a driver that does not
support it I will take patches fixing that.

Thanks.

-- 
Dmitry

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread christophe leroy




Le 25/05/2017 à 05:36, Balbir Singh a écrit :

Today our patching happens via direct copy and
patch_instruction. The patching code is well
contained in the sense that copying bits are limited.

While considering implementation of CONFIG_STRICT_RWX,
the first requirement is to a create another mapping
that will allow for patching. We create the window using
text_poke_area, allocated via get_vm_area(), which might
be an overkill. We can do per-cpu stuff as well. The
downside of these patches that patch_instruction is
now synchornized using a lock. Other arches do similar
things, but use fixmaps. The reason for not using
fixmaps is to make use of any randomization in the
future. The code also relies on set_pte_at and pte_clear
to do the appropriate tlb flushing.

Signed-off-by: Balbir Singh 
---
 arch/powerpc/lib/code-patching.c | 88 ++--
 1 file changed, 84 insertions(+), 4 deletions(-)



[...]


+static int kernel_map_addr(void *addr)
+{
+   unsigned long pfn;
int err;

-   __put_user_size(instr, addr, 4, err);
+   if (is_vmalloc_addr(addr))
+   pfn = vmalloc_to_pfn(addr);
+   else
+   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+
+   err = map_kernel_page((unsigned long)text_poke_area->addr,
+   (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);




Why not use PAGE_KERNEL instead of _PAGE_KERNEL_RW | _PAGE_PRESENT ?

From asm/pte-common.h :

#define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
#define _PAGE_BASE  (_PAGE_BASE_NC)
#define _PAGE_BASE_NC   (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)

Also, in pte-common.h, maybe the following defines could/should be 
reworked once you serie applied, shouldn't it ?


/* Protection used for kernel text. We want the debuggers to be able to
 * set breakpoints anywhere, so don't write protect the kernel text
 * on platforms where such control is possible.
 */
#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || 
defined(CONFIG_BDI_SWITCH) ||\

defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
#define PAGE_KERNEL_TEXTPAGE_KERNEL_X
#else
#define PAGE_KERNEL_TEXTPAGE_KERNEL_ROX
#endif


Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

2017-05-28 Thread Bastien Nocera

On Sun, 2017-05-28 at 11:47 +0200, Michal Suchanek wrote:
> On Tue, 9 May 2017 17:43:27 -0700
> Dmitry Torokhov  wrote:
> 
> > Hi Michal,
> > 
> > On Tue, May 09, 2017 at 09:14:18PM +0200, Michal Suchanek wrote:
> > > There is nothing mac-specific about this driver. Non-mac hardware
> > > with suboptimal built-in pointer devices exists.
> > > 
> > > This makes it possible to use this emulation not only on x86 and
> > > ppc
> > > notebooks but also on arm and mips.  
> > 
> > I'd rather we did not promote from drivers/macintosh to other
> > platforms, but rather removed it. The same functionality can be
> > done
> > from userspace.
> 
> What is the status of this?
> 
> Do you reply to every patch to drivers/input that is not the the core
> infrastructure that you would rather drop the driver because it can
> be
> done is in userspace?
> 
> It sure can be done. Remove everything but the bus drivers and uinput
> from drivers/input and the rest can be done in userspace.
> 
> The question is who does it?
> 
> Are you saying that you will implement the userspace equivalent?
> 
> If not then please do your job as maintainer and accept trivial
> patches
> for perfectly working drivers we have now.
> 
> If you want to move drivers/input into userspace I am not against it
> but I am not willing to do that for you either.

I'd advise you to take it down a notch. We don't go yelling at each
other on this mailing-list.

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread Balbir Singh

On Sun, 2017-05-28 at 20:00 +0200, christophe leroy wrote:
> 
> Le 25/05/2017 à 05:36, Balbir Singh a écrit :
> > Today our patching happens via direct copy and
> > patch_instruction. The patching code is well
> > contained in the sense that copying bits are limited.
> > 
> > While considering implementation of CONFIG_STRICT_RWX,
> > the first requirement is to a create another mapping
> > that will allow for patching. We create the window using
> > text_poke_area, allocated via get_vm_area(), which might
> > be an overkill. We can do per-cpu stuff as well. The
> > downside of these patches that patch_instruction is
> > now synchornized using a lock. Other arches do similar
> > things, but use fixmaps. The reason for not using
> > fixmaps is to make use of any randomization in the
> > future. The code also relies on set_pte_at and pte_clear
> > to do the appropriate tlb flushing.
> > 
> > Signed-off-by: Balbir Singh 
> > ---
> >  arch/powerpc/lib/code-patching.c | 88 
> > ++--
> >  1 file changed, 84 insertions(+), 4 deletions(-)
> > 
> 
> [...]
> 
> > +static int kernel_map_addr(void *addr)
> > +{
> > +   unsigned long pfn;
> > int err;
> > 
> > -   __put_user_size(instr, addr, 4, err);
> > +   if (is_vmalloc_addr(addr))
> > +   pfn = vmalloc_to_pfn(addr);
> > +   else
> > +   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> > +
> > +   err = map_kernel_page((unsigned long)text_poke_area->addr,
> > +   (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);
> 
> 
> 
> Why not use PAGE_KERNEL instead of _PAGE_KERNEL_RW | _PAGE_PRESENT ?
>

Will do
 
>  From asm/pte-common.h :
> 
> #define PAGE_KERNEL   __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
> #define _PAGE_BASE(_PAGE_BASE_NC)
> #define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)
> 
> Also, in pte-common.h, maybe the following defines could/should be 
> reworked once you serie applied, shouldn't it ?
> 
> /* Protection used for kernel text. We want the debuggers to be able to
>   * set breakpoints anywhere, so don't write protect the kernel text
>   * on platforms where such control is possible.
>   */
> #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || 
> defined(CONFIG_BDI_SWITCH) ||\
>   defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
> #define PAGE_KERNEL_TEXT  PAGE_KERNEL_X
> #else
> #define PAGE_KERNEL_TEXT  PAGE_KERNEL_ROX
> #endif

Yes, I did see them and I want to rework them.

Thanks,
Balbir Singh.

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread Balbir Singh

On Sun, 2017-05-28 at 17:59 +0200, christophe leroy wrote:
> 
> Le 25/05/2017 à 05:36, Balbir Singh a écrit :
> > Today our patching happens via direct copy and
> > patch_instruction. The patching code is well
> > contained in the sense that copying bits are limited.
> > 
> > While considering implementation of CONFIG_STRICT_RWX,
> > the first requirement is to a create another mapping
> > that will allow for patching. We create the window using
> > text_poke_area, allocated via get_vm_area(), which might
> > be an overkill. We can do per-cpu stuff as well. The
> > downside of these patches that patch_instruction is
> > now synchornized using a lock. Other arches do similar
> > things, but use fixmaps. The reason for not using
> > fixmaps is to make use of any randomization in the
> > future. The code also relies on set_pte_at and pte_clear
> > to do the appropriate tlb flushing.
> 
> Isn't it overkill to remap the text in another area ?
>
> Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64, 
> parisc, s390, x86/32, x86/64):
> - arm, x86/32 and x86/64 set text RW during the modification

x86 uses set_fixmap() in text_poke(), am I missing something?

> - s390 seems to uses a special instruction which bypassed write protection
> - parisc doesn't seem to implement any function which modifies kernel text.
> 
> Therefore it seems only arm64 does it via another mapping.
> Wouldn't it be lighter to just unprotect memory during the modification 
> as done on arm and x86 ?
> 

I am not sure if the trade-off is quite that simple, for security I thought

1. It would be better to randomize text_poke_area(), which is why I dynamically
allocated it. If we start randomizing get_vm_area(), we get the benefit
2. text_poke_aread() is RW and the normal text is RX, for any attack
to succeed, it would need to find text_poke_area() at the time of patching,
patch the kernel in that small window and use the normal mapping for
execution

Generally patch_instruction() is not fast path except for ftrace, tracing.
In my tests I did not find the slow down noticable

> Or another alternative could be to disable DMMU and do the write at 
> physical address ?
>

This would be worse off, I think, but we were discussing doing something
like that xmon. But for other cases, I think  it opens up a bigger window.

> Christophe

Balbir Singh

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread Balbir Singh

On Sun, 2017-05-28 at 16:29 +0200, christophe leroy wrote:
> 
> Le 25/05/2017 à 05:36, Balbir Singh a écrit :
> > Today our patching happens via direct copy and
> > patch_instruction. The patching code is well
> > contained in the sense that copying bits are limited.
> > 
> > While considering implementation of CONFIG_STRICT_RWX,
> > the first requirement is to a create another mapping
> > that will allow for patching. We create the window using
> > text_poke_area, allocated via get_vm_area(), which might
> > be an overkill. We can do per-cpu stuff as well. The
> > downside of these patches that patch_instruction is
> > now synchornized using a lock. Other arches do similar
> > things, but use fixmaps. The reason for not using
> > fixmaps is to make use of any randomization in the
> > future. The code also relies on set_pte_at and pte_clear
> > to do the appropriate tlb flushing.
> > 
> > Signed-off-by: Balbir Singh 
> 
> [...]
> 
> > +static int kernel_map_addr(void *addr)
> > +{
> > +   unsigned long pfn;
> > int err;
> > 
> > -   __put_user_size(instr, addr, 4, err);
> > +   if (is_vmalloc_addr(addr))
> > +   pfn = vmalloc_to_pfn(addr);
> > +   else
> > +   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> > +
> > +   err = map_kernel_page((unsigned long)text_poke_area->addr,
> > +   (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);
> 
> map_kernel_page() doesn't exist on powerpc32, so compilation fails.
> 
> However a similar function exists and is called map_page()
> 
> Maybe the below modification could help (not tested yet)
> 
> Christophe
>

Thanks, I'll try and get a compile, as an alternative how about

#ifdef CONFIG_PPC32
#define map_kernel_page map_page
#endif

Balbir Singh.

[PATCH V2 2/2] KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9

2017-05-28 Thread Paul Mackerras

This allows userspace (e.g. QEMU) to enable large decrementer mode
for the guest, by setting the LPCR_LD bit in the guest LPCR value.
With this, the guest exit code saves 64 bits of the guest DEC value
on exit.  Other places that use the guest DEC value check the LPCR_LD
bit in the guest LPCR value, and if it is set, omit the 32-bit sign
extension that would otherwise be done.

This doesn't change the DEC emulation used by PR KVM because PR KVM
is not supported on POWER9 yet.

This is partly based on an earlier patch by Oliver O'Halloran.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 28 
 arch/powerpc/kvm/emulate.c  |  4 ++--
 4 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 9c51ac4..3f879c8 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -579,7 +579,7 @@ struct kvm_vcpu_arch {
ulong mcsrr0;
ulong mcsrr1;
ulong mcsr;
-   u32 dec;
+   ulong dec;
 #ifdef CONFIG_BOOKE
u32 decar;
 #endif
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 42b7a4f..1f9c0ee 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1143,6 +1143,8 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 
new_lpcr,
mask = LPCR_DPFD | LPCR_ILE | LPCR_TC;
if (cpu_has_feature(CPU_FTR_ARCH_207S))
mask |= LPCR_AIL;
+   if (cpu_has_feature(CPU_FTR_LARGE_DEC))
+   mask |= LPCR_LD;
 
/* Broken 32-bit version of LPCR must not clear top bits */
if (preserve_top32)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bcb5401..e7a2c89 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -916,7 +916,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
mftbr7
subfr3,r7,r8
mtspr   SPRN_DEC,r3
-   stw r3,VCPU_DEC(r4)
+   std r3,VCPU_DEC(r4)
 
ld  r5, VCPU_SPRG0(r4)
ld  r6, VCPU_SPRG1(r4)
@@ -1030,7 +1030,13 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = 
paca */
li  r0, BOOK3S_INTERRUPT_EXTERNAL
bne cr1, 12f
mfspr   r0, SPRN_DEC
-   cmpwi   r0, 0
+BEGIN_FTR_SECTION
+   /* On POWER9 check whether the guest has large decrementer enabled */
+   andis.  r8, r8, LPCR_LD@h
+   bne 15f
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+   extsw   r0, r0
+15:cmpdi   r0, 0
li  r0, BOOK3S_INTERRUPT_DECREMENTER
bge 5f
 
@@ -1457,12 +1463,18 @@ mc_cont:
mtspr   SPRN_SPURR,r4
 
/* Save DEC */
+   ld  r3, HSTATE_KVM_VCORE(r13)
mfspr   r5,SPRN_DEC
mftbr6
+   /* On P9, if the guest has large decr enabled, don't sign extend */
+BEGIN_FTR_SECTION
+   ld  r4, VCORE_LPCR(r3)
+   andis.  r4, r4, LPCR_LD@h
+   bne 16f
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
extsw   r5,r5
-   add r5,r5,r6
+16:add r5,r5,r6
/* r5 is a guest timebase value here, convert to host TB */
-   ld  r3,HSTATE_KVM_VCORE(r13)
ld  r4,VCORE_TB_OFFSET(r3)
subfr5,r4,r5
std r5,VCPU_DEC_EXPIRES(r9)
@@ -2374,7 +2386,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
mfspr   r3, SPRN_DEC
mfspr   r4, SPRN_HDEC
mftbr5
+BEGIN_FTR_SECTION
+   /* On P9 check whether the guest has large decrementer mode enabled */
+   ld  r6, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_LPCR(r6)
+   andis.  r6, r6, LPCR_LD@h
+   bne 68f
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
extsw   r3, r3
+68:
 BEGIN_FTR_SECTION
extsw   r4, r4
 END_FTR_SECTION_IFSET(CPU_FTR_LARGE_DEC)
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index c873ffe..4d8b4d6 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -39,7 +39,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
unsigned long dec_nsec;
unsigned long long dec_time;
 
-   pr_debug("mtDEC: %x\n", vcpu->arch.dec);
+   pr_debug("mtDEC: %lx\n", vcpu->arch.dec);
hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 
 #ifdef CONFIG_PPC_BOOK3S
@@ -109,7 +109,7 @@ static int kvmppc_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
case SPRN_TBWU: break;
 
case SPRN_DEC:
-   vcpu->arch.dec = spr_val;
+   vcpu->arch.dec = (u32) spr_val;
kvmppc_emulate_dec(vcpu);
break;
 
-- 
2.7.4

[PATCH V2 0/2] KVM: PPC: Book3S HV: Support POWER9's large decrementer mode

2017-05-28 Thread Paul Mackerras

One of the new features of POWER9 is that the decrementer (the
facility that provides an interrupt after a programmable length of
time) has been increased in size from 32 bits to 56 bits, allowing
time intervals of up to about 814 days, compared to 4 seconds
previously.  This patch series adds support for the large decrementer
mode to HV KVM.  There is already code in the host kernel to enable
large decrementer mode for the host, which means that some of the KVM
entry/exit code is currently incorrect; the first patch fixes that.
The second patch allows userspace to enable large decrementer mode for
the guest, by setting the appropriate bit in the guest LPCR value.

Changes in v2: use the presence of the ibm,dec-bits property to set
the CPU_FTR_LARGE_DEC bit rather than the ibm,pa-features property,
because QEMU already sets the large decrementer bit in the
ibm,pa-features property (correctly, since ibm,pa-features describes
the capabilities of the CPU hardware, not the settings established by
the host) but does not currently enable large decrementer mode for the
guest.

Paul.
---
 arch/powerpc/include/asm/cputable.h |  4 ++-
 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kernel/prom.c  |  1 +
 arch/powerpc/kernel/time.c  |  7 ++---
 arch/powerpc/kvm/book3s_hv.c|  2 ++
 arch/powerpc/kvm/book3s_hv_interrupts.S |  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 51 ++---
 arch/powerpc/kvm/emulate.c  |  4 +--
 8 files changed, 54 insertions(+), 19 deletions(-)

[PATCH V2 1/2] KVM: PPC: Book3S HV: Cope with host using large decrementer mode

2017-05-28 Thread Paul Mackerras

POWER9 introduces a new mode for the decrementer register, called
large decrementer mode, in which the decrementer counter is 56 bits
wide rather than 32, and reads are sign-extended rather than
zero-extended.

Since KVM code reads and writes the host decrementer value in a few
places, it needs to be aware of the need to treat the decrementer
value as a 64-bit quantity, and only do a 32-bit sign extension when
large decrementer mode is not in effect.  To enable the sign extension
to be removed in large decrementer mode, we use a CPU feature bit to
indicate that large decrementer mode is in effect.  This CPU feature
bit is derived from the presence of the ibm,dec-bits property in the
cpu nodes of the firmware device tree.  This property is already set
by firmware in the device tree that the kernel uses when running as a
host.  We change the kernel timer code to use this bit and enable
large decrementer mode whenever it is set (even if firmware tells us
that the large decrementer mode only gives us 32 bits) so that we get
the sign extension in hardware.

This is partly based on an earlier patch by Oliver O'Halloran.

Cc: sta...@vger.kernel.org # v4.10+
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/cputable.h |  4 +++-
 arch/powerpc/kernel/prom.c  |  1 +
 arch/powerpc/kernel/time.c  |  7 ++-
 arch/powerpc/kvm/book3s_hv_interrupts.S |  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 23 +--
 5 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index c2d5095..99c3c56 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -216,6 +216,7 @@ enum {
 #define CPU_FTR_PMAO_BUG   LONG_ASM_CONST(0x1000)
 #define CPU_FTR_SUBCORE
LONG_ASM_CONST(0x2000)
 #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000)
+#define CPU_FTR_LARGE_DEC  LONG_ASM_CONST(0x8000)
 
 #ifndef __ASSEMBLY__
 
@@ -496,7 +497,8 @@ enum {
(CPU_FTRS_POWER4 | CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
-CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | 
CPU_FTRS_POWER9_DD1)
+CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | \
+CPU_FTRS_POWER9_DD1 | CPU_FTR_LARGE_DEC)
 #endif
 #else
 enum {
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 40c4887..987fcc5 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -259,6 +259,7 @@ static struct feature_property {
{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
{"ibm,purr", 1, CPU_FTR_PURR, 0},
{"ibm,spurr", 1, CPU_FTR_SPURR, 0},
+   {"ibm,dec-bits", 32, CPU_FTR_LARGE_DEC, 0},
 #endif /* CONFIG_PPC64 */
 };
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 2b33cfa..5d13f06 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -946,10 +946,7 @@ static void register_decrementer_clockevent(int cpu)
 
 static void enable_large_decrementer(void)
 {
-   if (!cpu_has_feature(CPU_FTR_ARCH_300))
-   return;
-
-   if (decrementer_max <= DECREMENTER_DEFAULT_MAX)
+   if (!cpu_has_feature(CPU_FTR_LARGE_DEC))
return;
 
/*
@@ -966,7 +963,7 @@ static void __init set_decrementer_max(void)
u32 bits = 32;
 
/* Prior to ISAv3 the decrementer is always 32 bit */
-   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   if (!cpu_has_feature(CPU_FTR_LARGE_DEC))
return;
 
cpu = of_find_node_by_type(NULL, "cpu");
diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S 
b/arch/powerpc/kvm/book3s_hv_interrupts.S
index 0fdc4a2..6e1d75f 100644
--- a/arch/powerpc/kvm/book3s_hv_interrupts.S
+++ b/arch/powerpc/kvm/book3s_hv_interrupts.S
@@ -124,7 +124,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mfspr   r8,SPRN_DEC
mftbr7
mtspr   SPRN_HDEC,r8
+BEGIN_FTR_SECTION
extsw   r8,r8
+END_FTR_SECTION_IFCLR(CPU_FTR_LARGE_DEC)
add r8,r8,r7
std r8,HSTATE_DECEXP(r13)
 
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bdb3f76..bcb5401 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -214,6 +214,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   /* HDEC may be larger than DEC for arch >= v3.00, but since the */
+   /* HDEC value came from DEC in the first place, it will fit */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
/*
@@ -295,8 +297,11 @@ kvm_nov

[PATCH] powerpc/64: Reclaim CPU_FTR_SUBCORE

2017-05-28 Thread Michael Ellerman

We are running low on CPU feature bits, so we only want to use them when
it's really necessary.

CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't
need it in order to make asm patching work. It can only be set on
"Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL.
There are no plans to implement it on future CPUs, but if there ever
were we could retrofit it then.

Although KVM uses subcores, it never looks at the CPU feature, it either
looks at the ISA level or the threads_per_subcore value.

So drop the CPU feature and do a PVR check instead. Drop the device tree
"subcore" feature as we no longer support doing anything with it, and we
will drop it from skiboot too.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/cputable.h  | 3 +--
 arch/powerpc/kernel/dt_cpu_ftrs.c| 1 -
 arch/powerpc/platforms/powernv/subcore.c | 8 +++-
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index c2d509584a98..d02ad93bf708 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -214,7 +214,6 @@ enum {
 #define CPU_FTR_DAWR   LONG_ASM_CONST(0x0400)
 #define CPU_FTR_DABRX  LONG_ASM_CONST(0x0800)
 #define CPU_FTR_PMAO_BUG   LONG_ASM_CONST(0x1000)
-#define CPU_FTR_SUBCORE
LONG_ASM_CONST(0x2000)
 #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000)
 
 #ifndef __ASSEMBLY__
@@ -463,7 +462,7 @@ enum {
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
-   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_SUBCORE)
+   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
 #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
 #define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 050925b5b451..d6f05e4dc328 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -642,7 +642,6 @@ static struct dt_cpu_feature_match __initdata
{"processor-control-facility", feat_enable_dbell, CPU_FTR_DBELL},
{"processor-control-facility-v3", feat_enable_dbell, CPU_FTR_DBELL},
{"processor-utilization-of-resources-register", feat_enable_purr, 0},
-   {"subcore", feat_enable, CPU_FTR_SUBCORE},
{"no-execute", feat_enable, 0},
{"strong-access-ordering", feat_enable, CPU_FTR_SAO},
{"cache-inhibited-large-page", feat_enable_large_ci, 0},
diff --git a/arch/powerpc/platforms/powernv/subcore.c 
b/arch/powerpc/platforms/powernv/subcore.c
index 0babef11136f..8c6119280c13 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -407,7 +407,13 @@ static DEVICE_ATTR(subcores_per_core, 0644,
 
 static int subcore_init(void)
 {
-   if (!cpu_has_feature(CPU_FTR_SUBCORE))
+   unsigned pvr_ver;
+
+   pvr_ver = PVR_VER(mfspr(SPRN_PVR));
+
+   if (pvr_ver != PVR_POWER8 &&
+   pvr_ver != PVR_POWER8E &&
+   pvr_ver != PVR_POWER8NVL)
return 0;
 
/*
-- 
2.7.4

[PATCH v2] spin loop primitives for busy waiting

2017-05-28 Thread Nicholas Piggin

Current busy-wait loops are implemented by repeatedly calling cpu_relax()
to give an arch option for a low-latency option to improve power and/or
SMT resource contention.

This poses some difficulties for powerpc, which has SMT priority setting
instructions (priorities determine how ifetch cycles are apportioned).
powerpc's cpu_relax() is implemented by setting a low priority then
setting normal priority. This has several problems:

 - Changing thread priority can have some execution cost and potential
   impact to other threads in the core. It's inefficient to execute them
   every time around a busy-wait loop.

 - Depending on implementation details, a `low ; medium` sequence may
   not have much if any affect. Some software with similar pattern
   actually inserts a lot of nops between, in order to cause a few fetch
   cycles with the low priority.

 - The busy-wait loop runs with regular priority. This might only be a few
   fetch cycles, but if there are several threads running such loops, they
   could cause a noticable impact on a non-idle thread.

Implement spin_begin, spin_end primitives that can be used around busy
wait loops, which default to no-ops. And spin_cpu_relax which defaults to
cpu_relax.

This will allow architectures to hook the entry and exit of busy-wait
loops, and will allow powerpc to set low SMT priority at entry, and
normal priority at exit.

Suggested-by: Linus Torvalds 
Signed-off-by: Nicholas Piggin 
---

Since last time:
- Fixed spin_do_cond with initial test as suggested by Linus.
- Renamed it to spin_until_cond, which reads a little better.

 include/linux/processor.h | 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 include/linux/processor.h

diff --git a/include/linux/processor.h b/include/linux/processor.h
new file mode 100644
index ..da0c5e56ca02
--- /dev/null
+++ b/include/linux/processor.h
@@ -0,0 +1,70 @@
+/* Misc low level processor primitives */
+#ifndef _LINUX_PROCESSOR_H
+#define _LINUX_PROCESSOR_H
+
+#include 
+
+/*
+ * spin_begin is used before beginning a busy-wait loop, and must be paired
+ * with spin_end when the loop is exited. spin_cpu_relax must be called
+ * within the loop.
+ *
+ * The loop body should be as small and fast as possible, on the order of
+ * tens of instructions/cycles as a guide. It should and avoid calling
+ * cpu_relax, or any "spin" or sleep type of primitive including nested uses
+ * of these primitives. It should not lock or take any other resource.
+ * Violations of these guidelies will not cause a bug, but may cause sub
+ * optimal performance.
+ *
+ * These loops are optimized to be used where wait times are expected to be
+ * less than the cost of a context switch (and associated overhead).
+ *
+ * Detection of resource owner and decision to spin or sleep or guest-yield
+ * (e.g., spin lock holder vcpu preempted, or mutex owner not on CPU) can be
+ * tested within the loop body.
+ */
+#ifndef spin_begin
+#define spin_begin()
+#endif
+
+#ifndef spin_cpu_relax
+#define spin_cpu_relax() cpu_relax()
+#endif
+
+/*
+ * spin_cpu_yield may be called to yield (undirected) to the hypervisor if
+ * necessary. This should be used if the wait is expected to take longer
+ * than context switch overhead, but we can't sleep or do a directed yield.
+ */
+#ifndef spin_cpu_yield
+#define spin_cpu_yield() cpu_relax_yield()
+#endif
+
+#ifndef spin_end
+#define spin_end()
+#endif
+
+/*
+ * spin_until_cond can be used to wait for a condition to become true. It
+ * may be expected that the first iteration will true in the common case
+ * (no spinning), so that callers should not require a first "likely" test
+ * for the uncontended case before using this primitive.
+ *
+ * Usage and implementation guidelines are the same as for the spin_begin
+ * primitives, above.
+ */
+#ifndef spin_until_cond
+#define spin_until_cond(cond)  \
+do {   \
+   if (unlikely(!(cond))) {\
+   spin_begin();   \
+   do {\
+   spin_cpu_relax();   \
+   } while (!(cond));  \
+   spin_end(); \
+   }   \
+} while (0)
+
+#endif
+
+#endif /* _LINUX_PROCESSOR_H */
-- 
2.11.0

[PATCH V5] hwmon: (ibmpowernv) Add highest/lowest attributes to sensors

2017-05-28 Thread Shilpasri G Bhat

OCC provides historical minimum and maximum value for the sensor
readings. This patch exports them as highest and lowest attributes
for the inband sensors copied by OCC to main memory.

Signed-off-by: Shilpasri G Bhat 
---
Changes from V4:
- Got rid of 'len' variable in populate_attr_groups

 drivers/hwmon/ibmpowernv.c | 68 +-
 1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
index 6d2e660..b562323 100644
--- a/drivers/hwmon/ibmpowernv.c
+++ b/drivers/hwmon/ibmpowernv.c
@@ -298,10 +298,14 @@ static int populate_attr_groups(struct platform_device 
*pdev)
sensor_groups[type].attr_count++;
 
/*
-* add a new attribute for labels
+* add attributes for labels, min and max
 */
if (!of_property_read_string(np, "label", &label))
sensor_groups[type].attr_count++;
+   if (of_find_property(np, "sensor-data-min", NULL))
+   sensor_groups[type].attr_count++;
+   if (of_find_property(np, "sensor-data-max", NULL))
+   sensor_groups[type].attr_count++;
}
 
of_node_put(opal);
@@ -337,6 +341,41 @@ static void create_hwmon_attr(struct sensor_data *sdata, 
const char *attr_name,
sdata->dev_attr.show = show;
 }
 
+static void populate_sensor(struct sensor_data *sdata, int od, int hd, int sid,
+   const char *attr_name, enum sensors type,
+   const struct attribute_group *pgroup,
+   ssize_t (*show)(struct device *dev,
+   struct device_attribute *attr,
+   char *buf))
+{
+   sdata->id = sid;
+   sdata->type = type;
+   sdata->opal_index = od;
+   sdata->hwmon_index = hd;
+   create_hwmon_attr(sdata, attr_name, show);
+   pgroup->attrs[sensor_groups[type].attr_count++] = &sdata->dev_attr.attr;
+}
+
+static char *get_max_attr(enum sensors type)
+{
+   switch (type) {
+   case POWER_INPUT:
+   return "input_highest";
+   default:
+   return "highest";
+   }
+}
+
+static char *get_min_attr(enum sensors type)
+{
+   switch (type) {
+   case POWER_INPUT:
+   return "input_lowest";
+   default:
+   return "lowest";
+   }
+}
+
 /*
  * Iterate through the device tree for each child of 'sensors' node, create
  * a sysfs attribute file, the file is named by translating the DT node name
@@ -417,16 +456,31 @@ static int create_device_attrs(struct platform_device 
*pdev)
 * attribute. They are related to the same
 * sensor.
 */
-   sdata[count].type = type;
-   sdata[count].opal_index = sdata[count - 1].opal_index;
-   sdata[count].hwmon_index = sdata[count - 1].hwmon_index;
 
make_sensor_label(np, &sdata[count], label);
+   populate_sensor(&sdata[count], opal_index,
+   sdata[count - 1].hwmon_index,
+   sensor_id, "label", type, pgroups[type],
+   show_label);
+   count++;
+   }
 
-   create_hwmon_attr(&sdata[count], "label", show_label);
+   if (!of_property_read_u32(np, "sensor-data-max", &sensor_id)) {
+   attr_name = get_max_attr(type);
+   populate_sensor(&sdata[count], opal_index,
+   sdata[count - 1].hwmon_index,
+   sensor_id, attr_name, type,
+   pgroups[type], show_sensor);
+   count++;
+   }
 
-   pgroups[type]->attrs[sensor_groups[type].attr_count++] =
-   &sdata[count++].dev_attr.attr;
+   if (!of_property_read_u32(np, "sensor-data-min", &sensor_id)) {
+   attr_name = get_min_attr(type);
+   populate_sensor(&sdata[count], opal_index,
+   sdata[count - 1].hwmon_index,
+   sensor_id, attr_name, type,
+   pgroups[type], show_sensor);
+   count++;
}
}
 
-- 
1.8.3.1

Re: [linux-next] PPC Lpar fail to boot with error hid: module verification failed: signature and/or required key missing - tainting kernel

2017-05-28 Thread Michael Ellerman

Rob Landley  writes:

> On 05/25/2017 04:24 PM, Stephen Rothwell wrote:
>> Hi Michael,
>> 
>> On Thu, 25 May 2017 23:02:06 +1000 Michael Ellerman  
>> wrote:
>>>
>>> It'll be:
>>>
>>> ee35011fd032 ("initramfs: make initramfs honor CONFIG_DEVTMPFS_MOUNT")
>>
>> And Andrew has asked me to drop that patch from linux-next which will
>> happen today.
>
> What approach do the kernel developers suggest I take here?

Well I'm just *a* kernel developer, but rule #1 is don't break userspace.

> I would have thought letting it soak in linux-next for a release so
> people could fix userspace bugs would be the next step, but this sounds
> like that's not an option?

You say they're userspace bugs, userspace will say it's a bug that the
kernel has changed its behaviour.

> Is the behavior the patch implements wrong?

Yes, because it breaks existing setups for no particularly good reason.

If CONFIG_DEVTMPFS_MOUNT had always meant devtmpfs was mounted in the
initramfs then that would have been fine.

But because it didn't, there are now systems out there that depend on
the existing behaviour, and changing it is therefore wrong IMHO.

As I said in another mail you can avoid breaking existing setups by
adding a new config option to control mounting devtmpfs in the
initramfs. It's a pity to need yet another config option, but such is
life.

cheers

RE: [PATCH 3/3] powerpc/8xx: xmon compile fix

2017-05-28 Thread Michael Ellerman

David Laight  writes:

> From:  Michael Ellerman
>> Sent: 26 May 2017 08:24
>> Nicholas Piggin  writes:
>> > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
>> > index f11f65634aab..438fdb0fb142 100644
>> > --- a/arch/powerpc/xmon/xmon.c
>> > +++ b/arch/powerpc/xmon/xmon.c
>> > @@ -1242,14 +1242,16 @@ bpt_cmds(void)
>> >  {
>> >int cmd;
>> >unsigned long a;
>> > -  int mode, i;
>> > +  int i;
>> >struct bpt *bp;
>> > -  const char badaddr[] = "Only kernel addresses are permitted "
>> > -  "for breakpoints\n";
>> >
>> >cmd = inchar();
>> >switch (cmd) {
>> > -#ifndef CONFIG_8xx
>> > +#ifndef CONFIG_PPC_8xx
>> > +  int mode;
>> > +  const char badaddr[] = "Only kernel addresses are permitted "
>> > +  "for breakpoints\n";
>> > +
>> >case 'd':   /* bd - hardware data breakpoint */
>> >mode = 7;
>> >cmd = inchar();
>> 
>> GCC 7 rejects this:
>> 
>>   arch/powerpc/xmon/xmon.c: In function bpt_cmds:
>>   arch/powerpc/xmon/xmon.c:1252:13: error: statement will never be executed 
>> [-Werror=switch-
>> unreachable]
>> const char badaddr[] = "Only kernel addresses are permitted for 
>> breakpoints\n";
>>^~~
>
> Try 'static' ?

Yep that works, will rebase this again ... O_o

cheers

Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-05-28 Thread Michael Ellerman

Reza Arbab  writes:

> On Fri, May 26, 2017 at 01:46:58PM +1000, Michael Ellerman wrote:
>>Reza Arbab  writes:
>>
>>> On Thu, May 25, 2017 at 04:19:53PM +1000, Michael Ellerman wrote:
The commit message for 3af229f2071f says:

In practice, we never see a system with 256 NUMA nodes, and in fact, we
do not support node hotplug on power in the first place, so the nodes
^^^
that are online when we come up are the nodes that will be present for
the lifetime of this kernel.

Is that no longer true?
>>>
>>> I don't know what the reasoning behind that statement was at the time,
>>> but as far as I can tell, the only thing missing for node hotplug now is
>>> Balbir's patchset [1]. He fixes the resource issue which motivated
>>> 3af229f2071f and reverts it.
>>>
>>> With that set, I can instantiate a new numa node just by doing
>>> add_memory(nid, ...) where nid doesn't currently exist.
>>
>>But does that actually happen on any real system?
>
> I don't know if anything currently tries to do this. My interest in 
> having this working is so that in the future, our coherent gpu memory 
> could be added as a distinct node by the device driver.

Sure. If/when that happens, we would hopefully still have some way to
limit the size of the possible map.

That would ideally be a firmware property that tells us the maximum
number of GPUs that might be hot-added, or we punt and cap it at some
"sane" maximum number.

But until that happens it's silly to say we can have up to 256 nodes
when in practice most of our systems have 8 or less.

So I'm still waiting for an explanation from Michael B on how he's
seeing this bug in practice.

cheers

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread Christophe LEROY




Le 29/05/2017 à 00:50, Balbir Singh a écrit :

On Sun, 2017-05-28 at 17:59 +0200, christophe leroy wrote:


Le 25/05/2017 à 05:36, Balbir Singh a écrit :

Today our patching happens via direct copy and
patch_instruction. The patching code is well
contained in the sense that copying bits are limited.

While considering implementation of CONFIG_STRICT_RWX,
the first requirement is to a create another mapping
that will allow for patching. We create the window using
text_poke_area, allocated via get_vm_area(), which might
be an overkill. We can do per-cpu stuff as well. The
downside of these patches that patch_instruction is
now synchornized using a lock. Other arches do similar
things, but use fixmaps. The reason for not using
fixmaps is to make use of any randomization in the
future. The code also relies on set_pte_at and pte_clear
to do the appropriate tlb flushing.


Isn't it overkill to remap the text in another area ?

Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64,
parisc, s390, x86/32, x86/64):
- arm, x86/32 and x86/64 set text RW during the modification


x86 uses set_fixmap() in text_poke(), am I missing something?



Indeed I looked how it is done in ftrace. On x86 text modifications are 
done using ftrace_write() which calls probe_kernel_write() which doesn't 
remap anything. It first calls ftrace_arch_code_modify_prepare() which 
sets the kernel text to rw.


Indeed you are right, text_poke() remaps via fixmap. However it looks 
like text_poke() is used only for kgdb and kprobe


Christophe

[PATCH] Documentation: networking: add DPAA Ethernet document

2017-05-28 Thread Madalin Bucur

Signed-off-by: Madalin Bucur 
Signed-off-by: Camelia Groza 
---
 Documentation/networking/dpaa.txt | 194 ++
 1 file changed, 194 insertions(+)
 create mode 100644 Documentation/networking/dpaa.txt

diff --git a/Documentation/networking/dpaa.txt 
b/Documentation/networking/dpaa.txt
new file mode 100644
index 000..76e016d
--- /dev/null
+++ b/Documentation/networking/dpaa.txt
@@ -0,0 +1,194 @@
+The QorIQ DPAA Ethernet Driver
+==
+
+Authors:
+Madalin Bucur 
+Camelia Groza 
+
+Contents
+
+
+   - DPAA Ethernet Overview
+   - DPAA Ethernet Supported SoCs
+   - Configuring DPAA Ethernet in your kernel
+   - DPAA Ethernet Frame Processing
+   - DPAA Ethernet Features
+   - Debugging
+
+DPAA Ethernet Overview
+==
+
+DPAA stands for Data Path Acceleration Architecture and it is a
+set of networking acceleration IPs that are available on several
+generations of SoCs, both on PowerPC and ARM64.
+
+The Freescale DPAA architecture consists of a series of hardware blocks
+that support Ethernet connectivity. The Ethernet driver depends upon the
+following drivers in the Linux kernel:
+
+ - Peripheral Access Memory Unit (PAMU) (* needed only for PPC platforms)
+drivers/iommu/fsl_*
+ - Frame Manager (FMan)
+drivers/net/ethernet/freescale/fman
+ - Queue Manager (QMan), Buffer Manager (BMan)
+drivers/soc/fsl/qbman
+
+A simplified view of the dpaa_eth interfaces mapped to FMan MACs:
+
+  dpaa_eth   /eth0\ ...   /ethN\
+  driver|  | |  |
+  -      ---      -
+   -Ports  / Tx  Rx \.../ Tx  Rx \
+  FMan|  | |  |
+   -MACs  |   MAC0   | |   MACN   |
+ /   dtsec0   \  ...  /   dtsecN   \ (or tgec)
+/  \ /  \(or memac)
+  -  --  ---  --  -
+  FMan, FMan Port, FMan SP, FMan MURAM drivers
+  -
+  FMan HW blocks: MURAM, MACs, Ports, SP
+  -
+
+The dpaa_eth relation to the QMan, BMan and FMan:
+  
+  dpaa_eth   /eth0\
+  driver/  \
+  -   -^-   -^-   -^-   ----
+  QMan driver / \   / \   / \  \   /  | BMan|
+ |Rx | |Rx | |Tx | |Tx |  | driver  |
+  -  |Dfl| |Err| |Cnf| |FQs|  | |
+  QMan HW|FQ | |FQ | |FQs| |   |  | |
+ /   \ /   \ /   \  \ /   | |
+  -   ---   ---   ---   -v--
+|FMan QMI | |
+| FMan HW   FMan BMI  | BMan HW |
+  ---   
+
+where the acronyms used above (and in the code) are:
+DPAA = Data Path Acceleration Architecture
+FMan = DPAA Frame Manager
+QMan = DPAA Queue Manager
+BMan = DPAA Buffers Manager
+QMI = QMan interface in FMan
+BMI = BMan interface in FMan
+FMan SP = FMan Storage Profiles
+MURAM = Multi-user RAM in FMan
+FQ = QMan Frame Queue
+Rx Dfl FQ = default reception FQ
+Rx Err FQ = Rx error frames FQ
+Tx Cnf FQ = Tx confirmation FQs
+Tx FQs = transmission frame queues
+dtsec = datapath three speed Ethernet controller (10/100/1000 Mbps)
+tgec = ten gigabit Ethernet controller (10 Gbps)
+memac = multirate Ethernet MAC (10/100/1000/1)
+
+DPAA Ethernet Supported SoCs
+
+
+The DPAA drivers enable the Ethernet controllers present on the following SoCs:
+
+# PPC
+P1023
+P2041
+P3041
+P4080
+P5020
+P5040
+T1023
+T1024
+T1040
+T1042
+T2080
+T4240
+B4860
+
+# ARM
+LS1043A
+LS1046A
+
+Configuring DPAA Ethernet in your kernel
+
+
+To enable the DPAA Ethernet driver, the following Kconfig options are required:
+
+# common for arch/arm64 and arch/powerpc platforms
+CONFIG_FSL_DPAA=y
+CONFIG_FSL_FMAN=y
+CONFIG_FSL_DPAA_ETH=y
+CONFIG_FSL_XGMAC_MDIO=y
+
+# for arch/powerpc only
+CONFIG_FSL_PAMU=y
+
+# common options needed for the PHYs used on the RDBs
+CONFIG_VITESSE_PHY=y
+CONFIG_REALTEK_PHY=y
+CONFIG_AQUANTIA_PHY=y
+
+DPAA Ethernet Frame Processing
+==
+
+On Rx, buffers for the incoming frames are retrieved from one of the three
+existing buffers pools. The driver initializes and seeds these, each with
+buffers of different sizes: 1KB, 2KB and 4KB.
+
+On Tx, all transmitted frames are returned to the driver through Tx
+confirmation frame queues. The driver is then responsible for freeing the
+buffers. In order to do this properly, a backpointer is added to the buffer
+before transmission that points to the skb. When the buffer returns to the
+driver on a confirmation FQ, the skb can be correctly consumed.
+
+DPAA Ethernet Features
+==
+
+C

[PATCH] powerpc/64s: machine check handle ifetch from foreign real address for POWER9

2017-05-28 Thread Nicholas Piggin

The i-side 0111b case was missed by 7b9f71f974 ("powerpc/64s: POWER9
machine check handler").

It is possible to trigger this exception by branching to a foreign real
address (bits [8:12] != 0) with instruction relocation off, and verify
the exception cause is found after this patch.

Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler")
Reported-by: Mahesh Salgaonkar 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/mce.h  | 15 ---
 arch/powerpc/kernel/mce.c   |  1 +
 arch/powerpc/kernel/mce_power.c |  3 +++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 81eff8631434..190d69a7f701 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -90,13 +90,14 @@ enum MCE_UserErrorType {
 enum MCE_RaErrorType {
MCE_RA_ERROR_INDETERMINATE = 0,
MCE_RA_ERROR_IFETCH = 1,
-   MCE_RA_ERROR_PAGE_TABLE_WALK_IFETCH = 2,
-   MCE_RA_ERROR_PAGE_TABLE_WALK_IFETCH_FOREIGN = 3,
-   MCE_RA_ERROR_LOAD = 4,
-   MCE_RA_ERROR_STORE = 5,
-   MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE = 6,
-   MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE_FOREIGN = 7,
-   MCE_RA_ERROR_LOAD_STORE_FOREIGN = 8,
+   MCE_RA_ERROR_IFETCH_FOREIGN = 2,
+   MCE_RA_ERROR_PAGE_TABLE_WALK_IFETCH = 3,
+   MCE_RA_ERROR_PAGE_TABLE_WALK_IFETCH_FOREIGN = 4,
+   MCE_RA_ERROR_LOAD = 5,
+   MCE_RA_ERROR_STORE = 6,
+   MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE = 7,
+   MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE_FOREIGN = 8,
+   MCE_RA_ERROR_LOAD_STORE_FOREIGN = 9,
 };
 
 enum MCE_LinkErrorType {
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 5f9eada3519b..92f185875694 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -268,6 +268,7 @@ void machine_check_print_event_info(struct 
machine_check_event *evt,
static const char *mc_ra_types[] = {
"Indeterminate",
"Instruction fetch (bad)",
+   "Instruction fetch (foreign)",
"Page table walk ifetch (bad)",
"Page table walk ifetch (foreign)",
"Load (bad)",
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index f913139bb0c2..d24e689e893f 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -236,6 +236,9 @@ static const struct mce_ierror_table mce_p9_ierror_table[] 
= {
 { 0x081c, 0x0018, true,
   MCE_ERROR_TYPE_UE,  MCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH,
   MCE_INITIATOR_CPU,  MCE_SEV_ERROR_SYNC, },
+{ 0x081c, 0x001c, true,
+  MCE_ERROR_TYPE_RA,  MCE_RA_ERROR_IFETCH_FOREIGN,
+  MCE_INITIATOR_CPU,  MCE_SEV_ERROR_SYNC, },
 { 0x081c, 0x0800, true,
   MCE_ERROR_TYPE_LINK,MCE_LINK_ERROR_IFETCH_TIMEOUT,
   MCE_INITIATOR_CPU,  MCE_SEV_ERROR_SYNC, },
-- 
2.11.0

[RFC] powerpc/powernv: machine check use kernel crash path

2017-05-28 Thread Nicholas Piggin

Use the normal kernel crash path in more cases (whenever we're not
the init task), because it generally leads to much better Linux crash
information.

POWER9 has introduced more machine check conditions that can be
triggered by programming errors (as opposed to hardware errors),
which need to be debugged in Linux.

It's unclear what the best way is to do this. Do we need to base the
behaviour on the type of error? That might be impossible to do really
well because some types of errors (e.g., translation multi hits) can
be caused by software or hardware failures. Best would be to do
something that works well for both.

So what does BMC/OCC need here? Should we plumb
OPAL_REBOOT_PLATFORM_ERROR into the generic crash path somehow (to be
triggered by a special case of die()/panic()?

This patch is just an RFC only, but when I test triggering a 0111b
error from (kernel) process context after the previous patch, this
patch changes the result from taking down the system with:

w8l login: Severe Machine check interrupt [Not recovered]
  NIP []: 0x
  Initiator: CPU
  Error type: Real address [Instruction fetch (foreign)]
[  127.426651616,0] OPAL: Reboot requested due to Platform error.
Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. 
address: 
opal: Reboot type 1 not supported
Kernel panic - not syncing: PowerNV Unrecovered Machine Check
CPU: 56 PID: 4425 Comm: syscall Tainted: G   M
4.12.0-rc1-13857-ga4700a261072-dirty #35
Call Trace:
[  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to buggy/mising 
code in OPAL for this BMCRebooting in 10 seconds..
Trying to free IRQ 496 from IRQ context!

To killing the process and continuing with:

w8l login: Severe Machine check interrupt [Not recovered]
  NIP []: 0x
  Initiator: CPU
  Error type: Real address [Instruction fetch (foreign)]
Effective address: 
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm_hv kvm 
iptable_filter binfmt_misc vmx_crypto ip_tables x_tables autofs4 crc32c_vpmsum
CPU: 22 PID: 4436 Comm: syscall Tainted: G   M
4.12.0-rc1-13857-ga4700a261072-dirty #36
task: c0093230 task.stack: c0093238
NIP:  LR: 217706a4 CTR: 
REGS: cfc8fd80 TRAP: 0200   Tainted: G   M 
(4.12.0-rc1-13857-ga4700a261072-dirty)
MSR: 901c1003 
  CR: 24000484  XER: 2000
CFAR: c0004c80 DAR: 21770a90 DSISR: 0a00 SOFTE: 1
GPR00: 1ebe 7fffce4818b0 21797f00 
GPR04: 7fff8007ac24 44000484 4000 7fff801405e8
GPR08: 9280f033 24000484  0030
GPR12: 90001003 7fff801bc370  
GPR16:    
GPR20:    
GPR24:    
GPR28: 7fff801b  217707a0 7fffce481918
NIP [] 0x
LR [217706a4] 0x217706a4
Call Trace:
Instruction dump:
       
       
---[ end trace 32ae1dabb4f8dae6 ]---
---
 arch/powerpc/platforms/powernv/opal.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 59684b4af4d1..67df76ac1fba 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "powernv.h"
 
@@ -407,16 +408,28 @@ static int opal_recover_mce(struct pt_regs *regs,
/* Fatal machine check */
pr_err("Machine check interrupt is fatal\n");
recovered = 0;
-   } else if ((evt->severity == MCE_SEV_ERROR_SYNC) &&
-   (user_mode(regs) && !is_global_init(current))) {
+   } else if ((evt->severity == MCE_SEV_ERROR_SYNC)
+   && !is_global_init(current)) {
/*
-* For now, kill the task if we have received exception when
-* in userspace.
-*
-* TODO: Queue up this address for hwpoisioning later.
+* Try to kill processes if we get a synchronous machine check
+* and are not "init" (see opal_machine_check() comment about
+* not going via normal

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

2017-05-28 Thread Christophe LEROY




Le 29/05/2017 à 00:58, Balbir Singh a écrit :

On Sun, 2017-05-28 at 16:29 +0200, christophe leroy wrote:


Le 25/05/2017 à 05:36, Balbir Singh a écrit :

Today our patching happens via direct copy and
patch_instruction. The patching code is well
contained in the sense that copying bits are limited.

While considering implementation of CONFIG_STRICT_RWX,
the first requirement is to a create another mapping
that will allow for patching. We create the window using
text_poke_area, allocated via get_vm_area(), which might
be an overkill. We can do per-cpu stuff as well. The
downside of these patches that patch_instruction is
now synchornized using a lock. Other arches do similar
things, but use fixmaps. The reason for not using
fixmaps is to make use of any randomization in the
future. The code also relies on set_pte_at and pte_clear
to do the appropriate tlb flushing.

Signed-off-by: Balbir Singh 


[...]


+static int kernel_map_addr(void *addr)
+{
+   unsigned long pfn;
int err;

-   __put_user_size(instr, addr, 4, err);
+   if (is_vmalloc_addr(addr))
+   pfn = vmalloc_to_pfn(addr);
+   else
+   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+
+   err = map_kernel_page((unsigned long)text_poke_area->addr,
+   (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);


map_kernel_page() doesn't exist on powerpc32, so compilation fails.

However a similar function exists and is called map_page()

Maybe the below modification could help (not tested yet)

Christophe



Thanks, I'll try and get a compile, as an alternative how about

#ifdef CONFIG_PPC32
#define map_kernel_page map_page
#endif



My preference goes to renaming the PPC32 function, first because the 
PPC64 name fits better, second because too many defines kills 
readability, third because two functions doing the same thing are worth 
being called the same, and fourth because we surely have opportunity to 
merge both functions on day.


Christophe

Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

Re: [PATCH] macintosh: move mac_hid driver to input/mouse.

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

[PATCH V2 2/2] KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9

[PATCH V2 0/2] KVM: PPC: Book3S HV: Support POWER9's large decrementer mode

[PATCH V2 1/2] KVM: PPC: Book3S HV: Cope with host using large decrementer mode

[PATCH] powerpc/64: Reclaim CPU_FTR_SUBCORE

[PATCH v2] spin loop primitives for busy waiting

[PATCH V5] hwmon: (ibmpowernv) Add highest/lowest attributes to sensors

Re: [linux-next] PPC Lpar fail to boot with error hid: module verification failed: signature and/or required key missing - tainting kernel

RE: [PATCH 3/3] powerpc/8xx: xmon compile fix

Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

[PATCH] Documentation: networking: add DPAA Ethernet document

[PATCH] powerpc/64s: machine check handle ifetch from foreign real address for POWER9

[RFC] powerpc/powernv: machine check use kernel crash path

Re: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching

23 matches

Site Navigation

Mail list logo

Footer information