date:20180102

Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Michael Ellerman

Christoph Hellwig  writes:

> We want to use the dma_direct_ namespace for a generic implementation,
> so rename powerpc to the second best choice: dma_nommu_.

I'm not a fan of "nommu". Some of the users of direct ops *are* using an
IOMMU, they're just setting up a 1:1 mapping once at init time, rather
than mapping dynamically.

Though I don't have a good idea for a better name, maybe "1to1",
"linear", "premapped" ?

cheers

Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Geert Uytterhoeven

On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman  wrote:
> Christoph Hellwig  writes:
>
>> We want to use the dma_direct_ namespace for a generic implementation,
>> so rename powerpc to the second best choice: dma_nommu_.
>
> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
> than mapping dynamically.
>
> Though I don't have a good idea for a better name, maybe "1to1",
> "linear", "premapped" ?

"identity"?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 29/67] dma-direct: use node local allocations for coherent memory

2018-01-02 Thread Geert Uytterhoeven

Missing patch description explaining why this change is desirable.

On Fri, Dec 29, 2017 at 9:18 AM, Christoph Hellwig  wrote:
> --- a/lib/dma-direct.c
> +++ b/lib/dma-direct.c
> @@ -39,7 +39,7 @@ static void *dma_direct_alloc(struct device *dev, size_t 
> size,
> if (gfpflags_allow_blocking(gfp))
> page = dma_alloc_from_contiguous(dev, count, page_order, gfp);
> if (!page)
> -   page = alloc_pages(gfp, page_order);
> +   page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
> if (!page)
> return NULL;
>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 02/67] alpha: mark jensen as broken

2018-01-02 Thread Geert Uytterhoeven

Hi Christoph,

On Fri, Dec 29, 2017 at 9:18 AM, Christoph Hellwig  wrote:
> CONFIG_ALPHA_JENSEN has failed to compile since commit aca05038
> ("alpha/dma: use common noop dma ops"), so mark it as broken.

unknown revision or path not in the working tree.
Ah, you dropped the leading "6":
6aca0503847f6329460b15b3ab2b0e30bb752793
is less than 2 years old, though.

>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/alpha/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
> index b31b974a03cb..e96adcbcab41 100644
> --- a/arch/alpha/Kconfig
> +++ b/arch/alpha/Kconfig
> @@ -209,6 +209,7 @@ config ALPHA_EIGER
>
>  config ALPHA_JENSEN
> bool "Jensen"
> +   depends on BROKEN
> help
>   DEC PC 150 AXP (aka Jensen): This is a very old Digital system - one
>   of the first-generation Alpha systems. A number of these systems

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 22/67] dma-mapping: clear harmful GFP_* flags in common code

2018-01-02 Thread Geert Uytterhoeven

On Fri, Dec 29, 2017 at 9:18 AM, Christoph Hellwig  wrote:
> Life the code from x86 so that we behave consistently.  In the future we
> should probably warn if any of these is set.
>
> Signed-off-by: Christoph Hellwig 

For m68k:
Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 05/67] dma-mapping: replace PCI_DMA_BUS_IS_PHYS with a flag in struct dma_map_ops

2018-01-02 Thread Geert Uytterhoeven

On Fri, Dec 29, 2017 at 9:18 AM, Christoph Hellwig  wrote:
> The current PCI_DMA_BUS_IS_PHYS decided if a dma implementation is bound
> by the dma mask in the device because it directly maps to a physical
> address range (modulo an offset in the device), or if it is virtualized
> by an iommu and can map any address (that includes virtual iommus like
> swiotlb).  The problem with this scheme is that it is per-architecture and
> not per dma_ops instance, and we are growing more and more setups that
> have multiple different dma operations in use on a single system, for
> which this scheme can't provide a correct answer.  Depending on the
> architecture that means we either get a false positive or false negative
> at the moment.
>
> This patch instead extents the is_phys flag in struct dma_map_ops that
> is currently only used by a few architectures to be used tree wide.
>
> Note that this means that we now need a struct device parent in the
> Scsi_Host or netdevice.  Every modern driver has these, but there might
> still be a few outdated legacy drivers out there, which now won't make
> an intelligent decision.
>
> Signed-off-by: Christoph Hellwig 

For m68k:
Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH RESEND 1/1] KVM: PPC: Book3S: Add MMIO emulation for VMX instructions

2018-01-02 Thread Michael Ellerman

Jose Ricardo Ziviani  writes:

> This patch provides the MMIO load/store vector indexed
> X-Form emulation.
>
> Instructions implemented: lvx, stvx
>
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  arch/powerpc/include/asm/kvm_host.h   |   2 +
>  arch/powerpc/include/asm/kvm_ppc.h|   4 +
>  arch/powerpc/include/asm/ppc-opcode.h |   6 ++
>  arch/powerpc/kvm/emulate_loadstore.c  |  32 +++
>  arch/powerpc/kvm/powerpc.c| 162 
> ++
>  5 files changed, 189 insertions(+), 17 deletions(-)

KVM patches should be Cc'ed to kvm-...@vger.kernel.org, that way Paul
will see them in his patchwork at:

  http://patchwork.ozlabs.org/project/kvm-ppc/list/

cheers

[PATCH] selftests/powerpc: Add a test of SEGV error behaviour

2018-01-02 Thread Michael Ellerman

Add a test case of the error code reported when we take a SEGV on a
mapped but inaccessible area. We broke this recently.

Based on a test case from John Sperbeck .

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/powerpc/mm/.gitignore|  3 +-
 tools/testing/selftests/powerpc/mm/Makefile  |  2 +-
 tools/testing/selftests/powerpc/mm/segv_errors.c | 78 
 3 files changed, 81 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/segv_errors.c

diff --git a/tools/testing/selftests/powerpc/mm/.gitignore 
b/tools/testing/selftests/powerpc/mm/.gitignore
index e715a3f2fbf4..7d7c42ed6de9 100644
--- a/tools/testing/selftests/powerpc/mm/.gitignore
+++ b/tools/testing/selftests/powerpc/mm/.gitignore
@@ -1,4 +1,5 @@
 hugetlb_vs_thp_test
 subpage_prot
 tempfile
-prot_sao
\ No newline at end of file
+prot_sao
+segv_errors
\ No newline at end of file
diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
b/tools/testing/selftests/powerpc/mm/Makefile
index bf315bcbe663..8ebbe96d80a8 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -2,7 +2,7 @@
 noarg:
$(MAKE) -C ../
 
-TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao
+TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors
 TEST_GEN_FILES := tempfile
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/mm/segv_errors.c 
b/tools/testing/selftests/powerpc/mm/segv_errors.c
new file mode 100644
index ..06ae76ee3ea1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/segv_errors.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2017 John Sperbeck
+ *
+ * Test that an access to a mapped but inaccessible area causes a SEGV and
+ * reports si_code == SEGV_ACCERR.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+static bool faulted;
+static int si_code;
+
+static void segv_handler(int n, siginfo_t *info, void *ctxt_v)
+{
+   ucontext_t *ctxt = (ucontext_t *)ctxt_v;
+   struct pt_regs *regs = ctxt->uc_mcontext.regs;
+
+   faulted = true;
+   si_code = info->si_code;
+   regs->nip += 4;
+}
+
+int test_segv_errors(void)
+{
+   struct sigaction act = {
+   .sa_sigaction = segv_handler,
+   .sa_flags = SA_SIGINFO,
+   };
+   char c, *p = NULL;
+
+   p = mmap(NULL, getpagesize(), 0, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+   FAIL_IF(p == MAP_FAILED);
+
+   FAIL_IF(sigaction(SIGSEGV, &act, NULL) != 0);
+
+   faulted = false;
+   si_code = 0;
+
+   /*
+* We just need a compiler barrier, but mb() works and has the nice
+* property of being easy to spot in the disassembly.
+*/
+   mb();
+   c = *p;
+   mb();
+
+   FAIL_IF(!faulted);
+   FAIL_IF(si_code != SEGV_ACCERR);
+
+   faulted = false;
+   si_code = 0;
+
+   mb();
+   *p = c;
+   mb();
+
+   FAIL_IF(!faulted);
+   FAIL_IF(si_code != SEGV_ACCERR);
+
+   return 0;
+}
+
+int main(void)
+{
+   return test_harness(test_segv_errors, "segv_errors");
+}
-- 
2.14.3

Re: [PATCH v2 1/1] powerpc/pseries: increase pseries_dlpar_init initcall priority

2018-01-02 Thread Michael Ellerman

Jose Ricardo Ziviani  writes:

> The hotplug engine uses its own workqueue to handle IRQ requests, the
> problem is that such workqueue is initialized after init_ras_IRQ, which
> will cause a kernel panic if any hotplug interruption is issued in that
> period of time.
>
> This patch changes the dlpar initcall registration to make sure it will
> be initialized before init_ras_IRQ.

Sorry I know this is already v2, but I don't think this is the best fix.

There's a dependency between the registration of the IRQ in the RAS
code, and the creation of the work queue in the DLPAR code, but it's
currently not explicit. That's the bug. So it'd be better to just make
it explicit.

As a bonus we can add actual error checking of the workqueue allocation.

Something like below, can you test it please?

cheers

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 6e35780c5962..dd8b29e58a98 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -574,11 +574,26 @@ static ssize_t dlpar_show(struct class *class, struct 
class_attribute *attr,
 
 static CLASS_ATTR_RW(dlpar);
 
-static int __init pseries_dlpar_init(void)
+int __init dlpar_workqueue_init(void)
 {
+   if (pseries_hp_wq)
+   return 0;
+
pseries_hp_wq = alloc_workqueue("pseries hotplug workqueue",
WQ_UNBOUND, 1);
+
+   return pseries_hp_wq ? 0 : -ENOMEM;
+}
+
+static int __init dlpar_sysfs_init(void)
+{
+   int rc;
+
+   rc = dlpar_workqueue_init();
+   if (rc)
+   return rc;
+
return sysfs_create_file(kernel_kobj, &class_attr_dlpar.attr);
 }
-machine_device_initcall(pseries, pseries_dlpar_init);
+machine_device_initcall(pseries, dlpar_sysfs_init);
 
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 4470a3194311..1ae1d9f4dbe9 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -98,4 +98,6 @@ static inline unsigned long cmo_get_page_size(void)
return CMO_PageSize;
 }
 
+int dlpar_workqueue_init(void);
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index 4923ffe230cf..879a92327010 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -69,8 +69,9 @@ static int __init init_ras_IRQ(void)
/* Hotplug Events */
np = of_find_node_by_path("/event-sources/hot-plug-events");
if (np != NULL) {
-   request_event_sources_irqs(np, ras_hotplug_interrupt,
-  "RAS_HOTPLUG");
+   if (dlpar_workqueue_init() == 0)
+   request_event_sources_irqs(np, ras_hotplug_interrupt,
+   "RAS_HOTPLUG");
of_node_put(np);
}

Re: [PATCH 25/67] dma-direct: rename dma_noop to dma_direct

2018-01-02 Thread Vladimir Murzin

On 29/12/17 08:18, Christoph Hellwig wrote:
> The trivial direct mapping implementation already does a virtual to
> physical translation which isn't strictly a noop, and will soon learn
> to do non-direct but linear physical to dma translations through the
> device offset and a few small tricks.  Rename it to a better fitting
> name.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  MAINTAINERS|  2 +-
>  arch/arm/Kconfig   |  2 +-
>  arch/arm/include/asm/dma-mapping.h |  2 +-
>  arch/arm/mm/dma-mapping-nommu.c|  8 
>  arch/m32r/Kconfig  |  2 +-
>  arch/riscv/Kconfig |  2 +-
>  arch/s390/Kconfig  |  2 +-
>  include/asm-generic/dma-mapping.h  |  2 +-
>  include/linux/dma-mapping.h|  2 +-
>  lib/Kconfig|  2 +-
>  lib/Makefile   |  2 +-
>  lib/{dma-noop.c => dma-direct.c}   | 35 +++
>  12 files changed, 29 insertions(+), 34 deletions(-)
>  rename lib/{dma-noop.c => dma-direct.c} (53%)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a8b35d9f41b2..b4005fe06e4c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4336,7 +4336,7 @@ T:  git 
> git://git.infradead.org/users/hch/dma-mapping.git
>  W:   http://git.infradead.org/users/hch/dma-mapping.git
>  S:   Supported
>  F:   lib/dma-debug.c
> -F:   lib/dma-noop.c
> +F:   lib/dma-direct.c
>  F:   lib/dma-virt.c
>  F:   drivers/base/dma-mapping.c
>  F:   drivers/base/dma-coherent.c
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 00d889a37965..430a0aa710d6 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -25,7 +25,7 @@ config ARM
>   select CLONE_BACKWARDS
>   select CPU_PM if (SUSPEND || CPU_IDLE)
>   select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
> - select DMA_NOOP_OPS if !MMU
> + select DMA_DIRECT_OPS if !MMU
>   select EDAC_SUPPORT
>   select EDAC_ATOMIC_SCRUB
>   select GENERIC_ALLOCATOR
> diff --git a/arch/arm/include/asm/dma-mapping.h 
> b/arch/arm/include/asm/dma-mapping.h
> index e5d9020c9ee1..8436f6ade57d 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type 
> *bus)
>  {
> - return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_noop_ops;
> + return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops;
>  }
>  
>  #ifdef __arch_page_to_dma
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index 1cced700e45a..49e9831dc0f1 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -22,7 +22,7 @@
>  #include "dma.h"
>  
>  /*
> - *  dma_noop_ops is used if
> + *  dma_direct_ops is used if
>   *   - MMU/MPU is off
>   *   - cpu is v7m w/o cache support
>   *   - device is coherent
> @@ -39,7 +39,7 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t 
> size,
>unsigned long attrs)
>  
>  {
> - const struct dma_map_ops *ops = &dma_noop_ops;
> + const struct dma_map_ops *ops = &dma_direct_ops;
>   void *ret;
>  
>   /*
> @@ -70,7 +70,7 @@ static void arm_nommu_dma_free(struct device *dev, size_t 
> size,
>  void *cpu_addr, dma_addr_t dma_addr,
>  unsigned long attrs)
>  {
> - const struct dma_map_ops *ops = &dma_noop_ops;
> + const struct dma_map_ops *ops = &dma_direct_ops;
>  
>   if (attrs & DMA_ATTR_NON_CONSISTENT) {
>   ops->free(dev, size, cpu_addr, dma_addr, attrs);
> @@ -214,7 +214,7 @@ EXPORT_SYMBOL(arm_nommu_dma_ops);
>  
>  static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
>  {
> - return coherent ? &dma_noop_ops : &arm_nommu_dma_ops;
> + return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
>  }
>  
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> diff --git a/arch/m32r/Kconfig b/arch/m32r/Kconfig
> index 498398d915c1..dd84ee194579 100644
> --- a/arch/m32r/Kconfig
> +++ b/arch/m32r/Kconfig
> @@ -19,7 +19,7 @@ config M32R
>   select MODULES_USE_ELF_RELA
>   select HAVE_DEBUG_STACKOVERFLOW
>   select CPU_NO_EFFICIENT_FFS
> - select DMA_NOOP_OPS
> + select DMA_DIRECT_OPS
>   select ARCH_NO_COHERENT_DMA_MMAP if !MMU
>  
>  config SBUS
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 2c6adf12713a..865e14f50c14 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -83,7 +83,7 @@ config PGTABLE_LEVELS
>  config HAVE_KPROBES
>   def_bool n
>  
> -config DMA_NOOP_OPS
> +config DMA_DIRECT_OPS
>   def_bool y
>  
>  menu "Platform type"
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 829c67986db7..9376637229c9 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -140,7 +140

Re: [PATCH 26/67] dma-direct: use phys_to_dma

2018-01-02 Thread Vladimir Murzin

On 29/12/17 08:18, Christoph Hellwig wrote:
> This means it uses whatever linear remapping scheme that the architecture
> provides is used in the generic dma_direct ops.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  lib/dma-direct.c | 18 +++---
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/dma-direct.c b/lib/dma-direct.c
> index 439db40854b7..0e087650e86b 100644
> --- a/lib/dma-direct.c
> +++ b/lib/dma-direct.c
> @@ -1,12 +1,11 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
> - *   lib/dma-noop.c
> - *
> - * DMA operations that map to physical addresses without flushing memory.
> + * DMA operations that map physical memory directly without using an IOMMU or
> + * flushing caches.
>   */
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  
> @@ -17,7 +16,7 @@ static void *dma_direct_alloc(struct device *dev, size_t 
> size,
>  
>   ret = (void *)__get_free_pages(gfp, get_order(size));
>   if (ret)
> - *dma_handle = virt_to_phys(ret) - PFN_PHYS(dev->dma_pfn_offset);
> + *dma_handle = phys_to_dma(dev, virt_to_phys(ret));
>  
>   return ret;
>  }
> @@ -32,7 +31,7 @@ static dma_addr_t dma_direct_map_page(struct device *dev, 
> struct page *page,
>   unsigned long offset, size_t size, enum dma_data_direction dir,
>   unsigned long attrs)
>  {
> - return page_to_phys(page) + offset - PFN_PHYS(dev->dma_pfn_offset);
> + return phys_to_dma(dev, page_to_phys(page)) + offset;
>  }
>  
>  static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
> @@ -42,12 +41,9 @@ static int dma_direct_map_sg(struct device *dev, struct 
> scatterlist *sgl,
>   struct scatterlist *sg;
>  
>   for_each_sg(sgl, sg, nents, i) {
> - dma_addr_t offset = PFN_PHYS(dev->dma_pfn_offset);
> - void *va;
> -
>   BUG_ON(!sg_page(sg));
> - va = sg_virt(sg);
> - sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va) - offset;
> +
> + sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg));
>   sg_dma_len(sg) = sg->length;
>   }
>  
> 

>From ARM NOMMU perspective

Reviewed-by: Vladimir Murzin 

Thanks
Vladimir

Re: [PATCH 30/67] dma-direct: retry allocations using GFP_DMA for small masks

2018-01-02 Thread Vladimir Murzin

On 29/12/17 08:18, Christoph Hellwig wrote:
> If we got back an allocation that wasn't inside the support coherent mask,
> retry the allocation using GFP_DMA.
> 
> Based on the x86 code.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  lib/dma-direct.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dma-direct.c b/lib/dma-direct.c
> index ab81de3ac1d3..f8467cb3d89a 100644
> --- a/lib/dma-direct.c
> +++ b/lib/dma-direct.c
> @@ -28,6 +28,11 @@ check_addr(struct device *dev, dma_addr_t dma_addr, size_t 
> size,
>   return true;
>  }
>  
> +static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t 
> size)
> +{
> + return phys_to_dma(dev, phys) + size <= dev->coherent_dma_mask;

Shouldn't it be: phys_to_dma(dev, phys) + size - 1 <= dev->coherent_dma_mask ?

> +}
> +
>  static void *dma_direct_alloc(struct device *dev, size_t size,
>   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
>  {
> @@ -35,11 +40,29 @@ static void *dma_direct_alloc(struct device *dev, size_t 
> size,
>   int page_order = get_order(size);
>   struct page *page = NULL;
>  
> +again:
>   /* CMA can be used only in the context which permits sleeping */
> - if (gfpflags_allow_blocking(gfp))
> + if (gfpflags_allow_blocking(gfp)) {
>   page = dma_alloc_from_contiguous(dev, count, page_order, gfp);
> + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> + dma_release_from_contiguous(dev, page, count);
> + page = NULL;
> + }
> + }
>   if (!page)
>   page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
> +
> + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> + __free_pages(page, page_order);
> + page = NULL;
> +
> + if (dev->coherent_dma_mask < DMA_BIT_MASK(32) &&
> + !(gfp & GFP_DMA)) {
> + gfp = (gfp & ~GFP_DMA32) | GFP_DMA;
> + goto again;

Shouldn't we limit number of attempts?

Thanks
Vladimir

Re: [PATCH 31/67] dma-direct: make dma_direct_{alloc, free} available to other implementations

2018-01-02 Thread Vladimir Murzin

On 29/12/17 08:18, Christoph Hellwig wrote:
> So that they don't need to indirect through the operation vector.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/mm/dma-mapping-nommu.c | 9 +++--
>  include/linux/dma-direct.h  | 5 +
>  lib/dma-direct.c| 6 +++---
>  3 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index 49e9831dc0f1..b4cf3e4e9d4a 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -11,7 +11,7 @@
>  
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  
>  #include 
> @@ -39,7 +39,6 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t 
> size,
>unsigned long attrs)
>  
>  {
> - const struct dma_map_ops *ops = &dma_direct_ops;
>   void *ret;
>  
>   /*
> @@ -48,7 +47,7 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t 
> size,
>*/
>  
>   if (attrs & DMA_ATTR_NON_CONSISTENT)
> - return ops->alloc(dev, size, dma_handle, gfp, attrs);
> + return dma_direct_alloc(dev, size, dma_handle, gfp, attrs);
>  
>   ret = dma_alloc_from_global_coherent(size, dma_handle);
>  
> @@ -70,10 +69,8 @@ static void arm_nommu_dma_free(struct device *dev, size_t 
> size,
>  void *cpu_addr, dma_addr_t dma_addr,
>  unsigned long attrs)
>  {
> - const struct dma_map_ops *ops = &dma_direct_ops;
> -
>   if (attrs & DMA_ATTR_NON_CONSISTENT) {
> - ops->free(dev, size, cpu_addr, dma_addr, attrs);
> + dma_direct_free(dev, size, cpu_addr, dma_addr, attrs);
>   } else {
>   int ret = dma_release_from_global_coherent(get_order(size),
>  cpu_addr);
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 10e924b7cba7..4788bf0bf683 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -38,4 +38,9 @@ static inline void dma_mark_clean(void *addr, size_t size)
>  }
>  #endif /* CONFIG_ARCH_HAS_DMA_MARK_CLEAN */
>  
> +void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
> + gfp_t gfp, unsigned long attrs);
> +void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
> + dma_addr_t dma_addr, unsigned long attrs);
> +
>  #endif /* _LINUX_DMA_DIRECT_H */
> diff --git a/lib/dma-direct.c b/lib/dma-direct.c
> index f8467cb3d89a..7e913728e099 100644
> --- a/lib/dma-direct.c
> +++ b/lib/dma-direct.c
> @@ -33,8 +33,8 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
> phys, size_t size)
>   return phys_to_dma(dev, phys) + size <= dev->coherent_dma_mask;
>  }
>  
> -static void *dma_direct_alloc(struct device *dev, size_t size,
> - dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
> +void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
> + gfp_t gfp, unsigned long attrs)
>  {
>   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
>   int page_order = get_order(size);
> @@ -71,7 +71,7 @@ static void *dma_direct_alloc(struct device *dev, size_t 
> size,
>   return page_address(page);
>  }
>  
> -static void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
> +void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
>   dma_addr_t dma_addr, unsigned long attrs)
>  {
>   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> 

Reviewed-by: Vladimir Murzin 

Thanks
Vladimir

Re: [net] Revert "net: core: maybe return -EEXIST in __dev_alloc_name"

2018-01-02 Thread David Miller

From: Michael Ellerman 
Date: Fri, 22 Dec 2017 15:22:22 +1100

>> On Tue, Dec 19 2017, Michael Ellerman  
>> wrote:
>>> This revert seems to have broken networking on one of my powerpc
>>> machines, according to git bisect.
>>>
>>> The symptom is DHCP fails and I don't get a link, I didn't dig any
>>> further than that. I can if it's helpful.
>>>
>>> I think the problem is that 87c320e51519 ("net: core: dev_get_valid_name
>>> is now the same as dev_alloc_name_ns") only makes sense while
>>> d6f295e9def0 remains in the tree.
>>
>> I'm sorry about all of this, I really didn't think there would be such
>> consequences of changing an errno return. Indeed, d6f29 was preparation
>> for unifying the two functions that do the exact same thing (and how we
>> ever got into that situation is somewhat unclear), except for
>> their behaviour in the case the requested name already exists. So one of
>> the two interfaces had to change its return value, and as I wrote, I
>> thought EEXIST was the saner choice when an explicit name (no %d) had
>> been requested.
> 
> No worries.
> 
>>> ie. before the entire series, dev_get_valid_name() would return EEXIST,
>>> and that was retained when 87c320e51519 was merged, but now that
>>> d6f295e9def0 has been reverted dev_get_valid_name() is returning ENFILE.
>>>
>>> I can get the network up again if I also revert 87c320e51519 ("net:
>>> core: dev_get_valid_name is now the same as dev_alloc_name_ns"), or with
>>> the gross patch below.
>>
>> I don't think changing -ENFILE to -EEXIST would be right either, since
>> dev_get_valid_name() used to be able to return both (-EEXIST in the case
>> where there's no %d, -ENFILE in the case where we end up calling
>> dev_alloc_name_ns()). If anything, we could do the check for the old
>> -EEXIST condition first, and then call dev_alloc_name_ns(). But I'm also
>> fine with reverting.
> 
> Yeah I think a revert would be best, given it's nearly rc5.
> 
> My userspace is not exotic AFAIK, just debian something, so presumably
> this will affect other people too.

I've just queued up the following revert, thanks!


>From 5047543928139184f060c8f3bccb788b3df4c1ea Mon Sep 17 00:00:00 2001
From: "David S. Miller" 
Date: Tue, 2 Jan 2018 11:45:07 -0500
Subject: [PATCH] Revert "net: core: dev_get_valid_name is now the same as
 dev_alloc_name_ns"

This reverts commit 87c320e51519a83c496ab7bfb4e96c8f9c001e89.

Changing the error return code in some situations turns out to
be harmful in practice.  In particular Michael Ellerman reports
that DHCP fails on his powerpc machines, and this revert gets
things working again.

Johannes Berg agrees that this revert is the best course of
action for now.

Fixes: 029b6d140550 ("Revert "net: core: maybe return -EEXIST in 
__dev_alloc_name"")
Reported-by: Michael Ellerman 
Signed-off-by: David S. Miller 
---
 net/core/dev.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 01ee854454a8..0e0ba36eeac9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1146,7 +1146,19 @@ EXPORT_SYMBOL(dev_alloc_name);
 int dev_get_valid_name(struct net *net, struct net_device *dev,
   const char *name)
 {
-   return dev_alloc_name_ns(net, dev, name);
+   BUG_ON(!net);
+
+   if (!dev_valid_name(name))
+   return -EINVAL;
+
+   if (strchr(name, '%'))
+   return dev_alloc_name_ns(net, dev, name);
+   else if (__dev_get_by_name(net, name))
+   return -EEXIST;
+   else if (dev->name != name)
+   strlcpy(dev->name, name, IFNAMSIZ);
+
+   return 0;
 }
 EXPORT_SYMBOL(dev_get_valid_name);
 
-- 
2.14.3

Re: [net] Revert "net: core: maybe return -EEXIST in __dev_alloc_name"

2018-01-02 Thread Johannes Berg

On Tue, 2018-01-02 at 11:50 -0500, David Miller wrote:
> From: Michael Ellerman 
> Date: Fri, 22 Dec 2017 15:22:22 +1100
> 
> >> On Tue, Dec 19 2017, Michael Ellerman  
> >> wrote:
> >>> This revert seems to have broken networking on one of my powerpc
> >>> machines, according to git bisect.
> >>>
> >>> The symptom is DHCP fails and I don't get a link, I didn't dig any
> >>> further than that. I can if it's helpful.
> >>>
> >>> I think the problem is that 87c320e51519 ("net: core: dev_get_valid_name
> >>> is now the same as dev_alloc_name_ns") only makes sense while
> >>> d6f295e9def0 remains in the tree.
> >>
> >> I'm sorry about all of this, I really didn't think there would be such
> >> consequences of changing an errno return. Indeed, d6f29 was preparation
> >> for unifying the two functions that do the exact same thing (and how we
> >> ever got into that situation is somewhat unclear), except for
> >> their behaviour in the case the requested name already exists. So one of
> >> the two interfaces had to change its return value, and as I wrote, I
> >> thought EEXIST was the saner choice when an explicit name (no %d) had
> >> been requested.
> > 
> > No worries.
> > 
> >>> ie. before the entire series, dev_get_valid_name() would return EEXIST,
> >>> and that was retained when 87c320e51519 was merged, but now that
> >>> d6f295e9def0 has been reverted dev_get_valid_name() is returning ENFILE.
> >>>
> >>> I can get the network up again if I also revert 87c320e51519 ("net:
> >>> core: dev_get_valid_name is now the same as dev_alloc_name_ns"), or with
> >>> the gross patch below.
> >>
> >> I don't think changing -ENFILE to -EEXIST would be right either, since
> >> dev_get_valid_name() used to be able to return both (-EEXIST in the case
> >> where there's no %d, -ENFILE in the case where we end up calling
> >> dev_alloc_name_ns()). If anything, we could do the check for the old
> >> -EEXIST condition first, and then call dev_alloc_name_ns(). But I'm also
> >> fine with reverting.
> > 
> > Yeah I think a revert would be best, given it's nearly rc5.
> > 
> > My userspace is not exotic AFAIK, just debian something, so presumably
> > this will affect other people too.
> 
> I've just queued up the following revert, thanks!
> 
> 
> From 5047543928139184f060c8f3bccb788b3df4c1ea Mon Sep 17 00:00:00 2001
> From: "David S. Miller" 
> Date: Tue, 2 Jan 2018 11:45:07 -0500
> Subject: [PATCH] Revert "net: core: dev_get_valid_name is now the same as
>  dev_alloc_name_ns"
> 
> This reverts commit 87c320e51519a83c496ab7bfb4e96c8f9c001e89.
> 
> Changing the error return code in some situations turns out to
> be harmful in practice.  In particular Michael Ellerman reports
> that DHCP fails on his powerpc machines, and this revert gets
> things working again.
> 
> Johannes Berg agrees that this revert is the best course of
> action for now.

I'm not sure my voice matters much, I merely did the first revert of
these two patches ... :)

But I agree with Michael that you can't really salvage this without the
other patch, and that one caused problems in wifi ...

Thanks :)

johannes

Re: [PATCH v2 1/1] powerpc/pseries: increase pseries_dlpar_init initcall priority

2018-01-02 Thread joserz

On Tue, Jan 02, 2018 at 10:46:07PM +1100, Michael Ellerman wrote:
> Jose Ricardo Ziviani  writes:
> 
> > The hotplug engine uses its own workqueue to handle IRQ requests, the
> > problem is that such workqueue is initialized after init_ras_IRQ, which
> > will cause a kernel panic if any hotplug interruption is issued in that
> > period of time.
> >
> > This patch changes the dlpar initcall registration to make sure it will
> > be initialized before init_ras_IRQ.
> 
> Sorry I know this is already v2, but I don't think this is the best fix.
> 
> There's a dependency between the registration of the IRQ in the RAS
> code, and the creation of the work queue in the DLPAR code, but it's
> currently not explicit. That's the bug. So it'd be better to just make
> it explicit.
> 
> As a bonus we can add actual error checking of the workqueue allocation.
> 
> Something like below, can you test it please?

sure, no problem. I'll do it.

Thanks for reviewing it!!!

> 
> cheers
> 
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
> b/arch/powerpc/platforms/pseries/dlpar.c
> index 6e35780c5962..dd8b29e58a98 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -574,11 +574,26 @@ static ssize_t dlpar_show(struct class *class, struct 
> class_attribute *attr,
> 
>  static CLASS_ATTR_RW(dlpar);
> 
> -static int __init pseries_dlpar_init(void)
> +int __init dlpar_workqueue_init(void)
>  {
> + if (pseries_hp_wq)
> + return 0;
> +
>   pseries_hp_wq = alloc_workqueue("pseries hotplug workqueue",
>   WQ_UNBOUND, 1);
> +
> + return pseries_hp_wq ? 0 : -ENOMEM;
> +}
> +
> +static int __init dlpar_sysfs_init(void)
> +{
> + int rc;
> +
> + rc = dlpar_workqueue_init();
> + if (rc)
> + return rc;
> +
>   return sysfs_create_file(kernel_kobj, &class_attr_dlpar.attr);
>  }
> -machine_device_initcall(pseries, pseries_dlpar_init);
> +machine_device_initcall(pseries, dlpar_sysfs_init);
> 
> diff --git a/arch/powerpc/platforms/pseries/pseries.h 
> b/arch/powerpc/platforms/pseries/pseries.h
> index 4470a3194311..1ae1d9f4dbe9 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -98,4 +98,6 @@ static inline unsigned long cmo_get_page_size(void)
>   return CMO_PageSize;
>  }
> 
> +int dlpar_workqueue_init(void);
> +
>  #endif /* _PSERIES_PSERIES_H */
> diff --git a/arch/powerpc/platforms/pseries/ras.c 
> b/arch/powerpc/platforms/pseries/ras.c
> index 4923ffe230cf..879a92327010 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -69,8 +69,9 @@ static int __init init_ras_IRQ(void)
>   /* Hotplug Events */
>   np = of_find_node_by_path("/event-sources/hot-plug-events");
>   if (np != NULL) {
> - request_event_sources_irqs(np, ras_hotplug_interrupt,
> -"RAS_HOTPLUG");
> + if (dlpar_workqueue_init() == 0)
> + request_event_sources_irqs(np, ras_hotplug_interrupt,
> + "RAS_HOTPLUG");
>   of_node_put(np);
>   }
> 
>

Re: [PATCH] selftests/powerpc: Add a test of SEGV error behaviour

2018-01-02 Thread John Sperbeck

On Tue, Jan 2, 2018 at 3:03 AM, Michael Ellerman  wrote:
> Add a test case of the error code reported when we take a SEGV on a
> mapped but inaccessible area. We broke this recently.
>
> Based on a test case from John Sperbeck .
>
> Signed-off-by: Michael Ellerman 
> ---
>  tools/testing/selftests/powerpc/mm/.gitignore|  3 +-
>  tools/testing/selftests/powerpc/mm/Makefile  |  2 +-
>  tools/testing/selftests/powerpc/mm/segv_errors.c | 78 
> 
>  3 files changed, 81 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/powerpc/mm/segv_errors.c
>
> diff --git a/tools/testing/selftests/powerpc/mm/.gitignore 
> b/tools/testing/selftests/powerpc/mm/.gitignore
> index e715a3f2fbf4..7d7c42ed6de9 100644
> --- a/tools/testing/selftests/powerpc/mm/.gitignore
> +++ b/tools/testing/selftests/powerpc/mm/.gitignore
> @@ -1,4 +1,5 @@
>  hugetlb_vs_thp_test
>  subpage_prot
>  tempfile
> -prot_sao
> \ No newline at end of file
> +prot_sao
> +segv_errors
> \ No newline at end of file
> diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
> b/tools/testing/selftests/powerpc/mm/Makefile
> index bf315bcbe663..8ebbe96d80a8 100644
> --- a/tools/testing/selftests/powerpc/mm/Makefile
> +++ b/tools/testing/selftests/powerpc/mm/Makefile
> @@ -2,7 +2,7 @@
>  noarg:
> $(MAKE) -C ../
>
> -TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao
> +TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors
>  TEST_GEN_FILES := tempfile
>
>  include ../../lib.mk
> diff --git a/tools/testing/selftests/powerpc/mm/segv_errors.c 
> b/tools/testing/selftests/powerpc/mm/segv_errors.c
> new file mode 100644
> index ..06ae76ee3ea1
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/mm/segv_errors.c
> @@ -0,0 +1,78 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright 2017 John Sperbeck
> + *
> + * Test that an access to a mapped but inaccessible area causes a SEGV and
> + * reports si_code == SEGV_ACCERR.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "utils.h"
> +
> +static bool faulted;
> +static int si_code;
> +
> +static void segv_handler(int n, siginfo_t *info, void *ctxt_v)
> +{
> +   ucontext_t *ctxt = (ucontext_t *)ctxt_v;
> +   struct pt_regs *regs = ctxt->uc_mcontext.regs;
> +
> +   faulted = true;
> +   si_code = info->si_code;
> +   regs->nip += 4;
> +}
> +
> +int test_segv_errors(void)
> +{
> +   struct sigaction act = {
> +   .sa_sigaction = segv_handler,
> +   .sa_flags = SA_SIGINFO,
> +   };
> +   char c, *p = NULL;
> +
> +   p = mmap(NULL, getpagesize(), 0, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> +   FAIL_IF(p == MAP_FAILED);
> +
> +   FAIL_IF(sigaction(SIGSEGV, &act, NULL) != 0);
> +
> +   faulted = false;
> +   si_code = 0;
> +
> +   /*
> +* We just need a compiler barrier, but mb() works and has the nice
> +* property of being easy to spot in the disassembly.
> +*/
> +   mb();
> +   c = *p;
> +   mb();
> +
> +   FAIL_IF(!faulted);
> +   FAIL_IF(si_code != SEGV_ACCERR);
> +
> +   faulted = false;
> +   si_code = 0;
> +
> +   mb();
> +   *p = c;
> +   mb();
> +
> +   FAIL_IF(!faulted);
> +   FAIL_IF(si_code != SEGV_ACCERR);
> +
> +   return 0;
> +}
> +
> +int main(void)
> +{
> +   return test_harness(test_segv_errors, "segv_errors");
> +}
> --
> 2.14.3
>

Looks good to me.

Acked-by: John Sperbeck

[PATCH v7 00/10] add support for relative references in special sections

2018-01-02 Thread Ard Biesheuvel

This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references. This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata
for these sections in relocatables kernels (e.g., for KASLR) that need
to fix up these absolute references at boot time. On arm64, this reduces
the vmlinux footprint of such a reference by 8x (8 byte absolute reference
+ 24 byte RELA entry vs 4 byte relative reference)

Patch #3 was sent out before as a single patch. This series supersedes
the previous submission. This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which architectures
it should be blacklisted.

Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.

Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled. This
means we save about 28 KB of vmlinux space for each of these patches.

Patches #7 - #10 have been added in v5, and implement relative references
in jump tables for arm64 and x86. On arm64, this results in significant
space savings (650+ KB on a typical distro kernel). On x86, the savings
are not as impressive, but still worthwhile. (Note that these patches
do not rely on CONFIG_HAVE_ARCH_PREL32_RELOCATIONS, given that the
inline asm that is emitted is already per-arch)

For the arm64 kernel, all patches combined reduce the memory footprint of
vmlinux by about 1.3 MB (using a config copied from Ubuntu that has KASLR
enabled), of which ~1 MB is the size reduction of the RELA section in .init,
and the remaining 300 KB is reduction of .text/.data.

Branch:
git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git 
relative-special-sections-v7

Changes since v6:
- drop S390 from patch #1 introducing HAVE_ARCH_PREL32_RELOCATIONS: kbuild
  robot threw me some s390 curveballs, and given that s390 does not define
  CONFIG_RELOCATABLE in the first place, it does not benefit as much from
  relative references as arm64, x86 and power do
- add patch to allow symbol exports to be disabled at compilation unit
  granularity (#2)
- get rid of arm64 vmlinux.lds.S hunk to ensure code generated by __ADDRESSABLE
  gets discarded from the EFI stub - it is no longer needed after adding #2 (#1)
- change _ADDRESSABLE() to emit a data reference, not a code reference - this
  is another simplification made possible by patch #2 (#3)
- add Steven's ack to #6
- split x86 jump_label patch into two (#9, #10)

Changes since v5:
- add missing jump_label prototypes to s390 jump_label.h (#6)
- fix inverted condition in call to jump_entry_is_module_init() (#6)

Changes since v4:
- add patches to convert x86 and arm64 to use relative references for jump
  tables (#6 - #8)
- rename PCI patch and add Bjorn's ack (#4)
- rebase onto v4.15-rc5

Changes since v3:
- fix module unload issue in patch #5 reported by Jessica, by reusing the
  updated routine for_each_tracepoint_range() for the quiescent check at
  module unload time; this requires this routine to be moved before
  tracepoint_module_going() in kernel/tracepoint.c
- add Jessica's ack to #2
- rebase onto v4.14-rc1

Changes since v2:
- Revert my slightly misguided attempt to appease checkpatch, which resulted
  in needless churn and worse code. This v3 is based on v1 with a few tweaks
  that were actually reasonable checkpatch warnings: unnecessary braces (as
  pointed out by Ingo) and other minor whitespace misdemeanors.

Changes since v1:
- Remove checkpatch errors to the extent feasible: in some cases, this
  involves moving extern declarations into C files, and switching to
  struct definitions rather than typedefs. Some errors are impossible
  to fix: please find the remaining ones after the diffstat.
- Used 'int' instead if 'signed int' for the various offset fields: there
  is no ambiguity between architectures regarding its signedness (unlike
  'char')
- Refactor the different patches to be more uniform in the way they define
  the section entry type and accessors in the .h file, and avoid the need to
  add #ifdefs to the C code.

Cc: "H. Peter Anvin" 
Cc: Ralf Baechle 
Cc: Arnd Bergmann 
Cc: Heiko Carstens 
Cc: Kees Cook 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: "Serge E. Hallyn" 
Cc: Bjorn Helgaas 
Cc: Benjamin Herrenschmidt 
Cc: Russell King 
Cc: Paul Mackerras 
Cc: Catalin Marinas 
Cc: "David S. Miller" 
Cc: Petr Mladek 
Cc: Ingo Molnar 
Cc: James Morris 
Cc: Andrew Morton 
Cc: Nicolas Pitre 
Cc: Josh Poimboeuf 
Cc: Steven Rostedt 
Cc: Mart

[PATCH v7 01/10] arch: enable relative relocations for arm64, power and x86

2018-01-02 Thread Ard Biesheuvel

Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space  and reduce the
number of absolute relocations that need to be processed at runtime
by relocatable kernels, introduce the Kconfig symbol and define it
for some architectures that should be able to support and benefit
from it.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Signed-off-by: Ard Biesheuvel 
---
 arch/Kconfig | 10 ++
 arch/arm64/Kconfig   |  1 +
 arch/powerpc/Kconfig |  1 +
 arch/x86/Kconfig |  1 +
 4 files changed, 13 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 400b9e1b2f27..dbc036a7bd1b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -959,4 +959,14 @@ config REFCOUNT_FULL
  against various use-after-free conditions that can be used in
  security flaw exploits.
 
+config HAVE_ARCH_PREL32_RELOCATIONS
+   bool
+   help
+ May be selected by an architecture if it supports place-relative
+ 32-bit relocations, both in the toolchain and in the module loader,
+ in which case relative references can be used in special sections
+ for PCI fixup, initcalls etc which are only half the size on 64 bit
+ architectures, and don't require runtime relocation on relocatable
+ kernels.
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..66c7b9ab2a3d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -89,6 +89,7 @@ config ARM64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
+   select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c51e6ce42e7a..e172478e2ae7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -177,6 +177,7 @@ config PPC
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
+   select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!RELOCATABLE && !HIBERNATION)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d4fc98c50378..9f2bb853aedb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -115,6 +115,7 @@ config X86
select HAVE_ARCH_MMAP_RND_BITS  if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
+   select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-- 
2.11.0

[PATCH v7 02/10] module: allow symbol exports to be disabled

2018-01-02 Thread Ard Biesheuvel

To allow existing C code to be incorporated into the decompressor or
the UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
declarations into nops, and #define it in places where such exports
are undesirable. Note that this gets rid of a rather dodgy redefine
of linux/export.h's header guard.

Cc: m...@codeblueprint.co.uk
Cc: keesc...@chromium.org
Cc: j...@kernel.org
Signed-off-by: Ard Biesheuvel 
---
 arch/x86/boot/compressed/kaslr.c  | 5 +
 drivers/firmware/efi/libstub/Makefile | 3 ++-
 include/linux/export.h| 9 +
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8199a6187251..3a2a6d7049e4 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -23,11 +23,8 @@
  * _ctype[] in lib/ctype.c is needed by isspace() of linux/ctype.h.
  * While both lib/ctype.c and lib/cmdline.c will bring EXPORT_SYMBOL
  * which is meaningless and will cause compiling error in some cases.
- * So do not include linux/export.h and define EXPORT_SYMBOL(sym)
- * as empty.
  */
-#define _LINUX_EXPORT_H
-#define EXPORT_SYMBOL(sym)
+#define __DISABLE_EXPORTS
 
 #include "misc.h"
 #include "error.h"
diff --git a/drivers/firmware/efi/libstub/Makefile 
b/drivers/firmware/efi/libstub/Makefile
index adaa4a964f0c..312bd0b64a61 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)  += 
-I$(srctree)/scripts/dtc/libfdt
 KBUILD_CFLAGS  := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
   -D__NO_FORTIFY \
   $(call cc-option,-ffreestanding) \
-  $(call cc-option,-fno-stack-protector)
+  $(call cc-option,-fno-stack-protector) \
+  -D__DISABLE_EXPORTS
 
 GCOV_PROFILE   := n
 KASAN_SANITIZE := n
diff --git a/include/linux/export.h b/include/linux/export.h
index 1a1dfdb2a5c6..6dba2fb08f77 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -83,6 +83,15 @@ extern struct module __this_module;
  */
 #define __EXPORT_SYMBOL(sym, sec)  === __KSYM_##sym ===
 
+#elif defined(__DISABLE_EXPORTS)
+
+/*
+ * Allow symbol exports to be disabled completely so that C code may
+ * be reused in other execution contexts such as the UEFI stub or the
+ * decompressor.
+ */
+#define __EXPORT_SYMBOL(sym, sec)
+
 #elif defined(CONFIG_TRIM_UNUSED_KSYMS)
 
 #include 
-- 
2.11.0

[PATCH v7 03/10] module: use relative references for __ksymtab entries

2018-01-02 Thread Ard Biesheuvel

An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab
entries, each consisting of two 64-bit fields containing absolute
references, to the symbol itself and to a char array containing
its name, respectively.

When we build the same configuration with KASLR enabled, we end
up with an additional ~192 KB of relocations in the .init section,
i.e., one 24 byte entry for each absolute reference, which all need
to be processed at boot time.

Given how the struct kernel_symbol that describes each entry is
completely local to module.c (except for the references emitted
by EXPORT_SYMBOL() itself), we can easily modify it to contain
two 32-bit relative references instead. This reduces the size of
the __ksymtab section by 50% for all 64-bit architectures, and
gets rid of the runtime relocations entirely for architectures
implementing KASLR, either via standard PIE linking (arm64) or
using custom host tools (x86).

Note that the binary search involving __ksymtab contents relies
on each section being sorted by symbol name. This is implemented
based on the input section names, not the names in the ksymtab
entries, so this patch does not interfere with that.

Given that the use of place-relative relocations requires support
both in the toolchain and in the module loader, we cannot enable
this feature for all architectures. So make it dependent on whether
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.

Cc: Arnd Bergmann 
Cc: Andrew Morton 
Cc: Ingo Molnar 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Nicolas Pitre 
Acked-by: Jessica Yu 
Signed-off-by: Ard Biesheuvel 
---
 arch/x86/include/asm/Kbuild   |  1 +
 arch/x86/include/asm/export.h |  5 ---
 include/asm-generic/export.h  | 12 -
 include/linux/compiler.h  | 10 +
 include/linux/export.h| 46 +++-
 kernel/module.c   | 33 +++---
 6 files changed, 83 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 5d6a53fd7521..3e8a88dcaa1d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -9,5 +9,6 @@ generated-y += xen-hypercalls.h
 generic-y += clkdev.h
 generic-y += dma-contiguous.h
 generic-y += early_ioremap.h
+generic-y += export.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
diff --git a/arch/x86/include/asm/export.h b/arch/x86/include/asm/export.h
deleted file mode 100644
index 2a51d66689c5..
--- a/arch/x86/include/asm/export.h
+++ /dev/null
@@ -1,5 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifdef CONFIG_64BIT
-#define KSYM_ALIGN 16
-#endif
-#include 
diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h
index 719db1968d81..97ce606459ae 100644
--- a/include/asm-generic/export.h
+++ b/include/asm-generic/export.h
@@ -5,12 +5,10 @@
 #define KSYM_FUNC(x) x
 #endif
 #ifdef CONFIG_64BIT
-#define __put .quad
 #ifndef KSYM_ALIGN
 #define KSYM_ALIGN 8
 #endif
 #else
-#define __put .long
 #ifndef KSYM_ALIGN
 #define KSYM_ALIGN 4
 #endif
@@ -25,6 +23,16 @@
 #define KSYM(name) name
 #endif
 
+.macro __put, val, name
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+   .long   \val - ., \name - .
+#elif defined(CONFIG_64BIT)
+   .quad   \val, \name
+#else
+   .long   \val, \name
+#endif
+.endm
+
 /*
  * note on .section use: @progbits vs %progbits nastiness doesn't matter,
  * since we immediately emit into those sections anyway.
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 52e611ab9a6c..79db4aa87d75 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -327,4 +327,14 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
compiletime_assert(__native_word(t),\
"Need native word sized stores/loads for atomicity.")
 
+/*
+ * Force the compiler to emit 'sym' as a symbol, so that we can reference
+ * it from inline assembler. Necessary in case 'sym' could be inlined
+ * otherwise, or eliminated entirely due to lack of references that are
+ * visible to the compiler.
+ */
+#define __ADDRESSABLE(sym) \
+   static void * const __attribute__((section(".discard"), used))  \
+   __PASTE(__addressable_##sym, __LINE__) = (void *)&sym;
+
 #endif /* __LINUX_COMPILER_H */
diff --git a/include/linux/export.h b/include/linux/export.h
index 6dba2fb08f77..4744cf4736b0 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -24,12 +24,6 @@
 #define VMLINUX_SYMBOL_STR(x) __VMLINUX_SYMBOL_STR(x)
 
 #ifndef __ASSEMBLY__
-struct kernel_symbol
-{
-   unsigned long value;
-   const char *name;
-};
-
 #ifdef MODULE
 extern struct module __this_module;
 #define THIS_MODULE (&__this_module)
@@ -60,17 +54,47 @@ extern struct module __this_module;
 #define __CRC_SYMBOL(sym, sec)
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#include 
+/*
+ * Emit the ksymtab entry as a pair of relative references: this reduces
+ * the size by half on 64-bit architect

[PATCH v7 04/10] init: allow initcall tables to be emitted using relative references

2018-01-02 Thread Ard Biesheuvel

Allow the initcall tables to be emitted using relative references that
are only half the size on 64-bit architectures and don't require fixups
at runtime on relocatable kernels.

Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: James Morris 
Cc: "Serge E. Hallyn" 
Signed-off-by: Ard Biesheuvel 
---
 include/linux/init.h   | 44 +++-
 init/main.c| 32 +++---
 kernel/printk/printk.c |  4 +-
 security/security.c|  4 +-
 4 files changed, 53 insertions(+), 31 deletions(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index ea1b31101d9e..cef8e817e5a5 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -109,8 +109,24 @@
 typedef int (*initcall_t)(void);
 typedef void (*exitcall_t)(void);
 
-extern initcall_t __con_initcall_start[], __con_initcall_end[];
-extern initcall_t __security_initcall_start[], __security_initcall_end[];
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+typedef int initcall_entry_t;
+
+static inline initcall_t initcall_from_entry(initcall_entry_t *entry)
+{
+   return (initcall_t)((unsigned long)entry + *entry);
+}
+#else
+typedef initcall_t initcall_entry_t;
+
+static inline initcall_t initcall_from_entry(initcall_entry_t *entry)
+{
+   return *entry;
+}
+#endif
+
+extern initcall_entry_t __con_initcall_start[], __con_initcall_end[];
+extern initcall_entry_t __security_initcall_start[], __security_initcall_end[];
 
 /* Used for contructor calls. */
 typedef void (*ctor_fn_t)(void);
@@ -160,9 +176,20 @@ extern bool initcall_debug;
  * as KEEP() in the linker script.
  */
 
-#define __define_initcall(fn, id) \
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define ___define_initcall(fn, id, __sec)  \
+   __ADDRESSABLE(fn)   \
+   asm(".section   \"" #__sec ".init\", \"a\"  \n" \
+   "__initcall_" #fn #id ":\n" \
+   ".long "VMLINUX_SYMBOL_STR(fn) " - .\n" \
+   ".previous  \n");
+#else
+#define ___define_initcall(fn, id, __sec) \
static initcall_t __initcall_##fn##id __used \
-   __attribute__((__section__(".initcall" #id ".init"))) = fn;
+   __attribute__((__section__(#__sec ".init"))) = fn;
+#endif
+
+#define __define_initcall(fn, id) ___define_initcall(fn, id, .initcall##id)
 
 /*
  * Early initcalls run before initializing SMP.
@@ -201,13 +228,8 @@ extern bool initcall_debug;
 #define __exitcall(fn) \
static exitcall_t __exitcall_##fn __exit_call = fn
 
-#define console_initcall(fn)   \
-   static initcall_t __initcall_##fn   \
-   __used __section(.con_initcall.init) = fn
-
-#define security_initcall(fn)  \
-   static initcall_t __initcall_##fn   \
-   __used __section(.security_initcall.init) = fn
+#define console_initcall(fn)   ___define_initcall(fn,, .con_initcall)
+#define security_initcall(fn)  ___define_initcall(fn,, .security_initcall)
 
 struct obs_kernel_param {
const char *str;
diff --git a/init/main.c b/init/main.c
index a8100b954839..d81487cc126d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -848,18 +848,18 @@ int __init_or_module do_one_initcall(initcall_t fn)
 }
 
 
-extern initcall_t __initcall_start[];
-extern initcall_t __initcall0_start[];
-extern initcall_t __initcall1_start[];
-extern initcall_t __initcall2_start[];
-extern initcall_t __initcall3_start[];
-extern initcall_t __initcall4_start[];
-extern initcall_t __initcall5_start[];
-extern initcall_t __initcall6_start[];
-extern initcall_t __initcall7_start[];
-extern initcall_t __initcall_end[];
-
-static initcall_t *initcall_levels[] __initdata = {
+extern initcall_entry_t __initcall_start[];
+extern initcall_entry_t __initcall0_start[];
+extern initcall_entry_t __initcall1_start[];
+extern initcall_entry_t __initcall2_start[];
+extern initcall_entry_t __initcall3_start[];
+extern initcall_entry_t __initcall4_start[];
+extern initcall_entry_t __initcall5_start[];
+extern initcall_entry_t __initcall6_start[];
+extern initcall_entry_t __initcall7_start[];
+extern initcall_entry_t __initcall_end[];
+
+static initcall_entry_t *initcall_levels[] __initdata = {
__initcall0_start,
__initcall1_start,
__initcall2_start,
@@ -885,7 +885,7 @@ static char *initcall_level_names[] __initdata = {
 
 static void __init do_initcall_level(int level)
 {
-   initcall_t *fn;
+   initcall_entry_t *fn;
 
strcpy(initcall_command_line, saved_command_line);
parse_args(initcall_level_names[level],
@@ -895,7 +895,7 @@ static void __init do_initcall_level(int level)
   NULL, &repair_env_string);
 
for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
-   do_one_initcall(*fn);
+   do_one_initcall(i

[PATCH v7 06/10] kernel: tracepoints: add support for relative references

2018-01-02 Thread Ard Biesheuvel

To avoid the need for relocating absolute references to tracepoint
structures at boot time when running relocatable kernels (which may
take a disproportionate amount of space), add the option to emit
these tables as relative references instead.

Cc: Ingo Molnar 
Acked-by: Steven Rostedt (VMware) 
Signed-off-by: Ard Biesheuvel 
---
 include/linux/tracepoint.h | 19 ++--
 kernel/tracepoint.c| 50 +++-
 2 files changed, 42 insertions(+), 27 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index a26ffbe09e71..d02bf1a695e8 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -228,6 +228,19 @@ extern void syscall_unregfunc(void);
return static_key_false(&__tracepoint_##name.key);  \
}
 
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define __TRACEPOINT_ENTRY(name)\
+   asm("   .section \"__tracepoints_ptrs\", \"a\"   \n" \
+   "   .balign 4\n" \
+   "   .long " VMLINUX_SYMBOL_STR(__tracepoint_##name) " - .\n" \
+   "   .previous\n")
+#else
+#define __TRACEPOINT_ENTRY(name)\
+   static struct tracepoint * const __tracepoint_ptr_##name __used  \
+   __attribute__((section("__tracepoints_ptrs"))) = \
+   &__tracepoint_##name
+#endif
+
 /*
  * We have no guarantee that gcc and the linker won't up-align the tracepoint
  * structures, so we create an array of pointers that will be used for 
iteration
@@ -237,11 +250,9 @@ extern void syscall_unregfunc(void);
static const char __tpstrtab_##name[]\
__attribute__((section("__tracepoints_strings"))) = #name;   \
struct tracepoint __tracepoint_##name\
-   __attribute__((section("__tracepoints"))) =  \
+   __attribute__((section("__tracepoints"), used)) =\
{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\
-   static struct tracepoint * const __tracepoint_ptr_##name __used  \
-   __attribute__((section("__tracepoints_ptrs"))) = \
-   &__tracepoint_##name;
+   __TRACEPOINT_ENTRY(name);
 
 #define DEFINE_TRACE(name) \
DEFINE_TRACE_FN(name, NULL, NULL);
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 685c50ae6300..05649fef106c 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -327,6 +327,28 @@ int tracepoint_probe_unregister(struct tracepoint *tp, 
void *probe, void *data)
 }
 EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);
 
+static void for_each_tracepoint_range(struct tracepoint * const *begin,
+   struct tracepoint * const *end,
+   void (*fct)(struct tracepoint *tp, void *priv),
+   void *priv)
+{
+   if (!begin)
+   return;
+
+   if (IS_ENABLED(CONFIG_HAVE_ARCH_PREL32_RELOCATIONS)) {
+   const int *iter;
+
+   for (iter = (const int *)begin; iter < (const int *)end; iter++)
+   fct((struct tracepoint *)((unsigned long)iter + *iter),
+   priv);
+   } else {
+   struct tracepoint * const *iter;
+
+   for (iter = begin; iter < end; iter++)
+   fct(*iter, priv);
+   }
+}
+
 #ifdef CONFIG_MODULES
 bool trace_module_has_bad_taint(struct module *mod)
 {
@@ -391,15 +413,9 @@ EXPORT_SYMBOL_GPL(unregister_tracepoint_module_notifier);
  * Ensure the tracer unregistered the module's probes before the module
  * teardown is performed. Prevents leaks of probe and data pointers.
  */
-static void tp_module_going_check_quiescent(struct tracepoint * const *begin,
-   struct tracepoint * const *end)
+static void tp_module_going_check_quiescent(struct tracepoint *tp, void *priv)
 {
-   struct tracepoint * const *iter;
-
-   if (!begin)
-   return;
-   for (iter = begin; iter < end; iter++)
-   WARN_ON_ONCE((*iter)->funcs);
+   WARN_ON_ONCE(tp->funcs);
 }
 
 static int tracepoint_module_coming(struct module *mod)
@@ -450,8 +466,9 @@ static void tracepoint_module_going(struct module *mod)
 * Called the going notifier before checking for
 * quiescence.
 */
-   tp_module_going_check_quiescent(mod->tracepoints_ptrs,
-   mod->tracepoints_ptrs + mod->num_tracepoints);
+   for_each_tracepoint_range(mod->tracepoints_ptrs,
+   mod->tracepoints_ptrs + mod->num_tracepoints,
+   tp_module_going_check_quiescent, NULL);
break;
}

[PATCH v7 07/10] kernel/jump_label: abstract jump_entry member accessors

2018-01-02 Thread Ard Biesheuvel

In preparation of allowing architectures to use relative references
in jump_label entries [which can dramatically reduce the memory
footprint], introduce abstractions for references to the 'code' and
'key' members of struct jump_entry.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm/include/asm/jump_label.h | 27 ++
 arch/arm64/include/asm/jump_label.h   | 27 ++
 arch/mips/include/asm/jump_label.h| 27 ++
 arch/powerpc/include/asm/jump_label.h | 27 ++
 arch/s390/include/asm/jump_label.h| 27 ++
 arch/sparc/include/asm/jump_label.h   | 27 ++
 arch/tile/include/asm/jump_label.h| 27 ++
 arch/x86/include/asm/jump_label.h | 27 ++
 kernel/jump_label.c   | 38 +---
 9 files changed, 232 insertions(+), 22 deletions(-)

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h
index e12d7d096fc0..7b05b404063a 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -45,5 +45,32 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #endif  /* __ASSEMBLY__ */
 #endif
diff --git a/arch/arm64/include/asm/jump_label.h 
b/arch/arm64/include/asm/jump_label.h
index 1b5e0e843c3a..9d6e46355c89 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -62,5 +62,32 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #endif  /* __ASSEMBLY__ */
 #endif /* __ASM_JUMP_LABEL_H */
diff --git a/arch/mips/include/asm/jump_label.h 
b/arch/mips/include/asm/jump_label.h
index e77672539e8e..70df9293dc49 100644
--- a/arch/mips/include/asm/jump_label.h
+++ b/arch/mips/include/asm/jump_label.h
@@ -66,5 +66,32 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #endif  /* __ASSEMBLY__ */
 #endif /* _ASM_MIPS_JUMP_LABEL_H */
diff --git a/arch/powerpc/include/asm/jump_label.h 
b/arch/powerpc/include/asm/jump_label.h
index 9a287e0ac8b1..412b2699c9f6 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -59,6 +59,33 @@ struct jump_entry {
jump_label_t key;
 };
 
+static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_module_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_module_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#define jump_label_swapNULL
+
 #else
 #define ARCH_STATIC_BRANCH(LABEL, KEY) \
 1098:  nop;\
diff --git a/arch/s390/include/asm/jump_lab

[PATCH v7 05/10] PCI: Add support for relative addressing in quirk tables

2018-01-02 Thread Ard Biesheuvel

Allow the PCI quirk tables to be emitted in a way that avoids absolute
references to the hook functions. This reduces the size of the entries,
and, more importantly, makes them invariant under runtime relocation
(e.g., for KASLR)

Acked-by: Bjorn Helgaas 
Signed-off-by: Ard Biesheuvel 
---
 drivers/pci/quirks.c | 13 ++---
 include/linux/pci.h  | 20 
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 10684b17d0bd..b6d51b4d5ce1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3556,9 +3556,16 @@ static void pci_do_fixups(struct pci_dev *dev, struct 
pci_fixup *f,
 f->vendor == (u16) PCI_ANY_ID) &&
(f->device == dev->device ||
 f->device == (u16) PCI_ANY_ID)) {
-   calltime = fixup_debug_start(dev, f->hook);
-   f->hook(dev);
-   fixup_debug_report(dev, calltime, f->hook);
+   void (*hook)(struct pci_dev *dev);
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+   hook = (void *)((unsigned long)&f->hook_offset +
+   f->hook_offset);
+#else
+   hook = f->hook;
+#endif
+   calltime = fixup_debug_start(dev, hook);
+   hook(dev);
+   fixup_debug_report(dev, calltime, hook);
}
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c170c9250c8b..086c3965710b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1792,7 +1792,11 @@ struct pci_fixup {
u16 device; /* You can use PCI_ANY_ID here of course */
u32 class;  /* You can use PCI_ANY_ID here too */
unsigned int class_shift;   /* should be 0, 8, 16 */
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+   int hook_offset;
+#else
void (*hook)(struct pci_dev *dev);
+#endif
 };
 
 enum pci_fixup_pass {
@@ -1806,12 +1810,28 @@ enum pci_fixup_pass {
pci_fixup_suspend_late, /* pci_device_suspend_late() */
 };
 
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,  \
+   class_shift, hook)  \
+   __ADDRESSABLE(hook) \
+   asm(".section " #sec ", \"a\"   \n" \
+   ".balign16  \n" \
+   ".short "   #vendor ", " #device "  \n" \
+   ".long "#class ", " #class_shift "  \n" \
+   ".long "VMLINUX_SYMBOL_STR(hook) " - .  \n" \
+   ".previous  \n");
+#define DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,\
+ class_shift, hook)\
+   __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,   \
+ class_shift, hook)
+#else
 /* Anonymous variables would be nice... */
 #define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, class,
\
  class_shift, hook)\
static const struct pci_fixup __PASTE(__pci_fixup_##name,__LINE__) 
__used   \
__attribute__((__section__(#section), aligned((sizeof(void *)\
= { vendor, device, class, class_shift, hook };
+#endif
 
 #define DECLARE_PCI_FIXUP_CLASS_EARLY(vendor, device, class,   \
 class_shift, hook) \
-- 
2.11.0

[PATCH v7 08/10] arm64/kernel: jump_label: use relative references

2018-01-02 Thread Ard Biesheuvel

On a randomly chosen distro kernel build for arm64, vmlinux.o shows the
following sections, containing jump label entries, and the associated
RELA relocation records, respectively:

  ...
  [38088] __jump_table  PROGBITS   00e19f30
   0002ea10    WA   0 0 8
  [38089] .rela__jump_table RELA   01fd8bb0
   0008be30  0018   I  38178   38088 8
  ...

In other words, we have 190 KB worth of 'struct jump_entry' instances,
and 573 KB worth of RELA entries to relocate each entry's code, target
and key members. This means the RELA section occupies 10% of the .init
segment, and the two sections combined represent 5% of vmlinux's entire
memory footprint.

So let's switch from 64-bit absolute references to 32-bit relative
references: this reduces the size of the __jump_table by 50%, and gets
rid of the RELA section entirely.

Note that this requires some extra care in the sorting routine, given
that the offsets change when entries are moved around in the jump_entry
table.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/jump_label.h | 27 
 arch/arm64/kernel/jump_label.c  | 22 +---
 2 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/jump_label.h 
b/arch/arm64/include/asm/jump_label.h
index 9d6e46355c89..8f82adeb7b0b 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -30,8 +30,8 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
 {
asm goto("1: nop\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".align 3\n\t"
-".quad 1b, %l[l_yes], %c0\n\t"
+".align 2\n\t"
+".long 1b - ., %l[l_yes] - ., %c0 - .\n\t"
 ".popsection\n\t"
 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
 
@@ -44,8 +44,8 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
 {
asm goto("1: b %l[l_yes]\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".align 3\n\t"
-".quad 1b, %l[l_yes], %c0\n\t"
+".align 2\n\t"
+".long 1b - ., %l[l_yes] - ., %c0 - .\n\t"
 ".popsection\n\t"
 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
 
@@ -57,19 +57,26 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
 typedef u64 jump_label_t;
 
 struct jump_entry {
-   jump_label_t code;
-   jump_label_t target;
-   jump_label_t key;
+   s32 code;
+   s32 target;
+   s32 key;
 };
 
 static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
 {
-   return entry->code;
+   return (unsigned long)&entry->code + entry->code;
+}
+
+static inline jump_label_t jump_entry_target(const struct jump_entry *entry)
+{
+   return (unsigned long)&entry->target + entry->target;
 }
 
 static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
 {
-   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+   unsigned long key = (unsigned long)&entry->key + entry->key;
+
+   return (struct static_key *)(key & ~1UL);
 }
 
 static inline bool jump_entry_is_branch(const struct jump_entry *entry)
@@ -87,7 +94,7 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
entry->code = 0;
 }
 
-#define jump_label_swapNULL
+void jump_label_swap(void *a, void *b, int size);
 
 #endif  /* __ASSEMBLY__ */
 #endif /* __ASM_JUMP_LABEL_H */
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
index c2dd1ad3e648..2b8e459e91f7 100644
--- a/arch/arm64/kernel/jump_label.c
+++ b/arch/arm64/kernel/jump_label.c
@@ -25,12 +25,12 @@
 void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
-   void *addr = (void *)entry->code;
+   void *addr = (void *)jump_entry_code(entry);
u32 insn;
 
if (type == JUMP_LABEL_JMP) {
-   insn = aarch64_insn_gen_branch_imm(entry->code,
-  entry->target,
+   insn = aarch64_insn_gen_branch_imm(jump_entry_code(entry),
+  jump_entry_target(entry),
   AARCH64_INSN_BRANCH_NOLINK);
} else {
insn = aarch64_insn_gen_nop();
@@ -50,4 +50,20 @@ void arch_jump_label_transform_static(struct jump_entry 
*entry,
 */
 }
 
+void jump_label_swap(void *a, void *b, int size)
+{
+   long delta = (unsigned long)a - (unsigned long)b;
+   struct jump_entry *jea = a;
+   struct jump_entry *jeb = b;
+   struct jump_entry tmp = *jea;
+
+   jea->code   = jeb->code - delta;
+   jea-

[PATCH v7 09/10] x86: jump_label: switch to jump_entry accessors

2018-01-02 Thread Ard Biesheuvel

In preparation of switching x86 to use place-relative references for
the code, target and key members of struct jump_entry, replace direct
references to the struct member with invocations of the new accessors.
This will allow us to make the switch by modifying the accessors only.

Signed-off-by: Ard Biesheuvel 
---
 arch/x86/kernel/jump_label.c | 43 
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index e56c95be2808..d64296092ef5 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -52,22 +52,24 @@ static void __jump_label_transform(struct jump_entry *entry,
 * Jump label is enabled for the first time.
 * So we expect a default_nop...
 */
-   if (unlikely(memcmp((void *)entry->code, default_nop, 5)
-!= 0))
-   bug_at((void *)entry->code, __LINE__);
+   if (unlikely(memcmp((void *)jump_entry_code(entry),
+   default_nop, 5) != 0))
+   bug_at((void *)jump_entry_code(entry),
+  __LINE__);
} else {
/*
 * ...otherwise expect an ideal_nop. Otherwise
 * something went horribly wrong.
 */
-   if (unlikely(memcmp((void *)entry->code, ideal_nop, 5)
-!= 0))
-   bug_at((void *)entry->code, __LINE__);
+   if (unlikely(memcmp((void *)jump_entry_code(entry),
+   ideal_nop, 5) != 0))
+   bug_at((void *)jump_entry_code(entry),
+  __LINE__);
}
 
code.jump = 0xe9;
-   code.offset = entry->target -
-   (entry->code + JUMP_LABEL_NOP_SIZE);
+   code.offset = jump_entry_target(entry) -
+ (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
} else {
/*
 * We are disabling this jump label. If it is not what
@@ -76,14 +78,18 @@ static void __jump_label_transform(struct jump_entry *entry,
 * are converting the default nop to the ideal nop.
 */
if (init) {
-   if (unlikely(memcmp((void *)entry->code, default_nop, 
5) != 0))
-   bug_at((void *)entry->code, __LINE__);
+   if (unlikely(memcmp((void *)jump_entry_code(entry),
+   default_nop, 5) != 0))
+   bug_at((void *)jump_entry_code(entry),
+  __LINE__);
} else {
code.jump = 0xe9;
-   code.offset = entry->target -
-   (entry->code + JUMP_LABEL_NOP_SIZE);
-   if (unlikely(memcmp((void *)entry->code, &code, 5) != 
0))
-   bug_at((void *)entry->code, __LINE__);
+   code.offset = jump_entry_target(entry) -
+   (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
+   if (unlikely(memcmp((void *)jump_entry_code(entry),
+&code, 5) != 0))
+   bug_at((void *)jump_entry_code(entry),
+  __LINE__);
}
memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
}
@@ -97,10 +103,13 @@ static void __jump_label_transform(struct jump_entry 
*entry,
 *
 */
if (poker)
-   (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+   (*poker)((void *)jump_entry_code(entry), &code,
+JUMP_LABEL_NOP_SIZE);
else
-   text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
-(void *)entry->code + JUMP_LABEL_NOP_SIZE);
+   text_poke_bp((void *)jump_entry_code(entry), &code,
+JUMP_LABEL_NOP_SIZE,
+(void *)jump_entry_code(entry) +
+JUMP_LABEL_NOP_SIZE);
 }
 
 void arch_jump_label_transform(struct jump_entry *entry,
-- 
2.11.0

[PATCH v7 10/10] x86/kernel: jump_table: use relative references

2018-01-02 Thread Ard Biesheuvel

Similar to the arm64 case, 64-bit x86 can benefit from using 32-bit
relative references rather than 64-bit absolute ones when emitting
struct jump_entry instances. Not only does this reduce the memory
footprint of the entries themselves by 50%, it also removes the need
for carrying relocation metadata on relocatable builds (i.e., for KASLR)
which saves a fair chunk of .init space as well (although the savings
are not as dramatic as on arm64)

Signed-off-by: Ard Biesheuvel 
---
 arch/x86/include/asm/jump_label.h | 35 
 arch/x86/kernel/jump_label.c  | 16 +
 tools/objtool/special.c   |  4 +--
 3 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h 
b/arch/x86/include/asm/jump_label.h
index 009ff2699d07..35fc2c5ec846 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -36,8 +36,8 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:"
".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
-   _ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   ".balign 4\n\t"
+   ".long 1b - ., %l[l_yes] - ., %c0 + %c1 - .\n\t"
".popsection \n\t"
: :  "i" (key), "i" (branch) : : l_yes);
 
@@ -52,8 +52,8 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"
"2:\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
-   _ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   ".balign 4\n\t"
+   ".long 1b - ., %l[l_yes] - ., %c0 + %c1 - .\n\t"
".popsection \n\t"
: :  "i" (key), "i" (branch) : : l_yes);
 
@@ -69,19 +69,26 @@ typedef u32 jump_label_t;
 #endif
 
 struct jump_entry {
-   jump_label_t code;
-   jump_label_t target;
-   jump_label_t key;
+   s32 code;
+   s32 target;
+   s32 key;
 };
 
 static inline jump_label_t jump_entry_code(const struct jump_entry *entry)
 {
-   return entry->code;
+   return (unsigned long)&entry->code + entry->code;
+}
+
+static inline jump_label_t jump_entry_target(const struct jump_entry *entry)
+{
+   return (unsigned long)&entry->target + entry->target;
 }
 
 static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
 {
-   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+   unsigned long key = (unsigned long)&entry->key + entry->key;
+
+   return (struct static_key *)(key & ~1UL);
 }
 
 static inline bool jump_entry_is_branch(const struct jump_entry *entry)
@@ -99,7 +106,7 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
entry->code = 0;
 }
 
-#define jump_label_swapNULL
+void jump_label_swap(void *a, void *b, int size);
 
 #else  /* __ASSEMBLY__ */
 
@@ -114,8 +121,8 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
.byte   STATIC_KEY_INIT_NOP
.endif
.pushsection __jump_table, "aw"
-   _ASM_ALIGN
-   _ASM_PTR.Lstatic_jump_\@, \target, \key
+   .balign 4
+   .long   .Lstatic_jump_\@ - ., \target - ., \key - .
.popsection
 .endm
 
@@ -130,8 +137,8 @@ static inline void jump_entry_set_module_init(struct 
jump_entry *entry)
 .Lstatic_jump_after_\@:
.endif
.pushsection __jump_table, "aw"
-   _ASM_ALIGN
-   _ASM_PTR.Lstatic_jump_\@, \target, \key + 1
+   .balign 4
+   .long   .Lstatic_jump_\@ - ., \target - ., \key + 1 - .
.popsection
 .endm
 
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index d64296092ef5..cc5034b42335 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -149,4 +149,20 @@ __init_or_module void 
arch_jump_label_transform_static(struct jump_entry *entry,
__jump_label_transform(entry, type, text_poke_early, 1);
 }
 
+void jump_label_swap(void *a, void *b, int size)
+{
+   long delta = (unsigned long)a - (unsigned long)b;
+   struct jump_entry *jea = a;
+   struct jump_entry *jeb = b;
+   struct jump_entry tmp = *jea;
+
+   jea->code   = jeb->code - delta;
+   jea->target = jeb->target - delta;
+   jea->key= jeb->key - delta;
+
+   jeb->code   = tmp.code + delta;
+   jeb->target = tmp.target + delta;
+   jeb->key= tmp.key + delta;
+}
+
 #endif
diff --git a/tools/objtool/special.c b/tools/objtool/special.c
index 84f001d52322..98ae55b39037 100644
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -30,9 +30,9 @@
 #define EX_ORIG_OFFSET 0
 #define EX_NEW_OFFSET  4
 
-#define JUMP

Re: [RFC PATCH 2/2] KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9

2018-01-02 Thread Suraj Jitindar Singh

On Fri, 2017-12-08 at 17:11 +1100, Paul Mackerras wrote:
> POWER9 has hardware bugs relating to transactional memory and thread
> reconfiguration (changes to hardware SMT mode).  Specifically, the
> core
> does not have enough storage to store a complete checkpoint of all
> the
> architected state for all four threads.  The DD2.2 version of POWER9
> includes hardware modifications designed to allow hypervisor software
> to implement workarounds for these problems.  This patch implements
> those workarounds in KVM code so that KVM guests see a full, working
> transactional memory implementation.
> 
> The problems center around the use of TM suspended state, where the
> CPU has a checkpointed state but execution is not transactional.  The
> workaround is to implement a "fake suspend" state, which looks to the
> guest like suspended state but the CPU does not store a checkpoint.
> In this state, any instruction that would cause a transition to
> transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
> checkpointed state (treclaim) causes a "soft patch" interrupt (vector
> 0x1500) to the hypervisor so that it can be emulated.  The trechkpt
> instruction also causes a soft patch interrupt.
> 
> On POWER9 DD2.2, we avoid returning to the guest in any state which
> would require a checkpoint to be present.  The trechkpt in the guest
> entry path which would normally create that checkpoint is replaced by
> either a transition to fake suspend state, if the guest is in suspend
> state, or a rollback to the pre-transactional state if the guest is
> in
> transactional state.  Fake suspend state is indicated by a flag in
> the
> PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only
> and
> reads back as 0.
> 
> On exit from the guest, if the guest is in fake suspend state, we
> still
> do the treclaim instruction as we would in real suspend state, in
> order
> to get into non-transactional state, but we do not save the resulting
> register state since there was no checkpoint.
> 
> Emulation of the instructions that cause a softpath interrupt is
> handled
> in two paths.  If the guest is in real suspend mode, we call
> kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
> transitioning to transactional state.  This is called before we do
> the treclaim in the guest exit path; because we haven't done
> treclaim,
> we can get back to the guest with the transaction still active.
> If the instruction is a case that kvmhv_p9_tm_emulation_early()
> doesn't
> handle, or if the guest is in fake suspend state, then we proceed to
> do the complete guest exit path and subsequently call
> kvmhv_p9_tm_emulation() in host context with the MMU on.  This
> handles all the cases including the cases that generate program
> interrupts (illegal instruction or TM Bad Thing) and facility
> unavailable interrupts.
> 
> The emulation is reasonably straightforward and is mostly concerned
> with checking for exception conditions and updating the state of
> registers such as MSR and CR0.  The treclaim emulation takes care to
> ensure that the TEXASR register gets updated as if it were the guest
> treclaim instruction that had done failure recording, not the
> treclaim
> done in hypervisor state in the guest exit path.
> 
> Signed-off-by: Paul Mackerras 
> 

With the following patch applied on top of the TM emulation code I was
able to get at least a basic test to run on the guest on real hardware.

[snip]

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c7fe377ff6bc..adf2da6b2211 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3049,6 +3049,7 @@ BEGIN_FTR_SECTION
li  r0, PSSCR_FAKE_SUSPEND
andcr3, r3, r0
mtspr   SPRN_PSSCR, r3
+   ld  r9, HSTATE_KVM_VCPU(r13)
b   1f
 2:
 END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
@@ -3273,8 +3274,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL)
b   9b  /* and return */
 10:stdur1, -PPC_MIN_STKFRM(r1)
/* guest is in transactional state, so simulate rollback */
+   mr  r3, r4
bl  kvmhv_emulate_tm_rollback
nop
+   ld  r4, HSTATE_KVM_VCPU(r13) /* our vcpu pointer has been
trashed */
addir1, r1, PPC_MIN_STKFRM
b   9b
 #endif

Re: [PATCH v1 00/15] ASoC: fsl_ssi: Clean up - program flow level

2018-01-02 Thread Caleb Crome

On Tue, Dec 19, 2017 at 9:00 AM, Nicolin Chen  wrote:
>
> ==Background==
> The fsl_ssi driver was designed for PPC originally and then it has
> been updated to support different modes for i.MX Series, including
> SDMA, I2S Master mode, AC97 and older i.MXs with FIQ, by different
> contributors for different use cases in different coding styles.
>
> Additionally, in order to fix/work-around hardware bugs and design
> flaws, the driver made a lot of compromise so now its program flow
> looks very complicated and it's getting hard to maintain or update.
>
> So I am going to clean up the driver on both coding style level and
> program flow level.
>
> ==Introduction==
> This series of patches is the second set to clean up fsl_ssi driver
> in the program flow level. Any patch here may impact a fundamental
> test case like playback or record.
>
> ==Verification==
> This series of patches require fully tested. I have done such tests
> on i.MX6SoloX with WM8962 using imx_v6_v7_defconfig as:
>  - Playback via I2S Master and Slave mode
>  - Record via I2S Master and Slave mode
>  - Simultaneous playback and record via I2S Master and Slave mode
>  - Background playback with foreground record (starting at different
>time) via I2S Master and Slave mode
>  - Background record with foreground playback (starting at different
>time) via I2S Master and Slave mode
>  * All tests above by hacking offline_config to true in imx51.
>
> Example of uncovered tests: TDM, AC97, PowerPC and FIQ.
>
> Nicolin Chen (15):
>   ASoC: fsl_ssi: Clean up set_dai_tdm_slot()
>   ASoC: fsl_ssi: Maintain a mask of active streams
>   ASoC: fsl_ssi: Rename fsl_ssi_disable_val macro
>   ASoC: fsl_ssi: Clear FIFO directly in fsl_ssi_config()
>   ASoC: fsl_ssi: Clean up helper functions of trigger()
>   ASoC: fsl_ssi: Add DAIFMT define for AC97
>   ASoC: fsl_ssi: Clean up fsl_ssi_setup_regvals()
>   ASoC: fsl_ssi: Set xFEN0 and xFEN1 together
>   ASoC: fsl_ssi: Use snd_soc_init_dma_data instead
>   ASoC: fsl_ssi: Move one-time configurations to dai_probe()
>   ASoC: fsl_ssi: Setup AC97 in dai_probe()
>   ASoC: fsl_ssi: Clean up _fsl_ssi_set_dai_fmt()
>   ASoC: fsl_ssi: Remove cpu_dai_drv from fsl_ssi structure
>   ASoC: fsl_ssi: Move DT related code to a separate probe()
>   ASoC: fsl_ssi: Use ssi->streams instead of reading register
>
>  sound/soc/fsl/fsl_ssi.c | 710 
> 
>  1 file changed, 348 insertions(+), 362 deletions(-)
>
> --
> 2.7.4
>

tested this patch set on MX6 SSI against broonie for-next (4.15-rc5),
no problems.
Do I send a separate Tested-by for each patch, or just the 00/15 one?



Tested-by: Caleb Crome

Re: [PATCH v7 02/10] module: allow symbol exports to be disabled

2018-01-02 Thread Nicolas Pitre

On Tue, 2 Jan 2018, Ard Biesheuvel wrote:

> To allow existing C code to be incorporated into the decompressor or
> the UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
> declarations into nops, and #define it in places where such exports
> are undesirable. Note that this gets rid of a rather dodgy redefine
> of linux/export.h's header guard.
[...]

> --- a/include/linux/export.h
> +++ b/include/linux/export.h
> @@ -83,6 +83,15 @@ extern struct module __this_module;
>   */
>  #define __EXPORT_SYMBOL(sym, sec)=== __KSYM_##sym ===
>  
> +#elif defined(__DISABLE_EXPORTS)
> +
> +/*
> + * Allow symbol exports to be disabled completely so that C code may
> + * be reused in other execution contexts such as the UEFI stub or the
> + * decompressor.
> + */
> +#define __EXPORT_SYMBOL(sym, sec)
> +

I think you should rather put this first thing in the #if sequence so to 
override the defined(__KSYM_DEPS__) case too.  No need to create build 
dependencies for module symbols that you're going to stub out 
afterwards anyway.


Nicolas

Re: [PATCH v7 02/10] module: allow symbol exports to be disabled

2018-01-02 Thread Ard Biesheuvel

On 2 January 2018 at 23:47, Nicolas Pitre  wrote:
> On Tue, 2 Jan 2018, Ard Biesheuvel wrote:
>
>> To allow existing C code to be incorporated into the decompressor or
>> the UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
>> declarations into nops, and #define it in places where such exports
>> are undesirable. Note that this gets rid of a rather dodgy redefine
>> of linux/export.h's header guard.
> [...]
>
>> --- a/include/linux/export.h
>> +++ b/include/linux/export.h
>> @@ -83,6 +83,15 @@ extern struct module __this_module;
>>   */
>>  #define __EXPORT_SYMBOL(sym, sec)=== __KSYM_##sym ===
>>
>> +#elif defined(__DISABLE_EXPORTS)
>> +
>> +/*
>> + * Allow symbol exports to be disabled completely so that C code may
>> + * be reused in other execution contexts such as the UEFI stub or the
>> + * decompressor.
>> + */
>> +#define __EXPORT_SYMBOL(sym, sec)
>> +
>
> I think you should rather put this first thing in the #if sequence so to
> override the defined(__KSYM_DEPS__) case too.  No need to create build
> dependencies for module symbols that you're going to stub out
> afterwards anyway.
>

I wasn't sure, so thanks for clearing that up.

Re: [PATCH 2/2] powerpc/pseries,ps3: panic flush kernel messages before halting system

2018-01-02 Thread David Gibson

On Sun, Dec 24, 2017 at 02:49:23AM +1000, Nicholas Piggin wrote:
> Platforms with a panic handler that halts the system can have problems
> getting kernel messages out, because the panic notifiers are called
> before kernel/panic.c does its flushing of printk buffers an console
> etc.
> 
> This was attempted to be solved with commit a3b2cb30f252 ("powerpc: Do
> not call ppc_md.panic in fadump panic notifier"), but that wasn't the
> right approach and caused other problems, and was reverted by commit
> ab9dbf771ff9.
> 
> Instead, the powernv shutdown paths have already had a similar
> problem, fixed by taking the message flushing sequence from
> kernel/panic.c. That's a little bit ugly, but while we have the code
> duplicated, it will work for this case as well. So have ppc panic
> handlers do the same flushing before they terminate.
> 
> Without this patch, a qemu pseries_le_defconfig guest stops silently
> when issued the nmi command when xmon is off and no crash dumpers
> enabled. Afterwards, an oops is printed by each CPU as expected.
> 
> Fixes: ab9dbf771ff9 ("Revert "powerpc: Do not call ppc_md.panic in fadump 
> panic notifier"")
> Signed-off-by: Nicholas Piggin 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/bug.h |  3 ++-
>  arch/powerpc/kernel/traps.c| 24 
>  arch/powerpc/platforms/powernv/opal.c  | 18 --
>  arch/powerpc/platforms/ps3/setup.c |  1 +
>  arch/powerpc/platforms/pseries/setup.c |  8 +++-
>  5 files changed, 38 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 3c04249bcf39..bca101ee1f32 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -135,7 +135,8 @@ extern void bad_page_fault(struct pt_regs *, unsigned 
> long, int);
>  extern void _exception(int, struct pt_regs *, int, unsigned long);
>  extern void die(const char *, struct pt_regs *, long);
>  extern bool die_will_crash(void);
> -
> +extern void panic_flush_kmsg_start(void);
> +extern void panic_flush_kmsg_end(void);
>  #endif /* !__ASSEMBLY__ */
>  
>  #endif /* __KERNEL__ */
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 109989676776..37c1ea9b0642 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -38,6 +38,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -142,6 +144,28 @@ static int die_owner = -1;
>  static unsigned int die_nest_count;
>  static int die_counter;
>  
> +extern void panic_flush_kmsg_start(void)
> +{
> + /*
> +  * These are mostly taken from kernel/panic.c, but tries to do
> +  * relatively minimal work. Don't use delay functions (TB may
> +  * be broken), don't crash dump (need to set a firmware log),
> +  * don't run notifiers. We do want to get some information to
> +  * Linux console.
> +  */
> + console_verbose();
> + bust_spinlocks(1);
> +}
> +
> +extern void panic_flush_kmsg_end(void)
> +{
> + printk_safe_flush_on_panic();
> + kmsg_dump(KMSG_DUMP_PANIC);
> + bust_spinlocks(0);
> + debug_locks_off();
> + console_flush_on_panic();
> +}
> +
>  static unsigned long oops_begin(struct pt_regs *regs)
>  {
>   int cpu;
> diff --git a/arch/powerpc/platforms/powernv/opal.c 
> b/arch/powerpc/platforms/powernv/opal.c
> index 69b5263fc9e3..c15182765ff5 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -461,24 +461,14 @@ static int opal_recover_mce(struct pt_regs *regs,
>  
>  void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg)
>  {
> - /*
> -  * This is mostly taken from kernel/panic.c, but tries to do
> -  * relatively minimal work. Don't use delay functions (TB may
> -  * be broken), don't crash dump (need to set a firmware log),
> -  * don't run notifiers. We do want to get some information to
> -  * Linux console.
> -  */
> - console_verbose();
> - bust_spinlocks(1);
> + panic_flush_kmsg_start();
> +
>   pr_emerg("Hardware platform error: %s\n", msg);
>   if (regs)
>   show_regs(regs);
>   smp_send_stop();
> - printk_safe_flush_on_panic();
> - kmsg_dump(KMSG_DUMP_PANIC);
> - bust_spinlocks(0);
> - debug_locks_off();
> - console_flush_on_panic();
> +
> + panic_flush_kmsg_end();
>  
>   /*
>* Don't bother to shut things down because this will
> diff --git a/arch/powerpc/platforms/ps3/setup.c 
> b/arch/powerpc/platforms/ps3/setup.c
> index 6244bc849469..77a37520068d 100644
> --- a/arch/powerpc/platforms/ps3/setup.c
> +++ b/arch/powerpc/platforms/ps3/setup.c
> @@ -113,6 +113,7 @@ static void ps3_panic(char *str)
>   printk("   System does not reboot automatically.\n");
>   printk("   Please press POWER button.\n");
>   printk("\n");
> + panic_flush_kmsg_end();
>

Re: [PATCH 1/2] powerpc: System reset avoid interleaving oops using die synchronisation

2018-01-02 Thread David Gibson

On Sun, Dec 24, 2017 at 02:49:22AM +1000, Nicholas Piggin wrote:
> The die() oops path contains a serializing lock to prevent oops
> messages from being interleaved. In the case of a system reset
> initiated oops (e.g., qemu nmi command), __die was being called
> which lacks that synchronisation and oops reports could be
> interleaved across CPUs.
> 
> A recent patch 4388c9b3a6ee7 ("powerpc: Do not send system reset
> request through the oops path") changed this to __die to avoid
> the debugger() call, but there is no real harm to calling it twice
> if the first time fell through. So go back to using die() here.
> This was observed to fix the problem.
> 
> Fixes: 4388c9b3a6ee7 ("powerpc: Do not send system reset request through the 
> oops path")
> Signed-off-by: Nicholas Piggin 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kernel/traps.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index f3eb61be0d30..109989676776 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -337,7 +337,7 @@ void system_reset_exception(struct pt_regs *regs)
>* No debugger or crash dump registered, print logs then
>* panic.
>*/
> - __die("System Reset", regs, SIGABRT);
> + die("System Reset", regs, SIGABRT);
>  
>   mdelay(2*MSEC_PER_SEC); /* Wait a little while for others to print */
>   add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 01/13] powerpc/powernv: Introduce new PHB type for opencapi links

2018-01-02 Thread Andrew Donnellan


On 19/12/17 02:21, Frederic Barrat wrote:

The NPU was already abstracted by opal as a virtual PHB for nvlink,
but it helps to be able to differentiate between a nvlink or opencapi
PHB, as it's not completely transparent to linux. In particular, PE
assignment differs and we'll also need the information in later
patches.

So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
new type PNV_PHB_NPU_OCAPI.

Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
---
  arch/powerpc/platforms/powernv/npu-dma.c  |  2 +-
  arch/powerpc/platforms/powernv/pci-ioda.c | 46 +++
  arch/powerpc/platforms/powernv/pci.c  |  4 +++
  arch/powerpc/platforms/powernv/pci.h  |  8 --
  4 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index f6cbc1a71472..c5899c107d59 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -277,7 +277,7 @@ static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
int64_t rc = 0;
phys_addr_t top = memblock_end_of_DRAM();

-   if (phb->type != PNV_PHB_NPU || !npe->pdev)
+   if (phb->type != PNV_PHB_NPU_NVLINK || !npe->pdev)
return -EINVAL;

rc = pnv_npu_unset_window(npe, 0);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 74903064..c37b5d288f9c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -54,7 +54,8 @@
  #define POWERNV_IOMMU_DEFAULT_LEVELS  1
  #define POWERNV_IOMMU_MAX_LEVELS  5

-static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU" };
+static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU_NVLINK",
+ "NPU_OCAPI" };
  static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);

  void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
@@ -924,7 +925,7 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, 
struct pnv_ioda_pe *pe)
 * Configure PELTV. NPUs don't have a PELTV table so skip
 * configuration on them.
 */
-   if (phb->type != PNV_PHB_NPU)
+   if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != PNV_PHB_NPU_OCAPI)
pnv_ioda_set_peltv(phb, pe, true);

/* Setup reverse map */
@@ -1260,12 +1261,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct 
pci_dev *npu_pdev)
return pe;
  }

-static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
+static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus,
+   struct pnv_ioda_pe *fn(struct pci_dev *npu_pdev))
  {
struct pci_dev *pdev;

list_for_each_entry(pdev, &bus->devices, bus_list)
-   pnv_ioda_setup_npu_PE(pdev);
+   fn(pdev);
  }


I think adding a function pointer here is rather ugly, at this point you 
might as well just do this directly in pnv_pci_ioda_setup_PEs()




  static void pnv_pci_ioda_setup_PEs(void)
@@ -1275,13 +1277,18 @@ static void pnv_pci_ioda_setup_PEs(void)

list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
phb = hose->private_data;
-   if (phb->type == PNV_PHB_NPU) {
+   if (phb->type == PNV_PHB_NPU_NVLINK) {
/* PE#0 is needed for error reporting */
pnv_ioda_reserve_pe(phb, 0);
-   pnv_ioda_setup_npu_PEs(hose->bus);
+   pnv_ioda_setup_npu_PEs(hose->bus,
+   pnv_ioda_setup_npu_PE);
if (phb->model == PNV_PHB_MODEL_NPU2)
pnv_npu2_init(phb);
}
+   if (phb->type == PNV_PHB_NPU_OCAPI) {
+   pnv_ioda_setup_npu_PEs(hose->bus,
+   pnv_ioda_setup_dev_PE);
+   }
}
  }



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

Re: [PATCH 10/13] ocxl: Add Makefile and Kconfig

2018-01-02 Thread Andrew Donnellan


On 19/12/17 02:21, Frederic Barrat wrote:

OCXL_BASE triggers the platform support needed by the driver.

Signed-off-by: Frederic Barrat 
---
  drivers/misc/Kconfig   |  1 +
  drivers/misc/Makefile  |  1 +
  drivers/misc/ocxl/Kconfig  | 25 +
  drivers/misc/ocxl/Makefile | 10 ++
  4 files changed, 37 insertions(+)
  create mode 100644 drivers/misc/ocxl/Kconfig
  create mode 100644 drivers/misc/ocxl/Makefile

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f1a5c2357b14..0534f338c84a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
  source "drivers/misc/genwqe/Kconfig"
  source "drivers/misc/echo/Kconfig"
  source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/ocxl/Kconfig"
  endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5ca5f64df478..73326d54e246 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE)+= cxl/
  obj-$(CONFIG_ASPEED_LPC_CTRL) += aspeed-lpc-ctrl.o
  obj-$(CONFIG_ASPEED_LPC_SNOOP)+= aspeed-lpc-snoop.o
  obj-$(CONFIG_PCI_ENDPOINT_TEST)   += pci_endpoint_test.o
+obj-$(CONFIG_OCXL) += ocxl/

  lkdtm-$(CONFIG_LKDTM) += lkdtm_core.o
  lkdtm-$(CONFIG_LKDTM) += lkdtm_bugs.o
diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
new file mode 100644
index ..4496b61f48db
--- /dev/null
+++ b/drivers/misc/ocxl/Kconfig
@@ -0,0 +1,25 @@
+#
+# Open Coherent Accelerator (OCXL) compatible devices
+#
+
+config OCXL_BASE
+   bool
+   default n
+   select PPC_COPRO_BASE
+
+config OCXL
+   tristate "Support for Open Coherent Accelerators (OCXL)"
+   depends on PPC_POWERNV && PCI && EEH
+   select OCXL_BASE
+   default m
+   help
+
+ Select this option to enable driver support for Open
+ Coherent Accelerators (OCXL).  OCXL is otherwise known as
+ Open Coherent Accelerator Processor Interface (OCAPI).
+ OCAPI allows accelerators in FPGAs to be coherently attached
+ to a CPU through a Open CAPI link.  This driver enables
+ userspace programs to access these accelerators through
+ devices found in /dev/ocxl/


I'd prefer more consistency in how we refer to OpenCAPI. "ocxl" is a 
driver name that we have purely for historical reasons, it's not really 
the name of anything else. I know throughout the various specs and code, 
we use "OCAPI" a lot, but that's not really an abbreviation that should 
be "user-facing".


Something like:

config OCXL
 tristate "OpenCAPI coherent accelerator support"
 help

   Select this option to enable the ocxl driver for Open Coherent 


   Accelerator Processor Interface (OpenCAPI) devices.

   OpenCAPI allows FPGA and ASIC accelerators to be coherently
   attached to a CPU over an OpenCAPI link.

   The ocxl driver enables userspace programs to access these
   accelerators through devices in /dev/ocxl/.

   For more information, see http://opencapi.org.

   If unsure, say N.


+
+ If unsure, say N.
diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
new file mode 100644
index ..f75853411cfd
--- /dev/null
+++ b/drivers/misc/ocxl/Makefile
@@ -0,0 +1,10 @@
+ccflags-$(CONFIG_PPC_WERROR)   += -Werror
+
+ocxl-y += main.o pci.o config.o file.o pasid.o
+ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
+obj-$(CONFIG_OCXL) += ocxl.o
+
+# For tracepoints to include our trace.h from tracepoint infrastructure:
+CFLAGS_trace.o := -I$(src)
+
+# ccflags-y += -DDEBUG



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

Re: [PATCH] selftests/powerpc: Add a test of SEGV error behaviour

2018-01-02 Thread Michael Ellerman

John Sperbeck  writes:

> On Tue, Jan 2, 2018 at 3:03 AM, Michael Ellerman  wrote:
>> Add a test case of the error code reported when we take a SEGV on a
>> mapped but inaccessible area. We broke this recently.
>>
>> Based on a test case from John Sperbeck .
>>
>> Signed-off-by: Michael Ellerman 
...
>
> Looks good to me.
>
> Acked-by: John Sperbeck 

Thanks.

cheers

Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Michael Ellerman

Geert Uytterhoeven  writes:

> On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman  wrote:
>> Christoph Hellwig  writes:
>>
>>> We want to use the dma_direct_ namespace for a generic implementation,
>>> so rename powerpc to the second best choice: dma_nommu_.
>>
>> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
>> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
>> than mapping dynamically.
>>
>> Though I don't have a good idea for a better name, maybe "1to1",
>> "linear", "premapped" ?
>
> "identity"?

I think that would be wrong, but thanks for trying to help :)

The address on the device side is sometimes (often?) offset from the CPU
address. So eg. the device can DMA to RAM address 0x0 using address
0x800.

Identity would imply 0 == 0 etc.

I think "bijective" is the correct term, but that's probably a bit
esoteric.

cheers

Re: [PATCH v3] powerpc/64s: Improve local TLB flush for boot and MCE on POWER9

2018-01-02 Thread Aneesh Kumar K.V

Nicholas Piggin  writes:

> There are several cases outside the normal address space management
> where a CPU's entire local TLB is to be flushed:
>
>   1. Booting the kernel, in case something has left stale entries in
>  the TLB (e.g., kexec).
>
>   2. Machine check, to clean corrupted TLB entries.
>
> One other place where the TLB is flushed, is waking from deep idle
> states. The flush is a side-effect of calling ->cpu_restore with the
> intention of re-setting various SPRs. The flush itself is unnecessary
> because in the first case, the TLB should not acquire new corrupted
> TLB entries as part of sleep/wake (though they may be lost).
>
> This type of TLB flush is coded inflexibly, several times for each CPU
> type, and they have a number of problems with ISA v3.0B:
>
> - The current radix mode of the MMU is not taken into account, it is
>   always done as a hash flushn For IS=2 (LPID-matching flush from host)
>   and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
>   the R field does not match the current radix mode.
>
> - ISA v3.0B hash must flush the partition and process table caches as
>   well.
>
> - ISA v3.0B radix must flush partition and process scoped translations,
>   partition and process table caches, and also the page walk cache.
>
> So consolidate the flushing code and implement it in C and inline asm
> under the mm/ directory with the rest of the flush code. Add ISA v3.0B
> cases for radix and hash, and use the radix flush in radix environment.
>
> Provide a way for IS=2 (LPID flush) to specify the radix mode of the
> partition. Have KVM pass in the radix mode of the guest.
>
> Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
> and move it later in the boot process after, the MMU registers are set
> up and before relocation is first turned on.
>
> The TLB flush is no longer called when restoring from deep idle states.
> This was not be done as a separate step because booting secondaries
> uses the same cpu_restore as idle restore, which needs the TLB flush.
>
> Signed-off-by: Nicholas Piggin 

..

> diff --git a/arch/powerpc/kvm/book3s_hv_ras.c 
> b/arch/powerpc/kvm/book3s_hv_ras.c
> index c356f9a40b24..e61066bb6725 100644
> --- a/arch/powerpc/kvm/book3s_hv_ras.c
> +++ b/arch/powerpc/kvm/book3s_hv_ras.c
> @@ -87,8 +87,7 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
>  DSISR_MC_SLB_PARITY | DSISR_MC_DERAT_MULTI);
>   }
>   if (dsisr & DSISR_MC_TLB_MULTI) {
> - if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
> - cur_cpu_spec->flush_tlb(TLB_INVAL_SCOPE_LPID);
> + tlbiel_all_lpid(vcpu->kvm->arch.radix);

Why use vcpu->kvm-arch.radix? why not TLB_INVAL_SCOPE_LPID?


>   dsisr &= ~DSISR_MC_TLB_MULTI;
>   }
>   /* Any other errors we don't understand? */
> @@ -105,8 +104,7 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu 
> *vcpu)
>   reload_slb(vcpu);
>   break;
>   case SRR1_MC_IFETCH_TLBMULTI:
> - if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
> - cur_cpu_spec->flush_tlb(TLB_INVAL_SCOPE_LPID);
> + tlbiel_all_lpid(vcpu->kvm->arch.radix);
>   break;
>   default:
>   handled = 0;


-aneesh

Re: [PATCH 06/13] ocxl: Driver code for 'generic' opencapi devices

2018-01-02 Thread Andrew Donnellan


On 19/12/17 02:21, Frederic Barrat wrote:

Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Alastair D'Silva 


A bunch of sparse warnings we should look at. (there's a few more that 
appear in later patches too)



---
  drivers/misc/ocxl/config.c| 718 ++
  drivers/misc/ocxl/context.c   | 237 +
  drivers/misc/ocxl/file.c  | 405 +
  drivers/misc/ocxl/link.c  | 610 
  drivers/misc/ocxl/main.c  |  40 +++
  drivers/misc/ocxl/ocxl_internal.h | 200 +++
  drivers/misc/ocxl/pasid.c | 114 ++
  drivers/misc/ocxl/pci.c   | 592 +++
  drivers/misc/ocxl/sysfs.c | 150 
  include/uapi/misc/ocxl.h  |  47 +++
  10 files changed, 3113 insertions(+)
  create mode 100644 drivers/misc/ocxl/config.c
  create mode 100644 drivers/misc/ocxl/context.c
  create mode 100644 drivers/misc/ocxl/file.c
  create mode 100644 drivers/misc/ocxl/link.c
  create mode 100644 drivers/misc/ocxl/main.c
  create mode 100644 drivers/misc/ocxl/ocxl_internal.h
  create mode 100644 drivers/misc/ocxl/pasid.c
  create mode 100644 drivers/misc/ocxl/pci.c
  create mode 100644 drivers/misc/ocxl/sysfs.c
  create mode 100644 include/uapi/misc/ocxl.h

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
new file mode 100644
index ..bb2fde5967e2
--- /dev/null
+++ b/drivers/misc/ocxl/config.c
@@ -0,0 +1,718 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include "ocxl_internal.h"
+
+#define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
+#define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
+
+#define OCXL_DVSEC_AFU_IDX_MASK  GENMASK(5, 0)
+#define OCXL_DVSEC_ACTAG_MASKGENMASK(11, 0)
+#define OCXL_DVSEC_PASID_MASKGENMASK(19, 0)
+#define OCXL_DVSEC_PASID_LOG_MASKGENMASK(4, 0)
+
+#define OCXL_DVSEC_TEMPL_VERSION 0x0
+#define OCXL_DVSEC_TEMPL_NAME0x4
+#define OCXL_DVSEC_TEMPL_AFU_VERSION 0x1C
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL 0x20
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ  0x28
+#define OCXL_DVSEC_TEMPL_MMIO_PP 0x30
+#define OCXL_DVSEC_TEMPL_MMIO_PP_SZ  0x38
+#define OCXL_DVSEC_TEMPL_MEM_SZ  0x3C
+#define OCXL_DVSEC_TEMPL_WWID0x40
+
+#define OCXL_MAX_AFU_PER_FUNCTION 64
+#define OCXL_TEMPL_LEN0x58
+#define OCXL_TEMPL_NAME_LEN   24
+#define OCXL_CFG_TIMEOUT 3
+
+static int find_dvsec(struct pci_dev *dev, int dvsec_id)
+{
+   int vsec = 0;
+   u16 vendor, id;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   &vendor);
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
+   if (vendor == PCI_VENDOR_ID_IBM && id == dvsec_id)
+   return vsec;
+   }
+   return 0;
+}
+
+static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
+{
+   int vsec = 0;
+   u16 vendor, id;
+   u8 idx;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   &vendor);
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
+
+   if (vendor == PCI_VENDOR_ID_IBM &&
+   id == OCXL_DVSEC_AFU_CTRL_ID) {
+   pci_read_config_byte(dev,
+   vsec + OCXL_DVSEC_AFU_CTRL_AFU_IDX,
+   &idx);
+   if (idx == afu_idx)
+   return vsec;
+   }
+   }
+   return 0;
+}
+
+static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *f

Re: [PATCH 04/13] powerpc/powernv: Add platform-specific services for opencapi

2018-01-02 Thread Andrew Donnellan


On 19/12/17 02:21, Frederic Barrat wrote:

Implement a few platform-specific calls which can be used by drivers:

- provide the Transaction Layer capabilities of the host, so that the
   driver can find some common ground and configure the device and host
   appropriately.

- provide the hw interrupt to be used for translation faults raised by
   the NPU

- map/unmap some NPU mmio registers to get the fault context when the
   NPU raises an address translation fault

The rest are wrappers around the previously-introduced opal calls.


Signed-off-by: Frederic Barrat 
---
  arch/powerpc/include/asm/pnv-ocxl.h |  36 ++
  arch/powerpc/platforms/powernv/Makefile |   1 +
  arch/powerpc/platforms/powernv/ocxl.c   | 187 
  3 files changed, 224 insertions(+)
  create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
  create mode 100644 arch/powerpc/platforms/powernv/ocxl.c

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
new file mode 100644
index ..b9ab3f0a9634
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PVN_OCXL_H
+#define _ASM_PVN_OCXL_H


I assume you meant "PNV" here.


+
+#include 
+
+#define PNV_OCXL_TL_MAX_TEMPLATE63
+#define PNV_OCXL_TL_BITS_PER_RATE   4
+#define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
+
+extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
+   char *rate_buf, int rate_buf_size);
+extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+   uint64_t rate_buf_phys, int rate_buf_size);
+
+extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);
+extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
+   void __iomem *tfc, void __iomem *pe_handle);
+extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
+   void __iomem **dar, void __iomem **tfc,
+   void __iomem **pe_handle);
+
+extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
+   void **platform_data);
+extern void pnv_ocxl_spa_release(void *platform_data);
+extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+
+#endif /* _ASM_PVN_OCXL_H */


And here


diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 3732118a0482..6c9d5199a7e2 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
  obj-$(CONFIG_PPC_MEMTRACE)+= memtrace.o
  obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o
  obj-$(CONFIG_PPC_FTW) += nx-ftw.o
+obj-$(CONFIG_OCXL_BASE)+= ocxl.o
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
new file mode 100644
index ..3378b75cf5e5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/ocxl.c
+int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq)
+{
+   int rc;
+
+   rc = of_property_read_u32(dev->dev.of_node, "ibm,opal-xsl-irq", hwirq);
+   if (rc) {
+   dev_err(&dev->dev,
+   "Can't translation xsl interrupt for device\n");


Can't get?


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Geert Uytterhoeven

Hi Michael,

On Wed, Jan 3, 2018 at 7:24 AM, Michael Ellerman  wrote:
> Geert Uytterhoeven  writes:
>
>> On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman  
>> wrote:
>>> Christoph Hellwig  writes:
>>>
 We want to use the dma_direct_ namespace for a generic implementation,
 so rename powerpc to the second best choice: dma_nommu_.
>>>
>>> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
>>> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
>>> than mapping dynamically.
>>>
>>> Though I don't have a good idea for a better name, maybe "1to1",
>>> "linear", "premapped" ?
>>
>> "identity"?
>
> I think that would be wrong, but thanks for trying to help :)
>
> The address on the device side is sometimes (often?) offset from the CPU
> address. So eg. the device can DMA to RAM address 0x0 using address
> 0x800.
>
> Identity would imply 0 == 0 etc.
>
> I think "bijective" is the correct term, but that's probably a bit
> esoteric.

OK, didn't know about the offset.
Then "linear" is what we tend to use, right?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

42 matches

Mail list logo