date:20160307

Re: [PATCH V11 0/4]perf/powerpc: Add ability to sample intr machine state in powerpc

2016-03-07 Thread Anju T


Hi,

Any updates on this?

On Saturday 20 February 2016 10:32 AM, Anju T wrote:


This short patch series adds the ability to sample the interrupted
machine state for each hardware sample.

To test this patchset,
Eg:

$ perf record -I?   # list supported registers

output:
available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 
r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip msr orig_r3 ctr 
link xer ccr softe trap dar dsisr

  usage: perf record [] []
 or: perf record [] --  []

 -I, --intr-regs[=]
   sample selected machine registers on interrupt, use 
-I ? to list register names


$ perf record -I ls   # record machine state at interrupt
$ perf script -D  # read the perf.data file

Sample output obtained for this patchset/ output looks like as follows:

496768515470 0x1988 [0x188]: PERF_RECORD_SAMPLE(IP, 0x1): 4522/4522: 
0xc01e538c period: 1 addr: 0
... intr regs: mask 0x7ff ABI 64-bit
 r00xc01e5e34
 r10xc00fe733f9a0
 r20xc1523100
 r30xc00ffaadeb60
 r40xc3456800
 r50x73a9b5e000
 r60x1e00
 r70x0
 r80x0
 r90x0
 r10   0x1
 r11   0x0
 r12   0x24022822
 r13   0xcfeec180
 r14   0x0
 r15   0xc01e4be18800
 r16   0x0
 r17   0xc00ffaac5000
 r18   0xc00fe733f8a0
 r19   0xc1523100
 r20   0xc009fd1c
 r21   0xc00fcaa69000
 r22   0xc01e4968
 r23   0xc1523100
 r24   0xc00fe733f850
 r25   0xc00fcaa69000
 r26   0xc3b8fcf0
 r27   0xfead
 r28   0x0
 r29   0xc00fcaa69000
 r30   0x1
 r31   0x0
 nip   0xc01dd320
 msr   0x90009032
 orig_r3 0xc01e538c
 ctr   0xc009d550
 link  0xc01e5e34
 xer   0x0
 ccr   0x84022882
 softe 0x0
 trap  0xf01
 dar   0x0
 dsisr 0xf0004006004
  ... thread: :4522:4522
  .. dso: /root/.debug/.build-id/b0/ef11b1a1629e62ac9de75199117ee5ef9469e9
:4522  4522   496.768515:  1 cycles:  c01e538c 
.perf_event_context_sched_in (/boot/vmlinux)



Changes from v10:

- Included SOFTE as suggested by mpe
- The name of registers displayed is  changed from
   gpr* to r* also the macro names changed from
   PERF_REG_POWERPC_GPR* to PERF_REG_POWERPC_R*.
- The conflict in returning the ABI is resolved.
- #define PERF_REG_SP  is again changed to  PERF_REG_POWERPC_R1
- Comment in tools/perf/config/Makefile is updated.
- removed the "Reviewed-By" tag as the patch has logic changes.


Changes from V9:

- Changed the name displayed for link register from "lnk" to "link" in
   tools/perf/arch/powerpc/include/perf_regs.h

changes from V8:

- Corrected the indentation issue in the Makefile mentioned in 3rd patch

Changes from V7:

- Addressed the new line issue in 3rd patch.

Changes from V6:

- Corrected the typo in patch  tools/perf: Map the ID values with register 
names.
   ie #define PERF_REG_SP  PERF_REG_POWERPC_R1 should be #define PERF_REG_SP   
PERF_REG_POWERPC_GPR1


Changes from V5:

- Enabled perf_sample_regs_user also in this patch set.Functions added in
arch/powerpc/perf/perf_regs.c
- Added Maddy's patch to this patchset for enabling -I? option which will
   list the supported register names.


Changes from V4:

- Removed the softe and MQ from all patches
- Switch case is replaced with an array in the 3rd patch

Changes from V3:

- Addressed the comments by Sukadev regarding the nits in the descriptions.
- Modified the subject of first patch.
- Included the sample output in the 3rd patch also.

Changes from V2:

- tools/perf/config/Makefile is moved to the patch tools/perf.
- The patchset is reordered.
- perf_regs_load() function is used for the dwarf unwind test.Since it is not 
required here,
   it is removed from tools/perf/arch/powerpc/include/perf_regs.h
- PERF_REGS_POWERPC_RESULT is removed.

Changes from V1:

- Solved the name missmatch issue in the from and signed-off field of the patch 
series.
- Added necessary comments in the 3rd patch ie perf/powerpc ,as suggested by 
Maddy.



Anju T (3):
   perf/powerpc: assign an id to each powerpc register
   perf/powerpc: add support for sampling intr machine state
   tools/perf: Map the ID values with register names

Madhavan Srinivasan (1):
   tool/perf: Add sample_reg_mask to include all perf_regs regs


  arch/powerpc/Kconfig|  1 +
  arch/powerpc/include/uapi/asm/perf_regs.h   | 50 
  arch/powerpc/perf/Makefile  |  1 +
  arch/powerpc/perf/perf_regs.c   | 91 +
  tools/perf/arch/powerpc/include/perf_regs.h | 69 ++
  tools/perf/arch/powerpc/util/Build  |  1 +
  tools/perf/arch/powerpc/util/perf_regs.c| 49 
  tools/per

Re: [PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table

2016-03-07 Thread Alexey Kardashevskiy


On 03/07/2016 05:25 PM, David Gibson wrote:

On Mon, Mar 07, 2016 at 02:41:14PM +1100, Alexey Kardashevskiy wrote:

The existing in-kernel TCE table for emulated devices contains
guest physical addresses which are accesses by emulated devices.
Since we need to keep this information for VFIO devices too
in order to implement H_GET_TCE, we are reusing it.

This adds IOMMU group list to kvmppc_spapr_tce_table. Each group
will have an iommu_table pointer.

This adds kvm_spapr_tce_attach_iommu_group() helper and its detach
counterpart to manage the lists.

This puts a group when:
- guest copy of TCE table is destroyed when TCE table fd is closed;
- kvm_spapr_tce_detach_iommu_group() is called from
the KVM_DEV_VFIO_GROUP_DEL ioctl handler in the case vfio-pci hotunplug
(will be added in the following patch).

Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/include/asm/kvm_host.h |   8 +++
  arch/powerpc/include/asm/kvm_ppc.h  |   6 ++
  arch/powerpc/kvm/book3s_64_vio.c| 108 
  3 files changed, 122 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2e7c791..2c5c823 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -178,6 +178,13 @@ struct kvmppc_pginfo {
atomic_t refcnt;
  };

+struct kvmppc_spapr_tce_group {
+   struct list_head next;
+   struct rcu_head rcu;
+   struct iommu_group *refgrp;/* for reference counting only */
+   struct iommu_table *tbl;
+};
+
  struct kvmppc_spapr_tce_table {
struct list_head list;
struct kvm *kvm;
@@ -186,6 +193,7 @@ struct kvmppc_spapr_tce_table {
u32 page_shift;
u64 offset; /* in pages */
u64 size;   /* window size in pages */
+   struct list_head groups;
struct page *pages[0];
  };

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2544eda..d1482dc 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -164,6 +164,12 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot, unsigned long porder);
  extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);

+extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
+   unsigned long liobn,
+   phys_addr_t start_addr,
+   struct iommu_group *grp);
+extern void kvm_spapr_tce_detach_iommu_group(struct kvm *kvm,
+   struct iommu_group *grp);
  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce_64 *args);
  extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 2c2d103..846d16d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 

  #include 
  #include 
@@ -95,10 +96,18 @@ static void release_spapr_tce_table(struct rcu_head *head)
struct kvmppc_spapr_tce_table *stt = container_of(head,
struct kvmppc_spapr_tce_table, rcu);
unsigned long i, npages = kvmppc_tce_pages(stt->size);
+   struct kvmppc_spapr_tce_group *kg;

for (i = 0; i < npages; i++)
__free_page(stt->pages[i]);

+   while (!list_empty(&stt->groups)) {
+   kg = list_first_entry(&stt->groups,
+   struct kvmppc_spapr_tce_group, next);
+   list_del(&kg->next);
+   kfree(kg);
+   }
+
kfree(stt);
  }

@@ -129,9 +138,15 @@ static int kvm_spapr_tce_mmap(struct file *file, struct 
vm_area_struct *vma)
  static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
  {
struct kvmppc_spapr_tce_table *stt = filp->private_data;
+   struct kvmppc_spapr_tce_group *kg;

list_del_rcu(&stt->list);

+   list_for_each_entry_rcu(kg, &stt->groups, next)  {
+   iommu_group_put(kg->refgrp);
+   kg->refgrp = NULL;
+   }


What's the reason for this kind of two-phase deletion?  Dereffing the
group here, and setting to NULL, then actually removing from the liast above.


Well, this way I have only one RCU-delayed release_spapr_tce_table(). The 
other option would be to call for each @kg:

- list_del(&kg->next);
- call_rcu()

as release_spapr_tce_table() won't be able to delete them - they are not in 
the list anymore.


I suppose I can reuse kvm_spapr_tce_put_group(), this looks inaccurate...






kvm_put_kvm(stt->kvm);

kvmppc_account_memlimit(
@@ -146,6 +161,98 @@ static const struct file_operations kvm_spapr_tce_fops = {
.release= kvm_spapr_tce_release,
  };

+extern long kvm_spapr_tce_attach_iommu_group(struct kvm *

[PATCH V2] powerpc/mm: Add validation for platform reserved memory ranges

2016-03-07 Thread Anshuman Khandual

For partition running on PHYP, there can be a adjunct partition
which shares the virtual address range with the operating system.
Virtual address ranges which can be used by the adjunct partition
are communicated with virtual device node of the device tree with
a property known as "ibm,reserved-virtual-addresses". This patch
introduces a new function named 'validate_reserved_va_range' which
is called  during initialization to validate that these reserved
virtual address ranges do not overlap with the address ranges used
by the kernel for all supported memory contexts. This helps prevent
the possibility of getting return codes similar to H_RESOURCE for
H_PROTECT hcalls for conflicting HPTE entries.

Signed-off-by: Anshuman Khandual 
---
- Tested on both POWER8 LE and BE platforms

Changes in V2:
- Added braces to the definition of LINUX_VA_BITS
- Adjusted tabs as spaces for the definition of PARTIAL_LINUX_VA_MASK

 arch/powerpc/mm/hash_utils_64.c | 77 +
 1 file changed, 77 insertions(+)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index ba59d59..b47f667 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1564,3 +1564,80 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
/* Finally limit subsequent allocations */
memblock_set_current_limit(ppc64_rma_size);
 }
+
+/*
+ * PAPR says that each reserved virtual address range record
+ * contains three be32 elements which is of toal 12 bytes.
+ * First two be32 elements contain the abbreviated virtual
+ * address (high order 32 bits and low order 32 bits that
+ * generate the abbreviated virtual address of 64 bits which
+ * need to be concatenated with 24 bits of 0 at the end) and
+ * the third be32 element contains the size of the reserved
+ * virtual address range as number of consecutive 4K pages.
+ */
+struct reserved_va_record {
+   __be32  high_addr;
+   __be32  low_addr;
+   __be32  nr_pages_4K;
+};
+
+/*
+ * Linux uses 65 bits (CONTEXT_BITS + ESID_BITS + SID_SHIFT)
+ * of virtual address. As reserved virtual address comes in
+ * as an abbreviated form (64 bits) from the device tree, we
+ * will use a partial address bit mask (65 >> 24) to match it
+ * for simplicity.
+ */
+#define RVA_LESS_BITS  24
+#define LINUX_VA_BITS  (CONTEXT_BITS + ESID_BITS + SID_SHIFT)
+#define PARTIAL_LINUX_VA_MASK  ((1ULL << (LINUX_VA_BITS - RVA_LESS_BITS)) - 1)
+
+static int __init validate_reserved_va_range(void)
+{
+   struct reserved_va_record rva;
+   struct device_node *np;
+   int records, ret, i;
+   __be64 vaddr;
+
+   np = of_find_node_by_name(NULL, "vdevice");
+   if (!np)
+   return -ENODEV;
+
+   records = of_property_count_elems_of_size(np,
+   "ibm,reserved-virtual-addresses",
+   sizeof(struct reserved_va_record));
+   if (records < 0)
+   return records;
+
+   for (i = 0; i < records; i++) {
+   ret = of_property_read_u32_index(np,
+   "ibm,reserved-virtual-addresses",
+   3 * i, &rva.high_addr);
+   if (ret)
+   return ret;
+
+   ret = of_property_read_u32_index(np,
+   "ibm,reserved-virtual-addresses",
+   3 * i + 1, &rva.low_addr);
+   if (ret)
+   return ret;
+
+   ret = of_property_read_u32_index(np,
+   "ibm,reserved-virtual-addresses",
+   3 * i + 2, &rva.nr_pages_4K);
+   if (ret)
+   return ret;
+
+   vaddr =  rva.high_addr;
+   vaddr =  (vaddr << 32) | rva.low_addr;
+   if (vaddr & cpu_to_be64(~PARTIAL_LINUX_VA_MASK))
+   continue;
+
+   pr_err("RVA [0x%llx00 (0x%x in bytes)] overlapped\n",
+   vaddr, rva.nr_pages_4K * 4096);
+   BUG();
+   }
+   of_node_put(np);
+   return 0;
+}
+__initcall(validate_reserved_va_range);
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: How to merge? (was Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc)

2016-03-07 Thread Michael Ellerman

On Fri, 2016-03-04 at 09:56 +0100, Jiri Kosina wrote:
> On Fri, 4 Mar 2016, Michael Ellerman wrote:
>
> > Obviously it depends heavily on the content of my series, which will go into
> > powerpc#next, so it would make sense if this went there too.
> >
> > I don't see any changes in linux-next for livepatch, so merging it via 
> > powerpc
> > would probably work fine and not cause any conflicts, unless there's some
> > livepatch changes pending for 4.6 that aren't in linux-next yet?
> >
> > The other option is that I put my ftrace changes and this in a topic branch
> > (based on v4.5-rc3), and then that can be merged into both powerpc#next and 
> > the
> > livepatch tree.
>
> This aligns with my usual workflow, so that'd be my preferred way of doing
> things; i.e. you put all the ftrace changes into a separate topic branch,
> and then
>
> - you pull that branch into powerpc#next
> - I pull that branch into livepatching tree
> - I apply the ppc livepatching support on top of that
> - I send a pull request to Linus only after powerpc#next gets merged to
>   Linus' tree
>
> Sounds good?

Yep, here it is:

  
https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/log/?h=topic/mprofile-kernel

aka:

  git fetch git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/mprofile-kernel


I haven't merged it into my next yet, but I will tomorrow unless you tell me
there's something wrong with it.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 02/14] powerpc/mm: use _PAGE_READ to indicate Read access

2016-03-07 Thread Aneesh Kumar K.V

This split _PAGE_RW bit to _PAGE_READ and _PAGE_WRITE. It also remove
the dependency on _PAGE_USER for implying read only. Few things to note
here is that, we have read implied with write and execute permission.
Hence we should always find _PAGE_READ set on hash pte fault.

We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa do depend
on marking a prot none pte _PAGE_WRITE. (For more details look at
b191f9b106ea "mm: numa: preserve PTE write permissions across a NUMA hinting 
fault")

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  4 +--
 arch/powerpc/include/asm/book3s/64/hash.h | 35 ---
 arch/powerpc/include/asm/pte-common.h |  5 
 arch/powerpc/mm/hash64_4k.c   |  2 +-
 arch/powerpc/mm/hash64_64k.c  |  4 +--
 arch/powerpc/mm/hash_utils_64.c   |  9 ---
 arch/powerpc/mm/hugepage-hash64.c |  2 +-
 arch/powerpc/mm/hugetlbpage-hash64.c  |  2 +-
 arch/powerpc/mm/hugetlbpage.c |  4 +--
 arch/powerpc/mm/pgtable.c |  4 +--
 arch/powerpc/mm/pgtable_64.c  |  5 ++--
 arch/powerpc/platforms/cell/spu_base.c|  2 +-
 arch/powerpc/platforms/cell/spufs/fault.c |  4 +--
 drivers/misc/cxl/fault.c  |  4 +--
 14 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0a7956a80a08..279ded72f1db 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -291,10 +291,10 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm, unsigned long addr,
  pmd_t *pmdp)
 {
 
-   if ((pmd_val(*pmdp) & _PAGE_RW) == 0)
+   if ((pmd_val(*pmdp) & _PAGE_WRITE) == 0)
return;
 
-   pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW, 0);
+   pmd_hugepage_update(mm, addr, pmdp, _PAGE_WRITE, 0);
 }
 
 #endif /*  CONFIG_TRANSPARENT_HUGEPAGE */
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 2113de051824..f092d83fa623 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -16,8 +16,10 @@
 #define _PAGE_BIT_SWAP_TYPE0
 
 #define _PAGE_EXEC 0x1 /* execute permission */
-#define _PAGE_RW   0x2 /* read & write access allowed */
+#define _PAGE_WRITE0x2 /* write access allowed */
 #define _PAGE_READ 0x4 /* read access allowed */
+#define _PAGE_RW   (_PAGE_READ | _PAGE_WRITE)
+#define _PAGE_RWX  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 #define _PAGE_USER 0x8 /* page may be accessed by userspace */
 #define _PAGE_GUARDED  0x00010 /* G: guarded (side-effect) page */
 /* M (memory coherence) is always set in the HPTE, so we don't need it here */
@@ -147,8 +149,8 @@
  */
 #define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \
 _PAGE_WRITETHRU | _PAGE_4K_PFN | \
-_PAGE_USER | _PAGE_ACCESSED |  \
-_PAGE_RW |  _PAGE_DIRTY | _PAGE_EXEC | \
+_PAGE_USER | _PAGE_ACCESSED |  _PAGE_READ |\
+_PAGE_WRITE |  _PAGE_DIRTY | _PAGE_EXEC | \
 _PAGE_SOFT_DIRTY)
 /*
  * We define 2 sets of base prot bits, one for basic pages (ie,
@@ -173,10 +175,12 @@
 #define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
 #define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | \
 _PAGE_EXEC)
-#define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER )
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
-#define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER )
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ| \
+_PAGE_EXEC)
+#define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ| \
+_PAGE_EXEC)
 
 #define __P000 PAGE_NONE
 #define __P001 PAGE_READONLY
@@ -300,19 +304,19 @@ static inline void ptep_set_wrprotect(struct mm_struct 
*mm, unsigned long addr,
  pte_t *ptep)
 {
 
-   if ((pte_val(*ptep) & _PAGE_RW) == 0)
+   if ((pte_val(*ptep) & _PAGE_WRITE) == 0)
return;
 
-   pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
+   pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 0);
 }
 
 static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long

[PATCH 05/14] powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED

2016-03-07 Thread Aneesh Kumar K.V

_PAGE_PRIVILEGED means the page can be accessed only by kernel. This is done
to keep pte bits similar to PowerISA 3.0 radix PTE format. User
pages are now makred by clearing _PAGE_PRIVILEGED bit.

Previously we allowed kernel to have a privileged page
in the lower address range(USER_REGION). With this patch such access
is denied.

We also prevent a kernel access to a non-privileged page in
higher address range (ie, REGION_ID != 0). Both the above access
scenario should never happen.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 34 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h | 18 ++-
 arch/powerpc/mm/hash64_4k.c  |  2 +-
 arch/powerpc/mm/hash64_64k.c |  4 ++--
 arch/powerpc/mm/hash_utils_64.c  | 17 --
 arch/powerpc/mm/hugepage-hash64.c|  2 +-
 arch/powerpc/mm/hugetlbpage-hash64.c |  3 ++-
 arch/powerpc/mm/hugetlbpage.c|  2 +-
 arch/powerpc/mm/pgtable.c| 15 ++--
 arch/powerpc/mm/pgtable_64.c | 15 +---
 arch/powerpc/platforms/cell/spufs/fault.c|  2 +-
 drivers/misc/cxl/fault.c |  5 ++--
 12 files changed, 80 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index f092d83fa623..fbefbaa92736 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -20,7 +20,7 @@
 #define _PAGE_READ 0x4 /* read access allowed */
 #define _PAGE_RW   (_PAGE_READ | _PAGE_WRITE)
 #define _PAGE_RWX  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
-#define _PAGE_USER 0x8 /* page may be accessed by userspace */
+#define _PAGE_PRIVILEGED   0x8 /* page can only be access by kernel */
 #define _PAGE_GUARDED  0x00010 /* G: guarded (side-effect) page */
 /* M (memory coherence) is always set in the HPTE, so we don't need it here */
 #define _PAGE_COHERENT 0x0
@@ -114,10 +114,13 @@
 #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
 #endif /* CONFIG_PPC_MM_SLICES */
 
-/* No separate kernel read-only */
-#define _PAGE_KERNEL_RW(_PAGE_RW | _PAGE_DIRTY) /* user access 
blocked by key */
+/*
+ * No separate kernel read-only, user access blocked by key
+ */
+#define _PAGE_KERNEL_RW(_PAGE_PRIVILEGED | _PAGE_RW | 
_PAGE_DIRTY)
 #define _PAGE_KERNEL_RO _PAGE_KERNEL_RW
-#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
+#define _PAGE_KERNEL_RWX   (_PAGE_PRIVILEGED | _PAGE_DIRTY | \
+_PAGE_RW | _PAGE_EXEC)
 
 /* Strong Access Ordering */
 #define _PAGE_SAO  (_PAGE_WRITETHRU | _PAGE_NO_CACHE | 
_PAGE_COHERENT)
@@ -149,7 +152,7 @@
  */
 #define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \
 _PAGE_WRITETHRU | _PAGE_4K_PFN | \
-_PAGE_USER | _PAGE_ACCESSED |  _PAGE_READ |\
+_PAGE_PRIVILEGED | _PAGE_ACCESSED |  _PAGE_READ |\
 _PAGE_WRITE |  _PAGE_DIRTY | _PAGE_EXEC | \
 _PAGE_SOFT_DIRTY)
 /*
@@ -171,16 +174,13 @@
  *
  * Note due to the way vm flags are laid out, the bits are XWR
  */
-#define PAGE_NONE  __pgprot(_PAGE_BASE)
-#define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | \
-_PAGE_EXEC)
-#define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ| \
-_PAGE_EXEC)
-#define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_READ| \
-_PAGE_EXEC)
+#define PAGE_NONE  __pgprot(_PAGE_BASE | _PAGE_PRIVILEGED)
+#define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_RW)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_RW | _PAGE_EXEC)
+#define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_READ)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
+#define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_READ)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
 
 #define __P000 PAGE_NONE
 #define __P001 PAGE_READONLY
@@ -421,8 +421,8 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte)
  */
 static inline int pte_protnone(pte_t pte)
 {
-   return (pte_val(pte) &
-   (_PAGE_PRESENT | _PAGE_USER)) == _PAGE_PRESENT;
+   return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PRIVILEGED)) ==
+   (_PAGE_PRESENT | _PAGE_PRIVILEGED);
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.

[PATCH 06/14] powerpc/mm: Remove RPN_SHIFT and RPN_SIZE

2016-03-07 Thread Aneesh Kumar K.V

We had them as page size dependent. Use PAGE_SIZE instead. While there
remove them and define RPN_MASK better.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  4 
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 11 +--
 arch/powerpc/include/asm/book3s/64/hash.h | 10 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  4 ++--
 arch/powerpc/mm/pgtable_64.c  |  2 +-
 5 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 5f08a0832238..772850e517f3 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -51,10 +51,6 @@
 #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | \
 _PAGE_F_SECOND | _PAGE_F_GIX)
 
-/* shift to put page number into pte */
-#define PTE_RPN_SHIFT  (12)
-#define PTE_RPN_SIZE   (45)/* gives 57-bit real addresses */
-
 #define _PAGE_4K_PFN   0
 #ifndef __ASSEMBLY__
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 279ded72f1db..a053e8a1d0d1 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -40,15 +40,6 @@
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_F_SECOND | \
 _PAGE_F_GIX | _PAGE_HASHPTE | _PAGE_COMBO)
-
-/* Shift to put page number into pte.
- *
- * That gives us a max RPN of 41 bits, which means a max of 57 bits
- * of addressable physical space, or 53 bits for the special 4k PFNs.
- */
-#define PTE_RPN_SHIFT  (16)
-#define PTE_RPN_SIZE   (41)
-
 /*
  * we support 16 fragments per PTE page of 64K size.
  */
@@ -125,7 +116,7 @@ extern bool __rpte_sub_valid(real_pte_t rpte, unsigned long 
index);
(((pte) & _PAGE_COMBO)? MMU_PAGE_4K: MMU_PAGE_64K)
 
 #define remap_4k_pfn(vma, addr, pfn, prot) \
-   (WARN_ON(((pfn) >= (1UL << PTE_RPN_SIZE))) ? -EINVAL :  \
+   (WARN_ON(((pfn) > (PTE_RPN_MASK >> PAGE_SHIFT))) ? -EINVAL :\
remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE,\
__pgprot(pgprot_val((prot)) | _PAGE_4K_PFN)))
 
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index fbefbaa92736..8ccb2970f30f 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -136,10 +136,10 @@
 #define PTE_ATOMIC_UPDATES 1
 #define _PTE_NONE_MASK _PAGE_HPTEFLAGS
 /*
- * The mask convered by the RPN must be a ULL on 32-bit platforms with
- * 64-bit PTEs
+ * We support 57 bit real address in pte. Clear everything above 57, and
+ * every thing below PAGE_SHIFT;
  */
-#define PTE_RPN_MASK   (((1UL << PTE_RPN_SIZE) - 1) << PTE_RPN_SHIFT)
+#define PTE_RPN_MASK   (((1UL << 57) - 1) & (PAGE_MASK))
 /*
  * _PAGE_CHG_MASK masks of bits that are to be preserved across
  * pgprot changes
@@ -439,13 +439,13 @@ static inline int pte_present(pte_t pte)
  */
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
 {
-   return __ptepte_basic_t)(pfn) << PTE_RPN_SHIFT) & PTE_RPN_MASK) |
+   return __ptepte_basic_t)(pfn) << PAGE_SHIFT) & PTE_RPN_MASK) |
 pgprot_val(pgprot));
 }
 
 static inline unsigned long pte_pfn(pte_t pte)
 {
-   return (pte_val(pte) & PTE_RPN_MASK) >> PTE_RPN_SHIFT;
+   return (pte_val(pte) & PTE_RPN_MASK) >> PAGE_SHIFT;
 }
 
 /* Generic modifiers for PTE bits */
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 97d06de8dbf6..144680382306 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -172,10 +172,10 @@ extern struct page *pgd_page(pgd_t pgd);
 #define SWP_TYPE_BITS 5
 #define __swp_type(x)  (((x).val >> _PAGE_BIT_SWAP_TYPE) \
& ((1UL << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x)(((x).val & PTE_RPN_MASK) >> 
PTE_RPN_SHIFT)
+#define __swp_offset(x)(((x).val & PTE_RPN_MASK) >> PAGE_SHIFT)
 #define __swp_entry(type, offset)  ((swp_entry_t) { \
((type) << _PAGE_BIT_SWAP_TYPE) \
-   | (((offset) << PTE_RPN_SHIFT) & PTE_RPN_MASK)})
+   | (((offset) << PAGE_SHIFT) & PTE_RPN_MASK)})
 /*
  * swp_entry_t must be independent of pte bits. We build a swp_entry_t from
  * swap type and offset we get from swap and convert that to pte to find a
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 441905f7bba4..1254cf107871 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -762,7 +762,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
 {
unsi

[PATCH 07/14] powerpc/mm: Update _PAGE_KERNEL_RO

2016-03-07 Thread Aneesh Kumar K.V

PS3 had used PPP bit hack to implement a read only mapping in the
kernel area. Since we are bolt mapping the ioremap area, it used
the pte flags _PAGE_PRESENT | _PAGE_USER to get a PPP value of 0x3
there by resulting in a read only mapping. This means the area
can be accessed by user space, but kernel will never return such an
address to user space.

Fix this by doing a read only kernel mapping using PPP bits 0b110

This also allows us to do read only kernel mapping for radix in later
patches.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h |  4 ++--
 arch/powerpc/mm/hash_utils_64.c   | 17 +++--
 arch/powerpc/platforms/ps3/spu.c  |  2 +-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 8ccb2970f30f..c2b567456796 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -115,10 +115,10 @@
 #endif /* CONFIG_PPC_MM_SLICES */
 
 /*
- * No separate kernel read-only, user access blocked by key
+ * user access blocked by key
  */
 #define _PAGE_KERNEL_RW(_PAGE_PRIVILEGED | _PAGE_RW | 
_PAGE_DIRTY)
-#define _PAGE_KERNEL_RO _PAGE_KERNEL_RW
+#define _PAGE_KERNEL_RO (_PAGE_PRIVILEGED | _PAGE_READ)
 #define _PAGE_KERNEL_RWX   (_PAGE_PRIVILEGED | _PAGE_DIRTY | \
 _PAGE_RW | _PAGE_EXEC)
 
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 630603f74056..c81c08aaff0e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -167,14 +167,19 @@ unsigned long htab_convert_pte_flags(unsigned long 
pteflags)
if ((pteflags & _PAGE_EXEC) == 0)
rflags |= HPTE_R_N;
/*
-* PP bits:
+* PPP bits:
 * Linux uses slb key 0 for kernel and 1 for user.
-* kernel areas are mapped with PP=00
-* and there is no kernel RO (_PAGE_KERNEL_RO).
-* User area is mapped with PP=0x2 for read/write
-* or PP=0x3 for read-only (including writeable but clean pages).
+* kernel RW areas are mapped with PPP=0b000
+* User area is mapped with PPP=0b010 for read/write
+* or PPP=0b011 for read-only (including writeable but clean pages).
 */
-   if (!(pteflags & _PAGE_PRIVILEGED)) {
+   if (pteflags & _PAGE_PRIVILEGED) {
+   /*
+* Kernel read only mapped with ppp bits 0b110
+*/
+   if (!(pteflags & _PAGE_WRITE))
+   rflags |= (HPTE_R_PP0 | 0x2);
+   } else {
if (pteflags & _PAGE_RWX)
rflags |= 0x2;
if (!((pteflags & _PAGE_WRITE) && (pteflags & _PAGE_DIRTY)))
diff --git a/arch/powerpc/platforms/ps3/spu.c b/arch/powerpc/platforms/ps3/spu.c
index a0bca05e26b0..5e8a40f9739f 100644
--- a/arch/powerpc/platforms/ps3/spu.c
+++ b/arch/powerpc/platforms/ps3/spu.c
@@ -205,7 +205,7 @@ static void spu_unmap(struct spu *spu)
 static int __init setup_areas(struct spu *spu)
 {
struct table {char* name; unsigned long addr; unsigned long size;};
-   static const unsigned long shadow_flags = _PAGE_NO_CACHE | 3;
+   unsigned long shadow_flags = 
pgprot_val(pgprot_noncached_wc(PAGE_KERNEL_RO));
 
spu_pdata(spu)->shadow = __ioremap(spu_pdata(spu)->shadow_addr,
   sizeof(struct spe_shadow),
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 10/14] powerpc/mm: Use generic version of pmdp_clear_flush_young

2016-03-07 Thread Aneesh Kumar K.V

The radix variant is going to require a flush_pmd_tlb_range. With
flush_pmd_tlb_range added, pmdp_clear_flush_young is same as the generic
version. So drop the powerpc specific variant

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  3 ---
 arch/powerpc/mm/pgtable_64.c | 13 +++--
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 144680382306..e7171323884a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -298,9 +298,6 @@ extern int pmdp_set_access_flags(struct vm_area_struct *vma,
 #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG
 extern int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 unsigned long address, pmd_t *pmdp);
-#define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH
-extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp);
 
 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
 extern pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index db924c54f370..98c91ad18ba7 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -593,22 +593,15 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, 
unsigned long address,
return pmd;
 }
 
-int pmdp_test_and_clear_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp)
-{
-   return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
-}
-
 /*
  * We currently remove entries from the hashtable regardless of whether
- * the entry was young or dirty. The generic routines only flush if the
- * entry was young or dirty which is not good enough.
+ * the entry was young or dirty.
  *
  * We should be more intelligent about this but for the moment we override
  * these functions and force a tlb flush unconditionally
  */
-int pmdp_clear_flush_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp)
+int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
 {
return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
 }
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 11/14] powerpc/mm: Use generic version of ptep_clear_flush_young

2016-03-07 Thread Aneesh Kumar K.V

The radix variant is going to require a flush_tlb_range. With
flush_tlb_range added, ptep_clear_flush_young is same as the generic
version. So drop the powerpc specific variant

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index edd3d47ef9a4..f04c7ae810b2 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -275,6 +275,13 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
return old;
 }
 
+/*
+ * We currently remove entries from the hashtable regardless of whether
+ * the entry was young or dirty.
+ *
+ * We should be more intelligent about this but for the moment we override
+ * these functions and force a tlb flush unconditionally
+ */
 static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pte_t *ptep)
 {
@@ -313,22 +320,6 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 1);
 }
 
-/*
- * We currently remove entries from the hashtable regardless of whether
- * the entry was young or dirty. The generic routines only flush if the
- * entry was young or dirty which is not good enough.
- *
- * We should be more intelligent about this but for the moment we override
- * these functions and force a tlb flush unconditionally
- */
-#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
-#define ptep_clear_flush_young(__vma, __address, __ptep)   \
-({ \
-   int __young = __ptep_test_and_clear_young((__vma)->vm_mm, __address, \
- __ptep);  \
-   __young;\
-})
-
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 12/14] powerpc/mm: Move common data structure between radix and hash to book3s 64 generic headers

2016-03-07 Thread Aneesh Kumar K.V

We want to use mmu_context_t between radix and hash. Move that mmuh.h

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  6 +--
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 61 ++-
 arch/powerpc/include/asm/book3s/64/mmu.h  | 72 +++
 arch/powerpc/include/asm/mmu.h| 11 ++--
 4 files changed, 85 insertions(+), 65 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/mmu.h

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 16f513e5cbd7..b82e063494dd 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_MMU_HASH32_H_
-#define _ASM_POWERPC_MMU_HASH32_H_
+#ifndef _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
+#define _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
 /*
  * 32-bit hash table MMU support
  */
@@ -90,4 +90,4 @@ typedef struct {
 #define mmu_virtual_psize  MMU_PAGE_4K
 #define mmu_linear_psize   MMU_PAGE_256M
 
-#endif /* _ASM_POWERPC_MMU_HASH32_H_ */
+#endif /* _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_ */
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 0cea4807e26f..ce73736b42db 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_MMU_HASH64_H_
-#define _ASM_POWERPC_MMU_HASH64_H_
+#ifndef _ASM_POWERPC_BOOK3S_64_MMU_HASH_H_
+#define _ASM_POWERPC_BOOK3S_64_MMU_HASH_H_
 /*
  * PowerPC64 memory management structures
  *
@@ -127,24 +127,6 @@ extern struct hash_pte *htab_address;
 extern unsigned long htab_size_bytes;
 extern unsigned long htab_hash_mask;
 
-/*
- * Page size definition
- *
- *shift : is the "PAGE_SHIFT" value for that page size
- *sllp  : is a bit mask with the value of SLB L || LP to be or'ed
- *directly to a slbmte "vsid" value
- *penc  : is the HPTE encoding mask for the "LP" field:
- *
- */
-struct mmu_psize_def
-{
-   unsigned intshift;  /* number of bits */
-   int penc[MMU_PAGE_COUNT];   /* HPTE encoding */
-   unsigned inttlbiel; /* tlbiel supported for that page size */
-   unsigned long   avpnm;  /* bits to mask out in AVPN in the HPTE */
-   unsigned long   sllp;   /* SLB L||LP (exact mask to use in slbmte) */
-};
-extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
 static inline int shift_to_mmu_psize(unsigned int shift)
 {
@@ -210,11 +192,6 @@ static inline int segment_shift(int ssize)
 /*
  * The current system page and segment sizes
  */
-extern int mmu_linear_psize;
-extern int mmu_virtual_psize;
-extern int mmu_vmalloc_psize;
-extern int mmu_vmemmap_psize;
-extern int mmu_io_psize;
 extern int mmu_kernel_ssize;
 extern int mmu_highuser_ssize;
 extern u16 mmu_slb_size;
@@ -512,38 +489,6 @@ static inline void subpage_prot_free(struct mm_struct *mm) 
{}
 static inline void subpage_prot_init_new_context(struct mm_struct *mm) { }
 #endif /* CONFIG_PPC_SUBPAGE_PROT */
 
-typedef unsigned long mm_context_id_t;
-struct spinlock;
-
-typedef struct {
-   mm_context_id_t id;
-   u16 user_psize; /* page size index */
-
-#ifdef CONFIG_PPC_MM_SLICES
-   u64 low_slices_psize;   /* SLB page size encodings */
-   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
-#else
-   u16 sllp;   /* SLB page size encoding */
-#endif
-   unsigned long vdso_base;
-#ifdef CONFIG_PPC_SUBPAGE_PROT
-   struct subpage_prot_table spt;
-#endif /* CONFIG_PPC_SUBPAGE_PROT */
-#ifdef CONFIG_PPC_ICSWX
-   struct spinlock *cop_lockp; /* guard acop and cop_pid */
-   unsigned long acop; /* mask of enabled coprocessor types */
-   unsigned int cop_pid;   /* pid value used with coprocessors */
-#endif /* CONFIG_PPC_ICSWX */
-#ifdef CONFIG_PPC_64K_PAGES
-   /* for 4K PTE fragment support */
-   void *pte_frag;
-#endif
-#ifdef CONFIG_SPAPR_TCE_IOMMU
-   struct list_head iommu_group_mem_list;
-#endif
-} mm_context_t;
-
-
 #if 0
 /*
  * The code below is equivalent to this function for arguments
@@ -613,4 +558,4 @@ unsigned htab_shift_for_mem_size(unsigned long mem_size);
 
 #endif /* __ASSEMBLY__ */
 
-#endif /* _ASM_POWERPC_MMU_HASH64_H_ */
+#endif /* _ASM_POWERPC_BOOK3S_64_MMU_HASH_H_ */
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
new file mode 100644
index ..aadb0bbc5c71
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -0,0 +1,72 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_MMU_H_
+#define _ASM_POWERPC_BOOK3S_64_MMU_H_
+
+#ifndef __ASSEMBLY__
+/*
+ * Page size definition
+ *
+ *shift : is the "PAGE_SHIFT" value for that page size
+ *sllp  : is a bit mask with the value of SLB L || LP to be or'ed
+ *directly to a slbmte "vsid" value
+ *penc  : is the HPTE encoding mask

[PATCH 13/14] powerpc/mm/power9: Add partition table format

2016-03-07 Thread Aneesh Kumar K.V

We also add mach dep call back for updating partition table entry.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 31 +--
 arch/powerpc/include/asm/machdep.h   |  1 +
 arch/powerpc/include/asm/reg.h   |  1 +
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index aadb0bbc5c71..b86786f2521c 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -21,12 +21,39 @@ struct mmu_psize_def {
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 #endif /* __ASSEMBLY__ */
 
-#ifdef CONFIG_PPC_STD_MMU_64
 /* 64-bit classic hash table MMU */
 #include 
-#endif
 
 #ifndef __ASSEMBLY__
+/*
+ * ISA 3.0 partiton and process table entry format
+ */
+struct prtb_entry {
+   __be64 prtb0;
+   __be64 prtb1;
+};
+extern struct prtb_entry *process_tb;
+
+struct patb_entry {
+   __be64 patb0;
+   __be64 patb1;
+};
+extern struct patb_entry *partition_tb;
+
+#define PATB_HR(1UL << 63)
+#define PATB_GR(1UL << 63)
+#define RPDB_MASK  0x000fUL
+#define RPDB_SHIFT (1UL << 8)
+/*
+ * Limit process table to PAGE_SIZE table. This
+ * also limit the max pid we can support.
+ * MAX_USER_CONTEXT * 16 bytes of space.
+ */
+#define PRTB_SIZE_SHIFT(CONTEXT_BITS + 4)
+/*
+ * Power9 currently only support 64K partition table size.
+ */
+#define PATB_SIZE_SHIFT16
 
 typedef unsigned long mm_context_id_t;
 struct spinlock;
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index fd22442d30a9..6bdcd0da9e21 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -256,6 +256,7 @@ struct machdep_calls {
 #ifdef CONFIG_ARCH_RANDOM
int (*get_random_seed)(unsigned long *v);
 #endif
+   int (*update_partition_table)(u64);
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 52ed654d01ba..257251ada3a3 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -587,6 +587,7 @@
 #define SPRN_PIR   0x3FF   /* Processor Identification Register */
 #endif
 #define SPRN_TIR   0x1BE   /* Thread Identification Register */
+#define SPRN_PTCR  0x1D0   /* Partition table control Register */
 #define SPRN_PSPB  0x09F   /* Problem State Priority Boost reg */
 #define SPRN_PTEHI 0x3D5   /* 981 7450 PTE HI word (S/W TLB load) */
 #define SPRN_PTELO 0x3D6   /* 982 7450 PTE LO word (S/W TLB load) */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 09/14] powerpc/mm: Drop WIMG in favour of new constants

2016-03-07 Thread Aneesh Kumar K.V

PowerISA 3.0 introduce three pte bits with the below meaning
000 ->  Normal Memory
001 ->  Strong Access Order
010 -> Non idempotent I/O ( Also cache inhibited and guarded)
100 -> Tolerant I/O (Cache inhibited)

We drop the existing WIMG bits in linux page table in favour of above
contants. We loose _PAGE_WRITETHRU with this conversion. We only use
writethru via pgprot_cached_wthru() which is used by fbdev/controlfb.c
which is Apple control display and also PPC32.

With respect to _PAGE_COHERENCE, we have been marking hpte
always coherent for some time now. htab_convert_pte_flags always added
HPTE_R_M.

NOTE: KVM changes need closer review.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h | 47 +--
 arch/powerpc/include/asm/kvm_book3s_64.h  | 29 ++-
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 11 
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 12 
 arch/powerpc/mm/hash64_64k.c  |  2 +-
 arch/powerpc/mm/hash_utils_64.c   | 14 -
 arch/powerpc/mm/pgtable.c |  2 +-
 arch/powerpc/mm/pgtable_64.c  |  4 ---
 arch/powerpc/platforms/pseries/lpar.c |  4 ---
 9 files changed, 48 insertions(+), 77 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index c2b567456796..edd3d47ef9a4 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -21,11 +21,9 @@
 #define _PAGE_RW   (_PAGE_READ | _PAGE_WRITE)
 #define _PAGE_RWX  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 #define _PAGE_PRIVILEGED   0x8 /* page can only be access by kernel */
-#define _PAGE_GUARDED  0x00010 /* G: guarded (side-effect) page */
-/* M (memory coherence) is always set in the HPTE, so we don't need it here */
-#define _PAGE_COHERENT 0x0
-#define _PAGE_NO_CACHE 0x00020 /* I: cache inhibit */
-#define _PAGE_WRITETHRU0x00040 /* W: cache write-through */
+#define _PAGE_SAO  0x00010 /* Strong access order */
+#define _PAGE_NON_IDEMPOTENT   0x00020 /* non idempotent memory */
+#define _PAGE_TOLERANT 0x00040 /* tolerant memory, cache inhibited */
 #define _PAGE_DIRTY0x00080 /* C: page changed */
 #define _PAGE_ACCESSED 0x00100 /* R: page referenced */
 #define _PAGE_SPECIAL  0x00400 /* software: special page */
@@ -122,9 +120,6 @@
 #define _PAGE_KERNEL_RWX   (_PAGE_PRIVILEGED | _PAGE_DIRTY | \
 _PAGE_RW | _PAGE_EXEC)
 
-/* Strong Access Ordering */
-#define _PAGE_SAO  (_PAGE_WRITETHRU | _PAGE_NO_CACHE | 
_PAGE_COHERENT)
-
 /* No page size encoding in the linux PTE */
 #define _PAGE_PSIZE0
 
@@ -150,10 +145,9 @@
 /*
  * Mask of bits returned by pte_pgprot()
  */
-#define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \
-_PAGE_WRITETHRU | _PAGE_4K_PFN | \
-_PAGE_PRIVILEGED | _PAGE_ACCESSED |  _PAGE_READ |\
-_PAGE_WRITE |  _PAGE_DIRTY | _PAGE_EXEC | \
+#define PAGE_PROT_BITS  (_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT | \
+_PAGE_4K_PFN | _PAGE_PRIVILEGED | _PAGE_ACCESSED | \
+_PAGE_READ | _PAGE_WRITE |  _PAGE_DIRTY | _PAGE_EXEC | 
\
 _PAGE_SOFT_DIRTY)
 /*
  * We define 2 sets of base prot bits, one for basic pages (ie,
@@ -162,7 +156,7 @@
  * the processor might need it for DMA coherency.
  */
 #define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)
-#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT)
+#define _PAGE_BASE (_PAGE_BASE_NC)
 
 /* Permission masks used to generate the __P and __S table,
  *
@@ -203,9 +197,9 @@
 /* Permission masks used for kernel mappings */
 #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
 #define PAGE_KERNEL_NC __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | \
-_PAGE_NO_CACHE)
+_PAGE_TOLERANT)
 #define PAGE_KERNEL_NCG__pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | \
-_PAGE_NO_CACHE | _PAGE_GUARDED)
+_PAGE_NON_IDEMPOTENT)
 #define PAGE_KERNEL_X  __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX)
 #define PAGE_KERNEL_RO __pgprot(_PAGE_BASE | _PAGE_KERNEL_RO)
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
@@ -516,41 +510,26 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
  * Macro to mark a page protection value as "uncacheable".
  */
 
-#define _PAGE_CACHE_CTL(_PAGE_COHERENT | _PAGE_GUARDED | 
_PAGE_NO_CACHE | \
-_PAGE_WRITETHRU)
+#define _PAGE_CACHE_CTL(_PAGE_SAO | _PAGE_NON_IDEMPOTENT | 
_PAGE_TOLERANT)
 
 #define pgprot_noncached pgprot_noncached
 static inline pgprot_t pgprot_noncached(pgpro

[PATCH 03/14] powerpc/mm/subpage: Clear RWX bit to indicate no access

2016-03-07 Thread Aneesh Kumar K.V

Subpage protection used to depend on _PAGE_USER bit to implement no
access mode. This patch switch that to use _PAGE_RWX. We clear READ
and Write access from pte instead of clearing _PAGE_USER now. This was
done to enable us to switch to _PAGE_PRIVILGED. subpage_protection()
returns the pte bits that need to be cleared. Instead of updating the
interface to handle no-access in a separate way, it appears simple to
clear RWX acecss to indicate no access.

We still don't insert hash pte for these ptes, hence we should not
get PROT_FAULT with change.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash_utils_64.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index ea23403b3fc0..ec37f4b0a8ff 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -917,7 +917,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long 
addr)
  * Userspace sets the subpage permissions using the subpage_prot system call.
  *
  * Result is 0: full permissions, _PAGE_RW: read-only,
- * _PAGE_USER or _PAGE_USER|_PAGE_RW: no access.
+ * _PAGE_RWX: no access.
  */
 static int subpage_protection(struct mm_struct *mm, unsigned long ea)
 {
@@ -943,8 +943,13 @@ static int subpage_protection(struct mm_struct *mm, 
unsigned long ea)
/* extract 2-bit bitfield for this 4k subpage */
spp >>= 30 - 2 * ((ea >> 12) & 0xf);
 
-   /* turn 0,1,2,3 into combination of _PAGE_USER and _PAGE_RW */
-   spp = ((spp & 2) ? _PAGE_USER : 0) | ((spp & 1) ? _PAGE_RW : 0);
+   /*
+* 0 -> full premission
+* 1 -> Read only
+* 2 -> no access.
+* We return the flag that need to be cleared.
+*/
+   spp = ((spp & 2) ? _PAGE_RWX : 0) | ((spp & 1) ? _PAGE_WRITE : 0);
return spp;
 }
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-07 Thread Sebastian Andrzej Siewior

hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
accessing the variable did not migrate from one CPU to another and soon
this was noticed by Tiejun Chen in 94b09d755462 ("powerpc/hugetlb:
Replace __get_cpu_var with get_cpu_var"). So we had it fixed.

Christoph Lameter was doing his __get_cpu_var() replaces and forgot
PowerPC. Then he noticed this and sent his fixed up batch again which
got applied as 69111bac42f5 ("powerpc: Replace __get_cpu_var uses").

The careful reader will noticed one little detail: get_cpu_var() got
replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
a preempt_enable() and nothing that does preempt_disable() so we
underflow the preempt counter.

Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Michael Ellerman 
Cc: Tiejun Chen 
Cc: 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/mm/hugetlbpage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 744e24bcb85c..9e8919b39640 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -414,7 +414,7 @@ static void hugepd_free(struct mmu_gather *tlb, void 
*hugepte)
 {
struct hugepd_freelist **batchp;
 
-   batchp = this_cpu_ptr(&hugepd_freelist_cur);
+   batchp = get_cpu_ptr(&hugepd_freelist_cur);
 
if (atomic_read(&tlb->mm->mm_users) < 2 ||
cpumask_equal(mm_cpumask(tlb->mm),
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 01/14] powerpc/mm: Use big endian page table for book3s 64

2016-03-07 Thread Aneesh Kumar K.V

This enables us to share the same page table code for
both radix and hash. Radix use a hardware defined big endian
page table

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h   |  16 +++--
 arch/powerpc/include/asm/kvm_book3s_64.h|  13 ++--
 arch/powerpc/include/asm/page.h |   4 ++
 arch/powerpc/include/asm/pgtable-be-types.h | 104 
 arch/powerpc/mm/hash64_4k.c |   7 +-
 arch/powerpc/mm/hash64_64k.c|  14 ++--
 arch/powerpc/mm/hugepage-hash64.c   |   7 +-
 arch/powerpc/mm/hugetlbpage-hash64.c|   7 +-
 arch/powerpc/mm/pgtable_64.c|   9 ++-
 9 files changed, 159 insertions(+), 22 deletions(-)
 create mode 100644 arch/powerpc/include/asm/pgtable-be-types.h

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index d0ee6fcef823..2113de051824 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -250,22 +250,27 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
   int huge)
 {
unsigned long old, tmp;
+   unsigned long busy = cpu_to_be64(_PAGE_BUSY);
+
+   clr = cpu_to_be64(clr);
+   set = cpu_to_be64(set);
 
__asm__ __volatile__(
"1: ldarx   %0,0,%3 # pte_update\n\
-   andi.   %1,%0,%6\n\
+   and.%1,%0,%6\n\
bne-1b \n\
andc%1,%0,%4 \n\
or  %1,%1,%7\n\
stdcx.  %1,0,%3 \n\
bne-1b"
: "=&r" (old), "=&r" (tmp), "=m" (*ptep)
-   : "r" (ptep), "r" (clr), "m" (*ptep), "i" (_PAGE_BUSY), "r" (set)
+   : "r" (ptep), "r" (clr), "m" (*ptep), "r" (busy), "r" (set)
: "cc" );
/* huge pages use the old page table lock */
if (!huge)
assert_pte_locked(mm, addr);
 
+   old = be64_to_cpu(old);
if (old & _PAGE_HASHPTE)
hpte_need_flush(mm, addr, ptep, old, huge);
 
@@ -351,16 +356,19 @@ static inline void __ptep_set_access_flags(pte_t *ptep, 
pte_t entry)
 _PAGE_SOFT_DIRTY);
 
unsigned long old, tmp;
+   unsigned long busy = cpu_to_be64(_PAGE_BUSY);
+
+   bits = cpu_to_be64(bits);
 
__asm__ __volatile__(
"1: ldarx   %0,0,%4\n\
-   andi.   %1,%0,%6\n\
+   and.%1,%0,%6\n\
bne-1b \n\
or  %0,%3,%0\n\
stdcx.  %0,0,%4\n\
bne-1b"
:"=&r" (old), "=&r" (tmp), "=m" (*ptep)
-   :"r" (bits), "r" (ptep), "m" (*ptep), "i" (_PAGE_BUSY)
+   :"r" (bits), "r" (ptep), "m" (*ptep), "r" (busy)
:"cc");
 }
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2aa79c864e91..f9a7a89a3e4f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -299,6 +299,8 @@ static inline int hpte_cache_flags_ok(unsigned long ptel, 
unsigned long io_type)
  */
 static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing)
 {
+   __be64 opte, npte;
+   unsigned long old_ptev;
pte_t old_pte, new_pte = __pte(0);
 
while (1) {
@@ -306,24 +308,25 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t 
*ptep, int writing)
 * Make sure we don't reload from ptep
 */
old_pte = READ_ONCE(*ptep);
+   old_ptev = pte_val(old_pte);
/*
 * wait until _PAGE_BUSY is clear then set it atomically
 */
-   if (unlikely(pte_val(old_pte) & _PAGE_BUSY)) {
+   if (unlikely(old_ptev & _PAGE_BUSY)) {
cpu_relax();
continue;
}
/* If pte is not present return None */
-   if (unlikely(!(pte_val(old_pte) & _PAGE_PRESENT)))
+   if (unlikely(!(old_ptev & _PAGE_PRESENT)))
return __pte(0);
 
new_pte = pte_mkyoung(old_pte);
if (writing && pte_write(old_pte))
new_pte = pte_mkdirty(new_pte);
 
-   if (pte_val(old_pte) == __cmpxchg_u64((unsigned long *)ptep,
- pte_val(old_pte),
- pte_val(new_pte))) {
+   npte = cpu_to_be64(pte_val(new_pte));
+   opte = cpu_to_be64(old_ptev);
+   if (opte == __cmpxchg_u64((unsigned long *)ptep, opte, npte)) {
break;
}
}
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index ab3d8977bacd..158574d2acf4 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -288,7 +288,11 @@ extern long long virt_phys_offset;
 
 #ifndef __ASSE

[PATCH 04/14] powerpc/mm: Use pte_user instead of opencoding

2016-03-07 Thread Aneesh Kumar K.V

We have common declaration in pte-common.h Add book3s specific one
and switch to pte_user. In the later patch we will be switching
_PAGE_USER to _PAGE_PRIVILEGED

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 5 +
 arch/powerpc/perf/callchain.c| 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 77d3ce05798e..4ac6221802ad 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -185,6 +185,11 @@ extern struct page *pgd_page(pgd_t pgd);
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val((pte)) & 
~_PAGE_PTE })
 #define __swp_entry_to_pte(x)  __pte((x).val | _PAGE_PTE)
 
+static inline bool pte_user(pte_t pte)
+{
+   return (pte_val(pte) & _PAGE_USER);
+}
+
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
 #else
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index e04a6752b399..0071de76d776 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -137,7 +137,7 @@ static int read_user_stack_slow(void __user *ptr, void 
*buf, int nb)
offset = addr & ((1UL << shift) - 1);
 
pte = READ_ONCE(*ptep);
-   if (!pte_present(pte) || !(pte_val(pte) & _PAGE_USER))
+   if (!pte_present(pte) || !pte_user(pte))
goto err_out;
pfn = pte_pfn(pte);
if (!page_is_ram(pfn))
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 08/14] powerpc/mm: Use helper for finding pte bits mapping I/O area

2016-03-07 Thread Aneesh Kumar K.V

Use helper instead of opencoding with constants. Later patch will
drop the WIMG bits and use PowerISA 3.0 defines

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/btext.c  | 2 +-
 arch/powerpc/kernel/isa-bridge.c | 4 ++--
 arch/powerpc/kernel/pci_64.c | 2 +-
 arch/powerpc/mm/pgtable_64.c | 4 ++--
 arch/powerpc/platforms/ps3/spu.c | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index 41c011cb6070..8275858a434d 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -162,7 +162,7 @@ void btext_map(void)
offset = ((unsigned long) dispDeviceBase) - base;
size = dispDeviceRowBytes * dispDeviceRect[3] + offset
+ dispDeviceRect[0];
-   vbase = __ioremap(base, size, _PAGE_NO_CACHE);
+   vbase = __ioremap(base, size, 
pgprot_val(pgprot_noncached_wc(__pgprot(0;
if (vbase == 0)
return;
logicalDisplayBase = vbase + offset;
diff --git a/arch/powerpc/kernel/isa-bridge.c b/arch/powerpc/kernel/isa-bridge.c
index 0f1997097960..ae1316106e2b 100644
--- a/arch/powerpc/kernel/isa-bridge.c
+++ b/arch/powerpc/kernel/isa-bridge.c
@@ -109,14 +109,14 @@ static void pci_process_ISA_OF_ranges(struct device_node 
*isa_node,
size = 0x1;
 
__ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
-size, _PAGE_NO_CACHE|_PAGE_GUARDED);
+size, pgprot_val(pgprot_noncached(__pgprot(0;
return;
 
 inval_range:
printk(KERN_ERR "no ISA IO ranges or unexpected isa range, "
   "mapping 64k\n");
__ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
-0x1, _PAGE_NO_CACHE|_PAGE_GUARDED);
+0x1, pgprot_val(pgprot_noncached(__pgprot(0;
 }
 
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 60bb187cb46a..41503d7d53a1 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -159,7 +159,7 @@ static int pcibios_map_phb_io_space(struct pci_controller 
*hose)
 
/* Establish the mapping */
if (__ioremap_at(phys_page, area->addr, size_page,
-_PAGE_NO_CACHE | _PAGE_GUARDED) == NULL)
+pgprot_val(pgprot_noncached(__pgprot(0 == NULL)
return -ENOMEM;
 
/* Fixup hose IO resource */
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 1254cf107871..6f1b7064f822 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -253,7 +253,7 @@ void __iomem * __ioremap(phys_addr_t addr, unsigned long 
size,
 
 void __iomem * ioremap(phys_addr_t addr, unsigned long size)
 {
-   unsigned long flags = _PAGE_NO_CACHE | _PAGE_GUARDED;
+   unsigned long flags = pgprot_val(pgprot_noncached(__pgprot(0)));
void *caller = __builtin_return_address(0);
 
if (ppc_md.ioremap)
@@ -263,7 +263,7 @@ void __iomem * ioremap(phys_addr_t addr, unsigned long size)
 
 void __iomem * ioremap_wc(phys_addr_t addr, unsigned long size)
 {
-   unsigned long flags = _PAGE_NO_CACHE;
+   unsigned long flags = pgprot_val(pgprot_noncached_wc(__pgprot(0)));
void *caller = __builtin_return_address(0);
 
if (ppc_md.ioremap)
diff --git a/arch/powerpc/platforms/ps3/spu.c b/arch/powerpc/platforms/ps3/spu.c
index 5e8a40f9739f..492b2575e0d2 100644
--- a/arch/powerpc/platforms/ps3/spu.c
+++ b/arch/powerpc/platforms/ps3/spu.c
@@ -216,7 +216,7 @@ static int __init setup_areas(struct spu *spu)
}
 
spu->local_store = (__force void *)ioremap_prot(spu->local_store_phys,
-   LS_SIZE, _PAGE_NO_CACHE);
+   LS_SIZE, pgprot_val(pgprot_noncached_wc(__pgprot(0;
 
if (!spu->local_store) {
pr_debug("%s:%d: ioremap local_store failed\n",
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 14/14] powerpc/mm/hash: Add support for POWER9 hash

2016-03-07 Thread Aneesh Kumar K.V

This add support for p9 hash with UPRT=0. ie, we don't have
segment table support yet.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 13 +++--
 arch/powerpc/mm/hash_native_64.c  | 11 ++-
 arch/powerpc/mm/hash_utils_64.c   | 42 +--
 arch/powerpc/mm/pgtable_64.c  |  7 +
 arch/powerpc/platforms/ps3/htab.c |  2 +-
 arch/powerpc/platforms/pseries/lpar.c |  2 +-
 6 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index ce73736b42db..843b5d839904 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -78,6 +78,10 @@
 #define HPTE_V_SECONDARY   ASM_CONST(0x0002)
 #define HPTE_V_VALID   ASM_CONST(0x0001)
 
+/*
+ * ISA 3.0 have a different HPTE format.
+ */
+#define HPTE_R_3_0_SSIZE_SHIFT 58
 #define HPTE_R_PP0 ASM_CONST(0x8000)
 #define HPTE_R_TS  ASM_CONST(0x4000)
 #define HPTE_R_KEY_HI  ASM_CONST(0x3000)
@@ -224,7 +228,8 @@ static inline unsigned long hpte_encode_avpn(unsigned long 
vpn, int psize,
 */
v = (vpn >> (23 - VPN_SHIFT)) & ~(mmu_psize_defs[psize].avpnm);
v <<= HPTE_V_AVPN_SHIFT;
-   v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
+   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
return v;
 }
 
@@ -248,8 +253,12 @@ static inline unsigned long hpte_encode_v(unsigned long 
vpn, int base_psize,
  * aligned for the requested page size
  */
 static inline unsigned long hpte_encode_r(unsigned long pa, int base_psize,
- int actual_psize)
+ int actual_psize, int ssize)
 {
+
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   pa |= ((unsigned long) ssize) << HPTE_R_3_0_SSIZE_SHIFT;
+
/* A 4K page needs no special encoding */
if (actual_psize == MMU_PAGE_4K)
return pa & HPTE_R_RPN;
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 8eaac81347fd..d873f6507f72 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -221,7 +221,7 @@ static long native_hpte_insert(unsigned long hpte_group, 
unsigned long vpn,
return -1;
 
hpte_v = hpte_encode_v(vpn, psize, apsize, ssize) | vflags | 
HPTE_V_VALID;
-   hpte_r = hpte_encode_r(pa, psize, apsize) | rflags;
+   hpte_r = hpte_encode_r(pa, psize, apsize, ssize) | rflags;
 
if (!(vflags & HPTE_V_BOLTED)) {
DBG_LOW(" i=%x hpte_v=%016lx, hpte_r=%016lx\n",
@@ -719,6 +719,12 @@ static void native_flush_hash_range(unsigned long number, 
int local)
local_irq_restore(flags);
 }
 
+static int native_update_partition_table(u64 patb1)
+{
+   partition_tb->patb1 = cpu_to_be64(patb1);
+   return 0;
+}
+
 void __init hpte_init_native(void)
 {
ppc_md.hpte_invalidate  = native_hpte_invalidate;
@@ -729,4 +735,7 @@ void __init hpte_init_native(void)
ppc_md.hpte_clear_all   = native_hpte_clear;
ppc_md.flush_hash_range = native_flush_hash_range;
ppc_md.hugepage_invalidate   = native_hugepage_invalidate;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   ppc_md.update_partition_table = native_update_partition_table;
 }
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 728acd17f2a6..0e25d2981e5e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -673,6 +673,41 @@ int remove_section_mapping(unsigned long start, unsigned 
long end)
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
+static void __init hash_init_partition_table(phys_addr_t hash_table,
+unsigned long pteg_count)
+{
+   unsigned long ps_field;
+   unsigned long htab_size;
+   unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+
+   /*
+* slb llp encoding for the page size used in VPM real mode.
+* We can ignore that for lpid 0
+*/
+   ps_field = 0;
+   htab_size =  __ilog2(pteg_count) - 11;
+
+   BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too 
large.");
+   partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
+   MEMBLOCK_ALLOC_ANYWHERE));
+
+   /* Initialize the Partition Table with no entries */
+   memset((void *)partition_tb, 0, patb_size);
+   partition_tb->patb0 = cpu_to_be64(ps_field | hash_table | htab_size);
+   /*
+* FIXME!! This should be done via update_partition table
+* For now UPRT is 0 for us.
+*/
+   partition_tb->patb1 = 0;
+   DBG("Par

Re: [PATCH] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-07 Thread Aneesh Kumar K.V

Sebastian Andrzej Siewior  writes:

> [ text/plain ]
> hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
> accessing the variable did not migrate from one CPU to another and soon
> this was noticed by Tiejun Chen in 94b09d755462 ("powerpc/hugetlb:
> Replace __get_cpu_var with get_cpu_var"). So we had it fixed.
>
> Christoph Lameter was doing his __get_cpu_var() replaces and forgot
> PowerPC. Then he noticed this and sent his fixed up batch again which
> got applied as 69111bac42f5 ("powerpc: Replace __get_cpu_var uses").
>
> The careful reader will noticed one little detail: get_cpu_var() got
> replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
> a preempt_enable() and nothing that does preempt_disable() so we
> underflow the preempt counter.
>
> Cc: Benjamin Herrenschmidt 
> Cc: Christoph Lameter 
> Cc: Michael Ellerman 
> Cc: Tiejun Chen 
> Cc: 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  arch/powerpc/mm/hugetlbpage.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 744e24bcb85c..9e8919b39640 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -414,7 +414,7 @@ static void hugepd_free(struct mmu_gather *tlb, void 
> *hugepte)
>  {
>   struct hugepd_freelist **batchp;
>
> - batchp = this_cpu_ptr(&hugepd_freelist_cur);
> + batchp = get_cpu_ptr(&hugepd_freelist_cur);
>
>   if (atomic_read(&tlb->mm->mm_users) < 2 ||
>   cpumask_equal(mm_cpumask(tlb->mm),

IMHO it would better if we do

batchp = get_cpu_var(hugepd_freelist_cpu);

so that it match the existing

put_cpu_var(hugepd_freelist_cur);

While you are there, can you also fix the wrong indentation on line 423
?

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/7] QE: Add uqe_serial document to bindings

2016-03-07 Thread Scott Wood

On Mon, 2016-03-07 at 02:35 +, Qiang Zhao wrote:
> On Tue, Mar 05, 2016 at 12:26PM, Rob Herring wrote:
> > -Original Message-
> > From: Rob Herring [mailto:r...@kernel.org]
> > Sent: Saturday, March 05, 2016 12:26 PM
> > To: Qiang Zhao 
> > Cc: o...@buserror.net; Yang-Leo Li ; Xiaobo Xie
> > ; linux-ker...@vger.kernel.org;
> > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH v3 3/7] QE: Add uqe_serial document to bindings
> > 
> > On Tue, Mar 01, 2016 at 03:09:39PM +0800, Zhao Qiang wrote:
> > > Add uqe_serial document to
> > > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> > > 
> > > Signed-off-by: Zhao Qiang 
> > > ---
> > > Changes for v2
> > >   - modify tx/rx-clock-name specification Changes for v2
> > >   - NA
> > > 
> > >  .../bindings/powerpc/fsl/cpm_qe/uqe_serial.txt| 19
> > +++
> > >  1 file changed, 19 insertions(+)
> > >  create mode 100644
> > > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> > > 
> > > diff --git
> > > a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> > > b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.txt
> > > new file mode 100644
> > > index 000..436c71c
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.
> > > +++ txt
> > > @@ -0,0 +1,19 @@
> > > +* Serial
> > > +
> > > +Currently defined compatibles:
> > > +- ucc_uart
> > 
> > I guess this is in use already and okay. However, looking at the driver
> > there
> > really should be SoC specific compatible strings here since the driver is
> > looking
> > up the SoC compatible string and composing the firmware filename from
> > that.
> 
> Ok, I will changed both driver and this compatible.

But don't break existing device trees while doing so.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 09/14] powerpc/mm: Drop WIMG in favour of new constants

2016-03-07 Thread kbuild test robot

Hi Aneesh,

[auto build test ERROR on powerpc/next]
[also build test ERROR on next-20160307]
[cannot apply to v4.5-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-mm-Use-big-endian-page-table-for-book3s-64/20160307-232212
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/sysdev/axonram.c: In function 'axon_ram_probe':
>> arch/powerpc/sysdev/axonram.c:208:31: error: '_PAGE_NO_CACHE' undeclared 
>> (first use in this function)
   bank->ph_addr, bank->size, _PAGE_NO_CACHE);
  ^
   arch/powerpc/sysdev/axonram.c:208:31: note: each undeclared identifier is 
reported only once for each function it appears in
--
   drivers/pcmcia/electra_cf.c: In function 'electra_cf_probe':
>> drivers/pcmcia/electra_cf.c:231:3: error: '_PAGE_NO_CACHE' undeclared (first 
>> use in this function)
  _PAGE_NO_CACHE | _PAGE_GUARDED) == NULL)) {
  ^
   drivers/pcmcia/electra_cf.c:231:3: note: each undeclared identifier is 
reported only once for each function it appears in
>> drivers/pcmcia/electra_cf.c:231:20: error: '_PAGE_GUARDED' undeclared (first 
>> use in this function)
  _PAGE_NO_CACHE | _PAGE_GUARDED) == NULL)) {
   ^

vim +/_PAGE_NO_CACHE +208 arch/powerpc/sysdev/axonram.c

dbdf04c4 Maxim Shchetynin 2007-07-20  202  
dbdf04c4 Maxim Shchetynin 2007-07-20  203   dev_info(&device->dev, 
"Register DDR2 memory device %s%d with %luMB\n",
dbdf04c4 Maxim Shchetynin 2007-07-20  204   
AXON_RAM_DEVICE_NAME, axon_ram_bank_id, bank->size >> 20);
dbdf04c4 Maxim Shchetynin 2007-07-20  205  
dbdf04c4 Maxim Shchetynin 2007-07-20  206   bank->ph_addr = resource.start;
40f1ce7f Anton Blanchard  2011-05-08  207   bank->io_addr = (unsigned long) 
ioremap_prot(
dbdf04c4 Maxim Shchetynin 2007-07-20 @208   bank->ph_addr, 
bank->size, _PAGE_NO_CACHE);
dbdf04c4 Maxim Shchetynin 2007-07-20  209   if (bank->io_addr == 0) {
dbdf04c4 Maxim Shchetynin 2007-07-20  210   dev_err(&device->dev, 
"ioremap() failed\n");
dbdf04c4 Maxim Shchetynin 2007-07-20  211   rc = -EFAULT;

:: The code at line 208 was first introduced by commit
:: dbdf04c40161f81d74e27f04e201acb3a5dfad69 [CELL] driver for DDR2 memory 
on AXON

:: TO: Maxim Shchetynin 
:: CC: Arnd Bergmann 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc: optimise csum_partial() call when len is constant

2016-03-07 Thread Christophe Leroy

csum_partial is often called for small fixed length packets
for which it is suboptimal to use the generic csum_partial()
function.

For instance, in my configuration, I got:
* One place calling it with constant len 4
* Seven places calling it with constant len 8
* Three places calling it with constant len 14
* One place calling it with constant len 20
* One place calling it with constant len 24
* One place calling it with constant len 32

This patch renames csum_partial() to __csum_partial() and
implements csum_partial() as a wrapper inline function which
* uses csum_add() for small 16bits multiple constant length
* uses ip_fast_csum() for other 32bits multiple constant
* uses __csum_partial() in all other cases

Signed-off-by: Christophe Leroy 
---
 v2: Taken into account Scott's comments
 applies on top of scottwood/linux next

 arch/powerpc/include/asm/checksum.h | 79 ++---
 arch/powerpc/lib/checksum_32.S  |  4 +-
 arch/powerpc/lib/checksum_64.S  |  4 +-
 arch/powerpc/lib/ppc_ksyms.c|  2 +-
 4 files changed, 61 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/checksum.h 
b/arch/powerpc/include/asm/checksum.h
index 74cd8d8..ee655ed 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -13,20 +13,6 @@
 #include 
 #else
 /*
- * computes the checksum of a memory block at buff, length len,
- * and adds in "sum" (32-bit)
- *
- * returns a 32-bit number suitable for feeding into itself
- * or csum_tcpudp_magic
- *
- * this function must be called with even lengths, except
- * for the last fragment, which may be odd
- *
- * it's best to have buff aligned on a 32-bit boundary
- */
-extern __wsum csum_partial(const void *buff, int len, __wsum sum);
-
-/*
  * Computes the checksum of a memory block at src, length len,
  * and adds in "sum" (32-bit), while copying the block to dst.
  * If an access exception occurs on src or dst, it stores -EFAULT
@@ -67,15 +53,6 @@ static inline __sum16 csum_fold(__wsum sum)
return (__force __sum16)(~((__force u32)sum + tmp) >> 16);
 }
 
-/*
- * this routine is used for miscellaneous IP-like checksums, mainly
- * in icmp.c
- */
-static inline __sum16 ip_compute_csum(const void *buff, int len)
-{
-   return csum_fold(csum_partial(buff, len, 0));
-}
-
 static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
  unsigned short len,
  unsigned short proto,
@@ -174,6 +151,62 @@ static inline __sum16 ip_fast_csum(const void *iph, 
unsigned int ihl)
return csum_fold(ip_fast_csum_nofold(iph, ihl));
 }
 
+/*
+ * computes the checksum of a memory block at buff, length len,
+ * and adds in "sum" (32-bit)
+ *
+ * returns a 32-bit number suitable for feeding into itself
+ * or csum_tcpudp_magic
+ *
+ * this function must be called with even lengths, except
+ * for the last fragment, which may be odd
+ *
+ * it's best to have buff aligned on a 32-bit boundary
+ */
+__wsum __csum_partial(const void *buff, int len, __wsum sum);
+
+static inline __wsum csum_partial(const void *buff, int len, __wsum sum)
+{
+   if (__builtin_constant_p(len) && len <= 16 && (len & 1) == 0) {
+   if (len == 2)
+   sum = csum_add(sum, (__force __wsum)*(const u16 *)buff);
+   if (len >= 4)
+   sum = csum_add(sum, (__force __wsum)*(const u32 *)buff);
+   if (len == 6)
+   sum = csum_add(sum, (__force __wsum)
+   *(const u16 *)(buff + 4));
+   if (len >= 8)
+   sum = csum_add(sum, (__force __wsum)
+   *(const u32 *)(buff + 4));
+   if (len == 10)
+   sum = csum_add(sum, (__force __wsum)
+   *(const u16 *)(buff + 8));
+   if (len >= 12)
+   sum = csum_add(sum, (__force __wsum)
+   *(const u32 *)(buff + 8));
+   if (len == 14)
+   sum = csum_add(sum, (__force __wsum)
+   *(const u16 *)(buff + 12));
+   if (len >= 16)
+   sum = csum_add(sum, (__force __wsum)
+   *(const u32 *)(buff + 12));
+   } else if (__builtin_constant_p(len) && (len & 3) == 0) {
+   sum = csum_add(sum, ip_fast_csum_nofold(buff, len >> 2));
+   } else {
+   sum = __csum_partial(buff, len, sum);
+   }
+   return sum;
+}
+
+/*
+ * this routine is used for miscellaneous IP-like checksums, mainly
+ * in icmp.c
+ */
+static inline __sum16 ip_compute_csum(const void *buff, int len)
+{
+   return csum_fold(csum_partial(buff, len, 0));
+}
+
 #endif
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/powerpc/lib/checksum_3

Re: [PATCH 1/7] cxlflash: Simplify PCI registration

2016-03-07 Thread Matthew R. Ochs

> On Mar 4, 2016, at 3:55 PM, Uma Krishnan  wrote:
> 
> From: "Manoj N. Kumar" 
> 
> The calls to pci_request_regions(), pci_resource_start(),
> pci_set_dma_mask(), pci_set_master() and pci_save_state() are all
> unnecessary for the IBM CXL flash adapter since data buffers
> are not required to be mapped to the device's memory.
> 
> The use of services such as pci_set_dma_mask() are problematic on
> hypervisor managed systems as the IBM CXL flash adapter is operating
> under a virtual PCI Host Bridge (virtual PHB) which does not support
> these services.
> 
> cxlflash 0001:00:00.0: init_pci: Failed to set PCI DMA mask rc=-5
> 
> The resolution is to simplify init_pci(), to a point where it does the
> bare minimum (pci_enable_device). Similarly, remove the call the
> pci_release_regions() from cxlflash_remove().
> 
> Signed-off-by: Manoj N. Kumar 

Acked-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/7] cxlflash: Unmap problem state area before detaching master context

2016-03-07 Thread Matthew R. Ochs

> On Mar 4, 2016, at 3:55 PM, Uma Krishnan  wrote:
> 
> When operating in the PowerVM environment, the cxlflash module can
> receive an error from the hypervisor indicating that there are
> existing mappings in the page table for the process MMIO space.
> 
> This issue exists because term_afu() currently invokes term_mc()
> before stop_afu(), allowing for the master context to be detached
> first and the problem state area to be unmapped second.
> 
> To resolve this issue, stop_afu() should be called before term_mc().
> 
> Signed-off-by: Uma Krishnan 

Acked-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/7] cxlflash: Reorder user context initialization

2016-03-07 Thread Matthew R. Ochs

> On Mar 4, 2016, at 3:55 PM, Uma Krishnan  wrote:
> 
> In order to support cxlflash in the PowerVM environment, underlying
> hypervisor APIs have imposed a kernel API ordering change.
> 
> For the superpipe access to LUN, user applications need a context.
> The cxlflash module creates this context by making a sequence of
> cxl calls. In the current code, a context is initialized via
> cxl_dev_context_init() followed by cxl_process_element(), a function
> that obtains the process element id. Finally, cxl_start_work()
> is called to attach the process element.
> 
> In the PowerVM environment, a process element id cannot be obtained
> from the hypervisor until the process element is attached. The
> cxlflash module is unable to create contexts without a valid
> process element id.
> 
> To fix this problem, cxl_start_work() is called before obtaining
> the process element id.
> 
> Signed-off-by: Uma Krishnan 

Acked-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 7/7] cxlflash: Increase cmd_per_lun for better throughput

2016-03-07 Thread Matthew R. Ochs

> On Mar 4, 2016, at 3:55 PM, Uma Krishnan  wrote:
> 
> From: "Manoj N. Kumar" 
> 
> With the current value of cmd_per_lun at 16, the throughput
> over a single adapter is limited to around 150kIOPS.
> 
> Increase the value of cmd_per_lun to 256 to improve
> throughput. With this change a single adapter is able to
> attain close to the maximum throughput (380kIOPS).
> Also change the number of RRQ entries that can be queued.
> 
> Signed-off-by: Manoj N. Kumar 

Acked-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/7] cxlflash: Fix to avoid unnecessary scan with internal LUNs

2016-03-07 Thread Matthew R. Ochs

> On Mar 4, 2016, at 3:55 PM, Uma Krishnan  wrote:
> 
> From: "Manoj N. Kumar" 
> 
> When switching to the internal LUN defined on the
> IBM CXL flash adapter, there is an unnecessary
> scan occurring on the second port. This scan leads
> to the following extra lines in the log:
> 
> Dec 17 10:09:00 tul83p1 kernel: [ 3708.561134] cxlflash 0008:00:00.0: 
> cxlflash_queuecommand: (scp=c000fc1f0f00) 11/1/0/0 
> cdb=(A000--1000-)
> Dec 17 10:09:00 tul83p1 kernel: [ 3708.561147] process_cmd_err: cmd failed 
> afu_rc=32 scsi_rc=0 fc_rc=0 afu_extra=0xE, scsi_extra=0x0, fc_extra=0x0
> 
> By definition, both of the internal LUNs are on the first port/channel.
> 
> When the lun_mode is switched to internal LUN the
> same value for host->max_channel is retained. This
> causes an unnecessary scan over the second port/channel.
> 
> This fix alters the host->max_channel to 0 (1 port), if internal
> LUNs are configured and switches it back to 1 (2 ports) while
> going back to external LUNs.
> 
> Signed-off-by: Manoj N. Kumar 

Acked-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-03-07 Thread Ian Munsie

From: Ian Munsie 

This adds an afu_driver_ops structure with event_pending and
deliver_event callbacks. An AFU driver such as cxlflash can fill these
out and associate it with a context to enable passing custom AFU
specific events to userspace.

The cxl driver will call event_pending() during poll, select, read, etc.
calls to check if an AFU driver specific event is pending, and will call
deliver_event() to deliver that event. This way, the cxl driver takes
care of all the usual locking semantics around these calls and handles
all the generic cxl events, so that the AFU driver only needs to worry
about it's own events.

The deliver_event() call is passed a struct cxl_event buffer to fill in.
The header will already be filled in for an AFU driver event, and the
AFU driver is expected to expand the header.size as necessary (up to
max_size, defined by struct cxl_event_afu_driver_reserved) and fill out
it's own information.

Since AFU drivers provide their own means for userspace to obtain the
AFU file descriptor (i.e. cxlflash uses an ioctl on their scsi file
descriptor to obtain the AFU file descriptor) and the generic cxl driver
will never use this event, the ABI of the event is up to each individual
AFU driver.

Signed-off-by: Ian Munsie 
---

Changes since v1:
- Rebased on upstream
- Bumped cxl api version to 3
- Addressed comments from mpe:
  - Clarified commit message & some comments
  - Mentioned 'cxlflash' as a possible user of this event
  - Check driver ops on registration and warn if missing calls
  - Remove redundant checks where driver ops is used
  - Simplified ctx_event_pending and removed underscore version
  - Changed deliver_event to take the context as the first argument

 drivers/misc/cxl/Kconfig |  5 +
 drivers/misc/cxl/api.c   |  8 
 drivers/misc/cxl/cxl.h   |  6 +-
 drivers/misc/cxl/file.c  | 36 +---
 include/misc/cxl.h   | 29 +
 include/uapi/misc/cxl.h  | 22 ++
 6 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 8756d06..560412c 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -15,12 +15,17 @@ config CXL_EEH
bool
default n
 
+config CXL_AFU_DRIVER_OPS
+   bool
+   default n
+
 config CXL
tristate "Support for IBM Coherent Accelerators (CXL)"
depends on PPC_POWERNV && PCI_MSI && EEH
select CXL_BASE
select CXL_KERNEL_API
select CXL_EEH
+   select CXL_AFU_DRIVER_OPS
default m
help
  Select this option to enable driver support for IBM Coherent
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index ea3eeb7..eebc9c3 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -296,6 +296,14 @@ struct cxl_context *cxl_fops_get_context(struct file *file)
 }
 EXPORT_SYMBOL_GPL(cxl_fops_get_context);
 
+void cxl_set_driver_ops(struct cxl_context *ctx,
+   struct cxl_afu_driver_ops *ops)
+{
+   WARN_ON(!ops->event_pending || !ops->deliver_event);
+   ctx->afu_driver_ops = ops;
+}
+EXPORT_SYMBOL_GPL(cxl_set_driver_ops);
+
 int cxl_start_work(struct cxl_context *ctx,
   struct cxl_ioctl_start_work *work)
 {
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index a521bc7..64e8e0a 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 extern uint cxl_verbose;
@@ -34,7 +35,7 @@ extern uint cxl_verbose;
  * Bump version each time a user API change is made, whether it is
  * backwards compatible ot not.
  */
-#define CXL_API_VERSION 2
+#define CXL_API_VERSION 3
 #define CXL_API_VERSION_COMPATIBLE 1
 
 /*
@@ -485,6 +486,9 @@ struct cxl_context {
bool pending_fault;
bool pending_afu_err;
 
+   /* Used by AFU drivers for driver specific event delivery */
+   struct cxl_afu_driver_ops *afu_driver_ops;
+
struct rcu_head rcu;
 };
 
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 783337d..d1cc297 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -295,6 +295,17 @@ int afu_mmap(struct file *file, struct vm_area_struct *vm)
return cxl_context_iomap(ctx, vm);
 }
 
+static inline bool ctx_event_pending(struct cxl_context *ctx)
+{
+   if (ctx->pending_irq || ctx->pending_fault || ctx->pending_afu_err)
+   return true;
+
+   if (ctx->afu_driver_ops)
+   return ctx->afu_driver_ops->event_pending(ctx);
+
+   return false;
+}
+
 unsigned int afu_poll(struct file *file, struct poll_table_struct *poll)
 {
struct cxl_context *ctx = file->private_data;
@@ -307,8 +318,7 @@ unsigned int afu_poll(struct file *file, struct 
poll_table_struct *poll)
pr_devel("afu_poll wait done pe: %i\n", ctx->pe);
 
spin_lock_irqsave(&ctx->lock, flags);
-   if (ctx->pendi

[PATCH v2 2/2] cxl: add set/get private data to context struct

2016-03-07 Thread Ian Munsie

From: Michael Neuling 

This provides AFU drivers a means to associate private data with a cxl
context. This is particularly intended for make the new callbacks for
driver specific events easier for AFU drivers to use, as they can easily
get back to any private data structures they may use.

Signed-off-by: Michael Neuling 
Signed-off-by: Ian Munsie 
---

No changes since v1

 drivers/misc/cxl/api.c | 21 +
 drivers/misc/cxl/cxl.h |  3 +++
 include/misc/cxl.h |  7 +++
 3 files changed, 31 insertions(+)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index eebc9c3..93b270c 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -91,6 +91,27 @@ int cxl_release_context(struct cxl_context *ctx)
 }
 EXPORT_SYMBOL_GPL(cxl_release_context);
 
+
+int cxl_set_priv(struct cxl_context *ctx, void *priv)
+{
+   if (!ctx)
+   return -EINVAL;
+
+   ctx->priv = priv;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_set_priv);
+
+void *cxl_get_priv(struct cxl_context *ctx)
+{
+   if (!ctx)
+   return ERR_PTR(-EINVAL);
+
+   return ctx->priv;
+}
+EXPORT_SYMBOL_GPL(cxl_get_priv);
+
 int cxl_allocate_afu_irqs(struct cxl_context *ctx, int num)
 {
if (num == 0)
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 64e8e0a..71f66e7 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -454,6 +454,9 @@ struct cxl_context {
/* Only used in PR mode */
u64 process_token;
 
+   /* driver private data */
+   void *priv;
+
unsigned long *irq_bitmap; /* Accessed from IRQ context */
struct cxl_irq_ranges irqs;
struct list_head irq_names;
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index f198a42..f99b383 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -89,6 +89,13 @@ struct cxl_context *cxl_dev_context_init(struct pci_dev 
*dev);
 int cxl_release_context(struct cxl_context *ctx);
 
 /*
+ * Set and get private data associated with a context. Allows drivers to have a
+ * back pointer to some useful structure.
+ */
+int cxl_set_priv(struct cxl_context *ctx, void *priv);
+void *cxl_get_priv(struct cxl_context *ctx);
+
+/*
  * Allocate AFU interrupts for this context. num=0 will allocate the default
  * for this AFU as given in the AFU descriptor. This number doesn't include the
  * interrupt 0 (CAIA defines AFU IRQ 0 for page faults). Each interrupt to be
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 6/7] iommu/fsl: PAMU power management support

2016-03-07 Thread Codrin Ciubotariu

From: Varun Sethi 

PAMU driver suspend and resume support.

Signed-off-by: Varun Sethi 
Signed-off-by: Codrin Ciubotariu 
---
 drivers/iommu/fsl_pamu.c | 155 +--
 1 file changed, 123 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index 181759e..290231a 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -35,10 +36,13 @@
 
 #define make64(high, low) (((u64)(high) << 32) | (low))
 
-struct pamu_isr_data {
+struct pamu_info {
void __iomem *pamu_reg_base;/* Base address of PAMU regs */
unsigned int count; /* The number of PAMUs */
-};
+} pamu_info_data;
+
+/* Pointer to the device configuration space */
+static struct ccsr_guts __iomem *guts_regs;
 
 static struct paace *ppaact;
 static struct paace *spaact;
@@ -104,6 +108,36 @@ static struct paace *pamu_get_ppaace(int liodn)
return &ppaact[liodn];
 }
 
+#ifdef CONFIG_SUSPEND
+/**
+ * set_dcfg_liodn() - set the device LIODN in DCFG
+ * @np: device tree node pointer
+ * @liodn: liodn value to program
+ *
+ * Returns 0 upon success else error code < 0 returned
+ */
+static int set_dcfg_liodn(struct device_node *np, int liodn)
+{
+   const __be32 *prop;
+   u32 liodn_reg_offset;
+   int len;
+   void __iomem *dcfg_region = (void *)guts_regs;
+
+   if (!dcfg_region)
+   return -ENODEV;
+
+   prop = of_get_property(np, "fsl,liodn-reg", &len);
+   if (!prop || len != 8)
+   return -EINVAL;
+
+   liodn_reg_offset = be32_to_cpup(&prop[1]);
+
+   out_be32((u32 *)(dcfg_region + liodn_reg_offset), liodn);
+
+   return 0;
+}
+#endif
+
 /**
  * pamu_enable_liodn() - Set valid bit of PACCE
  * @liodn: liodn PAACT index for desired PAACE
@@ -743,7 +777,7 @@ static void setup_omt(struct ome *omt)
  * Get the maximum number of PAACT table entries
  * and subwindows supported by PAMU
  */
-static void get_pamu_cap_values(unsigned long pamu_reg_base)
+static void get_pamu_cap_values(void *pamu_reg_base)
 {
u32 pc_val;
 
@@ -753,10 +787,8 @@ static void get_pamu_cap_values(unsigned long 
pamu_reg_base)
 }
 
 /* Setup PAMU registers pointing to PAACT, SPAACT and OMT */
-static int setup_one_pamu(unsigned long pamu_reg_base,
- unsigned long pamu_reg_size,
- phys_addr_t ppaact_phys, phys_addr_t spaact_phys,
- phys_addr_t omt_phys)
+static int setup_one_pamu(void *pamu_reg_base, phys_addr_t ppaact_phys,
+ phys_addr_t spaact_phys, phys_addr_t omt_phys)
 {
u32 *pc;
struct pamu_mmap_regs *pamu_regs;
@@ -846,7 +878,7 @@ static void setup_liodns(void)
 
 static irqreturn_t pamu_av_isr(int irq, void *arg)
 {
-   struct pamu_isr_data *data = arg;
+   struct pamu_info *data = arg;
phys_addr_t phys;
unsigned int i, j, ret;
 
@@ -1098,11 +1130,9 @@ static int fsl_pamu_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
void __iomem *pamu_regs = NULL;
-   struct ccsr_guts __iomem *guts_regs = NULL;
u32 pamubypenr, pamu_counter;
+   void __iomem *pamu_reg_base;
unsigned long pamu_reg_off;
-   unsigned long pamu_reg_base;
-   struct pamu_isr_data *data = NULL;
struct device_node *guts_node;
u64 size;
struct page *p;
@@ -1129,22 +1159,17 @@ static int fsl_pamu_probe(struct platform_device *pdev)
}
of_get_address(dev->of_node, 0, &size, NULL);
 
+   pamu_info_data.pamu_reg_base = pamu_regs;
+   pamu_info_data.count = size / PAMU_OFFSET;
+
irq = irq_of_parse_and_map(dev->of_node, 0);
if (irq == NO_IRQ) {
dev_warn(dev, "no interrupts listed in PAMU node\n");
goto error;
}
 
-   data = kzalloc(sizeof(*data), GFP_KERNEL);
-   if (!data) {
-   ret = -ENOMEM;
-   goto error;
-   }
-   data->pamu_reg_base = pamu_regs;
-   data->count = size / PAMU_OFFSET;
-
/* The ISR needs access to the regs, so we won't iounmap them */
-   ret = request_irq(irq, pamu_av_isr, 0, "pamu", data);
+   ret = request_irq(irq, pamu_av_isr, 0, "pamu", &pamu_info_data);
if (ret < 0) {
dev_err(dev, "error %i installing ISR for irq %i\n", ret, irq);
goto error;
@@ -1167,7 +1192,7 @@ static int fsl_pamu_probe(struct platform_device *pdev)
}
 
/* read in the PAMU capability registers */
-   get_pamu_cap_values((unsigned long)pamu_regs);
+   get_pamu_cap_values(pamu_regs);
/*
 * To simplify the allocation of a coherency domain, we allocate the
 * PAACT and the OMT in the same memory buffer. Unfortunately, this
@@ -1243,9 +1268,9 @@ static int fsl_pamu_probe(struct platform_device

[PATCH 2/7] iommu/fsl: Work around erratum A-007907

2016-03-07 Thread Codrin Ciubotariu

Erratum A-007907 can cause a core hang under certain circumstances.
Part of the workaround involves not stashing to L1 Cache.  On affected
chips, stash to L2 when L1 is requested.

Signed-off-by: Scott Wood 
Signed-off-by: Varun Sethi 
Signed-off-by: Shengzhou Liu 
Signed-off-by: Codrin Ciubotariu 
---
 drivers/iommu/fsl_pamu.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index c64cdef..a00c473 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -25,6 +25,7 @@
 #include 
 
 #include 
+#include 
 
 /* define indexes for each operation mapping scenario */
 #define OMI_QMAN0x00
@@ -534,6 +535,16 @@ void get_ome_index(u32 *omi_index, struct device *dev)
*omi_index = OMI_QMAN_PRIV;
 }
 
+static bool has_erratum_a007907(void)
+{
+   u32 pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) == PVR_VER_E6500 && PVR_REV(pvr) <= 0x20)
+   return true;
+
+   return false;
+}
+
 /**
  * get_stash_id - Returns stash destination id corresponding to a
  *cache type and vcpu.
@@ -551,6 +562,9 @@ u32 get_stash_id(u32 stash_dest_hint, u32 vcpu)
int len, found = 0;
int i;
 
+   if (stash_dest_hint == PAMU_ATTR_CACHE_L1 && has_erratum_a007907())
+   stash_dest_hint = PAMU_ATTR_CACHE_L2;
+
/* Fastpath, exit early if L3/CPC cache is target for stashing */
if (stash_dest_hint == PAMU_ATTR_CACHE_L3) {
node = of_find_matching_node(NULL, l3_device_ids);
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/7] iommu/fsl: Fix most checkpatch warnings and typos

2016-03-07 Thread Codrin Ciubotariu

Signed-off-by: Codrin Ciubotariu 
---
 drivers/iommu/fsl_pamu.c| 92 +
 drivers/iommu/fsl_pamu.h| 29 +++--
 drivers/iommu/fsl_pamu_domain.c | 41 +++---
 drivers/iommu/fsl_pamu_domain.h |  2 +-
 4 files changed, 109 insertions(+), 55 deletions(-)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index a34355f..c64cdef 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -128,6 +128,10 @@ int pamu_enable_liodn(int liodn)
mb();
 
set_bf(ppaace->addr_bitfields, PAACE_AF_V, PAACE_V_VALID);
+   /*
+* Ensure that I/O devices use the new PAACE entry
+* right after this function returns
+*/
mb();
 
return 0;
@@ -150,6 +154,10 @@ int pamu_disable_liodn(int liodn)
}
 
set_bf(ppaace->addr_bitfields, PAACE_AF_V, PAACE_V_INVALID);
+   /*
+* Ensure that I/O devices no longer use this PAACE entry
+* right after this function returns
+*/
mb();
 
return 0;
@@ -226,16 +234,17 @@ static struct paace *pamu_get_spaace(struct paace *paace, 
u32 wnum)
  * function returns the index of the first SPAACE entry. The remaining
  * SPAACE entries are reserved contiguously from that index.
  *
- * Returns a valid fspi index in the range of 0 - SPAACE_NUMBER_ENTRIES on 
success.
- * If no SPAACE entry is available or the allocator can not reserve the 
required
- * number of contiguous entries function returns ULONG_MAX indicating a 
failure.
- *
+ * Returns a valid fspi index in the range of 0 - SPAACE_NUMBER_ENTRIES on
+ * success. If no SPAACE entry is available or the allocator can not reserve
+ * the required number of contiguous entries function returns ULONG_MAX
+ * indicating a failure.
  */
 static unsigned long pamu_get_fspi_and_allocate(u32 subwin_cnt)
 {
unsigned long spaace_addr;
 
-   spaace_addr = gen_pool_alloc(spaace_pool, subwin_cnt * sizeof(struct 
paace));
+   spaace_addr = gen_pool_alloc(spaace_pool, subwin_cnt *
+ sizeof(struct paace));
if (!spaace_addr)
return ULONG_MAX;
 
@@ -257,16 +266,17 @@ void pamu_free_subwins(int liodn)
if (get_bf(ppaace->addr_bitfields, PPAACE_AF_MW)) {
subwin_cnt = 1UL << (get_bf(ppaace->impl_attr, PAACE_IA_WCE) + 
1);
size = (subwin_cnt - 1) * sizeof(struct paace);
-   gen_pool_free(spaace_pool, (unsigned 
long)&spaact[ppaace->fspi], size);
+   gen_pool_free(spaace_pool,
+ (unsigned long)&spaact[ppaace->fspi], size);
set_bf(ppaace->addr_bitfields, PPAACE_AF_MW, 0);
}
 }
 
 /*
- * Function used for updating stash destination for the coressponding
+ * Function used for updating stash destination for the corresponding
  * LIODN.
  */
-int  pamu_update_paace_stash(int liodn, u32 subwin, u32 value)
+int pamu_update_paace_stash(int liodn, u32 subwin, u32 value)
 {
struct paace *paace;
 
@@ -282,6 +292,10 @@ int  pamu_update_paace_stash(int liodn, u32 subwin, u32 
value)
}
set_bf(paace->impl_attr, PAACE_IA_CID, value);
 
+   /*
+* Ensure that I/O devices see the new stash id
+* just after this function returns
+*/
mb();
 
return 0;
@@ -307,6 +321,10 @@ int pamu_disable_spaace(int liodn, u32 subwin)
   PAACE_AP_PERMS_DENIED);
}
 
+   /*
+* Ensure that I/O devices no longer use this PAACE entry
+* right after this function returns
+*/
mb();
 
return 0;
@@ -399,6 +417,10 @@ int pamu_config_ppaace(int liodn, phys_addr_t win_addr, 
phys_addr_t win_size,
set_bf(ppaace->impl_attr, PAACE_IA_WCE, 0);
set_bf(ppaace->addr_bitfields, PPAACE_AF_MW, 0);
}
+   /*
+* Ensure that I/O devices see the updated PPAACE entry
+* right after this function returns
+*/
mb();
 
return 0;
@@ -483,11 +505,16 @@ int pamu_config_spaace(int liodn, u32 subwin_cnt, u32 
subwin,
if (~stashid != 0)
set_bf(paace->impl_attr, PAACE_IA_CID, stashid);
 
+   /* Ensure that this SPAACE entry updates before we enable it */
smp_wmb();
 
if (enable)
set_bf(paace->addr_bitfields, PAACE_AF_V, PAACE_V_VALID);
 
+   /*
+* Ensure that I/O devices use this PAACE entry
+* right after this function returns
+*/
mb();
 
return 0;
@@ -553,7 +580,8 @@ u32 get_stash_id(u32 stash_dest_hint, u32 vcpu)
 found_cpu_node:
 
/* find the hwnode that represents the cache */
-   for (cache_level = PAMU_ATTR_CACHE_L1; (cache_level < 
PAMU_ATTR_CACHE_L3) && found; cache_level++) {
+   for (cache_level = PAMU_ATTR_CACHE_L1;
+(cache_level < PAMU_ATTR_CACHE_L3) && found; cache_level++) {

[PATCH 3/7] iommu/fsl: Enable OMT cache, before invalidating PAACT and SPAACT cache

2016-03-07 Thread Codrin Ciubotariu

From: Varun Sethi 

Enable OMT cache, before invalidating PAACT and SPAACT cache. This
is a PAMU hardware errata work around.

Signed-off-by: Varun Sethi 
Signed-off-by: Codrin Ciubotariu 
---
 drivers/iommu/fsl_pamu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index a00c473..ce25084 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -731,6 +731,16 @@ static int setup_one_pamu(unsigned long pamu_reg_base,
pamu_regs = (struct pamu_mmap_regs *)
(pamu_reg_base + PAMU_MMAP_REGS_BASE);
 
+   /*
+* As per PAMU errata A-005982, writing the PAACT and SPAACT
+* base address registers wouldn't invalidate the corresponding
+* caches if the OMT cache is disabled. The workaround is to
+* enable the OMT cache before setting the base registers.
+* This can be done without actually enabling PAMU.
+*/
+
+   out_be32(pc, PAMU_PC_OCE);
+
/* set up pointers to corenet control blocks */
 
out_be32(&pamu_regs->ppbah, upper_32_bits(ppaact_phys));
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 7/7] iommu/fsl: Added cache controller compatible strings for SOCs

2016-03-07 Thread Codrin Ciubotariu

From: Varun Sethi 

Added cache controller compatible strings for T2080, B4420, T1040
and T1024. PAMU driver searches for a matching string while setting
up L3 cache stashing.

Signed-off-by: Varun Sethi 
Signed-off-by: Codrin Ciubotariu 
---
 drivers/iommu/fsl_pamu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index 290231a..47f54fe 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -63,12 +63,20 @@ static const struct of_device_id guts_device_ids[] = {
 /*
  * Table for matching compatible strings, for device tree
  * L3 cache controller node.
+ * "fsl,t1024-l3-cache-controller" corresponds to T1024,
+ * "fsl,t1040-l3-cache-controller" corresponds to T1040,
+ * "fsl,b4420-l3-cache-controller" corresponds to B4420,
+ * "fsl,t2080-l3-cache-controller" corresponds to T2080,
  * "fsl,t4240-l3-cache-controller" corresponds to T4,
  * "fsl,b4860-l3-cache-controller" corresponds to B4 &
  * "fsl,p4080-l3-cache-controller" corresponds to other,
  * SOCs.
  */
 static const struct of_device_id l3_device_ids[] = {
+   { .compatible = "fsl,t1024-l3-cache-controller", },
+   { .compatible = "fsl,t1040-l3-cache-controller", },
+   { .compatible = "fsl,b4420-l3-cache-controller", },
+   { .compatible = "fsl,t2080-l3-cache-controller", },
{ .compatible = "fsl,t4240-l3-cache-controller", },
{ .compatible = "fsl,b4860-l3-cache-controller", },
{ .compatible = "fsl,p4080-l3-cache-controller", },
@@ -647,6 +655,8 @@ u32 get_stash_id(u32 stash_dest_hint, u32 vcpu)
of_node_put(node);
return be32_to_cpup(prop);
}
+   pr_err("%s: Failed to get L3 cache controller information\n",
+  __func__);
return ~(u32)0;
}
 
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/7] PAMU driver update

2016-03-07 Thread Codrin Ciubotariu

This patchset addresses a few issues found on PAMU IOMMU
and small changes to enable power management and to support the
L3 cache controller on some newer boards.

The series starts with a clean-up patch, followed by two
errata fixes: A-007907 and A-005982. It continues with
two fixes for PCIe support. The last two patches add support
for power management and compatible strings for new L3 controller
device-tree nodes.

Codrin Ciubotariu (2):
  iommu/fsl: Fix most checkpatch warnings and typos
  iommu/fsl: Work around erratum A-007907

Varun Sethi (5):
  iommu/fsl: Enable OMT cache, before invalidating PAACT and SPAACT
cache
  iommu/fsl: Factor out PCI specific code
  iommu/fsl: Enable default DMA window for PCIe devices once detached
from domain
  iommu/fsl: PAMU power management support
  iommu/fsl: Added cache controller compatible strings for SOCs

 drivers/iommu/fsl_pamu.c| 322 
 drivers/iommu/fsl_pamu.h|  30 ++--
 drivers/iommu/fsl_pamu_domain.c | 160 +---
 drivers/iommu/fsl_pamu_domain.h |   2 +-
 4 files changed, 381 insertions(+), 133 deletions(-)

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/7] iommu/fsl: Enable default DMA window for PCIe devices once detached from domain

2016-03-07 Thread Codrin Ciubotariu

From: Varun Sethi 

Once the PCIe device assigned to a guest VM (via VFIO) gets detached
from the iommu domain (when guest terminates), its PAMU table entry
is disabled. So, this would prevent the device from being used once
it's assigned back to the host.

This patch allows for creation of a default DMA window corresponding
to the device and subsequently enabling the PAMU table entry. Before
we enable the entry, we ensure that the device's bus master capability
is disabled (device quiesced).

Signed-off-by: Varun Sethi 
Signed-off-by: Codrin Ciubotariu 
---
 drivers/iommu/fsl_pamu.c| 45 -
 drivers/iommu/fsl_pamu.h|  1 +
 drivers/iommu/fsl_pamu_domain.c | 42 +++---
 3 files changed, 76 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index ce25084..181759e 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -302,6 +302,40 @@ int pamu_update_paace_stash(int liodn, u32 subwin, u32 
value)
return 0;
 }
 
+/* Default PPAACE settings for an LIODN */
+static void setup_default_ppaace(struct paace *ppaace)
+{
+   pamu_init_ppaace(ppaace);
+   /* window size is 2^(WSE+1) bytes */
+   set_bf(ppaace->addr_bitfields, PPAACE_AF_WSE, 35);
+   ppaace->wbah = 0;
+   set_bf(ppaace->addr_bitfields, PPAACE_AF_WBAL, 0);
+   set_bf(ppaace->impl_attr, PAACE_IA_ATM,
+   PAACE_ATM_NO_XLATE);
+   set_bf(ppaace->addr_bitfields, PAACE_AF_AP,
+   PAACE_AP_PERMS_ALL);
+}
+
+/* Reset the PAACE entry to the default state */
+void enable_default_dma_window(int liodn)
+{
+   struct paace *ppaace;
+
+   ppaace = pamu_get_ppaace(liodn);
+   if (!ppaace) {
+   pr_debug("Invalid liodn entry\n");
+   return;
+   }
+
+   memset(ppaace, 0, sizeof(struct paace));
+
+   setup_default_ppaace(ppaace);
+
+   /* Ensure that all other stores to the ppaace complete first */
+   mb();
+   pamu_enable_liodn(liodn);
+}
+
 /* Disable a subwindow corresponding to the LIODN */
 int pamu_disable_spaace(int liodn, u32 subwin)
 {
@@ -792,15 +826,8 @@ static void setup_liodns(void)
continue;
}
ppaace = pamu_get_ppaace(liodn);
-   pamu_init_ppaace(ppaace);
-   /* window size is 2^(WSE+1) bytes */
-   set_bf(ppaace->addr_bitfields, PPAACE_AF_WSE, 35);
-   ppaace->wbah = 0;
-   set_bf(ppaace->addr_bitfields, PPAACE_AF_WBAL, 0);
-   set_bf(ppaace->impl_attr, PAACE_IA_ATM,
-  PAACE_ATM_NO_XLATE);
-   set_bf(ppaace->addr_bitfields, PAACE_AF_AP,
-  PAACE_AP_PERMS_ALL);
+   setup_default_ppaace(ppaace);
+
if (of_device_is_compatible(node, "fsl,qman-portal"))
setup_qbman_paace(ppaace, QMAN_PORTAL_PAACE);
if (of_device_is_compatible(node, "fsl,qman"))
diff --git a/drivers/iommu/fsl_pamu.h b/drivers/iommu/fsl_pamu.h
index bebc2e3..3bd0434 100644
--- a/drivers/iommu/fsl_pamu.h
+++ b/drivers/iommu/fsl_pamu.h
@@ -412,5 +412,6 @@ void get_ome_index(u32 *omi_index, struct device *dev);
 int  pamu_update_paace_stash(int liodn, u32 subwin, u32 value);
 int pamu_disable_spaace(int liodn, u32 subwin);
 u32 pamu_get_max_subwin_cnt(void);
+void enable_default_dma_window(int liodn);
 
 #endif  /* __FSL_PAMU_H */
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 37f95d3..ba2f97b 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -327,17 +327,53 @@ static struct fsl_dma_domain *iommu_alloc_dma_domain(void)
return domain;
 }
 
+/* Disable device DMA capability and enable default DMA window */
+static void disable_device_dma(struct device_domain_info *info,
+  int enable_dma_window)
+{
+#ifdef CONFIG_PCI
+   if (dev_is_pci(info->dev))
+   pci_clear_master(to_pci_dev(info->dev));
+#endif
+
+   if (enable_dma_window)
+   enable_default_dma_window(info->liodn);
+}
+
+static int check_for_shared_liodn(struct device_domain_info *info)
+{
+   struct device_domain_info *tmp;
+
+   /*
+* Sanity check, to ensure that this is not a
+* shared LIODN. In case of a PCIe controller
+* it's possible that all PCIe devices share
+* the same LIODN.
+*/
+   list_for_each_entry(tmp, &info->domain->devices, link) {
+   if (info->liodn == tmp->liodn)
+   return 1;
+   }
+
+   return 0;
+}
+
 static void remove_device_ref(struct device_domain_info *info, u32 win_cnt)
 {
unsigned long flags;
+   int enable_dma_window = 0;
 
list

[PATCH 4/7] iommu/fsl: Factor out PCI specific code

2016-03-07 Thread Codrin Ciubotariu

From: Varun Sethi 

Factor out PCI specific code in the PAMU driver.

Signed-off-by: Varun Sethi 
Signed-off-by: Codrin Ciubotariu 
---
 drivers/iommu/fsl_pamu_domain.c | 77 ++---
 1 file changed, 41 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 869e55e..37f95d3 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -661,21 +661,14 @@ static int handle_attach_device(struct fsl_dma_domain 
*dma_domain,
return ret;
 }
 
-static int fsl_pamu_attach_device(struct iommu_domain *domain,
- struct device *dev)
+static struct device *get_dma_device(struct device *dev)
 {
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
-   const u32 *liodn;
-   u32 liodn_cnt;
-   int len, ret = 0;
-   struct pci_dev *pdev = NULL;
-   struct pci_controller *pci_ctl;
-
-   /*
-* Use LIODN of the PCI controller while attaching a
-* PCI device.
-*/
+   struct device *dma_dev = dev;
+#ifdef CONFIG_PCI
if (dev_is_pci(dev)) {
+   struct pci_controller *pci_ctl;
+   struct pci_dev *pdev;
+
pdev = to_pci_dev(dev);
pci_ctl = pci_bus_to_host(pdev->bus);
/*
@@ -683,16 +676,30 @@ static int fsl_pamu_attach_device(struct iommu_domain 
*domain,
 * so we can get the LIODN programmed by
 * u-boot.
 */
-   dev = pci_ctl->parent;
+   dma_dev = pci_ctl->parent;
}
+#endif
+   return dma_dev;
+}
+
+static int fsl_pamu_attach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
+   struct device *dma_dev;
+   const u32 *liodn;
+   u32 liodn_cnt;
+   int len, ret = 0;
+
+   dma_dev = get_dma_device(dev);
 
-   liodn = of_get_property(dev->of_node, "fsl,liodn", &len);
+   liodn = of_get_property(dma_dev->of_node, "fsl,liodn", &len);
if (liodn) {
liodn_cnt = len / sizeof(u32);
ret = handle_attach_device(dma_domain, dev, liodn, liodn_cnt);
} else {
pr_debug("missing fsl,liodn property at %s\n",
-dev->of_node->full_name);
+dma_dev->of_node->full_name);
ret = -EINVAL;
}
 
@@ -703,27 +710,13 @@ static void fsl_pamu_detach_device(struct iommu_domain 
*domain,
   struct device *dev)
 {
struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
+   struct device *dma_dev;
const u32 *prop;
int len;
-   struct pci_dev *pdev = NULL;
-   struct pci_controller *pci_ctl;
 
-   /*
-* Use LIODN of the PCI controller while detaching a
-* PCI device.
-*/
-   if (dev_is_pci(dev)) {
-   pdev = to_pci_dev(dev);
-   pci_ctl = pci_bus_to_host(pdev->bus);
-   /*
-* make dev point to pci controller device
-* so we can get the LIODN programmed by
-* u-boot.
-*/
-   dev = pci_ctl->parent;
-   }
+   dma_dev = get_dma_device(dev);
 
-   prop = of_get_property(dev->of_node, "fsl,liodn", &len);
+   prop = of_get_property(dma_dev->of_node, "fsl,liodn", &len);
if (prop)
detach_device(dev, dma_domain);
else
@@ -884,6 +877,7 @@ static struct iommu_group *get_device_iommu_group(struct 
device *dev)
return group;
 }
 
+#ifdef CONFIG_PCI
 static  bool check_pci_ctl_endpt_part(struct pci_controller *pci_ctl)
 {
u32 version;
@@ -927,6 +921,10 @@ static struct iommu_group *get_pci_device_group(struct 
pci_dev *pdev)
bool pci_endpt_partioning;
struct iommu_group *group = NULL;
 
+   /* Don't create device groups for virtual PCI bridges */
+   if (pdev->subordinate)
+   return NULL;
+
pci_ctl = pci_bus_to_host(pdev->bus);
pci_endpt_partioning = check_pci_ctl_endpt_part(pci_ctl);
/* We can partition PCIe devices so assign device group to the device */
@@ -963,6 +961,7 @@ static struct iommu_group *get_pci_device_group(struct 
pci_dev *pdev)
 
return group;
 }
+#endif
 
 static struct iommu_group *fsl_pamu_device_group(struct device *dev)
 {
@@ -973,10 +972,14 @@ static struct iommu_group *fsl_pamu_device_group(struct 
device *dev)
 * For platform devices we allocate a separate group for
 * each of the devices.
 */
-   if (dev_is_pci(dev))
+   if (!dev_is_pci(dev)) {
+   if (of_get_property(dev->of_node, "fsl, liodn", &len))
+   group = get_device_iommu_group(dev);
+#ifdef CONFIG_PCI
+   } else {
gr

[PATCH] powerpc/process: fix altivec SPR not being saved

2016-03-07 Thread Oliver O'Halloran

In save_sprs() in process.c contains the following test:

if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
t->vrsave = mfspr(SPRN_VRSAVE);

CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test
is equivilent to:

if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
cpu_has_feature(CPU_FTR_COHERENT_ICACHE))

On CPUs without support for both (i.e G5) this results in vrsave not being
saved between context switches. The vector register save/restore code
doesn't use VRSAVE to determine which registers to save/restore,
but the value of VRSAVE is used to determine if altivec is being used
in several code paths.

Signed-off-by: Oliver O'Halloran 
Signed-off-by: Anton Blanchard 
Fixes: 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and 
restore_sprs()")
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dccc87e8fee5..bc6aa87a3b12 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -854,7 +854,7 @@ void restore_tm_state(struct pt_regs *regs)
 static inline void save_sprs(struct thread_struct *t)
 {
 #ifdef CONFIG_ALTIVEC
-   if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC)))
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
t->vrsave = mfspr(SPRN_VRSAVE);
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: How to merge? (was Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc)

2016-03-07 Thread Jiri Kosina

On Mon, 7 Mar 2016, Michael Ellerman wrote:

> > This aligns with my usual workflow, so that'd be my preferred way of doing
> > things; i.e. you put all the ftrace changes into a separate topic branch,
> > and then
> >
> > - you pull that branch into powerpc#next
> > - I pull that branch into livepatching tree
> > - I apply the ppc livepatching support on top of that
> > - I send a pull request to Linus only after powerpc#next gets merged to
> >   Linus' tree
> >
> > Sounds good?
> 
> Yep, here it is:
> 
>   
> https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/log/?h=topic/mprofile-kernel
> 
> aka:
> 
>   git fetch git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> topic/mprofile-kernel

Excellent, thanks.

> I haven't merged it into my next yet, but I will tomorrow unless you tell me
> there's something wrong with it.

There is one remaining issue which I think would be really nice to 
have(TM), and that's Steven's Ack for the whole thing :)

For the livepatching part, I don't think we are quite there yet (so maybe 
it'll miss the upcoming merge window anyway).

My primary worry there is what Torsten pointed out, i.e. functions with 
either varargs or more than 8 args needing special care.

Also, I'd like to have this positively reviewed by at least one more 
livepatching maintainer (I am currently looking into it myself, but my 
understanding of powerpc arch is rather low, so the more eyes, the 
better).

Thanks!

-- 
Jiri Kosina
SUSE Labs

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: How to merge? (was Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc)

2016-03-07 Thread Michael Ellerman

On Mon, 2016-03-07 at 23:52 +0100, Jiri Kosina wrote:
> On Mon, 7 Mar 2016, Michael Ellerman wrote:
>
> > > This aligns with my usual workflow, so that'd be my preferred way of doing
> > > things; i.e. you put all the ftrace changes into a separate topic branch,
> > > and then
> > >
> > > - you pull that branch into powerpc#next
> > > - I pull that branch into livepatching tree
> > > - I apply the ppc livepatching support on top of that
> > > - I send a pull request to Linus only after powerpc#next gets merged to
> > >   Linus' tree
> > >
> > > Sounds good?
> >
> > Yep, here it is:
> >
> >   
> > https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/log/?h=topic/mprofile-kernel
> >
> > aka:
> >
> >   git fetch git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> > topic/mprofile-kernel
>
> Excellent, thanks.
>
> > I haven't merged it into my next yet, but I will tomorrow unless you tell me
> > there's something wrong with it.
>
> There is one remaining issue which I think would be really nice to
> have(TM), and that's Steven's Ack for the whole thing :)

Yeah. He's been on CC the whole time, but he's probably getting a bit sick of
it all, as we're up to about version 15. So I figure if he really hated it he'd
have said so by now :) - but an Ack would still be good.

> For the livepatching part, I don't think we are quite there yet (so maybe
> it'll miss the upcoming merge window anyway).
>
> My primary worry there is what Torsten pointed out, i.e. functions with
> either varargs or more than 8 args needing special care.

Yeah true. My preference would be to merge it, but mark LIVEPATCH as
experimental on powerpc. I think having it in the tree would help it get more
testing, and probably find other bugs too. But it's up to you guys.

> Also, I'd like to have this positively reviewed by at least one more
> livepatching maintainer (I am currently looking into it myself, but my
> understanding of powerpc arch is rather low, so the more eyes, the
> better).

Sure. I can answer powerpc questions, though Torsten is probably the person who
has the best understanding of (livepatching && powerpc).

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: How to merge? (was Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc)

2016-03-07 Thread Josh Poimboeuf

On Mon, Mar 07, 2016 at 11:52:31PM +0100, Jiri Kosina wrote:
> On Mon, 7 Mar 2016, Michael Ellerman wrote:
> 
> > > This aligns with my usual workflow, so that'd be my preferred way of doing
> > > things; i.e. you put all the ftrace changes into a separate topic branch,
> > > and then
> > >
> > > - you pull that branch into powerpc#next
> > > - I pull that branch into livepatching tree
> > > - I apply the ppc livepatching support on top of that
> > > - I send a pull request to Linus only after powerpc#next gets merged to
> > >   Linus' tree
> > >
> > > Sounds good?
> > 
> > Yep, here it is:
> > 
> >   
> > https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/log/?h=topic/mprofile-kernel
> > 
> > aka:
> > 
> >   git fetch git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> > topic/mprofile-kernel
> 
> Excellent, thanks.
> 
> > I haven't merged it into my next yet, but I will tomorrow unless you tell me
> > there's something wrong with it.
> 
> There is one remaining issue which I think would be really nice to 
> have(TM), and that's Steven's Ack for the whole thing :)
> 
> For the livepatching part, I don't think we are quite there yet (so maybe 
> it'll miss the upcoming merge window anyway).
> 
> My primary worry there is what Torsten pointed out, i.e. functions with 
> either varargs or more than 8 args needing special care.
> 
> Also, I'd like to have this positively reviewed by at least one more 
> livepatching maintainer (I am currently looking into it myself, but my 
> understanding of powerpc arch is rather low, so the more eyes, the 
> better).

It's been a few years but I'll try to dust off my powerpc chops and give
it a proper review.

-- 
Josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-07 Thread Benjamin Herrenschmidt

On Mon, 2016-03-07 at 21:04 +0530, Aneesh Kumar K.V wrote:
> Sebastian Andrzej Siewior  writes:
> 
> While you are there, can you also fix the wrong indentation on line
> 423
> ?

 .../...

Also this looks like stable material no ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] dma driver: fix potential oom issue of fsldma

2016-03-07 Thread Li Yang

The change looks to be correct.  But we need better formatting and description.

Make the title something like:

dmaengine: fsldma: fix memory leak

On Thu, Dec 24, 2015 at 1:26 AM, Xuelin Shi  wrote:
> From: Xuelin Shi 
>
> missing unmap sources and destinations while doing dequeue.

How can this describe your change?

Regards,
Leo
>
> Signed-off-by: Xuelin Shi 
> ---
>  drivers/dma/fsldma.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 2209f75..aac85c3 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -522,6 +522,8 @@ static dma_cookie_t fsldma_run_tx_complete_actions(struct 
> fsldma_chan *chan,
> chan_dbg(chan, "LD %p callback\n", desc);
> txd->callback(txd->callback_param);
> }
> +
> +   dma_descriptor_unmap(txd);
> }
>
> /* Run any dependencies */
> --
> 1.8.4
>
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] dma driver: fix potential oom issue of fsldma

2016-03-07 Thread Li Yang

On Thu, Dec 24, 2015 at 1:26 AM, Xuelin Shi  wrote:
> From: Xuelin Shi 

And please cc dmaengine maintainers and its mailing list when you send
next version.

>
> missing unmap sources and destinations while doing dequeue.
>
> Signed-off-by: Xuelin Shi 
> ---
>  drivers/dma/fsldma.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 2209f75..aac85c3 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -522,6 +522,8 @@ static dma_cookie_t fsldma_run_tx_complete_actions(struct 
> fsldma_chan *chan,
> chan_dbg(chan, "LD %p callback\n", desc);
> txd->callback(txd->callback_param);
> }
> +
> +   dma_descriptor_unmap(txd);
> }
>
> /* Run any dependencies */
> --
> 1.8.4
>
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-03-07 Thread Matt Ochs

A couple of minor nits below...

> On Mar 7, 2016, at 12:59 PM, Ian Munsie  wrote:
> 
> @@ -346,7 +350,7 @@ ssize_t afu_read(struct file *file, char __user *buf, 
> size_t count,
> 
>   for (;;) {
>   prepare_to_wait(&ctx->wq, &wait, TASK_INTERRUPTIBLE);
> - if (ctx_event_pending(ctx))
> + if (ctx_event_pending(ctx) || (ctx->status == CLOSED))
>   break;
> 
>   if (!cxl_adapter_link_ok(ctx->afu->adapter)) {
> @@ -376,7 +380,14 @@ ssize_t afu_read(struct file *file, char __user *buf, 
> size_t count,
>   memset(&event, 0, sizeof(event));
>   event.header.process_element = ctx->pe;
>   event.header.size = sizeof(struct cxl_event_header);
> - if (ctx->pending_irq) {
> +
> + if (ctx->afu_driver_ops && ctx->afu_driver_ops->event_pending(ctx)) {
> + pr_devel("afu_read delivering AFU driver specific event\n");
> + event.header.type = CXL_EVENT_AFU_DRIVER;
> + ctx->afu_driver_ops->deliver_event(ctx, &event, sizeof(event));
> + WARN_ON(event.header.size > sizeof(event));
> +
> + } else if (ctx->pending_irq) {
>   pr_devel("afu_read delivering AFU interrupt\n");
>   event.header.size += sizeof(struct cxl_event_afu_interrupt);
>   event.header.type = CXL_EVENT_AFU_INTERRUPT;
> @@ -384,6 +395,7 @@ ssize_t afu_read(struct file *file, char __user *buf, 
> size_t count,
>   clear_bit(event.irq.irq - 1, ctx->irq_bitmap);
>   if (bitmap_empty(ctx->irq_bitmap, ctx->irq_count))
>   ctx->pending_irq = false;
> +
>   } else if (ctx->pending_fault) {
>   pr_devel("afu_read delivering data storage fault\n");
>   event.header.size += sizeof(struct cxl_event_data_storage);
> @@ -391,12 +403,14 @@ ssize_t afu_read(struct file *file, char __user *buf, 
> size_t count,
>   event.fault.addr = ctx->fault_addr;
>   event.fault.dsisr = ctx->fault_dsisr;
>   ctx->pending_fault = false;
> +
>   } else if (ctx->pending_afu_err) {
>   pr_devel("afu_read delivering afu error\n");
>   event.header.size += sizeof(struct cxl_event_afu_error);
>   event.header.type = CXL_EVENT_AFU_ERROR;
>   event.afu_error.error = ctx->afu_err;
>   ctx->pending_afu_err = false;
> +

Any reason for adding these extra lines as part of this commit?

> diff --git a/include/misc/cxl.h b/include/misc/cxl.h
> index f2ffe5b..f198a42 100644
> --- a/include/misc/cxl.h
> +++ b/include/misc/cxl.h
> @@ -210,4 +210,33 @@ ssize_t cxl_fd_read(struct file *file, char __user *buf, 
> size_t count,
> void cxl_perst_reloads_same_image(struct cxl_afu *afu,
> bool perst_reloads_same_image);
> 
> +/*
> + * AFU driver ops allows an AFU driver to create their own events to pass to
> + * userspace through the file descriptor as a simpler alternative to 
> overriding
> + * the read() and poll() calls that works with the generic cxl events. These
> + * events are given priority over the generic cxl events, so will be 
> delivered

so _they_ will be delivered

> + * first if multiple types of events are pending.
> + *
> + * even_pending() will be called by the cxl driver to check if an event is
> + * pending (e.g. in select/poll/read calls).

event_pending() <- missing 't'

> + *
> + * deliver_event() will be called to fill out a cxl_event structure with the
> + * driver specific event. The header will already have the type and
> + * process_element fields filled in, and header.size will be set to
> + * sizeof(struct cxl_event_header). The AFU driver can extend that size up to
> + * max_size (if an afu driver requires more space, they should submit a patch
> + * increasing the size in the struct cxl_event_afu_driver_reserved 
> definition).
> + *
> + * Both of these calls are made with a spin lock held, so they must not 
> sleep.
> + */
> +struct cxl_afu_driver_ops {
> + bool (*event_pending) (struct cxl_context *ctx);
> + void (*deliver_event) (struct cxl_context *ctx,
> + struct cxl_event *event, size_t max_size);
> +};
> +
> +/* Associate the above driver ops with a specific context */
> +void cxl_set_driver_ops(struct cxl_context *ctx,
> + struct cxl_afu_driver_ops *ops);
> +
> #endif /* _MISC_CXL_H */
> diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h
> index 1e889aa..8b097db 100644
> --- a/include/uapi/misc/cxl.h
> +++ b/include/uapi/misc/cxl.h
> @@ -69,6 +69,7 @@ enum cxl_event_type {
>   CXL_EVENT_AFU_INTERRUPT = 1,
>   CXL_EVENT_DATA_STORAGE  = 2,
>   CXL_EVENT_AFU_ERROR = 3,
> + CXL_EVENT_AFU_DRIVER= 4,
> };
> 
> struct cxl_event_header {
> @@ -100,12 +101,33 @@ struct cxl_event_afu_error {
>   __u64 error;
> };
> 
> +struct cxl_event_afu_driver_reserved {
> + /*
> +  * Reserves space for AFU driver specific

Re: [PATCH v2 2/2] cxl: add set/get private data to context struct

2016-03-07 Thread Matt Ochs

> On Mar 7, 2016, at 12:59 PM, Ian Munsie  wrote:
> 
> From: Michael Neuling 
> 
> This provides AFU drivers a means to associate private data with a cxl
> context. This is particularly intended for make the new callbacks for
> driver specific events easier for AFU drivers to use, as they can easily
> get back to any private data structures they may use.
> 
> Signed-off-by: Michael Neuling 
> Signed-off-by: Ian Munsie 

Reviewed-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()

2016-03-07 Thread Gavin Shan

On Tue, Mar 01, 2016 at 08:40:12PM -0600, Rob Herring wrote:
>On Thu, Feb 18, 2016 at 9:16 PM, Gavin Shan  wrote:
>> On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>>>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan  
>>>wrote:
 This renames unflatten_dt_node() to unflatten_dt_nodes() as it
 populates multiple device nodes from FDT blob. No logical changes
 introduced.

 Signed-off-by: Gavin Shan 
 ---
  drivers/of/fdt.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)
>>>
>>>Acked-by: Rob Herring 
>>>
>>>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>>>doesn't go in given they fix a separate problem. I just need to know
>>>soon (or at least they need to go into -next soon).
>>>
>>
>> Thanks for quick response, Rob. It depends how much comments I will
>> receive for the powerpc/powernv part. Except that, all parts including
>> this one have been ack'ed. I can discuss it with Michael Ellerman.
>> By the way, how soon you need the decision to merge 40-42? If that's
>> one or two weeks later, I don't think the reivew on the whole series
>> can be done.
>
>Well, it's been 2 weeks now. I need to know this week.
>
>> Also, I think you probably can merge 40-44 as they're all about
>> fdt.c. If they can be merged at one time, I needn't bother (cc)
>> you again if I need send a updated revision. Thanks for your
>> review.
>
>I did not include 43 and 44 as they are only needed for the rest of your 
>series.
>

Rob, sorry for late reponse. I really expect this series to be merged to 4.6 and
I was checking reviewers' bandwidth to review it. Unfortunately, I didn't 
receive
any comments except yours until now. That means this series has to miss 4.6. 
Please
pick/merge 41 and 42 if no body has objection. Thanks again for your time on 
this.

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v3 3/7] QE: Add uqe_serial document to bindings

2016-03-07 Thread Qiang Zhao

On Tue, Mar 08, 2016 at 1:28AM, Scott Wood wrote:
> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: Tuesday, March 08, 2016 1:28 AM
> To: Qiang Zhao ; Rob Herring 
> Cc: Yang-Leo Li ; Xiaobo Xie ;
> linux-ker...@vger.kernel.org; devicet...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Subject: Re: [PATCH v3 3/7] QE: Add uqe_serial document to bindings
> 
> On Mon, 2016-03-07 at 02:35 +, Qiang Zhao wrote:
> > On Tue, Mar 05, 2016 at 12:26PM, Rob Herring wrote:
> > > -Original Message-
> > > From: Rob Herring [mailto:r...@kernel.org]
> > > Sent: Saturday, March 05, 2016 12:26 PM
> > > To: Qiang Zhao 
> > > Cc: o...@buserror.net; Yang-Leo Li ; Xiaobo Xie
> > > ; linux-ker...@vger.kernel.org;
> > > devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> > > Subject: Re: [PATCH v3 3/7] QE: Add uqe_serial document to bindings
> > >
> > > On Tue, Mar 01, 2016 at 03:09:39PM +0800, Zhao Qiang wrote:
> > > > Add uqe_serial document to
> > > > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.tx
> > > > t
> > > >
> > > > Signed-off-by: Zhao Qiang 
> > > > ---
> > > > Changes for v2
> > > > - modify tx/rx-clock-name specification Changes for v2
> > > > - NA
> > > >
> > > >  .../bindings/powerpc/fsl/cpm_qe/uqe_serial.txt| 19
> > > +++
> > > >  1 file changed, 19 insertions(+)
> > > >  create mode 100644
> > > > Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.tx
> > > > t
> > > >
> > > > diff --git
> > > > a/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.
> > > > txt
> > > > b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.
> > > > txt
> > > > new file mode 100644
> > > > index 000..436c71c
> > > > --- /dev/null
> > > > +++
> b/Documentation/devicetree/bindings/powerpc/fsl/cpm_qe/uqe_serial.
> > > > +++ txt
> > > > @@ -0,0 +1,19 @@
> > > > +* Serial
> > > > +
> > > > +Currently defined compatibles:
> > > > +- ucc_uart
> > >
> > > I guess this is in use already and okay. However, looking at the
> > > driver there really should be SoC specific compatible strings here
> > > since the driver is looking up the SoC compatible string and
> > > composing the firmware filename from that.
> >
> > Ok, I will changed both driver and this compatible.
> 
> But don't break existing device trees while doing so.

Do I need to send driver patch in this patchset or a individual patch?

BR
-Zhao Qiang

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-03-07 Thread Ian Munsie

Excerpts from Matt Ochs's message of 2016-03-08 11:26:55 +1100:
> Any reason for adding these extra lines as part of this commit?

mpe asked for some newlines here in the v1 submission, and it only
really made sense to do so if all the related sections had consistent
whitespace as well.

> > +/*
> > + * AFU driver ops allows an AFU driver to create their own events to pass 
> > to
> > + * userspace through the file descriptor as a simpler alternative to 
> > overriding
> > + * the read() and poll() calls that works with the generic cxl events. 
> > These
> > + * events are given priority over the generic cxl events, so will be 
> > delivered
> 
> so _they_ will be delivered

thanks for spotting that...

> > + *
> > + * even_pending() will be called by the cxl driver to check if an event is
> > + * pending (e.g. in select/poll/read calls).
> 
> event_pending() <- missing 't'

...and that.

Cheers,
-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-03-07 Thread Ian Munsie

From: Ian Munsie 

This adds an afu_driver_ops structure with event_pending and
deliver_event callbacks. An AFU driver such as cxlflash can fill these
out and associate it with a context to enable passing custom AFU
specific events to userspace.

The cxl driver will call event_pending() during poll, select, read, etc.
calls to check if an AFU driver specific event is pending, and will call
deliver_event() to deliver that event. This way, the cxl driver takes
care of all the usual locking semantics around these calls and handles
all the generic cxl events, so that the AFU driver only needs to worry
about it's own events.

The deliver_event() call is passed a struct cxl_event buffer to fill in.
The header will already be filled in for an AFU driver event, and the
AFU driver is expected to expand the header.size as necessary (up to
max_size, defined by struct cxl_event_afu_driver_reserved) and fill out
it's own information.

Since AFU drivers provide their own means for userspace to obtain the
AFU file descriptor (i.e. cxlflash uses an ioctl on their scsi file
descriptor to obtain the AFU file descriptor) and the generic cxl driver
will never use this event, the ABI of the event is up to each individual
AFU driver.

Signed-off-by: Ian Munsie 
---

Changes since v2:
- Fixed some typos spotted by Matt Ochs

Changes since v1:
- Rebased on upstream
- Bumped cxl api version to 3
- Addressed comments from mpe:
  - Clarified commit message & some comments
  - Mentioned 'cxlflash' as a possible user of this event
  - Check driver ops on registration and warn if missing calls
  - Remove redundant checks where driver ops is used
  - Simplified ctx_event_pending and removed underscore version
  - Changed deliver_event to take the context as the first argument

 drivers/misc/cxl/Kconfig |  5 +
 drivers/misc/cxl/api.c   |  8 
 drivers/misc/cxl/cxl.h   |  6 +-
 drivers/misc/cxl/file.c  | 36 +---
 include/misc/cxl.h   | 29 +
 include/uapi/misc/cxl.h  | 22 ++
 6 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 8756d06..560412c 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -15,12 +15,17 @@ config CXL_EEH
bool
default n
 
+config CXL_AFU_DRIVER_OPS
+   bool
+   default n
+
 config CXL
tristate "Support for IBM Coherent Accelerators (CXL)"
depends on PPC_POWERNV && PCI_MSI && EEH
select CXL_BASE
select CXL_KERNEL_API
select CXL_EEH
+   select CXL_AFU_DRIVER_OPS
default m
help
  Select this option to enable driver support for IBM Coherent
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index ea3eeb7..eebc9c3 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -296,6 +296,14 @@ struct cxl_context *cxl_fops_get_context(struct file *file)
 }
 EXPORT_SYMBOL_GPL(cxl_fops_get_context);
 
+void cxl_set_driver_ops(struct cxl_context *ctx,
+   struct cxl_afu_driver_ops *ops)
+{
+   WARN_ON(!ops->event_pending || !ops->deliver_event);
+   ctx->afu_driver_ops = ops;
+}
+EXPORT_SYMBOL_GPL(cxl_set_driver_ops);
+
 int cxl_start_work(struct cxl_context *ctx,
   struct cxl_ioctl_start_work *work)
 {
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index a521bc7..64e8e0a 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 extern uint cxl_verbose;
@@ -34,7 +35,7 @@ extern uint cxl_verbose;
  * Bump version each time a user API change is made, whether it is
  * backwards compatible ot not.
  */
-#define CXL_API_VERSION 2
+#define CXL_API_VERSION 3
 #define CXL_API_VERSION_COMPATIBLE 1
 
 /*
@@ -485,6 +486,9 @@ struct cxl_context {
bool pending_fault;
bool pending_afu_err;
 
+   /* Used by AFU drivers for driver specific event delivery */
+   struct cxl_afu_driver_ops *afu_driver_ops;
+
struct rcu_head rcu;
 };
 
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 783337d..d1cc297 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -295,6 +295,17 @@ int afu_mmap(struct file *file, struct vm_area_struct *vm)
return cxl_context_iomap(ctx, vm);
 }
 
+static inline bool ctx_event_pending(struct cxl_context *ctx)
+{
+   if (ctx->pending_irq || ctx->pending_fault || ctx->pending_afu_err)
+   return true;
+
+   if (ctx->afu_driver_ops)
+   return ctx->afu_driver_ops->event_pending(ctx);
+
+   return false;
+}
+
 unsigned int afu_poll(struct file *file, struct poll_table_struct *poll)
 {
struct cxl_context *ctx = file->private_data;
@@ -307,8 +318,7 @@ unsigned int afu_poll(struct file *file, struct 
poll_table_struct *poll)
pr_devel("afu_poll wait done pe: %i\n", ctx->pe);
 
s

[PATCH v3 2/2] cxl: add set/get private data to context struct

2016-03-07 Thread Ian Munsie

From: Michael Neuling 

This provides AFU drivers a means to associate private data with a cxl
context. This is particularly intended for make the new callbacks for
driver specific events easier for AFU drivers to use, as they can easily
get back to any private data structures they may use.

Signed-off-by: Michael Neuling 
Signed-off-by: Ian Munsie 
Reviewed-by: Matthew R. Ochs 
---

No changes since v1, added Matt Ochs reviewed-by tag.

 drivers/misc/cxl/api.c | 21 +
 drivers/misc/cxl/cxl.h |  3 +++
 include/misc/cxl.h |  7 +++
 3 files changed, 31 insertions(+)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index eebc9c3..93b270c 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -91,6 +91,27 @@ int cxl_release_context(struct cxl_context *ctx)
 }
 EXPORT_SYMBOL_GPL(cxl_release_context);
 
+
+int cxl_set_priv(struct cxl_context *ctx, void *priv)
+{
+   if (!ctx)
+   return -EINVAL;
+
+   ctx->priv = priv;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_set_priv);
+
+void *cxl_get_priv(struct cxl_context *ctx)
+{
+   if (!ctx)
+   return ERR_PTR(-EINVAL);
+
+   return ctx->priv;
+}
+EXPORT_SYMBOL_GPL(cxl_get_priv);
+
 int cxl_allocate_afu_irqs(struct cxl_context *ctx, int num)
 {
if (num == 0)
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 64e8e0a..71f66e7 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -454,6 +454,9 @@ struct cxl_context {
/* Only used in PR mode */
u64 process_token;
 
+   /* driver private data */
+   void *priv;
+
unsigned long *irq_bitmap; /* Accessed from IRQ context */
struct cxl_irq_ranges irqs;
struct list_head irq_names;
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index 01d66a3..76c08cb 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -89,6 +89,13 @@ struct cxl_context *cxl_dev_context_init(struct pci_dev 
*dev);
 int cxl_release_context(struct cxl_context *ctx);
 
 /*
+ * Set and get private data associated with a context. Allows drivers to have a
+ * back pointer to some useful structure.
+ */
+int cxl_set_priv(struct cxl_context *ctx, void *priv);
+void *cxl_get_priv(struct cxl_context *ctx);
+
+/*
  * Allocate AFU interrupts for this context. num=0 will allocate the default
  * for this AFU as given in the AFU descriptor. This number doesn't include the
  * interrupt 0 (CAIA defines AFU IRQ 0 for page faults). Each interrupt to be
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: How to merge? (was Re: [PATCH][v4] livepatch/ppc: Enable livepatching on powerpc)

2016-03-07 Thread Steven Rostedt

On Tue, 08 Mar 2016 10:20:22 +1100
Michael Ellerman  wrote:

> >
> > There is one remaining issue which I think would be really nice to
> > have(TM), and that's Steven's Ack for the whole thing :)  
> 
> Yeah. He's been on CC the whole time, but he's probably getting a bit sick of
> it all, as we're up to about version 15. So I figure if he really hated it 
> he'd
> have said so by now :) - but an Ack would still be good.
> 

I figured this is all powerpc work, and you can crash what you like :-)

If you want, I can try to get some time tomorrow and take a quick look
at the patches. But yeah, I haven't been paying too much attention to
this.

-- Steve
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 01/25] powerpc/mm: Clean up error handling for htab_remove_mapping

2016-03-07 Thread David Gibson

Currently, the only error that htab_remove_mapping() can report is -EINVAL,
if removal of bolted HPTEs isn't implemeted for this platform.  We make
a few clean ups to the handling of this:

 * EINVAL isn't really the right code - there's nothing wrong with the
   function's arguments - use ENODEV instead
 * We were also printing a warning message, but that's a decision better
   left up to the callers, so remove it
 * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
   error, making the warning message redundant, so no change is needed
   there.
 * The other caller is remove_section_mapping().  This is called in the
   memory hot remove path at a point after vmemmap_remove_mapping() so
   if hpte_removebolted isn't implemented, we'd expect to have already
   BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
   really shouldn't be happening.

Signed-off-by: David Gibson 
---
 arch/powerpc/mm/hash_utils_64.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index ba59d59..9f7d727 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -273,11 +273,8 @@ int htab_remove_mapping(unsigned long vstart, unsigned 
long vend,
shift = mmu_psize_defs[psize].shift;
step = 1 << shift;
 
-   if (!ppc_md.hpte_removebolted) {
-   printk(KERN_WARNING "Platform doesn't implement "
-   "hpte_removebolted\n");
-   return -EINVAL;
-   }
+   if (!ppc_md.hpte_removebolted)
+   return -ENODEV;
 
for (vaddr = vstart; vaddr < vend; vaddr += step)
ppc_md.hpte_removebolted(vaddr, psize, ssize);
@@ -641,8 +638,10 @@ int create_section_mapping(unsigned long start, unsigned 
long end)
 
 int remove_section_mapping(unsigned long start, unsigned long end)
 {
-   return htab_remove_mapping(start, end, mmu_linear_psize,
-   mmu_kernel_ssize);
+   int rc = htab_remove_mapping(start, end, mmu_linear_psize,
+mmu_kernel_ssize);
+   WARN_ON(rc < 0);
+   return rc;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 02/25] powerpc/mm: Handle removing maybe-present bolted HPTEs

2016-03-07 Thread David Gibson

At the moment the hpte_removebolted callback in ppc_md returns void and
will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
place.  This is awkward for the case of cleaning up a mapping which was
partially made before failing.

So, we add a return value to hpte_removebolted, and have it return ENOENT
in the case that the HPTE to remove didn't exist in the first place.

In the (sole) caller, we propagate errors in hpte_removebolted to its
caller to handle.  However, we handle ENOENT specially, continuing to
complete the unmapping over the specified range before returning the error
to the caller.

This means that htab_remove_mapping() will work sanely on a partially
present mapping, removing any HPTEs which are present, while also returning
ENOENT to its caller in case it's important there.

There are two callers of htab_remove_mapping():
   - In remove_section_mapping() we already WARN_ON() any error return,
 which is reasonable - in this case the mapping should be fully
 present
   - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
 just a WARN_ON() in the case of ENOENT, since failing to remove a
 mapping that wasn't there in the first place probably shouldn't be
 fatal.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/machdep.h|  2 +-
 arch/powerpc/mm/hash_utils_64.c   | 15 ---
 arch/powerpc/mm/init_64.c |  9 +
 arch/powerpc/platforms/pseries/lpar.c |  9 ++---
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 3f191f5..fa25643 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -54,7 +54,7 @@ struct machdep_calls {
   int psize, int apsize,
   int ssize);
long(*hpte_remove)(unsigned long hpte_group);
-   void(*hpte_removebolted)(unsigned long ea,
+   int (*hpte_removebolted)(unsigned long ea,
 int psize, int ssize);
void(*flush_hash_range)(unsigned long number, int local);
void(*hugepage_invalidate)(unsigned long vsid,
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 9f7d727..99fbee0 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -269,6 +269,8 @@ int htab_remove_mapping(unsigned long vstart, unsigned long 
vend,
 {
unsigned long vaddr;
unsigned int step, shift;
+   int rc;
+   int ret = 0;
 
shift = mmu_psize_defs[psize].shift;
step = 1 << shift;
@@ -276,10 +278,17 @@ int htab_remove_mapping(unsigned long vstart, unsigned 
long vend,
if (!ppc_md.hpte_removebolted)
return -ENODEV;
 
-   for (vaddr = vstart; vaddr < vend; vaddr += step)
-   ppc_md.hpte_removebolted(vaddr, psize, ssize);
+   for (vaddr = vstart; vaddr < vend; vaddr += step) {
+   rc = ppc_md.hpte_removebolted(vaddr, psize, ssize);
+   if (rc == -ENOENT) {
+   ret = -ENOENT;
+   continue;
+   }
+   if (rc < 0)
+   return rc;
+   }
 
-   return 0;
+   return ret;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 379a6a9..baa1a23 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -232,10 +232,11 @@ static void __meminit vmemmap_create_mapping(unsigned 
long start,
 static void vmemmap_remove_mapping(unsigned long start,
   unsigned long page_size)
 {
-   int mapped = htab_remove_mapping(start, start + page_size,
-mmu_vmemmap_psize,
-mmu_kernel_ssize);
-   BUG_ON(mapped < 0);
+   int rc = htab_remove_mapping(start, start + page_size,
+mmu_vmemmap_psize,
+mmu_kernel_ssize);
+   BUG_ON((rc < 0) && (rc != -ENOENT));
+   WARN_ON(rc == -ENOENT);
 }
 #endif
 
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 477290a..2415a0d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -505,8 +505,8 @@ static void pSeries_lpar_hugepage_invalidate(unsigned long 
vsid,
 }
 #endif
 
-static void pSeries_lpar_hpte_removebolted(unsigned long ea,
-  int psize, int ssize)
+static int pSeries_lpar_hpte_removebolted(unsigned long ea,
+ int psize, int ssize)
 {
unsigned long vpn;
unsigned long slot, vsid;
@@ -515,11 +515,14 @@ static void pSeries_lpar_hpte_removebolted(unsi

[RFCv2 03/25] powerpc/mm: Clean up memory hotplug failure paths

2016-03-07 Thread David Gibson

This makes a number of cleanups to handling of mapping failures during
memory hotplug on Power:

For errors creating the linear mapping for the hot-added region:
  * This is now reported with EFAULT which is more appropriate than the
previous EINVAL (the failure is unlikely to be related to the
function's parameters)
  * An error in this path now prints a warning message, rather than just
silently failing to add the extra memory.
  * Previously a failure here could result in the region being partially
mapped.  We now clean up any partial mapping before failing.

For errors creating the vmemmap for the hot-added region:
   * This is now reported with EFAULT instead of causing a BUG() - this
 could happen for external reason (e.g. full hash table) so it's better
 to handle this non-fatally
   * An error message is also printed, so the failure won't be silent
   * As above a failure could cause a partially mapped region, we now
 clean this up.

Signed-off-by: David Gibson 
Reviewed-by: Paul Mackerras 
---
 arch/powerpc/mm/hash_utils_64.c | 13 ++---
 arch/powerpc/mm/init_64.c   | 38 ++
 arch/powerpc/mm/mem.c   | 10 --
 3 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 99fbee0..fdcf9d1 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -640,9 +640,16 @@ static unsigned long __init htab_get_table_size(void)
 #ifdef CONFIG_MEMORY_HOTPLUG
 int create_section_mapping(unsigned long start, unsigned long end)
 {
-   return htab_bolt_mapping(start, end, __pa(start),
-pgprot_val(PAGE_KERNEL), mmu_linear_psize,
-mmu_kernel_ssize);
+   int rc = htab_bolt_mapping(start, end, __pa(start),
+  pgprot_val(PAGE_KERNEL), mmu_linear_psize,
+  mmu_kernel_ssize);
+
+   if (rc < 0) {
+   int rc2 = htab_remove_mapping(start, end, mmu_linear_psize,
+ mmu_kernel_ssize);
+   BUG_ON(rc2 && (rc2 != -ENOENT));
+   }
+   return rc;
 }
 
 int remove_section_mapping(unsigned long start, unsigned long end)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index baa1a23..fbc9448 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -188,9 +188,9 @@ static int __meminit vmemmap_populated(unsigned long start, 
int page_size)
  */
 
 #ifdef CONFIG_PPC_BOOK3E
-static void __meminit vmemmap_create_mapping(unsigned long start,
-unsigned long page_size,
-unsigned long phys)
+static int __meminit vmemmap_create_mapping(unsigned long start,
+   unsigned long page_size,
+   unsigned long phys)
 {
/* Create a PTE encoding without page size */
unsigned long i, flags = _PAGE_PRESENT | _PAGE_ACCESSED |
@@ -208,6 +208,8 @@ static void __meminit vmemmap_create_mapping(unsigned long 
start,
 */
for (i = 0; i < page_size; i += PAGE_SIZE)
BUG_ON(map_kernel_page(start + i, phys, flags));
+
+   return 0;
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -217,15 +219,20 @@ static void vmemmap_remove_mapping(unsigned long start,
 }
 #endif
 #else /* CONFIG_PPC_BOOK3E */
-static void __meminit vmemmap_create_mapping(unsigned long start,
-unsigned long page_size,
-unsigned long phys)
+static int __meminit vmemmap_create_mapping(unsigned long start,
+   unsigned long page_size,
+   unsigned long phys)
 {
-   int  mapped = htab_bolt_mapping(start, start + page_size, phys,
-   pgprot_val(PAGE_KERNEL),
-   mmu_vmemmap_psize,
-   mmu_kernel_ssize);
-   BUG_ON(mapped < 0);
+   int rc = htab_bolt_mapping(start, start + page_size, phys,
+  pgprot_val(PAGE_KERNEL),
+  mmu_vmemmap_psize, mmu_kernel_ssize);
+   if (rc < 0) {
+   int rc2 = htab_remove_mapping(start, start + page_size,
+ mmu_vmemmap_psize,
+ mmu_kernel_ssize);
+   BUG_ON(rc2 && (rc2 != -ENOENT));
+   }
+   return rc;
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -304,6 +311,7 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
 
for (; start < end; start += page_size) {
void *p;
+   int rc;
 
if (vmemmap_popula

[RFCv2 00/25] PAPR HPT resizing, guest side & host side preliminaries

2016-03-07 Thread David Gibson

This is an unfinished implementation of the kernel parts of the PAPR
hashed page table (HPT) resizing extension.

It contains a complete guest-side implementation - or as complete as
it can be until we have a final PAPR change.

It also contains a host side implementation for KVM HV (the KVM PR and
TCG host-side implementations live in qemu).  This is "complete" in
the sense that there's no specific piece I know still needs to be
done, but is still a fair way from actually working, with both guest
and host crashes commonplaces during and/or after an attempted resize.

I'm continuing to debug this, obviously, but any review I can get on
the basic approach would be helpful.  With the various failure and
cancellation paths the synchronization is rather hairier than I'd
like.

David Gibson (25):
  powerpc/mm: Clean up error handling for htab_remove_mapping
  powerpc/mm: Handle removing maybe-present bolted HPTEs
  powerpc/mm: Clean up memory hotplug failure paths
  powerpc/mm: Split hash page table sizing heuristic into a helper
  pseries: Add hypercall wrappers for hash page table resizing
  pseries: Add support for hash table resizing
  pseries: Advertise HPT resizing support via CAS
  pseries: Automatically resize HPT for memory hot add/remove
  powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB
  powerpc/kvm: Add capability flag for hashed page table resizing
  powerpc/kvm: Rename kvm_alloc_hpt() for clarity
  powerpc/kvm: Gather HPT related variables into sub-structure
  powerpc/kvm: Don't store values derivable from HPT order
  powerpc/kvm: Split HPT allocation from activation
  powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  powerpc/kvm: HPT resizing stub implementation
  powerpc/kvm: Advertise availablity of HPT resizing on KVM HV
  powerpc/kvm: Outline of HPT resizing implementation
  powerpc/kbm: Allocations for HPT resizing
  powerpc/kvm: Make MMU notifier handlers more flexible
  powerpc/kvm: Make MMU notifiers HPT resize aware
  powerpc/kvm: Exclude HPT resizes when collecting the dirty log
  powerpc/kvm: Rehashing for HPT resizing
  powerpc/kvm: HPT resize pivot
  powerpc/kvm: Harvest RC bits from old HPT after HPT resize

 arch/powerpc/include/asm/firmware.h   |   5 +-
 arch/powerpc/include/asm/hvcall.h |   2 +
 arch/powerpc/include/asm/kvm_book3s.h |  12 +-
 arch/powerpc/include/asm/kvm_book3s_64.h  |  15 +
 arch/powerpc/include/asm/kvm_host.h   |  19 +-
 arch/powerpc/include/asm/kvm_ppc.h|  11 +-
 arch/powerpc/include/asm/machdep.h|   3 +-
 arch/powerpc/include/asm/mmu-hash64.h |   3 +
 arch/powerpc/include/asm/plpar_wrappers.h |  12 +
 arch/powerpc/include/asm/prom.h   |   1 +
 arch/powerpc/include/asm/sparsemem.h  |   1 +
 arch/powerpc/kernel/prom_init.c   |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 836 ++
 arch/powerpc/kvm/book3s_hv.c  |  39 +-
 arch/powerpc/kvm/book3s_hv_builtin.c  |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |  68 +--
 arch/powerpc/kvm/powerpc.c|  17 +-
 arch/powerpc/mm/hash_utils_64.c   | 128 -
 arch/powerpc/mm/init_64.c |  47 +-
 arch/powerpc/mm/mem.c |  14 +-
 arch/powerpc/platforms/pseries/firmware.c |   1 +
 arch/powerpc/platforms/pseries/lpar.c | 119 -
 include/uapi/linux/kvm.h  |   1 +
 23 files changed, 1151 insertions(+), 213 deletions(-)

-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 04/25] powerpc/mm: Split hash page table sizing heuristic into a helper

2016-03-07 Thread David Gibson

htab_get_table_size() either retrieve the size of the hash page table (HPT)
from the device tree - if the HPT size is determined by firmware - or
uses a heuristic to determine a good size based on RAM size if the kernel
is responsible for allocating the HPT.

To support a PAPR extension allowing resizing of the HPT, we're going to
want the memory size -> HPT size logic elsewhere, so split it out into a
helper function.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/mmu-hash64.h |  3 +++
 arch/powerpc/mm/hash_utils_64.c   | 32 +++-
 2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/mmu-hash64.h
index 7352d3f..cf070fd 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -607,6 +607,9 @@ static inline unsigned long get_kernel_vsid(unsigned long 
ea, int ssize)
context = (MAX_USER_CONTEXT) + ((ea >> 60) - 0xc) + 1;
return get_vsid(context, ea, ssize);
 }
+
+unsigned htab_shift_for_mem_size(unsigned long mem_size);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_MMU_HASH64_H_ */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index fdcf9d1..da5d279 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -611,10 +611,26 @@ static int __init htab_dt_scan_pftsize(unsigned long node,
return 0;
 }
 
-static unsigned long __init htab_get_table_size(void)
+unsigned htab_shift_for_mem_size(unsigned long mem_size)
 {
-   unsigned long mem_size, rnd_mem_size, pteg_count, psize;
+   unsigned memshift = __ilog2(mem_size);
+   unsigned pshift = mmu_psize_defs[mmu_virtual_psize].shift;
+   unsigned pteg_shift;
+
+   /* round mem_size up to next power of 2 */
+   if ((1UL << memshift) < mem_size)
+   memshift += 1;
+
+   /* aim for 2 pages / pteg */
+   pteg_shift = memshift - (pshift + 1);
+
+   /* 2^11 PTEGS / 2^18 bytes is the minimum htab size permitted
+* by the architecture */
+   return max(pteg_shift + 7, 18U);
+}
 
+static unsigned long __init htab_get_table_size(void)
+{
/* If hash size isn't already provided by the platform, we try to
 * retrieve it from the device-tree. If it's not there neither, we
 * calculate it now based on the total RAM size
@@ -624,17 +640,7 @@ static unsigned long __init htab_get_table_size(void)
if (ppc64_pft_size)
return 1UL << ppc64_pft_size;
 
-   /* round mem_size up to next power of 2 */
-   mem_size = memblock_phys_mem_size();
-   rnd_mem_size = 1UL << __ilog2(mem_size);
-   if (rnd_mem_size < mem_size)
-   rnd_mem_size <<= 1;
-
-   /* # pages / 2 */
-   psize = mmu_psize_defs[mmu_virtual_psize].shift;
-   pteg_count = max(rnd_mem_size >> (psize + 1), 1UL << 11);
-
-   return pteg_count << 7;
+   return 1UL << htab_shift_for_mem_size(memblock_phys_mem_size());
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 08/25] pseries: Automatically resize HPT for memory hot add/remove

2016-03-07 Thread David Gibson

We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.

This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed.  This tries to always keep the HPT at
a reasonable size for our current memory size.

Signed-off-by: David Gibson 
Reviewed-by: Paul Mackerras 
---
 arch/powerpc/include/asm/sparsemem.h |  1 +
 arch/powerpc/mm/hash_utils_64.c  | 29 +
 arch/powerpc/mm/mem.c|  4 
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index f6fc0ee..737335c 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -16,6 +16,7 @@
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+extern void resize_hpt_for_hotplug(unsigned long new_mem_size);
 extern int create_section_mapping(unsigned long start, unsigned long end);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 #ifdef CONFIG_NUMA
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0809bea..6fbc27a 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -645,6 +645,35 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+void resize_hpt_for_hotplug(unsigned long new_mem_size)
+{
+   unsigned target_hpt_shift;
+
+   if (!ppc_md.resize_hpt)
+   return;
+
+   target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
+
+   /*
+* To avoid lots of HPT resizes if memory size is fluctuating
+* across a boundary, we deliberately have some hysterisis
+* here: we immediately increase the HPT size if the target
+* shift exceeds the current shift, but we won't attempt to
+* reduce unless the target shift is at least 2 below the
+* current shift
+*/
+   if ((target_hpt_shift > ppc64_pft_size)
+   || (target_hpt_shift < (ppc64_pft_size - 1))) {
+   int rc;
+
+   rc = ppc_md.resize_hpt(target_hpt_shift);
+   if (rc)
+   printk(KERN_WARNING
+  "Unable to resize hash page table to target 
order %d: %d\n",
+  target_hpt_shift, rc);
+   }
+}
+
 int create_section_mapping(unsigned long start, unsigned long end)
 {
int rc = htab_bolt_mapping(start, end, __pa(start),
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index f980da6..4938ee7 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -121,6 +121,8 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
for_device)
unsigned long nr_pages = size >> PAGE_SHIFT;
int rc;
 
+   resize_hpt_for_hotplug(memblock_phys_mem_size());
+
pgdata = NODE_DATA(nid);
 
start = (unsigned long)__va(start);
@@ -161,6 +163,8 @@ int arch_remove_memory(u64 start, u64 size)
 */
vm_unmap_aliases();
 
+   resize_hpt_for_hotplug(memblock_phys_mem_size());
+
return ret;
 }
 #endif
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 07/25] pseries: Advertise HPT resizing support via CAS

2016-03-07 Thread David Gibson

The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.

If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added.  Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.

This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface.  Obviously, the actual
encoding in the CAS vector is tentative until the extension is officially
incorporated into PAPR.  For now we use bit 0 of (previously unused) byte 8
of option vector 5.

Signed-off-by: David Gibson 
Reviewed-by: Anshuman Khandual 
Reviewed-by: Paul Mackerras 
---
 arch/powerpc/include/asm/prom.h | 1 +
 arch/powerpc/kernel/prom_init.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 7f436ba..ef08208 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -151,6 +151,7 @@ struct of_drconf_cell {
 #define OV5_XCMO   0x0440  /* Page Coalescing */
 #define OV5_TYPE1_AFFINITY 0x0580  /* Type 1 NUMA affinity */
 #define OV5_PRRN   0x0540  /* Platform Resource Reassignment */
+#define OV5_HPT_RESIZE 0x0880  /* Hash Page Table resizing */
 #define OV5_PFO_HW_RNG 0x0E80  /* PFO Random Number Generator */
 #define OV5_PFO_HW_842 0x0E40  /* PFO Compression Accelerator */
 #define OV5_PFO_HW_ENCR0x0E20  /* PFO Encryption Accelerator */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index da51925..c6feafb 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -713,7 +713,7 @@ unsigned char ibm_architecture_vec[] = {
OV5_FEAT(OV5_TYPE1_AFFINITY) | OV5_FEAT(OV5_PRRN),
0,
0,
-   0,
+   OV5_FEAT(OV5_HPT_RESIZE),
/* WARNING: The offset of the "number of cores" field below
 * must match by the macro below. Update the definition if
 * the structure layout changes.
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 05/25] pseries: Add hypercall wrappers for hash page table resizing

2016-03-07 Thread David Gibson

This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.

These are experimental "platform specific" values for now, until we have a
formal PAPR update.

It also adds a new firmware feature flag to track the presence of the
HPT resizing calls.

Signed-off-by: David Gibson 
Reviewed-by: Paul Mackerras 
---
 arch/powerpc/include/asm/firmware.h   |  5 +++--
 arch/powerpc/include/asm/hvcall.h |  2 ++
 arch/powerpc/include/asm/plpar_wrappers.h | 12 
 arch/powerpc/platforms/pseries/firmware.c |  1 +
 4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index b062924..32435d2 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -42,7 +42,7 @@
 #define FW_FEATURE_SPLPAR  ASM_CONST(0x0010)
 #define FW_FEATURE_LPARASM_CONST(0x0040)
 #define FW_FEATURE_PS3_LV1 ASM_CONST(0x0080)
-/* FreeASM_CONST(0x0100) */
+#define FW_FEATURE_HPT_RESIZE  ASM_CONST(0x0100)
 #define FW_FEATURE_CMO ASM_CONST(0x0200)
 #define FW_FEATURE_VPHNASM_CONST(0x0400)
 #define FW_FEATURE_XCMOASM_CONST(0x0800)
@@ -66,7 +66,8 @@ enum {
FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
-   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
+   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
+   FW_FEATURE_HPT_RESIZE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b54dd..195e080 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -293,6 +293,8 @@
 
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS 0xf000
+#define H_RESIZE_HPT_PREPARE   0xf003
+#define H_RESIZE_HPT_COMMIT0xf004
 
 /* "Platform specific hcalls", provided by PHYP */
 #define H_GET_24X7_CATALOG_PAGE0xF078
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 1b39424..b7ee6d9 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -242,6 +242,18 @@ static inline long plpar_pte_protect(unsigned long flags, 
unsigned long ptex,
return plpar_hcall_norets(H_PROTECT, flags, ptex, avpn);
 }
 
+static inline long plpar_resize_hpt_prepare(unsigned long flags,
+   unsigned long shift)
+{
+   return plpar_hcall_norets(H_RESIZE_HPT_PREPARE, flags, shift);
+}
+
+static inline long plpar_resize_hpt_commit(unsigned long flags,
+  unsigned long shift)
+{
+   return plpar_hcall_norets(H_RESIZE_HPT_COMMIT, flags, shift);
+}
+
 static inline long plpar_tce_get(unsigned long liobn, unsigned long ioba,
unsigned long *tce_ret)
 {
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 8c80588..7b287be 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -63,6 +63,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_VPHN,   "hcall-vphn"},
{FW_FEATURE_SET_MODE,   "hcall-set-mode"},
{FW_FEATURE_BEST_ENERGY,"hcall-best-energy-1*"},
+   {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 09/25] powerpc/kvm: Corectly report KVM_CAP_PPC_ALLOC_HTAB

2016-03-07 Thread David Gibson

At present KVM on powerpc always reports KVM_CAP_PPC_ALLOC_HTAB as enabled.
However, the ioctl() it advertises (KVM_PPC_ALLOCATE_HTAB) only actually
works on KVM HV.  On KVM PR it will fail with ENOTTY.

qemu already has a workaround for this, so it's not breaking things in
practice, but it would be better to advertise this correctly.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/powerpc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a3b182d..2f21ab7 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -509,7 +509,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 
 #ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
-   case KVM_CAP_PPC_ALLOC_HTAB:
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
case KVM_CAP_PPC_ENABLE_HCALL:
@@ -518,6 +517,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #endif
r = 1;
break;
+
+   case KVM_CAP_PPC_ALLOC_HTAB:
+   r = hv_enabled;
+   break;
 #endif /* CONFIG_PPC_BOOK3S_64 */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_SMT:
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 06/25] pseries: Add support for hash table resizing

2016-03-07 Thread David Gibson

This adds support for using experimental hypercalls to change the size
of the main hash page table while running as a PAPR guest.  For now these
hypercalls are only in experimental qemu versions.

The interface is two part: first H_RESIZE_HPT_PREPARE is used to allocate
and prepare the new hash table.  This may be slow, but can be done
asynchronously.  Then, H_RESIZE_HPT_COMMIT is used to switch to the new
hash table.  This requires that no CPUs be concurrently updating the HPT,
and so must be run under stop_machine().

This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.

Signed-off-by: David Gibson 
Reviewed-by: Paul Mackerras 
---
 arch/powerpc/include/asm/machdep.h|   1 +
 arch/powerpc/mm/hash_utils_64.c   |  28 +
 arch/powerpc/platforms/pseries/lpar.c | 110 ++
 3 files changed, 139 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index fa25643..1e23898 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -61,6 +61,7 @@ struct machdep_calls {
   unsigned long addr,
   unsigned char *hpte_slot_array,
   int psize, int ssize, int local);
+   int (*resize_hpt)(unsigned long shift);
/*
 * Special for kexec.
 * To be called in real mode with interrupts disabled. No locks are
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index da5d279..0809bea 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1585,3 +1586,30 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
/* Finally limit subsequent allocations */
memblock_set_current_limit(ppc64_rma_size);
 }
+
+static int ppc64_pft_size_get(void *data, u64 *val)
+{
+   *val = ppc64_pft_size;
+   return 0;
+}
+
+static int ppc64_pft_size_set(void *data, u64 val)
+{
+   if (!ppc_md.resize_hpt)
+   return -ENODEV;
+   return ppc_md.resize_hpt(val);
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_ppc64_pft_size,
+   ppc64_pft_size_get, ppc64_pft_size_set, "%llu\n");
+
+static int __init hash64_debugfs(void)
+{
+   if (!debugfs_create_file("pft-size", 0600, powerpc_debugfs_root,
+NULL, &fops_ppc64_pft_size)) {
+   pr_err("lpar: unable to create ppc64_pft_size debugsfs file\n");
+   }
+
+   return 0;
+}
+machine_device_initcall(pseries, hash64_debugfs);
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 2415a0d..ed9738d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -603,6 +605,113 @@ static int __init disable_bulk_remove(char *str)
 
 __setup("bulk_remove=", disable_bulk_remove);
 
+#define HPT_RESIZE_TIMEOUT 1 /* ms */
+
+struct hpt_resize_state {
+   unsigned long shift;
+   int commit_rc;
+};
+
+static int pseries_lpar_resize_hpt_commit(void *data)
+{
+   struct hpt_resize_state *state = data;
+
+   state->commit_rc = plpar_resize_hpt_commit(0, state->shift);
+   if (state->commit_rc != H_SUCCESS)
+   return -EIO;
+
+   /* Hypervisor has transitioned the HTAB, update our globals */
+   ppc64_pft_size = state->shift;
+   htab_size_bytes = 1UL << ppc64_pft_size;
+   htab_hash_mask = (htab_size_bytes >> 7) - 1;
+
+   return 0;
+}
+
+/* Must be called in user context */
+static int pseries_lpar_resize_hpt(unsigned long shift)
+{
+   struct hpt_resize_state state = {
+   .shift = shift,
+   .commit_rc = H_FUNCTION,
+   };
+   unsigned int delay, total_delay = 0;
+   int rc;
+   ktime_t t0, t1, t2;
+
+   might_sleep();
+
+   if (!firmware_has_feature(FW_FEATURE_HPT_RESIZE))
+   return -ENODEV;
+
+   printk(KERN_INFO "lpar: Attempting to resize HPT to shift %lu\n",
+  shift);
+
+   t0 = ktime_get();
+
+   rc = plpar_resize_hpt_prepare(0, shift);
+   while (H_IS_LONG_BUSY(rc)) {
+   delay = get_longbusy_msecs(rc);
+   total_delay += delay;
+   if (total_delay > HPT_RESIZE_TIMEOUT) {
+   /* prepare call with shift==0 cancels an
+* in-progress resize */
+   rc = plpar_resize_hpt_prepare(0, 0);
+   if (rc != H_SUCCESS)
+   printk(KERN_WARNING
+  "lpar: Unexpected error %d cancelling 
timed out HPT resize\n",
+

[RFCv2 10/25] powerpc/kvm: Add capability flag for hashed page table resizing

2016-03-07 Thread David Gibson

This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime.

At present, HPT resizing is possible with KVM PR without kernel
modification, since the HPT is managed within qemu.  It's not possible yet
with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
use other capabilities which (by accident) reveal whether PR or HV is in
use to know if it can advertise HPT resizing capability to the guest.

To avoid ambiguity with existing kernels, the encoding is a bit odd.
0 means "unknown" since that's what previous kernels will return
1 means "HPT resize possible if available if and only if the HPT is 
allocated in
  userspace, rather than in the kernel".  In practice this is the same
  test as userspace already uses, but this makes it explicit.
2 will mean "HPT resize available and implemented in-kernel"

For now we always return 1, but the intention is to return 2 once HPT
resize is implemented for KVM HV.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/powerpc.c | 3 +++
 include/uapi/linux/kvm.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2f21ab7..a4250f1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -572,6 +572,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PPC_GET_SMMU_INFO:
r = 1;
break;
+   case KVM_CAP_SPAPR_RESIZE_HPT:
+   r = 1; /* resize allowed only if HPT is outside kernel */
+   break;
 #endif
default:
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9da9051..7e7e0e3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -850,6 +850,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
 #define KVM_CAP_HYPERV_SYNIC 123
 #define KVM_CAP_S390_RI 124
+#define KVM_CAP_SPAPR_RESIZE_HPT 125
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 13/25] powerpc/kvm: Don't store values derivable from HPT order

2016-03-07 Thread David Gibson

Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size.  The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 12 
 arch/powerpc/include/asm/kvm_host.h  |  2 --
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 28 +---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 18 +-
 4 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2aa79c8..75b2dee 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -437,6 +437,18 @@ extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
 extern void kvmhv_rm_send_ipi(int cpu);
 
+static inline unsigned long kvmppc_hpt_npte(struct kvm_hpt_info *hpt)
+{
+   /* HPTEs are 2**4 bytes long */
+   return 1UL << (hpt->order - 4);
+}
+
+static inline unsigned long kvmppc_hpt_mask(struct kvm_hpt_info *hpt)
+{
+   /* 128 (2**7) bytes in each HPTEG */
+   return (1UL << (hpt->order - 7)) - 1;
+}
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index c32413a..718dc56 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -226,8 +226,6 @@ struct kvm_arch_memory_slot {
 struct kvm_hpt_info {
unsigned long virt;
struct revmap_entry *rev;
-   unsigned long npte;
-   unsigned long mask;
u32 order;
int cma;
 };
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2ba9d99..679c292 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -83,13 +83,9 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
kvm->arch.hpt.virt = hpt;
kvm->arch.hpt.order = order;
-   /* HPTEs are 2**4 bytes long */
-   kvm->arch.hpt.npte = 1ul << (order - 4);
-   /* 128 (2**7) bytes in each HPTEG */
-   kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
 
/* Allocate reverse map array */
-   rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
+   rev = vmalloc(sizeof(struct revmap_entry) * 
kvmppc_hpt_npte(&kvm->arch.hpt));
if (!rev) {
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
goto out_freehpt;
@@ -192,8 +188,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
if (npages > 1ul << (40 - porder))
npages = 1ul << (40 - porder);
/* Can't use more than 1 HPTE per HPTEG */
-   if (npages > kvm->arch.hpt.mask + 1)
-   npages = kvm->arch.hpt.mask + 1;
+   if (npages > kvmppc_hpt_mask(&kvm->arch.hpt) + 1)
+   npages = kvmppc_hpt_mask(&kvm->arch.hpt) + 1;
 
hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
@@ -203,7 +199,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
for (i = 0; i < npages; ++i) {
addr = i << porder;
/* can't use hpt_hash since va > 64 bits */
-   hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & 
kvm->arch.hpt.mask;
+   hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
+   & kvmppc_hpt_mask(&kvm->arch.hpt);
/*
 * We assume that the hash table is empty and no
 * vcpus are using it at this stage.  Since we create
@@ -1268,7 +1265,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
__user *buf,
 
/* Skip uninteresting entries, i.e. clean on not-first pass */
if (!first_pass) {
-   while (i < kvm->arch.hpt.npte &&
+   while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
   !hpte_dirty(revp, hptp)) {
++i;
hptp += 2;
@@ -1278,7 +1275,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
__user *buf,
hdr.index = i;
 
/* Grab a series of valid entries */
-   while (i < kvm->arch.hpt.npte &&
+   while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
   hdr.n_valid < 0x &&
   nb + HPTE_SIZE < count &&
   record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
@@ -1294,7 +1291,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
__user *buf,
++revp;
}
/* Now skip invalid entries while we can */
-   while

[RFCv2 12/25] powerpc/kvm: Gather HPT related variables into sub-structure

2016-03-07 Thread David Gibson

Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.

Signed-off-by: David Gibson 

# Conflicts:
#   arch/powerpc/kvm/book3s_64_mmu_hv.c
---
 arch/powerpc/include/asm/kvm_host.h | 16 ---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 ++---
 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 -
 4 files changed, 87 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 9d08d8c..c32413a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -223,11 +223,19 @@ struct kvm_arch_memory_slot {
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 };
 
+struct kvm_hpt_info {
+   unsigned long virt;
+   struct revmap_entry *rev;
+   unsigned long npte;
+   unsigned long mask;
+   u32 order;
+   int cma;
+};
+
 struct kvm_arch {
unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   unsigned long hpt_virt;
-   struct revmap_entry *revmap;
+   struct kvm_hpt_info hpt;
unsigned int host_lpid;
unsigned long host_lpcr;
unsigned long sdr1;
@@ -236,14 +244,10 @@ struct kvm_arch {
unsigned long lpcr;
unsigned long vrma_slb_v;
int hpte_setup_done;
-   u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
-   unsigned long hpt_npte;
-   unsigned long hpt_mask;
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
-   int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 157285b0..2ba9d99 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -61,12 +61,12 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
order = PPC_MIN_HPT_ORDER;
}
 
-   kvm->arch.hpt_cma_alloc = 0;
+   kvm->arch.hpt.cma = 0;
page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
-   kvm->arch.hpt_cma_alloc = 1;
+   kvm->arch.hpt.cma = 1;
}
 
/* Lastly try successively smaller sizes from the page allocator */
@@ -81,20 +81,20 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
if (!hpt)
return -ENOMEM;
 
-   kvm->arch.hpt_virt = hpt;
-   kvm->arch.hpt_order = order;
+   kvm->arch.hpt.virt = hpt;
+   kvm->arch.hpt.order = order;
/* HPTEs are 2**4 bytes long */
-   kvm->arch.hpt_npte = 1ul << (order - 4);
+   kvm->arch.hpt.npte = 1ul << (order - 4);
/* 128 (2**7) bytes in each HPTEG */
-   kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
+   kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
 
/* Allocate reverse map array */
-   rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
+   rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
if (!rev) {
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
goto out_freehpt;
}
-   kvm->arch.revmap = rev;
+   kvm->arch.hpt.rev = rev;
kvm->arch.sdr1 = __pa(hpt) | (order - 18);
 
pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
@@ -105,7 +105,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
return 0;
 
  out_freehpt:
-   if (kvm->arch.hpt_cma_alloc)
+   if (kvm->arch.hpt.cma)
kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
else
free_pages(hpt, order - PAGE_SHIFT);
@@ -127,10 +127,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
goto out;
}
}
-   if (kvm->arch.hpt_virt) {
-   order = kvm->arch.hpt_order;
+   if (kvm->arch.hpt.virt) {
+   order = kvm->arch.hpt.order;
/* Set the entire HPT to 0, i.e. invalid HPTEs */
-   memset((void *)kvm->arch.hpt_virt, 0, 1ul << order);
+   memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
/*
 * Reset all the reverse-mapping chains for all memslots
 */
@@ -151,13 +151,13 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
 void kvmppc_free_hpt(struct kvm *kvm)
 {
kvmppc_free_lpid(kvm->arch.lpid);
-   vfree(kvm->arch.revmap);
-   if (kvm->arch.hpt_cma_alloc)
-

[RFCv2 14/25] powerpc/kvm: Split HPT allocation from activation

2016-03-07 Thread David Gibson

Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM.  For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.

So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function.  Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.

We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation faile, we would
fall back to attempting smaller sizes with the page allocator.  Now, we
try first CMA, then the page allocator at each size.  As far as I can tell
this change should be harmless.

To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
call to kvmppc_free_lpid() that was there, we move to the single caller.

Signed-off-by: David Gibson 

# Conflicts:
#   arch/powerpc/kvm/book3s_64_mmu_hv.c
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  3 ++
 arch/powerpc/include/asm/kvm_ppc.h   |  5 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 89 
 arch/powerpc/kvm/book3s_hv.c | 18 +--
 4 files changed, 65 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 75b2dee..f1b832c 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -20,6 +20,9 @@
 #ifndef __ASM_KVM_BOOK3S_64_H__
 #define __ASM_KVM_BOOK3S_64_H__
 
+/* Power architecture requires HPT is at least 256kB */
+#define PPC_MIN_HPT_ORDER  18
+
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu 
*vcpu)
 {
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index f25947a..f77d0a0 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
 extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
 extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
 
-extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
+extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
 extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
-extern void kvmppc_free_hpt(struct kvm *kvm);
+extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
 extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 679c292..eb1aa3a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,74 +40,69 @@
 
 #include "trace_hv.h"
 
-/* Power architecture requires HPT is at least 256kB */
-#define PPC_MIN_HPT_ORDER  18
-
 static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh,
unsigned long ptel, unsigned long *pte_idx_ret);
 static void kvmppc_rmap_reset(struct kvm *kvm);
 
-long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
+int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
 {
-   unsigned long hpt = 0;
-   struct revmap_entry *rev;
+   unsigned long hpt;
+   int cma;
struct page *page = NULL;
-   long order = KVM_DEFAULT_HPT_ORDER;
-
-   if (htab_orderp) {
-   order = *htab_orderp;
-   if (order < PPC_MIN_HPT_ORDER)
-   order = PPC_MIN_HPT_ORDER;
-   }
+   struct revmap_entry *rev;
+   unsigned long npte;
 
-   kvm->arch.hpt.cma = 0;
+   hpt = 0;
+   cma = 0;
page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
-   kvm->arch.hpt.cma = 1;
+   cma = 1;
}
 
-   /* Lastly try successively smaller sizes from the page allocator */
-   /* Only do this if userspace didn't specify a size via ioctl */
-   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
-   hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
-  __GFP_NOWARN, order - PAGE_SHIFT);
-   if (!hpt)
-   --order;
-   }
+   if (!hpt)
+   hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|

[RFCv2 11/25] powerpc/kvm: Rename kvm_alloc_hpt() for clarity

2016-03-07 Thread David Gibson

The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and clals kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 8 
 arch/powerpc/kvm/book3s_hv_builtin.c | 8 
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2241d53..f25947a 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -170,8 +170,8 @@ extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, 
unsigned long liobn,
 unsigned long ioba, unsigned long tce);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba);
-extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
-extern void kvm_release_hpt(struct page *page, unsigned long nr_pages);
+extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages);
+extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
 extern int kvmppc_core_init_vm(struct kvm *kvm);
 extern void kvmppc_core_destroy_vm(struct kvm *kvm);
 extern void kvmppc_core_free_memslot(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fb37290..157285b0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
kvm->arch.hpt_cma_alloc = 0;
-   page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT));
+   page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
@@ -106,7 +106,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
  out_freehpt:
if (kvm->arch.hpt_cma_alloc)
-   kvm_release_hpt(page, 1 << (order - PAGE_SHIFT));
+   kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
else
free_pages(hpt, order - PAGE_SHIFT);
return -ENOMEM;
@@ -153,8 +153,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
kvmppc_free_lpid(kvm->arch.lpid);
vfree(kvm->arch.revmap);
if (kvm->arch.hpt_cma_alloc)
-   kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
-   1 << (kvm->arch.hpt_order - PAGE_SHIFT));
+   kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
+1 << (kvm->arch.hpt_order - PAGE_SHIFT));
else
free_pages(kvm->arch.hpt_virt,
   kvm->arch.hpt_order - PAGE_SHIFT);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index fd7006b..bcc00b7 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -49,19 +49,19 @@ static int __init early_parse_kvm_cma_resv(char *p)
 }
 early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv);
 
-struct page *kvm_alloc_hpt(unsigned long nr_pages)
+struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
 {
VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 
return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
 }
-EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
+EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
 
-void kvm_release_hpt(struct page *page, unsigned long nr_pages)
+void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages)
 {
cma_release(kvm_cma, page, nr_pages);
 }
-EXPORT_SYMBOL_GPL(kvm_release_hpt);
+EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
 
 /**
  * kvm_cma_reserve() - reserve area for kvm hash pagetable
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 17/25] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV

2016-03-07 Thread David Gibson

This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the
presence of in-kernel HPT resizing on KVM HV.  In fact the HPT resizing
isn't fully implemented, but this allows us to experiment with what's
there.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/powerpc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index eeda4a8..2314059 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -573,7 +573,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
case KVM_CAP_SPAPR_RESIZE_HPT:
-   r = 1; /* resize allowed only if HPT is outside kernel */
+   if (hv_enabled)
+   r = 2; /* In-kernel resize implementation */
+   else
+   r = 1; /* outside kernel resize allowed */
break;
 #endif
default:
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 16/25] powerpc/kvm: HPT resizing stub implementation

2016-03-07 Thread David Gibson

This patch adds a stub (always failing) implementation of the hypercalls
for the HPT resizing PAPR extension.

For now we include a hack which makes it safe for qemu to call ENABLE_HCALL
on these hypercalls, although it will have no effect.  That should go away
once the PAPR change is formalized and we can use "real" hcall numbers.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_book3s.h |  6 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 19 +++
 arch/powerpc/kvm/book3s_hv.c  |  8 
 arch/powerpc/kvm/powerpc.c|  6 ++
 4 files changed, 39 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 8f39796..81f2b77 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -191,6 +191,12 @@ extern void kvmppc_copy_to_svcpu(struct 
kvmppc_book3s_shadow_vcpu *svcpu,
 struct kvm_vcpu *vcpu);
 extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
   struct kvmppc_book3s_shadow_vcpu *svcpu);
+extern unsigned long do_h_resize_hpt_prepare(struct kvm_vcpu *vcpu,
+unsigned long flags,
+unsigned long shift);
+extern unsigned long do_h_resize_hpt_commit(struct kvm_vcpu *vcpu,
+   unsigned long flags,
+   unsigned long shift);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 4547b6e..b92384f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1118,6 +1118,25 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, 
unsigned long gpa,
 }
 
 /*
+ * HPT resizing
+ */
+
+unsigned long do_h_resize_hpt_prepare(struct kvm_vcpu *vcpu,
+ unsigned long flags,
+ unsigned long shift)
+{
+   return H_HARDWARE;
+}
+
+unsigned long do_h_resize_hpt_commit(struct kvm_vcpu *vcpu,
+unsigned long flags,
+unsigned long shift)
+{
+   return H_HARDWARE;
+}
+
+
+/*
  * Functions for reading and writing the hash table via reads and
  * writes on a file descriptor.
  *
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a2730ca..5a451f8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -726,6 +726,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
kvmppc_get_gpr(vcpu, 5),
kvmppc_get_gpr(vcpu, 6));
break;
+   case H_RESIZE_HPT_PREPARE:
+   ret = do_h_resize_hpt_prepare(vcpu, kvmppc_get_gpr(vcpu, 4),
+ kvmppc_get_gpr(vcpu, 5));
+   break;
+   case H_RESIZE_HPT_COMMIT:
+   ret = do_h_resize_hpt_commit(vcpu, kvmppc_get_gpr(vcpu, 4),
+kvmppc_get_gpr(vcpu, 5));
+   break;
case H_RTAS:
if (list_empty(&vcpu->kvm->arch.rtas_tokens))
return RESUME_HOST;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a4250f1..eeda4a8 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1287,6 +1287,12 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
unsigned long hcall = cap->args[0];
 
r = -EINVAL;
+   /* Hack: until we have proper hcall numbers allocated */
+   if ((hcall == H_RESIZE_HPT_PREPARE)
+   || (hcall == H_RESIZE_HPT_COMMIT)) {
+   r = 0;
+   break;
+   }
if (hcall > MAX_HCALL_OPCODE || (hcall & 3) ||
cap->args[1] > 1)
break;
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 22/25] powerpc/kvm: Exclude HPT resizes when collecting the dirty log

2016-03-07 Thread David Gibson

While there is an active HPT resize in progress, working out which guest
pages are dirty is rather more complicated, because depending on exactly
the phase of the resize the information could be in either the current,
tentative or previous HPT or reverse map of the guest.

To avoid this problem, for now we just exclude collecting the dirty map
while a resize is in progress, blocking the dirty map operation until the
resize is complete.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 5b84347..c4c1814 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1128,6 +1128,7 @@ long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
unsigned long *rmapp;
struct kvm_vcpu *vcpu;
 
+   mutex_lock(&kvm->arch.resize_hpt_mutex); /* exclude a concurrent HPT 
resize */
preempt_disable();
rmapp = memslot->arch.rmap;
for (i = 0; i < memslot->npages; ++i) {
@@ -1152,6 +1153,7 @@ long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
spin_unlock(&vcpu->arch.vpa_update_lock);
}
preempt_enable();
+   mutex_unlock(&kvm->arch.resize_hpt_mutex);
return 0;
 }
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 24/25] powerpc/kvm: HPT resize pivot

2016-03-07 Thread David Gibson

This implements the code for HPT resizing to actually pivot from the
currently active HPT to the new HPT, which has previously been populated
by rehashing entries from the old HPT.

This only occurs while the guest is executing the H_RESIZE_HPT_COMMIT
hypercall, handling synchronization with the guest.  On the host side this
is executed under the kvm->mmu_lock to prevent races with host side MMU
notifiers.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d06aef6..45430fe 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1416,9 +1416,45 @@ static int resize_hpt_rehash(struct kvm_resize_hpt 
*resize)
return H_SUCCESS;
 }
 
+static void resize_hpt_pivot_cpu(void *opaque)
+{
+   /* Nothing to do, just force a KVM exit */
+}
+
 static void resize_hpt_pivot(struct kvm_resize_hpt *resize,
 struct kvm_memslots *slots)
 {
+   struct kvm *kvm = resize->kvm;
+   struct kvm_memory_slot *memslot;
+   struct kvm_hpt_info hpt_tmp;
+
+   /* Exchange the pending tables in the resize structure with
+* the active tables */
+
+   resize_hpt_debug(resize, "PIVOT!\n");
+
+   kvm_for_each_memslot(memslot, slots) {
+   unsigned long *tmp;
+
+   tmp = memslot->arch.rmap;
+   memslot->arch.rmap = resize->rmap[memslot->id];
+   resize->rmap[memslot->id] = tmp;
+   }
+
+   hpt_tmp = kvm->arch.hpt;
+   kvmppc_set_hpt(kvm, &resize->hpt);
+   resize->hpt = hpt_tmp;
+
+   spin_unlock(&kvm->mmu_lock);
+
+   synchronize_srcu_expedited(&kvm->srcu);
+
+   /* Force an exit on every vcpu, to make sure the real SDR1
+* gets updated */
+
+   on_each_cpu(resize_hpt_pivot_cpu, NULL, 1);
+
+   spin_lock(&kvm->mmu_lock);
 }
 
 static void resize_hpt_flush_rmaps(struct kvm_resize_hpt *resize,
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 21/25] powerpc/kvm: Make MMU notifiers HPT resize aware

2016-03-07 Thread David Gibson

While an HPT resize operation is in progress, specifically when a tentative
HPT has been allocated and we are possibly in the middle of populating it
various host side MM events need to be reflected in the tentative resized
HPT as well as the currently active one.

This extends the powerpc KVM MMU notifiers to act on both the active and
tentative HPTs (and reverse maps) when there is an active resize in
progress.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index db070ad..5b84347 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -97,6 +97,17 @@ static void resize_hpt_set_state(struct kvm_resize_hpt 
*resize,
wake_up_all(&kvm->arch.resize_hpt_wq);
 }
 
+static struct kvm_resize_hpt *kvm_active_resize_hpt(struct kvm *kvm)
+{
+   struct kvm_resize_hpt *resize = kvm->arch.resize_hpt;
+
+   if (resize && (resize->state & RESIZE_HPT_PREPARED)
+   && !(resize->state & RESIZE_HPT_FAILED))
+   return resize;
+
+   return NULL;
+}
+
 static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh,
unsigned long ptel, unsigned long *pte_idx_ret);
@@ -755,6 +766,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
int retval = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
+   struct kvm_resize_hpt *resize = kvm_active_resize_hpt(kvm);
 
slots = kvm_memslots(kvm);
kvm_for_each_memslot(memslot, slots) {
@@ -776,6 +788,10 @@ static int kvm_handle_hva_range(struct kvm *kvm,
retval |= kvm_handle_hva_range_slot(kvm, &kvm->arch.hpt,
memslot, memslot->arch.rmap,
gfn, gfn_end, handler);
+   if (resize)
+   retval |= kvm_handle_hva_range_slot(kvm, &resize->hpt,
+   memslot, resize->rmap[memslot->id],
+   gfn, gfn_end, handler);
}
 
return retval;
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 20/25] powerpc/kvm: Make MMU notifier handlers more flexible

2016-03-07 Thread David Gibson

KVM on powerpc uses several MMU notifiers to update guest page tables and
reverse mappings based on host MM events.  At these always act on the
guest's main active hash table and reverse mappings.

However, for HPT resizing we're going to need these to sometimes operate
on a tentative hash table or reverse mapping for an in-progress or
recently completed resize.

To allow that, extend the MMU notifier helper functions to take extra
parameters for the HPT to operate on.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 65 +
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d2f04ee..db070ad 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -720,14 +720,38 @@ static void kvmppc_rmap_reset(struct kvm *kvm)
srcu_read_unlock(&kvm->srcu, srcu_idx);
 }
 
+static int kvm_handle_hva_range_slot(struct kvm *kvm,
+struct kvm_hpt_info *hpt,
+struct kvm_memory_slot *memslot,
+unsigned long *rmap,
+gfn_t gfn_start, gfn_t gfn_end,
+int (*handler)(struct kvm *kvm,
+   struct kvm_hpt_info *hpt,
+   unsigned long *rmapp,
+   unsigned long gfn))
+{
+   int ret;
+   int retval = 0;
+   gfn_t gfn;
+
+   for (gfn = gfn_start; gfn < gfn_end; ++gfn) {
+   gfn_t gfn_offset = gfn - memslot->base_gfn;
+
+   ret = handler(kvm, hpt, &rmap[gfn_offset], gfn);
+   retval |= ret;
+   }
+
+   return retval;
+}
+
 static int kvm_handle_hva_range(struct kvm *kvm,
unsigned long start,
unsigned long end,
int (*handler)(struct kvm *kvm,
+  struct kvm_hpt_info *hpt,
   unsigned long *rmapp,
   unsigned long gfn))
 {
-   int ret;
int retval = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
@@ -749,28 +773,27 @@ static int kvm_handle_hva_range(struct kvm *kvm,
gfn = hva_to_gfn_memslot(hva_start, memslot);
gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
 
-   for (; gfn < gfn_end; ++gfn) {
-   gfn_t gfn_offset = gfn - memslot->base_gfn;
-
-   ret = handler(kvm, &memslot->arch.rmap[gfn_offset], 
gfn);
-   retval |= ret;
-   }
+   retval |= kvm_handle_hva_range_slot(kvm, &kvm->arch.hpt,
+   memslot, memslot->arch.rmap,
+   gfn, gfn_end, handler);
}
 
return retval;
 }
 
 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
- int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+ int (*handler)(struct kvm *kvm,
+struct kvm_hpt_info *hpt,
+unsigned long *rmapp,
 unsigned long gfn))
 {
return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
 }
 
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
-  unsigned long gfn)
+static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_hpt_info *hpt,
+  unsigned long *rmapp, unsigned long gfn)
 {
-   struct revmap_entry *rev = kvm->arch.hpt.rev;
+   struct revmap_entry *rev = hpt->rev;
unsigned long h, i, j;
__be64 *hptep;
unsigned long ptel, psize, rcbits;
@@ -788,7 +811,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
 * rmap chain lock.
 */
i = *rmapp & KVMPPC_RMAP_INDEX;
-   hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
+   hptep = (__be64 *) (hpt->virt + (i << 4));
if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
/* unlock rmap before spinning on the HPTE lock */
unlock_rmap(rmapp);
@@ -861,16 +884,16 @@ void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
 * thus the present bit can't go from 0 to 1.
 */
if (*rmapp & KVMPPC_RMAP_PRESENT)
-   kvm_unmap_rmapp(kvm, rmapp, gfn);
+   kvm_unmap_rmapp(kvm, &kvm->arch.hpt, rmapp, gfn);
++rmapp;
++gfn;
}
 }
 
-static

[RFCv2 19/25] powerpc/kbm: Allocations for HPT resizing

2016-03-07 Thread David Gibson

This adds code to initialize an HPT resize operation, including allocating
a tentative new HPT and reverse maps.  It also includes corresponding code
to free things afterwards.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 42 +
 1 file changed, 42 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ee50e46..d2f04ee 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -68,6 +68,13 @@ struct kvm_resize_hpt {
/* Private to the work thread, until RESIZE_HPT_FAILED is set,
 * thereafter read-only */
int error;
+
+   /* Private to the work thread, until RESIZE_HPT_PREPARED, then
+* protected by kvm->mmu_lock until the resize struct is
+* unlinked from struct kvm, then private to the work thread
+* again */
+   struct kvm_hpt_info hpt;
+   unsigned long *rmap[KVM_USER_MEM_SLOTS];
 };
 
 #ifdef DEBUG_RESIZE_HPT
@@ -1173,6 +1180,31 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, 
unsigned long gpa,
 static int resize_hpt_allocate(struct kvm_resize_hpt *resize,
   struct kvm_memslots *slots)
 {
+   struct kvm_memory_slot *memslot;
+   int rc;
+
+   rc = kvmppc_allocate_hpt(&resize->hpt, resize->order);
+   if (rc == -ENOMEM)
+   return H_NO_MEM;
+   else if (rc < 0)
+   return H_HARDWARE;
+
+   resize_hpt_debug(resize, "HPT @ 0x%lx\n", resize->hpt.virt);
+
+   kvm_for_each_memslot(memslot, slots) {
+   unsigned long *rmap;
+
+   if (memslot->flags & KVM_MEMSLOT_INVALID)
+   continue;
+
+   rmap = vzalloc(memslot->npages * sizeof(*rmap));
+   if (!rmap)
+   return H_NO_MEM;
+   resize->rmap[memslot->id] = rmap;
+   resize_hpt_debug(resize, "Memslot %d (%lu pages): %p\n",
+memslot->id, memslot->npages, rmap);
+   }
+
return H_SUCCESS;
 }
 
@@ -1193,6 +1225,13 @@ static void resize_hpt_flush_rmaps(struct kvm_resize_hpt 
*resize,
 
 static void resize_hpt_free(struct kvm_resize_hpt *resize)
 {
+   int i;
+   if (resize->hpt.virt)
+   kvmppc_free_hpt(&resize->hpt);
+
+   for (i = 0; i < KVM_USER_MEM_SLOTS; i++)
+   if (resize->rmap[i])
+   vfree(resize->rmap[i]);
 }
 
 static void resize_hpt_work(struct work_struct *work)
@@ -1205,6 +1244,9 @@ static void resize_hpt_work(struct work_struct *work)
 
resize_hpt_debug(resize, "Starting work, order = %d\n", resize->order);
 
+   memset(&resize->hpt, 0, sizeof(resize->hpt));
+   memset(&resize->rmap, 0, sizeof(resize->rmap));
+
mutex_lock(&kvm->arch.resize_hpt_mutex);
 
/* Don't want to have memslots change under us */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFCv2 23/25] powerpc/kvm: Rehashing for HPT resizing

2016-03-07 Thread David Gibson

This adds code for the "guts" of an HPT resize operation: rehashing HPTEs
from the current HPT into the new resized HPT.

This is performed by the HPT resize work thread, but is gated to occur only
while the guest is executing the H_RESIZE_HPT_COMMIT hypercall.  The guest
is expected not to modify or use the hash table during this period which
simplifies things somewhat (Linux guests do this with stop_machine()).
However, there are still host processes active which could affect the guest
so there's still some hairy synchronization.

To reduce the amount of stuff we need to do (and thus the latency of the
operation) we only rehash bolted entries, expecting the guest to refault
other HPTEs after the resize is complete.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_book3s.h |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 166 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |  10 +-
 3 files changed, 173 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 81f2b77..935fbba 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -156,8 +156,10 @@ extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong 
msr);
 extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu 
*vcpu);
 extern kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa,
bool writing, bool *writable);
-extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
-   unsigned long *rmap, long pte_index, int realmode);
+extern void kvmppc_add_revmap_chain(struct kvm_hpt_info *hpt,
+   struct revmap_entry *rev,
+   unsigned long *rmap,
+   long pte_index, int realmode);
 extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long 
psize);
 extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index c4c1814..d06aef6 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -681,7 +681,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
/* don't lose previous R and C bits */
r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
} else {
-   kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
+   kvmppc_add_revmap_chain(&kvm->arch.hpt, rev, rmap, index, 0);
}
 
hptep[1] = cpu_to_be64(r);
@@ -1249,9 +1249,171 @@ static int resize_hpt_allocate(struct kvm_resize_hpt 
*resize,
return H_SUCCESS;
 }
 
+static unsigned long resize_hpt_rehash_hpte(struct kvm *kvm,
+   struct kvm_resize_hpt *resize,
+   unsigned long pteg, int slot)
+{
+
+   struct kvm_hpt_info *old = &kvm->arch.hpt;
+   struct kvm_hpt_info *new = &resize->hpt;
+   unsigned long old_idx = pteg * HPTES_PER_GROUP + slot;
+   unsigned long new_idx;
+   __be64 *hptep, *new_hptep;
+   unsigned long old_hash_mask = (1ULL << (old->order - 7)) - 1;
+   unsigned long new_hash_mask = (1ULL << (new->order - 7)) - 1;
+   unsigned long pte0, pte1, guest_pte1;
+   unsigned long avpn;
+   unsigned long psize, a_psize;
+   unsigned long hash, new_pteg, replace_pte0;
+   unsigned long gpa, gfn;
+   struct kvm_memory_slot *memslot;
+   struct revmap_entry *new_rev;
+   unsigned long mmu_seq;
+
+   mmu_seq = kvm->mmu_notifier_seq;
+   smp_rmb();
+
+   hptep = (__be64 *)(old->virt + (old_idx << 4));
+   if (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+   return H_HARDWARE;
+
+   pte0 = be64_to_cpu(hptep[0]);
+   pte1 = be64_to_cpu(hptep[1]);
+   guest_pte1 = old->rev[old_idx].guest_rpte;
+
+   unlock_hpte(hptep, pte0);
+
+   if (!(pte0 & HPTE_V_VALID) && !(pte0 & HPTE_V_ABSENT))
+   /* Nothing to do */
+   return H_SUCCESS;
+
+   if (!(pte0 & HPTE_V_BOLTED))
+   /* Don't bother rehashing non-bolted HPTEs */
+   return H_SUCCESS;
+
+   pte1 = be64_to_cpu(hptep[1]);
+   psize = hpte_base_page_size(pte0, pte1);
+   if (WARN_ON(!psize))
+   return H_HARDWARE;
+
+   avpn = HPTE_V_AVPN_VAL(pte0) & ~((psize - 1) >> 23);
+
+   if (pte0 & HPTE_V_SECONDARY)
+   pteg = ~pteg;
+
+   if (!(pte0 & HPTE_V_1TB_SEG)) {
+   unsigned long offset, vsid;
+
+   /* We only have 28 - 23 bits of offset in avpn */
+   offset = (avpn & 0x1f) << 23;
+   vsid = avpn >> 5;
+   /* We can find more bits from the pteg value */
+   if (psize < (1U

[RFCv2 18/25] powerpc/kvm: Outline of HPT resizing implementation

2016-03-07 Thread David Gibson

This adds an outline (not yet working) of an implementation for the HPT
resizing PAPR extension.  Specifically it adds the work function which will
see through the resizing workflow, and adds in the synchronization between
this and the HPT resizing hypercalls.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_host.h |   5 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 276 +++-
 arch/powerpc/kvm/book3s_hv.c|   6 +
 3 files changed, 285 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 718dc56..ef5b444 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -230,6 +230,8 @@ struct kvm_hpt_info {
int cma;
 };
 
+struct kvm_resize_hpt;
+
 struct kvm_arch {
unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -248,6 +250,9 @@ struct kvm_arch {
cpumask_t need_tlb_flush;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   struct kvm_resize_hpt *resize_hpt; /* protected by kvm->mmu_lock */
+   struct mutex resize_hpt_mutex;
+   wait_queue_head_t resize_hpt_wq;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b92384f..ee50e46 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,6 +40,56 @@
 
 #include "trace_hv.h"
 
+#define DEBUG_RESIZE_HPT   1
+
+
+struct kvm_resize_hpt {
+   /* These fields are read-only after initialization */
+   struct kvm *kvm;
+   struct work_struct work;
+   u32 order;
+
+   /* These fields protected by kvm->mmu_lock */
+   unsigned long state;
+   /*  Prepare completed, or failed */
+#defineRESIZE_HPT_PREPARED (1UL << 1)
+   /*  Something failed in work thread */
+#defineRESIZE_HPT_FAILED   (1UL << 2)
+   /*  New HPT is active */
+#defineRESIZE_HPT_COMMITTED(1UL << 3)
+
+   /*  H_COMMIT hypercall has started */
+#defineRESIZE_HPT_COMMIT   (1UL << 16)
+   /*  Cancelled */
+#defineRESIZE_HPT_CANCEL   (1UL << 17)
+   /*  All done, state can be free()d */
+#defineRESIZE_HPT_FREE (1UL << 18)   
+
+   /* Private to the work thread, until RESIZE_HPT_FAILED is set,
+* thereafter read-only */
+   int error;
+};
+
+#ifdef DEBUG_RESIZE_HPT
+#define resize_hpt_debug(resize, ...)  \
+   do {\
+   printk(KERN_DEBUG "RESIZE HPT %p: ", resize);   \
+   printk(__VA_ARGS__);\
+   } while (0)
+#else
+#define resize_hpt_debug(resize, ...)  \
+   do { } while (0)
+#endif
+
+static void resize_hpt_set_state(struct kvm_resize_hpt *resize,
+  unsigned long newstate)
+{
+   struct kvm *kvm = resize->kvm;
+
+   resize->state |= newstate;
+   wake_up_all(&kvm->arch.resize_hpt_wq);
+}
+
 static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh,
unsigned long ptel, unsigned long *pte_idx_ret);
@@ -1120,19 +1170,241 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void 
*va, unsigned long gpa,
 /*
  * HPT resizing
  */
+static int resize_hpt_allocate(struct kvm_resize_hpt *resize,
+  struct kvm_memslots *slots)
+{
+   return H_SUCCESS;
+}
+
+static int resize_hpt_rehash(struct kvm_resize_hpt *resize)
+{
+   return H_HARDWARE;
+}
+
+static void resize_hpt_pivot(struct kvm_resize_hpt *resize,
+struct kvm_memslots *slots)
+{
+}
+
+static void resize_hpt_flush_rmaps(struct kvm_resize_hpt *resize,
+  struct kvm_memslots *slots)
+{
+}
+
+static void resize_hpt_free(struct kvm_resize_hpt *resize)
+{
+}
+
+static void resize_hpt_work(struct work_struct *work)
+{
+   struct kvm_resize_hpt *resize = container_of(work,
+struct kvm_resize_hpt,
+work);
+   struct kvm *kvm = resize->kvm;
+   struct kvm_memslots *slots;
+
+   resize_hpt_debug(resize, "Starting work, order = %d\n", resize->order);
+
+   mutex_lock(&kvm->arch.resize_hpt_mutex);
+
+   /* Don't want to have memslots change under us */
+   mutex_lock(&kvm->slots_lock);
+
+   slots = kvm_memslots(kvm);
+
+   resize->error = resize_hpt_allocate(resize, slots);
+   spin_lock(&kvm->mmu_lock);
+
+   if (resize->error || (resize->state &

[RFCv2 15/25] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size

2016-03-07 Thread David Gibson

The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).

At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.

With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).

So, we need to allow this ioctl() to change the HPT size.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 52 -
 arch/powerpc/kvm/book3s_hv.c|  5 +---
 3 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index f77d0a0..bc7a104 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
 extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
-extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
 extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index eb1aa3a..4547b6e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -103,10 +103,22 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info 
*info)
info->virt, (long)info->order, kvm->arch.lpid);
 }
 
-long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
+void kvmppc_free_hpt(struct kvm_hpt_info *info)
+{
+   vfree(info->rev);
+   if (info->cma)
+   kvm_free_hpt_cma(virt_to_page(info->virt),
+1 << (info->order - PAGE_SHIFT));
+   else
+   free_pages(info->virt, info->order - PAGE_SHIFT);
+   info->virt = 0;
+   info->order = 0;
+}
+
+long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 {
long err = -EBUSY;
-   long order;
+   struct kvm_hpt_info info;
 
mutex_lock(&kvm->lock);
if (kvm->arch.hpte_setup_done) {
@@ -118,8 +130,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
goto out;
}
}
-   if (kvm->arch.hpt.virt) {
-   order = kvm->arch.hpt.order;
+   if (kvm->arch.hpt.order == order) {
+   /* We already have a suitable HPT */
+
/* Set the entire HPT to 0, i.e. invalid HPTEs */
memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
/*
@@ -128,33 +141,24 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
kvmppc_rmap_reset(kvm);
/* Ensure that each vcpu will flush its TLB on next entry. */
cpumask_setall(&kvm->arch.need_tlb_flush);
-   *htab_orderp = order;
err = 0;
-   } else {
-   struct kvm_hpt_info info;
-
-   err = kvmppc_allocate_hpt(&info, *htab_orderp);
-   if (err < 0)
-   goto out;
-   kvmppc_set_hpt(kvm, &info);
+   goto out;
}
+
+   if (kvm->arch.hpt.virt)
+   kvmppc_free_hpt(&kvm->arch.hpt);
+
+   
+   err = kvmppc_allocate_hpt(&info, order);
+   if (err < 0)
+   goto out;
+   kvmppc_set_hpt(kvm, &info);
+   
  out:
mutex_unlock(&kvm->lock);
return err;
 }
 
-void kvmppc_free_hpt(struct kvm_hpt_info *info)
-{
-   vfree(info->rev);
-   if (info->cma)
-   kvm_free_hpt_cma(virt_to_page(info->virt),
-1 << (info->order - PAGE_SHIFT));
-   else
-   free_pages(info->virt, info->order - PAGE_SHIFT);
-   info->virt = 0;
-   info->order = 0;
-}
-
 /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
 static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize)
 {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1199fb5..a2730ca 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3113,12 +3113,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
r = -EFAULT;
if (get_user(htab_order, (u32 __user *)argp))
break;
-   r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
+   r = kvmppc_alloc_reset_hpt(kvm, htab_order);
if (r)

[RFCv2 25/25] powerpc/kvm: Harvest RC bits from old HPT after HPT resize

2016-03-07 Thread David Gibson

During an HPT resize operation we have two HPTs and sets of reverse maps
for the guest: the active one, and the tentative resized one.  This means
that information about a host page's referenced / dirty state as affected
by the guest could end up in either HPT depending on exactly what moment
it happens at.

During the transition we handle this by having things which need this
information consult both the new and old HPTs.  However, in order to clean
things up after, we need to harvest any such information left over in the
old tables and store it in the new ones.

This implements this, first harvesting R & C bits from the old HPT into
the old rmaps, then folding that information into the new (now current)
rmaps.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 57 +
 1 file changed, 57 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 45430fe..f132f86 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1457,9 +1457,66 @@ static void resize_hpt_pivot(struct kvm_resize_hpt 
*resize,
spin_lock(&kvm->mmu_lock);
 }
 
+static void resize_hpt_harvest_rc(struct kvm_hpt_info *hpt,
+ unsigned long *rmapp)
+{
+   unsigned long idx;
+
+   if (!(*rmapp & KVMPPC_RMAP_PRESENT))
+   return;
+
+   idx = *rmapp & KVMPPC_RMAP_INDEX;
+   do {
+   struct revmap_entry *rev = &hpt->rev[idx];
+   __be64 *hptep = (__be64 *)(hpt->virt + (idx << 4));
+   unsigned long hpte0 = be64_to_cpu(hptep[0]);
+   unsigned long hpte1 = be64_to_cpu(hptep[1]);
+   unsigned long psize = hpte_page_size(hpte0, hpte1);
+   unsigned long rcbits = hpte1 & (HPTE_R_R | HPTE_R_C);
+
+   *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+   if (rcbits & HPTE_R_C)
+   kvmppc_update_rmap_change(rmapp, psize);
+
+   idx = rev->forw;
+   } while (idx != (*rmapp & KVMPPC_RMAP_INDEX));
+}
+
 static void resize_hpt_flush_rmaps(struct kvm_resize_hpt *resize,
   struct kvm_memslots *slots)
 {
+   struct kvm_memory_slot *memslot;
+
+   kvm_for_each_memslot(memslot, slots) {
+   unsigned long *old_rmap = resize->rmap[memslot->id];
+   unsigned long *new_rmap = memslot->arch.rmap;
+   unsigned long i;
+
+   resize_hpt_debug(resize, "Flushing RMAPS for memslot %d\n", 
memslot->id);
+
+   for (i = 0; i < memslot->npages; i++) {
+   lock_rmap(old_rmap);
+
+   resize_hpt_harvest_rc(&resize->hpt, old_rmap);
+
+   lock_rmap(new_rmap);
+
+   *new_rmap |= *old_rmap & (KVMPPC_RMAP_REFERENCED
+ | KVMPPC_RMAP_CHANGED);
+   if ((*old_rmap & KVMPPC_RMAP_CHG_ORDER)
+   > (*new_rmap & KVMPPC_RMAP_CHG_ORDER)) {
+   *new_rmap &= ~KVMPPC_RMAP_CHG_ORDER;
+   *new_rmap |= *old_rmap & KVMPPC_RMAP_CHG_ORDER;
+   }
+   unlock_rmap(new_rmap);
+   unlock_rmap(old_rmap);
+
+   old_rmap++;
+   new_rmap++;
+   }
+
+   resize_hpt_debug(resize, "Flushed RMAPS for memslot %d\n", 
memslot->id);
+   }
 }
 
 static void resize_hpt_free(struct kvm_resize_hpt *resize)
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/2] mm: Enable page parallel initialisation for Power

2016-03-07 Thread Li Zhang

From: Li Zhang 

Uptream has supported page parallel initialisation for X86 and the
boot time is improved greately. Some tests have been done for Power.

Here is the result I have done with different memory size.

* 4GB memory:
boot time is as the following: 
with patch vs without patch: 10.4s vs 24.5s
boot time is improved 57%
* 200GB memory: 
boot time looks the same with and without patches.
boot time is about 38s
* 32TB memory: 
boot time looks the same with and without patches 
boot time is about 160s.
The boot time is much shorter than X86 with 24TB memory.
From community discussion, it costs about 694s for X86 24T system.

From code view, parallel initialisation improve the performance by
deferring memory initilisation to kswap with N kthreads, it should
improve the performance therotically. 

From the test result, On X86, performance is improved greatly with huge
memory. But on Power platform, it is improved greatly with less than 
100GB memory. For huge memory, it is not improved greatly. But it saves 
the time with several threads at least, as the following information 
shows(32TB system log):

[   22.648169] node 9 initialised, 16607461 pages in 280ms
[   22.783772] node 3 initialised, 23937243 pages in 410ms
[   22.858877] node 6 initialised, 29179347 pages in 490ms
[   22.863252] node 2 initialised, 29179347 pages in 490ms
[   22.907545] node 0 initialised, 32049614 pages in 540ms
[   22.920891] node 15 initialised, 32212280 pages in 550ms
[   22.923236] node 4 initialised, 32306127 pages in 550ms
[   22.923384] node 12 initialised, 32314319 pages in 550ms
[   22.924754] node 8 initialised, 32314319 pages in 550ms
[   22.940780] node 13 initialised, 33353677 pages in 570ms
[   22.940796] node 11 initialised, 33353677 pages in 570ms
[   22.941700] node 5 initialised, 33353677 pages in 570ms
[   22.941721] node 10 initialised, 33353677 pages in 570ms
[   22.941876] node 7 initialised, 33353677 pages in 570ms
[   22.944946] node 14 initialised, 33353677 pages in 570ms
[   22.946063] node 1 initialised, 33345485 pages in 580ms

It saves the time about 550*16 ms at least, although it can be ignore to 
compare 
the boot time about 160 seconds. What's more, the boot time is much shorter 
on Power even without patches than x86 for huge memory machine. 

So this patchset is still necessary to be enabled for Power. 

Li Zhang (2):
  mm: meminit: initialise more memory for inode/dentry hash tables in
early boot
  powerpc/mm: Enable page parallel initialisation

 arch/powerpc/Kconfig |  1 +
 mm/page_alloc.c  | 11 +--
 2 files changed, 10 insertions(+), 2 deletions(-)

-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot

2016-03-07 Thread Li Zhang

From: Li Zhang 

This patch is based on Mel Gorman's old patch in the mailing list,
https://lkml.org/lkml/2015/5/5/280 which is discussed but it is
fixed with a completion to wait for all memory initialised in
page_alloc_init_late(). It is to fix the OOM problem on X86
with 24TB memory which allocates memory in late initialisation.
But for Power platform with 32TB memory, it causes a call trace
in vfs_caches_init->inode_init() and inode hash table needs more
memory.
So this patch allocates 1GB for 0.25TB/node for large system
as it is mentioned in https://lkml.org/lkml/2015/5/1/627

This call trace is found on Power with 32TB memory, 1024CPUs, 16nodes.
Currently, it only allocates 2GB*16=32GB for early initialisation. But
Dentry cache hash table needes 16GB and Inode cache hash table needs
16GB. So the system have no enough memory for it.
The log from dmesg as the following:

Dentry cache hash table entries: 2147483648 (order: 18,17179869184 bytes)
vmalloc: allocation failure, allocated 16021913600 of 17179934720 bytes
swapper/0: page allocation failure: order:0,mode:0x2080020
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-0-ppc64
Call Trace:
[c12bfa00] [c07c4a50].dump_stack+0xb4/0xb664 (unreliable)
[c12bfa80] [c01f93d4].warn_alloc_failed+0x114/0x160
[c12bfb30] [c023c204].__vmalloc_area_node+0x1a4/0x2b0
[c12bfbf0] [c023c3f4].__vmalloc_node_range+0xe4/0x110
[c12bfc90] [c023c460].__vmalloc_node+0x40/0x50
[c12bfd10] [c0b67d60].alloc_large_system_hash+0x134/0x2a4
[c12bfdd0] [c0b70924].inode_init+0xa4/0xf0
[c12bfe60] [c0b706a0].vfs_caches_init+0x80/0x144
[c12bfef0] [c0b35208].start_kernel+0x40c/0x4e0
[c12bff90] [c0008cfc]start_here_common+0x20/0x4a4
Mem-Info:

Acked-by: Mel Gorman 
Signed-off-by: Li Zhang 
---
 * Fix a typo and format dmesg in change log.
 * Fix a coding stype of this patch.

 mm/page_alloc.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 838ca8bb..6f77f64 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -293,13 +293,20 @@ static inline bool update_defer_init(pg_data_t *pgdat,
unsigned long pfn, unsigned long zone_end,
unsigned long *nr_initialised)
 {
+   unsigned long max_initialise;
+
/* Always populate low zones for address-contrained allocations */
if (zone_end < pgdat_end_pfn(pgdat))
return true;
+   /*
+* Initialise at least 2G of a node but also take into account that
+* two large system hashes that can take up 1GB for 0.25TB/node.
+*/
+   max_initialise = max(2UL << (30 - PAGE_SHIFT),
+   (pgdat->node_spanned_pages >> 8));
 
-   /* Initialise at least 2G of the highest zone */
(*nr_initialised)++;
-   if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+   if ((*nr_initialised > max_initialise) &&
(pfn & (PAGES_PER_SECTION - 1)) == 0) {
pgdat->first_deferred_pfn = pfn;
return false;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc/mm: Enable page parallel initialisation

2016-03-07 Thread Li Zhang

From: Li Zhang 

Parallel initialisation has been enabled for X86, boot time is
improved greatly. On Power8, it is improved greatly for small
memory. Here is the result from my test on Power8 platform:

For 4GB memory: 57% is improved, boot time as the following:
with patch: 10s, without patch: 24.5s

For 50GB memory: 22% is improved, boot time as the following:
with patch: 43.8s, without patch: 56.8s

Acked-by: Mel Gorman 
Signed-off-by: Li Zhang 
---
 * Add boot time details in change log.
 * Please apply this patch after [PATCH 1/2] mm: meminit: initialise
more memory for inode/dentry hash tables in early boot, because
   [PATCH 1/2] is to fix a bug which can be reproduced on Power.

 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9faa18c..97d41ad 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -158,6 +158,7 @@ config PPC
select ARCH_HAS_DEVMEM_IS_ALLOWED
select HAVE_ARCH_SECCOMP_FILTER
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-03-07 Thread Matt Ochs

> On Mar 7, 2016, at 7:48 PM, Ian Munsie  wrote:
> 
> From: Ian Munsie 
> 
> This adds an afu_driver_ops structure with event_pending and
> deliver_event callbacks. An AFU driver such as cxlflash can fill these
> out and associate it with a context to enable passing custom AFU
> specific events to userspace.
> 
> The cxl driver will call event_pending() during poll, select, read, etc.
> calls to check if an AFU driver specific event is pending, and will call
> deliver_event() to deliver that event. This way, the cxl driver takes
> care of all the usual locking semantics around these calls and handles
> all the generic cxl events, so that the AFU driver only needs to worry
> about it's own events.
> 
> The deliver_event() call is passed a struct cxl_event buffer to fill in.
> The header will already be filled in for an AFU driver event, and the
> AFU driver is expected to expand the header.size as necessary (up to
> max_size, defined by struct cxl_event_afu_driver_reserved) and fill out
> it's own information.
> 
> Since AFU drivers provide their own means for userspace to obtain the
> AFU file descriptor (i.e. cxlflash uses an ioctl on their scsi file
> descriptor to obtain the AFU file descriptor) and the generic cxl driver
> will never use this event, the ABI of the event is up to each individual
> AFU driver.
> 
> Signed-off-by: Ian Munsie 

Reviewed-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table

2016-03-07 Thread David Gibson

On Mon, Mar 07, 2016 at 08:38:13PM +1100, Alexey Kardashevskiy wrote:
> On 03/07/2016 05:25 PM, David Gibson wrote:
> >On Mon, Mar 07, 2016 at 02:41:14PM +1100, Alexey Kardashevskiy wrote:
> >>The existing in-kernel TCE table for emulated devices contains
> >>guest physical addresses which are accesses by emulated devices.
> >>Since we need to keep this information for VFIO devices too
> >>in order to implement H_GET_TCE, we are reusing it.
> >>
> >>This adds IOMMU group list to kvmppc_spapr_tce_table. Each group
> >>will have an iommu_table pointer.
> >>
> >>This adds kvm_spapr_tce_attach_iommu_group() helper and its detach
> >>counterpart to manage the lists.
> >>
> >>This puts a group when:
> >>- guest copy of TCE table is destroyed when TCE table fd is closed;
> >>- kvm_spapr_tce_detach_iommu_group() is called from
> >>the KVM_DEV_VFIO_GROUP_DEL ioctl handler in the case vfio-pci hotunplug
> >>(will be added in the following patch).
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >>---
> >>  arch/powerpc/include/asm/kvm_host.h |   8 +++
> >>  arch/powerpc/include/asm/kvm_ppc.h  |   6 ++
> >>  arch/powerpc/kvm/book3s_64_vio.c| 108 
> >> 
> >>  3 files changed, 122 insertions(+)
> >>
> >>diff --git a/arch/powerpc/include/asm/kvm_host.h 
> >>b/arch/powerpc/include/asm/kvm_host.h
> >>index 2e7c791..2c5c823 100644
> >>--- a/arch/powerpc/include/asm/kvm_host.h
> >>+++ b/arch/powerpc/include/asm/kvm_host.h
> >>@@ -178,6 +178,13 @@ struct kvmppc_pginfo {
> >>atomic_t refcnt;
> >>  };
> >>
> >>+struct kvmppc_spapr_tce_group {
> >>+   struct list_head next;
> >>+   struct rcu_head rcu;
> >>+   struct iommu_group *refgrp;/* for reference counting only */
> >>+   struct iommu_table *tbl;
> >>+};
> >>+
> >>  struct kvmppc_spapr_tce_table {
> >>struct list_head list;
> >>struct kvm *kvm;
> >>@@ -186,6 +193,7 @@ struct kvmppc_spapr_tce_table {
> >>u32 page_shift;
> >>u64 offset; /* in pages */
> >>u64 size;   /* window size in pages */
> >>+   struct list_head groups;
> >>struct page *pages[0];
> >>  };
> >>
> >>diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> >>b/arch/powerpc/include/asm/kvm_ppc.h
> >>index 2544eda..d1482dc 100644
> >>--- a/arch/powerpc/include/asm/kvm_ppc.h
> >>+++ b/arch/powerpc/include/asm/kvm_ppc.h
> >>@@ -164,6 +164,12 @@ extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
> >>struct kvm_memory_slot *memslot, unsigned long porder);
> >>  extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
> >>
> >>+extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
> >>+   unsigned long liobn,
> >>+   phys_addr_t start_addr,
> >>+   struct iommu_group *grp);
> >>+extern void kvm_spapr_tce_detach_iommu_group(struct kvm *kvm,
> >>+   struct iommu_group *grp);
> >>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> >>struct kvm_create_spapr_tce_64 *args);
> >>  extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> >>b/arch/powerpc/kvm/book3s_64_vio.c
> >>index 2c2d103..846d16d 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio.c
> >>@@ -27,6 +27,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >>+#include 
> >>
> >>  #include 
> >>  #include 
> >>@@ -95,10 +96,18 @@ static void release_spapr_tce_table(struct rcu_head 
> >>*head)
> >>struct kvmppc_spapr_tce_table *stt = container_of(head,
> >>struct kvmppc_spapr_tce_table, rcu);
> >>unsigned long i, npages = kvmppc_tce_pages(stt->size);
> >>+   struct kvmppc_spapr_tce_group *kg;
> >>
> >>for (i = 0; i < npages; i++)
> >>__free_page(stt->pages[i]);
> >>
> >>+   while (!list_empty(&stt->groups)) {
> >>+   kg = list_first_entry(&stt->groups,
> >>+   struct kvmppc_spapr_tce_group, next);
> >>+   list_del(&kg->next);
> >>+   kfree(kg);
> >>+   }
> >>+
> >>kfree(stt);
> >>  }
> >>
> >>@@ -129,9 +138,15 @@ static int kvm_spapr_tce_mmap(struct file *file, 
> >>struct vm_area_struct *vma)
> >>  static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
> >>  {
> >>struct kvmppc_spapr_tce_table *stt = filp->private_data;
> >>+   struct kvmppc_spapr_tce_group *kg;
> >>
> >>list_del_rcu(&stt->list);
> >>
> >>+   list_for_each_entry_rcu(kg, &stt->groups, next) {
> >>+   iommu_group_put(kg->refgrp);
> >>+   kg->refgrp = NULL;
> >>+   }
> >
> >What's the reason for this kind of two-phase deletion?  Dereffing the
> >group here, and setting to NULL, then actually removing from the liast above.
> 
> Well, this way I have only one RCU-delayed release_spapr_tce_table(). The
> other option would be to call for each @kg:
> - list_del(&kg->next);
> - call_rcu()
> 
> as release_spapr_tce_table()

Re: [PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg()

2016-03-07 Thread David Gibson

On Mon, Mar 07, 2016 at 06:32:23PM +1100, Alexey Kardashevskiy wrote:
> On 03/07/2016 05:05 PM, David Gibson wrote:
> >On Mon, Mar 07, 2016 at 02:41:12PM +1100, Alexey Kardashevskiy wrote:
> >>In real mode, TCE tables are invalidated using different
> >>cache-inhibited store instructions which is different from
> >>the virtual mode.
> >>
> >>This defines and implements exchange_rm() callback. This does not
> >>define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
> >>exchange/exchange_rm are only to be used by KVM for VFIO.
> >>
> >>The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.
> >>
> >>This replaces list_for_each_entry_rcu with its lockless version as
> >>from now on pnv_pci_ioda2_tce_invalidate() can be called in
> >>the real mode too.
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >>---
> >>  arch/powerpc/include/asm/iommu.h  |  7 +++
> >>  arch/powerpc/kernel/iommu.c   | 15 +++
> >>  arch/powerpc/platforms/powernv/pci-ioda.c | 28 
> >> +++-
> >>  3 files changed, 49 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/arch/powerpc/include/asm/iommu.h 
> >>b/arch/powerpc/include/asm/iommu.h
> >>index 7b87bab..3ca877a 100644
> >>--- a/arch/powerpc/include/asm/iommu.h
> >>+++ b/arch/powerpc/include/asm/iommu.h
> >>@@ -64,6 +64,11 @@ struct iommu_table_ops {
> >>long index,
> >>unsigned long *hpa,
> >>enum dma_data_direction *direction);
> >>+   /* Real mode */
> >>+   int (*exchange_rm)(struct iommu_table *tbl,
> >>+   long index,
> >>+   unsigned long *hpa,
> >>+   enum dma_data_direction *direction);
> >>  #endif
> >>void (*clear)(struct iommu_table *tbl,
> >>long index, long npages);
> >>@@ -208,6 +213,8 @@ extern void iommu_del_device(struct device *dev);
> >>  extern int __init tce_iommu_bus_notifier_init(void);
> >>  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
> >>unsigned long *hpa, enum dma_data_direction *direction);
> >>+extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> >>+   unsigned long *hpa, enum dma_data_direction *direction);
> >>  #else
> >>  static inline void iommu_register_group(struct iommu_table_group 
> >> *table_group,
> >>int pci_domain_number,
> >>diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> >>index a8e3490..2fcc48b 100644
> >>--- a/arch/powerpc/kernel/iommu.c
> >>+++ b/arch/powerpc/kernel/iommu.c
> >>@@ -1062,6 +1062,21 @@ void iommu_release_ownership(struct iommu_table *tbl)
> >>  }
> >>  EXPORT_SYMBOL_GPL(iommu_release_ownership);
> >>
> >>+long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> >>+   unsigned long *hpa, enum dma_data_direction *direction)
> >>+{
> >>+   long ret;
> >>+
> >>+   ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> >>+
> >>+   if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> >>+   (*direction == DMA_BIDIRECTIONAL)))
> >>+   SetPageDirty(realmode_pfn_to_page(*hpa >> PAGE_SHIFT));
> >>+
> >>+   return ret;
> >>+}
> >>+EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);
> >
> >>  int iommu_add_device(struct device *dev)
> >>  {
> >>struct iommu_table *tbl;
> >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> >>b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>index c5baaf3..bed1944 100644
> >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>@@ -1791,6 +1791,18 @@ static int pnv_ioda1_tce_xchg(struct iommu_table 
> >>*tbl, long index,
> >>
> >>return ret;
> >>  }
> >>+
> >>+static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index,
> >>+   unsigned long *hpa, enum dma_data_direction *direction)
> >>+{
> >>+   long ret = pnv_tce_xchg(tbl, index, hpa, direction);
> >>+
> >>+   if (!ret && (tbl->it_type &
> >>+   (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
> >>+   pnv_pci_ioda1_tce_invalidate(tbl, index, 1, true);
> >>+
> >>+   return ret;
> >>+}
> >>  #endif
> >
> >Both your _rm variants are identical to the non _rm versions.  Why not
> >just set the function poiinter to the same thing, rather than copying
> >the whole function.
> 
> 
> The last parameter - "rm" - to pnv_pci_ioda1_tce_invalidate() is
> different.

Ah, missed that, sorry.

> 
> 
> >
> >>  static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index,
> >>@@ -1806,6 +1818,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
> >>.set = pnv_ioda1_tce_build,
> >>  #ifdef CONFIG_IOMMU_API
> >>.exchange = pnv_ioda1_tce_xchg,
> >>+   .exchange_rm = pnv_ioda1_tce_xchg_rm,
> >>  #endif
> >>.clear = pnv_ioda1_tce_free,
> >>.get = pnv_tce_get,
> >>@@ -1866,7 +1879,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
> >>iommu_table *tbl,
> >>

Re: [PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list

2016-03-07 Thread Alexey Kardashevskiy


On 03/07/2016 05:00 PM, David Gibson wrote:

On Mon, Mar 07, 2016 at 02:41:11PM +1100, Alexey Kardashevskiy wrote:

VFIO on sPAPR already implements guest memory pre-registration
when the entire guest RAM gets pinned. This can be used to translate
the physical address of a guest page containing the TCE list
from H_PUT_TCE_INDIRECT.

This makes use of the pre-registrered memory API to access TCE list
pages in order to avoid unnecessary locking on the KVM memory
reverse map.

Signed-off-by: Alexey Kardashevskiy 


Ok.. so, what's the benefit of not having to lock the rmap?


Less locking -> less racing == good, no?





---
  arch/powerpc/kvm/book3s_64_vio_hv.c | 86 ++---
  1 file changed, 70 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 44be73e..af155f6 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -180,6 +180,38 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
  EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);

  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+static mm_context_t *kvmppc_mm_context(struct kvm_vcpu *vcpu)
+{
+   struct task_struct *task;
+
+   task = vcpu->arch.run_task;
+   if (unlikely(!task || !task->mm))
+   return NULL;
+
+   return &task->mm->context;
+}
+
+static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu)
+{
+   mm_context_t *mm = kvmppc_mm_context(vcpu);
+
+   if (unlikely(!mm))
+   return false;
+
+   return mm_iommu_preregistered(mm);
+}
+
+static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup(
+   struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size)
+{
+   mm_context_t *mm = kvmppc_mm_context(vcpu);
+
+   if (unlikely(!mm))
+   return NULL;
+
+   return mm_iommu_lookup_rm(mm, ua, size);
+}
+
  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
  unsigned long ioba, unsigned long tce)
  {
@@ -261,23 +293,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if (ret != H_SUCCESS)
return ret;

-   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
-   return H_TOO_HARD;
+   if (kvmppc_preregistered(vcpu)) {
+   /*
+* We get here if guest memory was pre-registered which
+* is normally VFIO case and gpa->hpa translation does not
+* depend on hpt.
+*/
+   struct mm_iommu_table_group_mem_t *mem;

-   rmap = (void *) vmalloc_to_phys(rmap);
+   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
+   return H_TOO_HARD;

-   /*
-* Synchronize with the MMU notifier callbacks in
-* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
-* While we have the rmap lock, code running on other CPUs
-* cannot finish unmapping the host real page that backs
-* this guest real page, so we are OK to access the host
-* real page.
-*/
-   lock_rmap(rmap);
-   if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
-   ret = H_TOO_HARD;
-   goto unlock_exit;
+   mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K);
+   if (!mem || mm_iommu_rm_ua_to_hpa(mem, ua, &tces))
+   return H_TOO_HARD;
+   } else {
+   /*
+* This is emulated devices case.
+* We do not require memory to be preregistered in this case
+* so lock rmap and do __find_linux_pte_or_hugepte().
+*/
+   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
+   return H_TOO_HARD;
+
+   rmap = (void *) vmalloc_to_phys(rmap);
+
+   /*
+* Synchronize with the MMU notifier callbacks in
+* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
+* While we have the rmap lock, code running on other CPUs
+* cannot finish unmapping the host real page that backs
+* this guest real page, so we are OK to access the host
+* real page.
+*/
+   lock_rmap(rmap);
+   if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
+   ret = H_TOO_HARD;
+   goto unlock_exit;
+   }
}

for (i = 0; i < npages; ++i) {
@@ -291,7 +344,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
}

  unlock_exit:
-   unlock_rmap(rmap);
+   if (rmap)


I don't see where rmap is initialized to NULL in the case where it's
not being used.


@rmap is not new to this function, and it has always been initialized to 
NULL as it was returned via a pointer from kvmppc_gpa_to_ua().






+   unlock_rmap(rmap);

return ret;
  }





--
Alexey
__

Re: [PATCH kernel 7/9] KVM: PPC: Create a virtual-mode only TCE table handlers

2016-03-07 Thread David Gibson

On Mon, Mar 07, 2016 at 02:41:15PM +1100, Alexey Kardashevskiy wrote:
> In-kernel VFIO acceleration needs different handling in real and virtual
> modes which makes it hard to support both modes in the same handler.
> 
> This creates a copy of kvmppc_rm_h_stuff_tce and kvmppc_rm_h_put_tce
> in addition to the existing kvmppc_rm_h_put_tce_indirect.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_64_vio.c| 52 
> +
>  arch/powerpc/kvm/book3s_64_vio_hv.c |  8 ++---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  4 +--
>  3 files changed, 57 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 846d16d..7965fc7 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -317,6 +317,32 @@ fail:
>   return ret;
>  }
>  
> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> +   unsigned long ioba, unsigned long tce)
> +{
> + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> + long ret;
> +
> + /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
> + /*  liobn, ioba, tce); */
> +
> + if (!stt)
> + return H_TOO_HARD;
> +
> + ret = kvmppc_ioba_validate(stt, ioba, 1);
> + if (ret != H_SUCCESS)
> + return ret;
> +
> + ret = kvmppc_tce_validate(stt, tce);
> + if (ret != H_SUCCESS)
> + return ret;
> +
> + kvmppc_tce_put(stt, ioba >> stt->page_shift, tce);
> +
> + return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
>  long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   unsigned long liobn, unsigned long ioba,
>   unsigned long tce_list, unsigned long npages)
> @@ -372,3 +398,29 @@ unlock_exit:
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
> +
> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> + unsigned long liobn, unsigned long ioba,
> + unsigned long tce_value, unsigned long npages)
> +{
> + struct kvmppc_spapr_tce_table *stt;
> + long i, ret;
> +
> + stt = kvmppc_find_table(vcpu, liobn);
> + if (!stt)
> + return H_TOO_HARD;
> +
> + ret = kvmppc_ioba_validate(stt, ioba, npages);
> + if (ret != H_SUCCESS)
> + return ret;
> +
> + /* Check permission bits only to allow userspace poison TCE for debug */
> + if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> + return H_PARAMETER;
> +
> + for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift))
> + kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
> +
> + return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index af155f6..11163ae 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -212,8 +212,8 @@ static struct mm_iommu_table_group_mem_t 
> *kvmppc_rm_iommu_lookup(
>   return mm_iommu_lookup_rm(mm, ua, size);
>  }
>  
> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> -   unsigned long ioba, unsigned long tce)
> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> + unsigned long ioba, unsigned long tce)
>  {
>   struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>   long ret;
> @@ -236,7 +236,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned 
> long liobn,
>  
>   return H_SUCCESS;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>  
>  static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
>   unsigned long ua, unsigned long *phpa)
> @@ -350,7 +349,7 @@ unlock_exit:
>   return ret;
>  }
>  
> -long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>   unsigned long liobn, unsigned long ioba,
>   unsigned long tce_value, unsigned long npages)
>  {
> @@ -374,7 +373,6 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  
>   return H_SUCCESS;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba)
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index ed16182..d6dad2c 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1928,7 +1928,7 @@ hcall_real_table:
>   .long   DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>   .long   DOTSYM(kvmppc_h_protect) - hcall_real_table
>   .long   DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> - .long   DOTSYM(kvmppc_h_put_tce) - hcall_real_table
> + .long   DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
>   .long

Re: [PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list

2016-03-07 Thread David Gibson

On Tue, Mar 08, 2016 at 04:47:20PM +1100, Alexey Kardashevskiy wrote:
> On 03/07/2016 05:00 PM, David Gibson wrote:
> >On Mon, Mar 07, 2016 at 02:41:11PM +1100, Alexey Kardashevskiy wrote:
> >>VFIO on sPAPR already implements guest memory pre-registration
> >>when the entire guest RAM gets pinned. This can be used to translate
> >>the physical address of a guest page containing the TCE list
> >>from H_PUT_TCE_INDIRECT.
> >>
> >>This makes use of the pre-registrered memory API to access TCE list
> >>pages in order to avoid unnecessary locking on the KVM memory
> >>reverse map.
> >>
> >>Signed-off-by: Alexey Kardashevskiy 
> >
> >Ok.. so, what's the benefit of not having to lock the rmap?
> 
> Less locking -> less racing == good, no?

Well.. maybe.  The increased difficulty in verifying that the code is
correct isn't always a good price to pay.

> >>---
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c | 86 
> >> ++---
> >>  1 file changed, 70 insertions(+), 16 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> >>b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>index 44be73e..af155f6 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>@@ -180,6 +180,38 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long 
> >>gpa,
> >>  EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> >>
> >>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> >>+static mm_context_t *kvmppc_mm_context(struct kvm_vcpu *vcpu)
> >>+{
> >>+   struct task_struct *task;
> >>+
> >>+   task = vcpu->arch.run_task;
> >>+   if (unlikely(!task || !task->mm))
> >>+   return NULL;
> >>+
> >>+   return &task->mm->context;
> >>+}
> >>+
> >>+static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu)
> >>+{
> >>+   mm_context_t *mm = kvmppc_mm_context(vcpu);
> >>+
> >>+   if (unlikely(!mm))
> >>+   return false;
> >>+
> >>+   return mm_iommu_preregistered(mm);
> >>+}
> >>+
> >>+static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup(
> >>+   struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size)
> >>+{
> >>+   mm_context_t *mm = kvmppc_mm_context(vcpu);
> >>+
> >>+   if (unlikely(!mm))
> >>+   return NULL;
> >>+
> >>+   return mm_iommu_lookup_rm(mm, ua, size);
> >>+}
> >>+
> >>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  unsigned long ioba, unsigned long tce)
> >>  {
> >>@@ -261,23 +293,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu 
> >>*vcpu,
> >>if (ret != H_SUCCESS)
> >>return ret;
> >>
> >>-   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> >>-   return H_TOO_HARD;
> >>+   if (kvmppc_preregistered(vcpu)) {
> >>+   /*
> >>+* We get here if guest memory was pre-registered which
> >>+* is normally VFIO case and gpa->hpa translation does not
> >>+* depend on hpt.
> >>+*/
> >>+   struct mm_iommu_table_group_mem_t *mem;
> >>
> >>-   rmap = (void *) vmalloc_to_phys(rmap);
> >>+   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
> >>+   return H_TOO_HARD;
> >>
> >>-   /*
> >>-* Synchronize with the MMU notifier callbacks in
> >>-* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> >>-* While we have the rmap lock, code running on other CPUs
> >>-* cannot finish unmapping the host real page that backs
> >>-* this guest real page, so we are OK to access the host
> >>-* real page.
> >>-*/
> >>-   lock_rmap(rmap);
> >>-   if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> >>-   ret = H_TOO_HARD;
> >>-   goto unlock_exit;
> >>+   mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K);
> >>+   if (!mem || mm_iommu_rm_ua_to_hpa(mem, ua, &tces))
> >>+   return H_TOO_HARD;
> >>+   } else {
> >>+   /*
> >>+* This is emulated devices case.
> >>+* We do not require memory to be preregistered in this case
> >>+* so lock rmap and do __find_linux_pte_or_hugepte().
> >>+*/
> >>+   if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> >>+   return H_TOO_HARD;
> >>+
> >>+   rmap = (void *) vmalloc_to_phys(rmap);
> >>+
> >>+   /*
> >>+* Synchronize with the MMU notifier callbacks in
> >>+* book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> >>+* While we have the rmap lock, code running on other CPUs
> >>+* cannot finish unmapping the host real page that backs
> >>+* this guest real page, so we are OK to access the host
> >>+* real page.
> >>+*/
> >>+   lock_rmap(rmap);
> >>+   if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> >>+   ret = H_TOO_HARD;
> >>+   goto unlock_exit;
> >>+   }
> >>}
> >>
> >>for (i = 0; i < npages; ++i) {
> >>@@ -291,7 +344,8 @@ long kvmppc_rm_h_put_tce_ind

[v5][PATCH] livepatch/ppc: Enable livepatching on powerpc

2016-03-07 Thread Balbir Singh

Changelog v5:
1. Removed the mini-stack frame created for klp_return_helper.
   As a result of the mini-stack frame, function with > 8
   arguments could not be patched
2. Removed camel casing in the comments
Changelog v4:
1. Renamed klp_matchaddr() to klp_get_ftrace_location()
   and used it just to convert the function address.
2. Synced klp_write_module_reloc() with s390(); made it
   inline, no error message, return -ENOSYS
3. Added an error message when including
   powerpc/include/asm/livepatch.h without HAVE_LIVEPATCH
4. Update some comments.
Changelog v3:
1. Moved -ENOSYS to -EINVAL in klp_write_module_reloc
2. Moved klp_matchaddr to use ftrace_location_range
Changelog v2:
1. Implement review comments by Michael
2. The previous version compared _NIP from the
   wrong location to check for whether we
   are going to a patched location

This patch enables live patching for powerpc. The current patch
is applied on top of topic/mprofile-kernel at
https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/

This patch builds on top of ftrace with regs changes and the
-mprofile-kernel changes. It detects a change in NIP after
the klp subsystem has potentially changes the NIP as a result
of a livepatch. In that case it saves the TOC in the parents
stack and the offset of the return address from the TOC in
the reserved (CR+4) space. This hack allows us to provide the
complete frame of the calling function as is to the caller
without having to create a mini-frame

Upon return from the patched function, the TOC and correct
LR is restored.

I tested the sample in the livepatch and an additional sample
that patches int_to_scsilun. I'll post out that sample if there
is interest later. I also tested ftrace functionality on the
command line to check for breakage

Signed-off-by: Torsten Duwe 
Signed-off-by: Balbir Singh 
Signed-off-by: Petr Mladek 
---
 arch/powerpc/Kconfig |  3 ++
 arch/powerpc/include/asm/livepatch.h | 47 
 arch/powerpc/kernel/Makefile |  1 +
 arch/powerpc/kernel/entry_64.S   | 60 
 arch/powerpc/kernel/livepatch.c  | 29 +
 include/linux/ftrace.h   |  1 +
 include/linux/livepatch.h|  2 ++
 kernel/livepatch/core.c  | 28 +++--
 kernel/trace/ftrace.c| 14 -
 9 files changed, 181 insertions(+), 4 deletions(-)
 create mode 100644 arch/powerpc/include/asm/livepatch.h
 create mode 100644 arch/powerpc/kernel/livepatch.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 91da283..926c0ea 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -159,6 +159,7 @@ config PPC
select ARCH_HAS_DEVMEM_IS_ALLOWED
select HAVE_ARCH_SECCOMP_FILTER
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select HAVE_LIVEPATCH if HAVE_DYNAMIC_FTRACE_WITH_REGS
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
@@ -1110,3 +,5 @@ config PPC_LIB_RHEAP
bool
 
 source "arch/powerpc/kvm/Kconfig"
+
+source "kernel/livepatch/Kconfig"
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
new file mode 100644
index 000..b9856ce
--- /dev/null
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -0,0 +1,47 @@
+/*
+ * livepatch.h - powerpc-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2015 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+#ifndef _ASM_POWERPC64_LIVEPATCH_H
+#define _ASM_POWERPC64_LIVEPATCH_H
+
+#include 
+
+#ifdef CONFIG_LIVEPATCH
+
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline int klp_write_module_reloc(struct module *mod, unsigned long
+   type, unsigned long loc, unsigned long value)
+{
+   /* This requires infrastructure changes; we need the loadinfos. */
+   return -ENOSYS;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->nip = ip;
+}
+
+#else /* CONFIG_LIVEPATCH */
+#error Include linux/livepatch.h, not asm/livepatch.h
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _ASM_POWERPC64_LIVEPATCH_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..

Re: powerpc/eeh: eeh_pci_enable(): fix checking of post-request state

2016-03-07 Thread Andrew Donnellan


On 09/02/16 10:57, Andrew Donnellan wrote:

It is a fix - I'm a bit hazy on the details now but IIRC, Daniel Axtens
and I encountered this when doing some cxl debugging, though I think we
decided not to tag this for stable since it was a secondary issue to the
primary bug we were looking for. It probably could go to stable though?
(Daniel - thoughts?)

The line in question was last touched in 4d4f577e4b5e, but it looks like
the behaviour wasn't right even before that.


Ping :)

I think this patch should still go in, I don't really care whether it 
goes to stable or not.


--
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] powerpc/embedded6xx: Make reboot works on MVME5100

2016-03-07 Thread Alessio Igor Bogani

The mtmsr() function hangs during restart. Make reboot works on
MVME5100 removing that function call.
---
 arch/powerpc/platforms/embedded6xx/mvme5100.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/mvme5100.c 
b/arch/powerpc/platforms/embedded6xx/mvme5100.c
index 8f65aa3..118cc33 100644
--- a/arch/powerpc/platforms/embedded6xx/mvme5100.c
+++ b/arch/powerpc/platforms/embedded6xx/mvme5100.c
@@ -179,9 +179,7 @@ static void mvme5100_show_cpuinfo(struct seq_file *m)
 
 static void mvme5100_restart(char *cmd)
 {
-
local_irq_disable();
-   mtmsr(mfmsr() | MSR_IP);
 
out_8((u_char *) restart, 0x01);
 
-- 
2.7.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

91 matches

Mail list logo