Re: [PATCH kernel v2 6/6] powerpc/powernv/ioda: Allocate indirect TCE levels on demand

2018-06-21 Thread Alexey Kardashevskiy
On Thu, 21 Jun 2018 12:03:21 +1000
David Gibson  wrote:

> On Sun, Jun 17, 2018 at 09:14:28PM +1000, Alexey Kardashevskiy wrote:
> > At the moment we allocate the entire TCE table, twice (hardware part and
> > userspace translation cache). This normally works as we normally have
> > contigous memory and the guest will map entire RAM for 64bit DMA.
> > 
> > However if we have sparse RAM (one example is a memory device), then
> > we will allocate TCEs which will never be used as the guest only maps
> > actual memory for DMA. If it is a single level TCE table, there is nothing
> > we can really do but if it a multilevel table, we can skip allocating
> > TCEs we know we won't need.
> > 
> > This adds ability to allocate only first level, saving memory.
> > 
> > This changes iommu_table::free() to avoid allocating of an extra level;
> > iommu_table::set() will do this when needed.
> > 
> > This adds @alloc parameter to iommu_table::exchange() to tell the callback
> > if it can allocate an extra level; the flag is set to "false" for
> > the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
> > H_TOO_HARD.
> > 
> > This still requires the entire table to be counted in mm::locked_vm.
> > 
> > To be conservative, this only does on-demand allocation when
> > the usespace cache table is requested which is the case of VFIO.
> > 
> > The example math for a system replicating a powernv setup with NVLink2
> > in a guest:
> > 16GB RAM mapped at 0x0
> > 128GB GPU RAM window (16GB of actual RAM) mapped at 0x2440
> > 
> > the table to cover that all with 64K pages takes:
> > (((0x2440 + 0x20) >> 16)*8)>>20 = 4556MB
> > 
> > If we allocate only necessary TCE levels, we will only need:
> > (((0x4 + 0x4) >> 16)*8)>>20 = 4MB (plus some for indirect
> > levels).
> > 
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> > Changes:
> > v2:
> > * fixed bug in cleanup path which forced the entire table to be
> > allocated right before destroying
> > * added memory allocation error handling pnv_tce()
> > ---
> >  arch/powerpc/include/asm/iommu.h  |  7 ++-
> >  arch/powerpc/platforms/powernv/pci.h  |  6 ++-
> >  arch/powerpc/kvm/book3s_64_vio_hv.c   |  4 +-
> >  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 69 
> > ---
> >  arch/powerpc/platforms/powernv/pci-ioda.c |  8 ++--
> >  drivers/vfio/vfio_iommu_spapr_tce.c   |  2 +-
> >  6 files changed, 69 insertions(+), 27 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/iommu.h 
> > b/arch/powerpc/include/asm/iommu.h
> > index 4bdcf22..daa3ee5 100644
> > --- a/arch/powerpc/include/asm/iommu.h
> > +++ b/arch/powerpc/include/asm/iommu.h
> > @@ -70,7 +70,7 @@ struct iommu_table_ops {
> > unsigned long *hpa,
> > enum dma_data_direction *direction);
> >  
> > -   __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
> > +   __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
> >  #endif
> > void (*clear)(struct iommu_table *tbl,
> > long index, long npages);
> > @@ -122,10 +122,13 @@ struct iommu_table {
> > __be64 *it_userspace; /* userspace view of the table */
> > struct iommu_table_ops *it_ops;
> > struct krefit_kref;
> > +   int it_nid;
> >  };
> >  
> > +#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
> > +   ((tbl)->it_ops->useraddrptr((tbl), (entry), false))
> >  #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
> > -   ((tbl)->it_ops->useraddrptr((tbl), (entry)))
> > +   ((tbl)->it_ops->useraddrptr((tbl), (entry), true))
> >  
> >  /* Pure 2^n version of get_order */
> >  static inline __attribute_const__
> > diff --git a/arch/powerpc/platforms/powernv/pci.h 
> > b/arch/powerpc/platforms/powernv/pci.h
> > index 5e02408..1fa5590 100644
> > --- a/arch/powerpc/platforms/powernv/pci.h
> > +++ b/arch/powerpc/platforms/powernv/pci.h
> > @@ -267,8 +267,10 @@ extern int pnv_tce_build(struct iommu_table *tbl, long 
> > index, long npages,
> > unsigned long attrs);
> >  extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
> >  extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
> > -   unsigned long *hpa, enum dma_data_direction *direction);
> > -extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
> > +   unsigned long *hpa, enum dma_data_direction *direction,
> > +   bool alloc);
> > +extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index,
> > +   bool alloc);
> >  extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
> >  
> >  extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> > b/arch/powerpc/kvm/book3s_64_vio_hv.c
> > index db0490c..05b4865 100644
> > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> > +++ b/arch/powerpc/kvm/book3s

[PATCH 1/2] powerpc/mm: Check memblock_add against MAX_PHYSMEM_BITS range

2018-06-21 Thread Aneesh Kumar K.V
With SPARSEMEM config enabled, we make sure that we don't add sections beyond
MAX_PHYSMEM_BITS range. This results in not building vmemmap mapping for
range beyond max range. But our memblock layer looks the device tree and create
mapping for the full memory range. Prevent this by checking against
MAX_PHSYSMEM_BITS when doing memblock_add.

We don't do similar check for memeblock_reserve_range. If reserve range is 
beyond
MAX_PHYSMEM_BITS we expect that to be configured with 'nomap'. Any other
reserved range should come from existing memblock ranges which we already
filtered while adding.

This avoids crash as below when running on a system with system ram config above
MAX_PHSYSMEM_BITS

 Unable to handle kernel paging request for data at address 0xc00a00100440
 Faulting instruction address: 0xc1034118
 cpu 0x0: Vector: 300 (Data Access) at [c124fb30]
 pc: c1034118: __free_pages_bootmem+0xc0/0x1c0
 lr: c103b258: free_all_bootmem+0x19c/0x22c
 sp: c124fdb0
msr: 92001033
dar: c00a00100440
  dsisr: 4000
   current = 0xc120dd00
   paca= 0xc1f6^I irqmask: 0x03^I irq_happened: 0x01
 pid   = 0, comm = swapper
 [c124fe20] c103b258 free_all_bootmem+0x19c/0x22c
 [c124fee0] c1010a68 mem_init+0x3c/0x5c
 [c124ff00] c100401c start_kernel+0x298/0x5e4
 [c124ff90] c000b57c start_here_common+0x1c/0x520

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/prom.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 05e7fb47a7a4..8f32f14ba508 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -440,6 +440,29 @@ static int __init early_init_dt_scan_chosen_ppc(unsigned 
long node,
return 1;
 }
 
+/*
+ * Compare the range against max mem limit and update
+ * size if it cross the limit.
+ */
+
+#ifdef CONFIG_SPARSEMEM
+static bool validate_mem_limit(u64 base, u64 *size)
+{
+   u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
+
+   if (base >= max_mem)
+   return false;
+   if ((base + *size) > max_mem)
+   *size = max_mem - base;
+   return true;
+}
+#else
+static bool validate_mem_limit(u64 base, u64 *size)
+{
+   return true;
+}
+#endif
+
 #ifdef CONFIG_PPC_PSERIES
 /*
  * Interpret the ibm dynamic reconfiguration memory LMBs.
@@ -494,7 +517,8 @@ static void __init early_init_drmem_lmb(struct drmem_lmb 
*lmb,
}
 
DBG("Adding: %llx -> %llx\n", base, size);
-   memblock_add(base, size);
+   if (validate_mem_limit(base, &size))
+   memblock_add(base, size);
} while (--rngs);
 }
 #endif /* CONFIG_PPC_PSERIES */
@@ -548,8 +572,10 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 
size)
}
 
/* Add the chunk to the MEMBLOCK list */
-   if (add_mem_to_memblock)
-   memblock_add(base, size);
+   if (add_mem_to_memblock) {
+   if (validate_mem_limit(base, &size))
+   memblock_add(base, size);
+   }
 }
 
 static void __init early_reserve_mem_dt(void)
-- 
2.17.1



[PATCH 2/2] powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config

2018-06-21 Thread Aneesh Kumar K.V
We do this only with VMEMMAP config so that our page_to_[nid/section] etc are 
not
impacted.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/sparsemem.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index bc66712bdc3c..28f5dae25db6 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -6,13 +6,20 @@
 #ifdef CONFIG_SPARSEMEM
 /*
  * SECTION_SIZE_BITS   2^N: how big each section will be
- * MAX_PHYSADDR_BITS   2^N: how much physical address space we have
  * MAX_PHYSMEM_BITS2^N: how much memory we can have in that space
  */
 #define SECTION_SIZE_BITS   24
-
-#define MAX_PHYSADDR_BITS   46
+/*
+ * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
+ * if we increase SECTIONS_WIDTH we will not store node details in page->flags 
and
+ * page_to_nid does a page->section->node lookup
+ * Hence only increase for VMEMMAP.
+ */
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+#define MAX_PHYSMEM_BITS47
+#else
 #define MAX_PHYSMEM_BITS46
+#endif
 
 #endif /* CONFIG_SPARSEMEM */
 
-- 
2.17.1



[RESEND PATCH 1/3] powerpc: dts: use 'atmel' as at24 anufacturer for pdm360ng

2018-06-21 Thread Bartosz Golaszewski
Using 'at' as the  part of the compatible string is now
deprecated. Use a correct string: 'atmel,'.

Signed-off-by: Bartosz Golaszewski 
---
 arch/powerpc/boot/dts/pdm360ng.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/pdm360ng.dts 
b/arch/powerpc/boot/dts/pdm360ng.dts
index 445b88114009..df1283b63d9b 100644
--- a/arch/powerpc/boot/dts/pdm360ng.dts
+++ b/arch/powerpc/boot/dts/pdm360ng.dts
@@ -98,7 +98,7 @@
fsl,preserve-clocking;
 
eeprom@50 {
-   compatible = "at,24c01";
+   compatible = "atmel,24c01";
reg = <0x50>;
};
 
-- 
2.17.1



[RESEND PATCH 2/3] powerpc: dts: use 'atmel' as at24 manufacturer for kmcent2

2018-06-21 Thread Bartosz Golaszewski
Using compatible strings without the  part for at24 is
now deprecated. Use a correct 'atmel,' value.

Signed-off-by: Bartosz Golaszewski 
---
 arch/powerpc/boot/dts/fsl/kmcent2.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/fsl/kmcent2.dts 
b/arch/powerpc/boot/dts/fsl/kmcent2.dts
index 5922c1ea0e96..3094df05f5ea 100644
--- a/arch/powerpc/boot/dts/fsl/kmcent2.dts
+++ b/arch/powerpc/boot/dts/fsl/kmcent2.dts
@@ -130,7 +130,7 @@
#size-cells = <0>;
 
eeprom@54 {
-   compatible = "24c02";
+   compatible = "atmel,24c02";
reg = <0x54>;
pagesize = <2>;
read-only;
-- 
2.17.1



[RESEND PATCH 3/3] powerpc: dts: use a correct at24 compatible fallback in ac14xx

2018-06-21 Thread Bartosz Golaszewski
Using 'at24' as fallback is now deprecated - use the full
'atmel,' string.

Signed-off-by: Bartosz Golaszewski 
---
 arch/powerpc/boot/dts/ac14xx.dts | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/boot/dts/ac14xx.dts b/arch/powerpc/boot/dts/ac14xx.dts
index 83bcfd865167..0be5c4f3265d 100644
--- a/arch/powerpc/boot/dts/ac14xx.dts
+++ b/arch/powerpc/boot/dts/ac14xx.dts
@@ -176,12 +176,12 @@
clock-frequency = <40>;
 
at24@30 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x30>;
};
 
at24@31 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x31>;
};
 
@@ -191,42 +191,42 @@
};
 
at24@50 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x50>;
};
 
at24@51 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x51>;
};
 
at24@52 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x52>;
};
 
at24@53 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x53>;
};
 
at24@54 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x54>;
};
 
at24@55 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x55>;
};
 
at24@56 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x56>;
};
 
at24@57 {
-   compatible = "at24,24c01";
+   compatible = "atmel,24c01";
reg = <0x57>;
};
 
-- 
2.17.1



[PATCH kernel v2 REPOST] powerpc/powernv/ioda: Allocate indirect TCE levels on demand

2018-06-21 Thread Alexey Kardashevskiy
At the moment we allocate the entire TCE table, twice (hardware part and
userspace translation cache). This normally works as we normally have
contigous memory and the guest will map entire RAM for 64bit DMA.

However if we have sparse RAM (one example is a memory device), then
we will allocate TCEs which will never be used as the guest only maps
actual memory for DMA. If it is a single level TCE table, there is nothing
we can really do but if it a multilevel table, we can skip allocating
TCEs we know we won't need.

This adds ability to allocate only first level, saving memory.

This changes iommu_table::free() to avoid allocating of an extra level;
iommu_table::set() will do this when needed.

This adds @alloc parameter to iommu_table::exchange() to tell the callback
if it can allocate an extra level; the flag is set to "false" for
the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
H_TOO_HARD.

This still requires the entire table to be counted in mm::locked_vm.

To be conservative, this only does on-demand allocation when
the usespace cache table is requested which is the case of VFIO.

The example math for a system replicating a powernv setup with NVLink2
in a guest:
16GB RAM mapped at 0x0
128GB GPU RAM window (16GB of actual RAM) mapped at 0x2440

the table to cover that all with 64K pages takes:
(((0x2440 + 0x20) >> 16)*8)>>20 = 4556MB

If we allocate only necessary TCE levels, we will only need:
(((0x4 + 0x4) >> 16)*8)>>20 = 4MB (plus some for indirect
levels).

Signed-off-by: Alexey Kardashevskiy 
---


This is what I meant to post few days ago, sorry for the noise.


---
Changes:
v2:
* fixed bug in cleanup path which forced the entire table to be
allocated right before destroying
* added memory allocation error handling pnv_tce()
---
 arch/powerpc/include/asm/iommu.h  |  7 ++-
 arch/powerpc/platforms/powernv/pci.h  |  6 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c   |  4 +-
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 73 +--
 arch/powerpc/platforms/powernv/pci-ioda.c |  8 +--
 drivers/vfio/vfio_iommu_spapr_tce.c   |  2 +-
 6 files changed, 73 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 4bdcf22..daa3ee5 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -70,7 +70,7 @@ struct iommu_table_ops {
unsigned long *hpa,
enum dma_data_direction *direction);
 
-   __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
+   __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
 #endif
void (*clear)(struct iommu_table *tbl,
long index, long npages);
@@ -122,10 +122,13 @@ struct iommu_table {
__be64 *it_userspace; /* userspace view of the table */
struct iommu_table_ops *it_ops;
struct krefit_kref;
+   int it_nid;
 };
 
+#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
+   ((tbl)->it_ops->useraddrptr((tbl), (entry), false))
 #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
-   ((tbl)->it_ops->useraddrptr((tbl), (entry)))
+   ((tbl)->it_ops->useraddrptr((tbl), (entry), true))
 
 /* Pure 2^n version of get_order */
 static inline __attribute_const__
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 5e02408..1fa5590 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -267,8 +267,10 @@ extern int pnv_tce_build(struct iommu_table *tbl, long 
index, long npages,
unsigned long attrs);
 extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
 extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
-   unsigned long *hpa, enum dma_data_direction *direction);
-extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
+   unsigned long *hpa, enum dma_data_direction *direction,
+   bool alloc);
+extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index,
+   bool alloc);
 extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
 
 extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 8cc1caf..efb90d8 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
 {
struct mm_iommu_table_group_mem_t *mem = NULL;
const unsigned long pgsize = 1ULL << tbl->it_page_shift;
-   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+   __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
 
if (!pua)
/* it_userspace allocation might be

Re: [PATCH v2 0/6] powerpc/pkeys: fixes to pkeys

2018-06-21 Thread Michael Ellerman
Florian Weimer  writes:

> On 06/19/2018 02:40 PM, Michael Ellerman wrote:
>>> I tested the whole series with the new selftests, with the printamr.c
>>> program I posted earlier, and the glibc test for pkey_alloc &c.  The
>>> latter required some test fixes, but now passes as well.  As far as I
>>> can tell, everything looks good now.
>>>
>>> Tested-By: Florian Weimer
>> Thanks. I'll add that to each patch I guess, if you're happy with that?
>
> Sure, but I only tested the whole series as a whole.

Yeah OK. We don't have a good way to express that, other than using a
merge which I'd prefer to avoid.

So I've tagged them all with your Tested-by. If any of them turn out to
have bugs you can blame me :)

cheers


[4.18.0-rc1][i40e][9b10df59] WARNING: CPU: 32 PID: 95246 at drivers/net/ethernet/intel/i40e/i40e_ethtool.c:1907 i40e_update_ethtool_fdir_entry.isra.12+0x2264/0x2270

2018-06-21 Thread Abdul Haleem
Greeting's

WARN_ONCE is triggered when interface is toggled when running 'ethtool -S' 
command 
on a powerpc machine running today's mainline kernel

related commit looks to be:

commit 9b10df596bd4d38f2a58cf87e0780510acc53d8d
Author: Jacob Keller 
Date:   Thu May 17 01:08:36 2018 -0700

i40e: use WARN_ONCE to replace the commented BUG_ON size check

We don't really want to use BUG_ON here since that would completely
crash the kernel, thus the reason we commented it out. We *can't* use
BUILD_BUG_ON because at least now (a) the sizes aren't constant (we are
fixing this) and (b) not all compilers are smart enough to understand
that "p - data" is a constant.

Machine: Power 8 bare-metal
kernel: 4.18.0-rc1
ethtool: version 4.8
gcc:  4.8.5 
config: attached

Adapter: Ethernet controller [0200]: Intel Corporation Ethernet
Controller X710/X557-AT 10GBASE-T [8086:1589] (rev 02)

dmesg:
--
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Down
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Up, 10 Gbps Full Duplex, Flow 
Control: None
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Down
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Up, 10 Gbps Full Duplex, Flow 
Control: None
i40e 0002:01:00.0 net0: NIC Link is Down
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Down
i40e 0002:01:00.3 enP2p1s0f3: NIC Link is Down
i40e 0002:01:00.0 net0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: 
None 
i40e 0002:01:00.3 enP2p1s0f3: NIC Link is Up, 10 Gbps Full Duplex, Flow 
Control: None
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Up, 10 Gbps Full Duplex, Flow 
Control: None
i40e 0002:01:00.0 net0: NIC Link is Down
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Down
i40e 0002:01:00.3 enP2p1s0f3: NIC Link is Down
i40e 0002:01:00.0 net0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: 
None 
i40e 0002:01:00.2 enP2p1s0f2: NIC Link is Up, 10 Gbps Full Duplex, Flow 
Control: None
i40e 0002:01:00.3 enP2p1s0f3: NIC Link is Up, 10 Gbps Full Duplex, Flow 
Control: None
[ cut here ]
stat strings count mismatch! 
WARNING: CPU: 81 PID: 24268 at 
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:1907 
i40e_update_ethtool_fdir_entry.isra.12+0x2264/0x2270 [i40e]
Modules linked in: ipt_MASQUERADE tun bridge stp llc xt_tcpudp
ipt_REJECT nf_reject_ipv4 xt_conntrack nfnetlink iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_filter btrfs xor zstd_decompress zstd_compress
xxhash lzo_compress vmx_crypto raid6_pq powernv_rng rng_core kvm_hv kvm
nfsd ip_tables x_tables xfs libcrc32c qla2xxx mpt3sas nvme_fc
nvme_fabrics raid_class i40e nvme_core scsi_transport_sas
CPU: 81 PID: 24268 Comm: ethtool Not tainted 4.18.0-rc1-autotest-autotest #1
NIP:  d00010260634 LR: d00010260630 CTR: c09dc580
REGS: c07fe2c2f730 TRAP: 0700   Not tainted  (4.18.0-rc1-autotest-autotest)
MSR:  90029033   CR: 48028422  XER: 2000  
CFAR: c00f1d98 IRQMASK: 0 
GPR00: d00010260630 c07fe2c2f9b0 d0001029cc00 001c 
GPR04: 0001 000c 0001 c115be00 
GPR08: 0001 0007 0006 90001003 
GPR12: 8800 c07fff7fe580   
GPR16:     
GPR20: d0001028e500 d0001028e518 d0001028e530 d0001028e548 
GPR24: d0001028e598 d0001028e5b0 c03fec19d000 d0001028e5d0 
GPR28: 0008 54c0 ab40 d0001b4a54c0 
NIP [d00010260634] i40e_update_ethtool_fdir_entry.isra.12+0x2264/0x2270 
[i40e]
LR [d00010260630] i40e_update_ethtool_fdir_entry.isra.12+0x2260/0x2270 
[i40e]
Call Trace:
[c07fe2c2f9b0] [d00010260630] 
i40e_update_ethtool_fdir_entry.isra.12+0x2260/0x2270 [i40e] (unreliable)
[c07fe2c2fa70] [c08a3990] dev_ethtool+0x1880/0x27b0
[c07fe2c2fba0] [c08c86a0] dev_ioctl+0x560/0x7e0
[c07fe2c2fbf0] [c0869510] sock_do_ioctl+0xc0/0x1a0
[c07fe2c2fc60] [c0869ed0] sock_ioctl+0x1a0/0x460
[c07fe2c2fd20] [c0329dc8] do_vfs_ioctl+0xc8/0x8b0
[c07fe2c2fdc0] [c032a60c] ksys_ioctl+0x5c/0xe0
[c07fe2c2fe10] [c032a6b0] sys_ioctl+0x20/0x80
[c07fe2c2fe30] [c000b9e4] system_call+0x5c/0x70
Instruction dump:
41fefc1c 3d22 e9298e80 8949 2f8a 40fefc08 3c62 e8638e88
3941 9949 48022d45 e8410018 <0fe0> 4bfffbe8 6042 3c4c0004
---[ end trace 1a1b8331349cf7fd ]---

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre


#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.11.0-rc4 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
# CONFIG_POWER7_CPU is not set
CONFIG_POWER8_CPU=y
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
# CONFIG_PPC_ICSWX is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC

Re: [PATCH v3] powerpc/32: Remove left over function prototypes

2018-06-21 Thread Michael Ellerman
Mathieu Malaterre  writes:

> In commit 4aea909eeba3 ("powerpc: Add missing prototypes in setup_32.c")

I don't have that commit ^ ?

That might be because I squashed some of your fixes together or something?

> diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h
> index 35ca309848d7..829ed66f0a40 100644
> --- a/arch/powerpc/kernel/setup.h
> +++ b/arch/powerpc/kernel/setup.h
> @@ -19,9 +19,6 @@ void irqstack_early_init(void);
>  void setup_power_save(void);
>  unsigned long __init early_init(unsigned long dt_ptr);
>  void __init machine_init(u64 dt_ptr);
> -int __init ppc_setup_l2cr(char *str);
> -int __init ppc_setup_l3cr(char *str);
> -int __init ppc_init(void);
>  #else
>  static inline void setup_power_save(void) { };
>  #endif

I have:

#ifdef CONFIG_PPC32
void setup_power_save(void);
#else
static inline void setup_power_save(void) { };
#endif


cheers


Re: [PATCH 2/2] powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config

2018-06-21 Thread Balbir Singh
On Thu, Jun 21, 2018 at 6:31 PM, Aneesh Kumar K.V
 wrote:
>
> We do this only with VMEMMAP config so that our page_to_[nid/section] etc are 
> not
> impacted.
>
> Signed-off-by: Aneesh Kumar K.V 

Why 128TB, given that it's sparse_vmemmap_extreme by default, why not
1PB directly (50 bits)?

Balbir Singh


Re: [PATCH 0/7 v5] Support for fsl-mc bus and its devices in SMMU

2018-06-21 Thread Will Deacon
Hi Nipun,

On Thu, Jun 21, 2018 at 03:59:27AM +, Nipun Gupta wrote:
> Will this patch-set be taken for the next kernel release (and via which
> tree)?

I think you need Acks from Robin and Joerg in order for this to be queued.
Robin should be back at the beginning of next month, so there's still time
for 4.19.

Will


Re: [PATCH 2/2] powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config

2018-06-21 Thread Aneesh Kumar K.V

On 06/21/2018 05:02 PM, Balbir Singh wrote:

On Thu, Jun 21, 2018 at 6:31 PM, Aneesh Kumar K.V
 wrote:


We do this only with VMEMMAP config so that our page_to_[nid/section] etc are 
not
impacted.

Signed-off-by: Aneesh Kumar K.V 


Why 128TB, given that it's sparse_vmemmap_extreme by default, why not
1PB directly (50 bits)?



That will impact config with VMEMMAP_EXTREME with no real immediate 
benefit. We could possibly make MAX_PHYSMEM_BITS a Kconfig variable. 
s390 do that. Not sure we want to do that.


-aneesh



Re: [PATCH 2/2] powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config

2018-06-21 Thread Aneesh Kumar K.V

On 06/21/2018 05:02 PM, Balbir Singh wrote:

On Thu, Jun 21, 2018 at 6:31 PM, Aneesh Kumar K.V
 wrote:


We do this only with VMEMMAP config so that our page_to_[nid/section] etc are 
not
impacted.

Signed-off-by: Aneesh Kumar K.V 


Why 128TB, given that it's sparse_vmemmap_extreme by default, why not
1PB directly (50 bits)?



That will impact config with VMEMMAP_EXTREME disabled with no real 
immediate benefit. We could possibly make MAX_PHYSMEM_BITS a Kconfig 
variable. s390 do that. Not sure we want to do that.


-aneesh



Re: [PATCH v2 1/6] powerpc/pkeys: Enable all user-allocatable pkeys at init.

2018-06-21 Thread Ram Pai
On Thu, Jun 21, 2018 at 02:14:53PM +1000, Michael Ellerman wrote:
> Ram Pai  writes:
> > On Tue, Jun 19, 2018 at 10:39:52PM +1000, Michael Ellerman wrote:
> >> Ram Pai  writes:
> >> 
> >> > In a multithreaded application, a key allocated by one thread must
> >> > be activate and usable on all threads.
> >> >
> >> > Currently this is not the case, because the UAMOR bits for all keys are
> >> > disabled by default. When a new key is allocated in one thread, though
> >> > the corresponding UAMOR bits for that thread get enabled, the UAMOR bits
> >> > for all other existing threads continue to have their bits disabled.
> >> > Other threads have no way to set permissions on the key, effectively
> >> > making the key useless.
> >> 
> >> This all seems a bit strongly worded to me. It's arguable whether a key
> >> should be usable by the thread that allocated it or all threads.
> >> 
> >> You could conceivably have a design where threads are blocked from using
> >> a key until they're given permission to do so by the thread that
> >> allocated the key.
> >> 
> >> But we're changing the behaviour to match x86 and because we don't have
> >> an API to grant another thread access to a key. Right?
> >
> > correct. The other threads have no way to access or change the
> > permissions on the key.
> 
> OK.
> 
> Though prior to patch 6 all threads have read/write permissions for all
> keys, so they don't necessarily need to change permissions on a key
> allocated by another thread.
> 
> >> > Enable the UAMOR bits for all keys, at process creation. Since the
> >> > contents of UAMOR are inherited at fork, all threads are capable of
> >> > modifying the permissions on any key.
> >> >
> >> > BTW: changing the permission on unallocated keys has no effect, till
> >> > those keys are not associated with any PTEs. The kernel will anyway
> >> > disallow to association of unallocated keys with PTEs.
> >> 
> >> This is an ABI change, which is bad, but I guess we call it a bug fix
> >> because things didn't really work previously?
> >
> > Yes its a behaviorial change for the better. There is no downside
> > to the change because no applications should break. Single threaded
> > apps will continue to just work fine. Multithreaded applications,
> > which were unable to consume the API/ABI, will now be able to do so.
> 
> Multi-threaded applications were able to use the API, as long as they
> were satisfied with the semantics it provided, ie. that restrictions on
> a key were only possible on the thread that allocated the key.
> 
> I'm not trying to argue for the sake of it, it's important that we
> understand the subtleties of what we're changing and how it affects
> existing software - even if we think there is essentially no existing
> software.
> 
> I'll try and massage the change log to capture it.
> 
> I ended up with what's below.
> 
> cheers
> 
>   powerpc/pkeys: Give all threads control of their key permissions
>   
>   Currently in a multithreaded application, a key allocated by one
>   thread is not usable by other threads. By "not usable" we mean that
>   other threads are unable to change the access permissions for that
>   key for themselves.
>   
>   When a new key is allocated in one thread, the corresponding UAMOR
>   bits for that thread get enabled, however the UAMOR bits for that key
>   for all other threads remain disabled.
>   
>   Other threads have no way to set permissions on the key, and the
>   current default permissions are that read/write is enabled for all
>   keys, which means the key has no effect for other threads. Although
>   that may be the desired behaviour in some circumstances, having all
>   threads able to control their permissions for the key is more
>   flexible.
>   
>   The current behaviour also differs from the x86 behaviour, which is
>   problematic for users.
>   
>   To fix this, enable the UAMOR bits for all keys, at process
>   creation (in start_thread(), ie exec time). Since the contents of
>   UAMOR are inherited at fork, all threads are capable of modifying the
>   permissions on any key.
>   
>   This is technically an ABI break on powerpc, but pkey support is
>   fairly new on powerpc and not widely used, and this brings us into
>   line with x86.

Wow! yes it crisply captures the subtle API change and the reasoning
behind it.

RP



Re: [PATCH v2 2/6] powerpc/pkeys: Save the pkey registers before fork

2018-06-21 Thread Ram Pai
On Thu, Jun 21, 2018 at 02:13:40PM +1000, Michael Ellerman wrote:
> Ram Pai  writes:
> 
> > On Tue, Jun 19, 2018 at 10:39:56PM +1000, Michael Ellerman wrote:
> >> Ram Pai  writes:
> >> 
> >> > When a thread forks the contents of AMR, IAMR, UAMOR registers in the
> >> > newly forked thread are not inherited.
> >> >
> >> > Save the registers before forking, for content of those
> >> > registers to be automatically copied into the new thread.
> >> >
> >> > CC: Michael Ellerman 
> >> > CC: Florian Weimer 
> >> > CC: Andy Lutomirski 
> >> > CC: Thiago Jung Bauermann 
> >> > Signed-off-by: Ram Pai 
> >> 
> >> Again this is an ABI change but we'll call it a bug fix I guess.
> >
> > yes. the same defense here too. its a behaviorial change for the better.
> > Single threaded applications will not see any behaviorial change.
> > Multithreaded apps, which were unable to consume, the behavior will now be
> > able to do so.
> 
> Well threads is one thing, but this also affects processes.
> 
> And actually without this fix it's possible that a child process could
> fault on a region protected in the parent, if the value in the AMR in
> the thread struct happens to block access at the time of fork(). The
> value in the thread struct would be whatever was in the AMR the last
> time the parent was scheduled in. I think?

right. Child processes will see stale value of AMR. Technically this
behavior is a bug, since existing applications; if any,  cannot rely on
this stale AMR value.

RP



Re: [PATCH v2 0/6] powerpc/pkeys: fixes to pkeys

2018-06-21 Thread Ram Pai
On Thu, Jun 21, 2018 at 08:28:47PM +1000, Michael Ellerman wrote:
> Florian Weimer  writes:
> 
> > On 06/19/2018 02:40 PM, Michael Ellerman wrote:
> >>> I tested the whole series with the new selftests, with the printamr.c
> >>> program I posted earlier, and the glibc test for pkey_alloc &c.  The
> >>> latter required some test fixes, but now passes as well.  As far as I
> >>> can tell, everything looks good now.
> >>>
> >>> Tested-By: Florian Weimer
> >> Thanks. I'll add that to each patch I guess, if you're happy with that?
> >
> > Sure, but I only tested the whole series as a whole.
> 
> Yeah OK. We don't have a good way to express that, other than using a
> merge which I'd prefer to avoid.
> 
> So I've tagged them all with your Tested-by. If any of them turn out to
> have bugs you can blame me :)

I just tested the patches incrementally using the pkey selftests.

So I feel confident these patches are not bugs. I will take the blame
if the blame lands on Mpe  :)

RP



Re: [PATCH v04 9/9] hotplug/pmt: Update topology after PMT

2018-06-21 Thread Michael Bringmann
I posted the wrong copy of the file the first time.
That is what broke here.  I posted the correction almost
immediately to the list.  The correct one has

Message ID <8c437fe5-632c-a7ed-1f11-66c4578a1...@linux.vnet.ibm.com>

Sorry for the inconvenience.
Michael

On 06/20/2018 09:13 PM, kbuild test robot wrote:
> Hi Michael,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.18-rc1 next-20180620]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-hotplug-Update-affinity-for-migrated-CPUs/20180621-085543
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>arch/powerpc/platforms/pseries/dlpar.c: In function 'dlpar_pmt':
>>> arch/powerpc/platforms/pseries/dlpar.c:453:2: error: implicit declaration 
>>> of function 'rebuild_sched_domains' [-Werror=implicit-function-declaration]
>  rebuild_sched_domains();
>  ^
>cc1: all warnings being treated as errors
> 
> vim +/rebuild_sched_domains +453 arch/powerpc/platforms/pseries/dlpar.c
> 
>435
>436static int dlpar_pmt(struct pseries_hp_errorlog *work)
>437{
>438struct list_head *pos, *q;
>439
>440ssleep(15);
>441
>442list_for_each_safe(pos, q, &dlpar_delayed_list) {
>443struct pseries_hp_errorlog *tmp;
>444
>445tmp = list_entry(pos, struct 
> pseries_hp_errorlog, list);
>446handle_dlpar_errorlog(tmp);
>447
>448list_del(pos);
>449kfree(tmp);
>450}
>451
>452ssleep(5);
>  > 453rebuild_sched_domains();
>454
>455return 0;
>456}
>457
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH v04 9/9] hotplug/pmt: Update topology after PMT

2018-06-21 Thread Michael Bringmann
I posted the wrong copy of the file the first time.
That is what broke here.  I posted the correction almost
immediately to the list.  The correct one has

Message ID <8c437fe5-632c-a7ed-1f11-66c4578a1...@linux.vnet.ibm.com>

Sorry for the inconvenience.
Michael

On 06/20/2018 09:13 PM, kbuild test robot wrote:
> Hi Michael,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.18-rc1 next-20180620]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-hotplug-Update-affinity-for-migrated-CPUs/20180621-085543
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>arch/powerpc/platforms/pseries/dlpar.c: In function 'dlpar_pmt':
>>> arch/powerpc/platforms/pseries/dlpar.c:453:2: error: implicit declaration 
>>> of function 'rebuild_sched_domains' [-Werror=implicit-function-declaration]
>  rebuild_sched_domains();
>  ^
>cc1: all warnings being treated as errors
> 
> vim +/rebuild_sched_domains +453 arch/powerpc/platforms/pseries/dlpar.c
> 
>435
>436static int dlpar_pmt(struct pseries_hp_errorlog *work)
>437{
>438struct list_head *pos, *q;
>439
>440ssleep(15);
>441
>442list_for_each_safe(pos, q, &dlpar_delayed_list) {
>443struct pseries_hp_errorlog *tmp;
>444
>445tmp = list_entry(pos, struct 
> pseries_hp_errorlog, list);
>446handle_dlpar_errorlog(tmp);
>447
>448list_del(pos);
>449kfree(tmp);
>450}
>451
>452ssleep(5);
>  > 453rebuild_sched_domains();
>454
>455return 0;
>456}
>457
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



[PATCH 13/26] ppc: Convert mmu context allocation to new IDA API

2018-06-21 Thread Matthew Wilcox
ida_alloc_range is the perfect fit for this use case.  Eliminates
a custom spinlock, a call to ida_pre_get and a local check for the
allocated ID exceeding a maximum.

Signed-off-by: Matthew Wilcox 
---
 arch/powerpc/mm/mmu_context_book3s64.c | 44 +++---
 1 file changed, 4 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index f3d4b4a0e561..5a0cf2cc8ba0 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -26,48 +26,16 @@
 #include 
 #include 
 
-static DEFINE_SPINLOCK(mmu_context_lock);
 static DEFINE_IDA(mmu_context_ida);
 
 static int alloc_context_id(int min_id, int max_id)
 {
-   int index, err;
-
-again:
-   if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
-   return -ENOMEM;
-
-   spin_lock(&mmu_context_lock);
-   err = ida_get_new_above(&mmu_context_ida, min_id, &index);
-   spin_unlock(&mmu_context_lock);
-
-   if (err == -EAGAIN)
-   goto again;
-   else if (err)
-   return err;
-
-   if (index > max_id) {
-   spin_lock(&mmu_context_lock);
-   ida_remove(&mmu_context_ida, index);
-   spin_unlock(&mmu_context_lock);
-   return -ENOMEM;
-   }
-
-   return index;
+   return ida_alloc_range(&mmu_context_ida, min_id, max_id, GFP_KERNEL);
 }
 
 void hash__reserve_context_id(int id)
 {
-   int rc, result = 0;
-
-   do {
-   if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
-   break;
-
-   spin_lock(&mmu_context_lock);
-   rc = ida_get_new_above(&mmu_context_ida, id, &result);
-   spin_unlock(&mmu_context_lock);
-   } while (rc == -EAGAIN);
+   int result = ida_alloc_range(&mmu_context_ida, id, id, GFP_KERNEL);
 
WARN(result != id, "mmu: Failed to reserve context id %d (rc %d)\n", 
id, result);
 }
@@ -172,9 +140,7 @@ int init_new_context(struct task_struct *tsk, struct 
mm_struct *mm)
 
 void __destroy_context(int context_id)
 {
-   spin_lock(&mmu_context_lock);
-   ida_remove(&mmu_context_ida, context_id);
-   spin_unlock(&mmu_context_lock);
+   ida_free(&mmu_context_ida, context_id);
 }
 EXPORT_SYMBOL_GPL(__destroy_context);
 
@@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
 {
int index, context_id;
 
-   spin_lock(&mmu_context_lock);
for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
context_id = ctx->extended_id[index];
if (context_id)
-   ida_remove(&mmu_context_ida, context_id);
+   ida_free(&mmu_context_ida, context_id);
}
-   spin_unlock(&mmu_context_lock);
 }
 
 static void pte_frag_destroy(void *pte_frag)
-- 
2.17.1



[PATCH 15/26] ppc: Convert vas ID allocation to new IDA API

2018-06-21 Thread Matthew Wilcox
Removes a custom spinlock and simplifies the code.

Signed-off-by: Matthew Wilcox 
---
 arch/powerpc/platforms/powernv/vas-window.c | 26 -
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index ff9f48812331..2a5e68a2664d 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -515,35 +515,17 @@ int init_winctx_regs(struct vas_window *window, struct 
vas_winctx *winctx)
return 0;
 }
 
-static DEFINE_SPINLOCK(vas_ida_lock);
-
 static void vas_release_window_id(struct ida *ida, int winid)
 {
-   spin_lock(&vas_ida_lock);
-   ida_remove(ida, winid);
-   spin_unlock(&vas_ida_lock);
+   ida_free(ida, winid);
 }
 
 static int vas_assign_window_id(struct ida *ida)
 {
-   int rc, winid;
-
-   do {
-   rc = ida_pre_get(ida, GFP_KERNEL);
-   if (!rc)
-   return -EAGAIN;
-
-   spin_lock(&vas_ida_lock);
-   rc = ida_get_new(ida, &winid);
-   spin_unlock(&vas_ida_lock);
-   } while (rc == -EAGAIN);
-
-   if (rc)
-   return rc;
+   int winid = ida_alloc_max(ida, VAX_WINDOWS_PER_CHIP, GFP_KERNEL);
 
-   if (winid > VAS_WINDOWS_PER_CHIP) {
-   pr_err("Too many (%d) open windows\n", winid);
-   vas_release_window_id(ida, winid);
+   if (winid == -ENOSPC) {
+   pr_err("Too many (%d) open windows\n", VAX_WINDOWS_PER_CHIP);
return -EAGAIN;
}
 
-- 
2.17.1



Re: [PATCH] selftests/powerpc: Fix strncpy usage

2018-06-21 Thread Segher Boessenkool
On Wed, Jun 20, 2018 at 07:51:11PM -0300, Breno Leitao wrote:
> - strncpy(prog, argv[0], strlen(argv[0]));
> + strncpy(prog, argv[0], sizeof(prog) - 1);

strncpy(prog, argv[0], sizeof prog);
if (prog[sizeof prog - 1])
scream_bloody_murder();

Silently using the wrong data is a worse habit than not checking for
overflows ;-)


Segher


Re: [PATCH kernel v2 REPOST] powerpc/powernv/ioda: Allocate indirect TCE levels on demand

2018-06-21 Thread David Gibson
On Thu, Jun 21, 2018 at 07:36:27PM +1000, Alexey Kardashevskiy wrote:
> At the moment we allocate the entire TCE table, twice (hardware part and
> userspace translation cache). This normally works as we normally have
> contigous memory and the guest will map entire RAM for 64bit DMA.
> 
> However if we have sparse RAM (one example is a memory device), then
> we will allocate TCEs which will never be used as the guest only maps
> actual memory for DMA. If it is a single level TCE table, there is nothing
> we can really do but if it a multilevel table, we can skip allocating
> TCEs we know we won't need.
> 
> This adds ability to allocate only first level, saving memory.
> 
> This changes iommu_table::free() to avoid allocating of an extra level;
> iommu_table::set() will do this when needed.
> 
> This adds @alloc parameter to iommu_table::exchange() to tell the callback
> if it can allocate an extra level; the flag is set to "false" for
> the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
> H_TOO_HARD.
> 
> This still requires the entire table to be counted in mm::locked_vm.
> 
> To be conservative, this only does on-demand allocation when
> the usespace cache table is requested which is the case of VFIO.
> 
> The example math for a system replicating a powernv setup with NVLink2
> in a guest:
> 16GB RAM mapped at 0x0
> 128GB GPU RAM window (16GB of actual RAM) mapped at 0x2440
> 
> the table to cover that all with 64K pages takes:
> (((0x2440 + 0x20) >> 16)*8)>>20 = 4556MB
> 
> If we allocate only necessary TCE levels, we will only need:
> (((0x4 + 0x4) >> 16)*8)>>20 = 4MB (plus some for indirect
> levels).
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
> 
> 
> This is what I meant to post few days ago, sorry for the noise.
> 
> 
> ---
> Changes:
> v2:
> * fixed bug in cleanup path which forced the entire table to be
> allocated right before destroying
> * added memory allocation error handling pnv_tce()
> ---
>  arch/powerpc/include/asm/iommu.h  |  7 ++-
>  arch/powerpc/platforms/powernv/pci.h  |  6 ++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c   |  4 +-
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 73 
> +--
>  arch/powerpc/platforms/powernv/pci-ioda.c |  8 +--
>  drivers/vfio/vfio_iommu_spapr_tce.c   |  2 +-
>  6 files changed, 73 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 4bdcf22..daa3ee5 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -70,7 +70,7 @@ struct iommu_table_ops {
>   unsigned long *hpa,
>   enum dma_data_direction *direction);
>  
> - __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
> + __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
>  #endif
>   void (*clear)(struct iommu_table *tbl,
>   long index, long npages);
> @@ -122,10 +122,13 @@ struct iommu_table {
>   __be64 *it_userspace; /* userspace view of the table */
>   struct iommu_table_ops *it_ops;
>   struct krefit_kref;
> + int it_nid;
>  };
>  
> +#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
> + ((tbl)->it_ops->useraddrptr((tbl), (entry), false))
>  #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
> - ((tbl)->it_ops->useraddrptr((tbl), (entry)))
> + ((tbl)->it_ops->useraddrptr((tbl), (entry), true))
>  
>  /* Pure 2^n version of get_order */
>  static inline __attribute_const__
> diff --git a/arch/powerpc/platforms/powernv/pci.h 
> b/arch/powerpc/platforms/powernv/pci.h
> index 5e02408..1fa5590 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -267,8 +267,10 @@ extern int pnv_tce_build(struct iommu_table *tbl, long 
> index, long npages,
>   unsigned long attrs);
>  extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
>  extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
> - unsigned long *hpa, enum dma_data_direction *direction);
> -extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
> + unsigned long *hpa, enum dma_data_direction *direction,
> + bool alloc);
> +extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index,
> + bool alloc);
>  extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>  
>  extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8cc1caf..efb90d8 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm 
> *kvm,
>  {
>   struct mm_iomm

Re: [PATCH v2] powerpc/numa: Correct kernel message severity

2018-06-21 Thread Michael Ellerman
Vipin K Parashar  writes:

> printk() in unmap_cpu_from_node() uses KERN_ERR message severity,
> for a WARNING message. Change it to pr_warn().
>
> Signed-off-by: Vipin K Parashar 
> ---
>  arch/powerpc/mm/numa.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index edd8d0b..1632f4b 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -163,8 +163,7 @@ static void unmap_cpu_from_node(unsigned long cpu)
>   if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) {
>   cpumask_clear_cpu(cpu, node_to_cpumask_map[node]);
>   } else {
> - printk(KERN_ERR "WARNING: cpu %lu not found in node %d\n",
> -cpu, node);
> + pr_warn("WARNING: cpu %lu not found in node %d\n", cpu, node);
>   }
>  }

The full function is:

static void unmap_cpu_from_node(unsigned long cpu)
{
int node = numa_cpu_lookup_table[cpu];

dbg("removing cpu %lu from node %d\n", cpu, node);

if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) {
cpumask_clear_cpu(cpu, node_to_cpumask_map[node]);
} else {
printk(KERN_ERR "WARNING: cpu %lu not found in node %d\n",
   cpu, node);
}
}


So we look up what node the CPU is on, and then we lookup the set of
CPUs on that node, and they don't match.

That seems like a bug, not a warning.

Have you looked at why we're seeing this warning? It seems like maybe
something else is going wrong to get us into this situation to begin
with.

If there's some good reason why this is happening, and it's truly
harmless, then we can just remove the printk() entirely.

cheers


Re: [PATCH kernel] KVM: PPC: Fix hardware and emulated TCE tables matching

2018-06-21 Thread David Gibson
On Wed, Jun 20, 2018 at 06:42:58PM +1000, Alexey Kardashevskiy wrote:
> When attaching a hardware table to LIOBN in KVM, we match table parameters
> such as page size, table offset and table size. However the tables are
> created via very different paths - VFIO and KVM - and the VFIO path goes
> through the platform code which has minimum TCE page size requirement
> (which is 4K but since we allocate memory by pages and cannot avoid
> alignment anyway, we align to 64k pages for powernv_defconfig).
> 
> So when we match the tables, one might be bigger that the other which
> means the hardware table cannot get attached to LIOBN and DMA mapping
> fails.
> 
> This removes the table size alignment from the guest visible table.
> This does not affect the memory allocation which is still aligned -
> kvmppc_tce_pages() takes care of this.
> 
> This relaxes the check we do when attaching tables to allow the hardware
> table be bigger than the guest visible table.
> 
> Ideally we want the KVM table to cover the same space as the hardware
> table does but since the hardware table may use multiple levels, and
> all levels must use the same table size (IODA2 design), the area it can
> actually cover might get very different from the window size which
> the guest requested, even though the guest won't map it all.
> 
> Fixes: ca1fc489cf "KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages 
> with smaller physical pages"
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 8c456fa..8167ce8 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -180,7 +180,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm 
> *kvm, int tablefd,
>   if ((tbltmp->it_page_shift <= stt->page_shift) &&
>   (tbltmp->it_offset << tbltmp->it_page_shift ==
>stt->offset << stt->page_shift) &&
> - (tbltmp->it_size << tbltmp->it_page_shift ==
> + (tbltmp->it_size << tbltmp->it_page_shift >=
>stt->size << stt->page_shift)) {
>   /*
>* Reference the table to avoid races with
> @@ -296,7 +296,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  {
>   struct kvmppc_spapr_tce_table *stt = NULL;
>   struct kvmppc_spapr_tce_table *siter;
> - unsigned long npages, size;
> + unsigned long npages, size = args->size;
>   int ret = -ENOMEM;
>   int i;
>  
> @@ -304,7 +304,6 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>   (args->offset + args->size > (ULLONG_MAX >> args->page_shift)))
>   return -EINVAL;
>  
> - size = _ALIGN_UP(args->size, PAGE_SIZE >> 3);
>   npages = kvmppc_tce_pages(size);
>   ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
>   if (ret)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API

2018-06-21 Thread Nicholas Piggin
On Thu, 21 Jun 2018 14:28:22 -0700
Matthew Wilcox  wrote:

> ida_alloc_range is the perfect fit for this use case.  Eliminates
> a custom spinlock, a call to ida_pre_get and a local check for the
> allocated ID exceeding a maximum.
> 
> Signed-off-by: Matthew Wilcox 
> ---
>  arch/powerpc/mm/mmu_context_book3s64.c | 44 +++---
>  1 file changed, 4 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
> b/arch/powerpc/mm/mmu_context_book3s64.c
> index f3d4b4a0e561..5a0cf2cc8ba0 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -26,48 +26,16 @@
>  #include 
>  #include 
>  
> -static DEFINE_SPINLOCK(mmu_context_lock);
>  static DEFINE_IDA(mmu_context_ida);
>  
>  static int alloc_context_id(int min_id, int max_id)
>  {
> - int index, err;
> -
> -again:
> - if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
> - return -ENOMEM;
> -
> - spin_lock(&mmu_context_lock);
> - err = ida_get_new_above(&mmu_context_ida, min_id, &index);
> - spin_unlock(&mmu_context_lock);
> -
> - if (err == -EAGAIN)
> - goto again;
> - else if (err)
> - return err;
> -
> - if (index > max_id) {
> - spin_lock(&mmu_context_lock);
> - ida_remove(&mmu_context_ida, index);
> - spin_unlock(&mmu_context_lock);
> - return -ENOMEM;
> - }
> -
> - return index;
> + return ida_alloc_range(&mmu_context_ida, min_id, max_id, GFP_KERNEL);
>  }
>  
>  void hash__reserve_context_id(int id)
>  {
> - int rc, result = 0;
> -
> - do {
> - if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
> - break;
> -
> - spin_lock(&mmu_context_lock);
> - rc = ida_get_new_above(&mmu_context_ida, id, &result);
> - spin_unlock(&mmu_context_lock);
> - } while (rc == -EAGAIN);
> + int result = ida_alloc_range(&mmu_context_ida, id, id, GFP_KERNEL);
>  
>   WARN(result != id, "mmu: Failed to reserve context id %d (rc %d)\n", 
> id, result);
>  }
> @@ -172,9 +140,7 @@ int init_new_context(struct task_struct *tsk, struct 
> mm_struct *mm)
>  
>  void __destroy_context(int context_id)
>  {
> - spin_lock(&mmu_context_lock);
> - ida_remove(&mmu_context_ida, context_id);
> - spin_unlock(&mmu_context_lock);
> + ida_free(&mmu_context_ida, context_id);
>  }
>  EXPORT_SYMBOL_GPL(__destroy_context);
>  
> @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
>  {
>   int index, context_id;
>  
> - spin_lock(&mmu_context_lock);
>   for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
>   context_id = ctx->extended_id[index];
>   if (context_id)
> - ida_remove(&mmu_context_ida, context_id);
> + ida_free(&mmu_context_ida, context_id);
>   }
> - spin_unlock(&mmu_context_lock);
>  }
>  
>  static void pte_frag_destroy(void *pte_frag)

This hunk should be okay because the mmu_context_lock does not protect
the extended_id array, right Aneesh?

Reviewed-by: Nicholas Piggin 

Thanks,
Nick


Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API

2018-06-21 Thread Matthew Wilcox
On Fri, Jun 22, 2018 at 12:15:11PM +1000, Nicholas Piggin wrote:
> On Thu, 21 Jun 2018 14:28:22 -0700
> Matthew Wilcox  wrote:
> >  static int alloc_context_id(int min_id, int max_id)
...
> > -   spin_lock(&mmu_context_lock);
> > -   err = ida_get_new_above(&mmu_context_ida, min_id, &index);
> > -   spin_unlock(&mmu_context_lock);
...
> > @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
> >  {
> > int index, context_id;
> >  
> > -   spin_lock(&mmu_context_lock);
> > for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
> > context_id = ctx->extended_id[index];
> > if (context_id)
> > -   ida_remove(&mmu_context_ida, context_id);
> > +   ida_free(&mmu_context_ida, context_id);
> > }
> > -   spin_unlock(&mmu_context_lock);
> >  }
> >  
> >  static void pte_frag_destroy(void *pte_frag)
> 
> This hunk should be okay because the mmu_context_lock does not protect
> the extended_id array, right Aneesh?

That's my understanding.  The code today does this:

static inline int alloc_extended_context(struct mm_struct *mm,
 unsigned long ea)
{
int context_id;

int index = ea >> MAX_EA_BITS_PER_CONTEXT;

context_id = hash__alloc_context_id();
if (context_id < 0)
return context_id;

VM_WARN_ON(mm->context.extended_id[index]);
mm->context.extended_id[index] = context_id;

so it's not currently protected by this lock.  I suppose we are currently
protected from destroy_contexts() being called twice simultaneously, but
you'll notice that we don't zero the array elements in destroy_contexts(),
so if we somehow had a code path which could call it concurrently, we'd
be seeing warnings when the second caller tried to remove the context
IDs from the IDA.  I deduced that something else must be preventing
this situation from occurring (like, oh i don't know, this function only
being called on process exit, so implicitly only called once per context).

> Reviewed-by: Nicholas Piggin 

Thanks.


Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API

2018-06-21 Thread Nicholas Piggin
On Thu, 21 Jun 2018 21:38:15 -0700
Matthew Wilcox  wrote:

> On Fri, Jun 22, 2018 at 12:15:11PM +1000, Nicholas Piggin wrote:
> > On Thu, 21 Jun 2018 14:28:22 -0700
> > Matthew Wilcox  wrote:  
> > >  static int alloc_context_id(int min_id, int max_id)  
> ...
> > > - spin_lock(&mmu_context_lock);
> > > - err = ida_get_new_above(&mmu_context_ida, min_id, &index);
> > > - spin_unlock(&mmu_context_lock);  
> ...
> > > @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
> > >  {
> > >   int index, context_id;
> > >  
> > > - spin_lock(&mmu_context_lock);
> > >   for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
> > >   context_id = ctx->extended_id[index];
> > >   if (context_id)
> > > - ida_remove(&mmu_context_ida, context_id);
> > > + ida_free(&mmu_context_ida, context_id);
> > >   }
> > > - spin_unlock(&mmu_context_lock);
> > >  }
> > >  
> > >  static void pte_frag_destroy(void *pte_frag)  
> > 
> > This hunk should be okay because the mmu_context_lock does not protect
> > the extended_id array, right Aneesh?  
> 
> That's my understanding.  The code today does this:
> 
> static inline int alloc_extended_context(struct mm_struct *mm,
>  unsigned long ea)
> {
> int context_id;
> 
> int index = ea >> MAX_EA_BITS_PER_CONTEXT;
> 
> context_id = hash__alloc_context_id();
> if (context_id < 0)
> return context_id;
> 
> VM_WARN_ON(mm->context.extended_id[index]);
> mm->context.extended_id[index] = context_id;
> 
> so it's not currently protected by this lock.  I suppose we are currently
> protected from destroy_contexts() being called twice simultaneously, but
> you'll notice that we don't zero the array elements in destroy_contexts(),
> so if we somehow had a code path which could call it concurrently, we'd
> be seeing warnings when the second caller tried to remove the context

Yeah that'd be an existing bug.

> IDs from the IDA.  I deduced that something else must be preventing
> this situation from occurring (like, oh i don't know, this function only
> being called on process exit, so implicitly only called once per context).

I think that's exactly right.

Thanks,
Nick


Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling

2018-06-21 Thread Finn Thain
On Tue, 19 Jun 2018, Arnd Bergmann wrote:

> The real-time clock on m68k (and powerpc) mac systems uses an unsigned 
> 32-bit value starting in 1904, which overflows in 2040, about two years 
> later than everyone else, but this gets wrapped around in the Linux code 
> in 2038 already because of the deprecated usage of time_t and/or long in 
> the conversion.
> 
> Getting rid of the deprecated interfaces makes it work until 2040 as 
> documented, and it could be easily extended by reinterpreting the 
> resulting time64_t as a positive number. For the moment, I'm adding a 
> WARN_ON() that triggers if we encounter a time before 1970 or after 2040 
> (the two are indistinguishable).
> 

I really don't like the WARN_ON(), but I'd prefer to address that in a 
separate patch rather than impede the progress of this patch (or of this 
series, since 3/3 seems to be unrelated).

BTW, have you considered using the same wrap-around test (i.e. YY < 70) 
that we use for the year register in the other RTC chips?

> This brings it in line with the corresponding code that we have on 
> powerpc macintosh.
> 

Your recent patches to the Mac RTC routines (which are duplicated under 
arch/m68k and arch/powerpc) conflict with my recent patch that 
deduplicates the same code. So I will rebase and resubmit after someone 
merges these fixes.

Apparently the PowerMac routines work now, which is sufficient testing for 
me; the PowerMac routines will get tested on m68k Macs when that code gets 
deduplicated again.

BTW, Joshua tells me that he is not doing code review. We should probably 
drop the "M68K ON APPLE MACINTOSH" entry from the MAINTAINERS file, like 
the Amiga and Atari ports...

-- 

> Signed-off-by: Arnd Bergmann 
> ---
> v2: Fix varargs passing bug pointed out by Andreas Schwab
> Fix a typo that caused a build regression
> ---
>  arch/m68k/mac/misc.c | 62 
> +---
>  1 file changed, 39 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
> index c68054361615..0a2572a6bfe5 100644
> --- a/arch/m68k/mac/misc.c
> +++ b/arch/m68k/mac/misc.c
> @@ -26,33 +26,39 @@
>  
>  #include 
>  
> -/* Offset between Unix time (1970-based) and Mac time (1904-based) */
> +/* Offset between Unix time (1970-based) and Mac time (1904-based). Cuda and 
> PMU
> + * times wrap in 2040. If we need to handle later times, the read_time 
> functions
> + * need to be changed to interpret wrapped times as post-2040. */
>  
>  #define RTC_OFFSET 2082844800
>  
>  static void (*rom_reset)(void);
>  
>  #ifdef CONFIG_ADB_CUDA
> -static long cuda_read_time(void)
> +static time64_t cuda_read_time(void)
>  {
>   struct adb_request req;
> - long time;
> + time64_t time;
>  
>   if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0)
>   return 0;
>   while (!req.complete)
>   cuda_poll();
>  
> - time = (req.reply[3] << 24) | (req.reply[4] << 16) |
> -(req.reply[5] << 8) | req.reply[6];
> + time = (u32)((req.reply[3] << 24) | (req.reply[4] << 16) |
> +  (req.reply[5] << 8) | req.reply[6]);
> +
> + /* it's either after year 2040, or the RTC has gone backwards */
> + WARN_ON(time < RTC_OFFSET);
> +
>   return time - RTC_OFFSET;
>  }
>  
> -static void cuda_write_time(long data)
> +static void cuda_write_time(time64_t time)
>  {
>   struct adb_request req;
> + u32 data = lower_32_bits(time + RTC_OFFSET);
>  
> - data += RTC_OFFSET;
>   if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME,
>(data >> 24) & 0xFF, (data >> 16) & 0xFF,
>(data >> 8) & 0xFF, data & 0xFF) < 0)
> @@ -86,26 +92,30 @@ static void cuda_write_pram(int offset, __u8 data)
>  #endif /* CONFIG_ADB_CUDA */
>  
>  #ifdef CONFIG_ADB_PMU68K
> -static long pmu_read_time(void)
> +static time64_t pmu_read_time(void)
>  {
>   struct adb_request req;
> - long time;
> + time64_t time;
>  
>   if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
>   return 0;
>   while (!req.complete)
>   pmu_poll();
>  
> - time = (req.reply[1] << 24) | (req.reply[2] << 16) |
> -(req.reply[3] << 8) | req.reply[4];
> + time = (u32)((req.reply[1] << 24) | (req.reply[2] << 16) |
> +  (req.reply[3] << 8) | req.reply[4]);
> +
> + /* it's either after year 2040, or the RTC has gone backwards */
> + WARN_ON(time < RTC_OFFSET);
> +
>   return time - RTC_OFFSET;
>  }
>  
> -static void pmu_write_time(long data)
> +static void pmu_write_time(time64_t time)
>  {
>   struct adb_request req;
> + u32 data = lower_32_bits(time + RTC_OFFSET);
>  
> - data += RTC_OFFSET;
>   if (pmu_request(&req, NULL, 5, PMU_SET_RTC,
>   (data >> 24) & 0xFF, (data >> 16) & 0xFF,
>   (data >> 8) & 0xFF, data & 0xFF) < 0)
> @@ -269,8 +279,12 @@ static lon

Re: [PATCH 2/3] migration/memory: Evaluate LMB assoc changes

2018-06-21 Thread kbuild test robot
Hi Michael,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.18-rc1 next-20180621]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-migration-Affinity-fix-for-memory/20180621-090151
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/pseries/hotplug-memory.c: In function 
'pseries_update_drconf_memory':
>> arch/powerpc/platforms/pseries/hotplug-memory.c:1032:4: error: implicit 
>> declaration of function 'dlpar_queue_action'; did you mean 
>> 'waitqueue_active'? [-Werror=implicit-function-declaration]
   dlpar_queue_action(
   ^~
   waitqueue_active
   cc1: some warnings being treated as errors

vim +1032 arch/powerpc/platforms/pseries/hotplug-memory.c

   999  
  1000  static int pseries_update_drconf_memory(struct drmem_lmb_info 
*new_dinfo)
  1001  {
  1002  struct drmem_lmb *old_lmb, *new_lmb;
  1003  unsigned long memblock_size;
  1004  int rc = 0;
  1005  
  1006  if (rtas_hp_event)
  1007  return 0;
  1008  
  1009  memblock_size = pseries_memory_block_size();
  1010  if (!memblock_size)
  1011  return -EINVAL;
  1012  
  1013  /* Arrays should have the same size and DRC indexes */
  1014  for_each_pair_dinfo_lmb(drmem_info, old_lmb, new_dinfo, 
new_lmb) {
  1015  
  1016  if (new_lmb->drc_index != old_lmb->drc_index)
  1017  continue;
  1018  
  1019  if ((old_lmb->flags & DRCONF_MEM_ASSIGNED) &&
  1020  (!(new_lmb->flags & DRCONF_MEM_ASSIGNED))) {
  1021  rc = pseries_remove_memblock(
  1022  old_lmb->base_addr, memblock_size);
  1023  break;
  1024  } else if ((!(old_lmb->flags & DRCONF_MEM_ASSIGNED)) &&
  1025 (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
  1026  rc = memblock_add(old_lmb->base_addr,
  1027  memblock_size);
  1028  rc = (rc < 0) ? -EINVAL : 0;
  1029  break;
  1030  } else if ((old_lmb->aa_index != new_lmb->aa_index) &&
  1031 (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
> 1032  dlpar_queue_action(
  1033  PSERIES_HP_ELOG_RESOURCE_MEM,
  1034  PSERIES_HP_ELOG_ACTION_READD,
  1035  new_lmb->drc_index);
  1036  }
  1037  }
  1038  return rc;
  1039  }
  1040  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API

2018-06-21 Thread Aneesh Kumar K.V
Nicholas Piggin  writes:

> On Thu, 21 Jun 2018 14:28:22 -0700
> Matthew Wilcox  wrote:
>
>> ida_alloc_range is the perfect fit for this use case.  Eliminates
>> a custom spinlock, a call to ida_pre_get and a local check for the
>> allocated ID exceeding a maximum.
>> 
>> Signed-off-by: Matthew Wilcox 
>> ---
>>  arch/powerpc/mm/mmu_context_book3s64.c | 44 +++---
>>  1 file changed, 4 insertions(+), 40 deletions(-)
>> 
>> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
>> b/arch/powerpc/mm/mmu_context_book3s64.c
>> index f3d4b4a0e561..5a0cf2cc8ba0 100644
>> --- a/arch/powerpc/mm/mmu_context_book3s64.c
>> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
>> @@ -26,48 +26,16 @@
>>  #include 
>>  #include 
>>  
>> -static DEFINE_SPINLOCK(mmu_context_lock);
>>  static DEFINE_IDA(mmu_context_ida);
>>  
>>  static int alloc_context_id(int min_id, int max_id)
>>  {
>> -int index, err;
>> -
>> -again:
>> -if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
>> -return -ENOMEM;
>> -
>> -spin_lock(&mmu_context_lock);
>> -err = ida_get_new_above(&mmu_context_ida, min_id, &index);
>> -spin_unlock(&mmu_context_lock);
>> -
>> -if (err == -EAGAIN)
>> -goto again;
>> -else if (err)
>> -return err;
>> -
>> -if (index > max_id) {
>> -spin_lock(&mmu_context_lock);
>> -ida_remove(&mmu_context_ida, index);
>> -spin_unlock(&mmu_context_lock);
>> -return -ENOMEM;
>> -}
>> -
>> -return index;
>> +return ida_alloc_range(&mmu_context_ida, min_id, max_id, GFP_KERNEL);
>>  }
>>  
>>  void hash__reserve_context_id(int id)
>>  {
>> -int rc, result = 0;
>> -
>> -do {
>> -if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
>> -break;
>> -
>> -spin_lock(&mmu_context_lock);
>> -rc = ida_get_new_above(&mmu_context_ida, id, &result);
>> -spin_unlock(&mmu_context_lock);
>> -} while (rc == -EAGAIN);
>> +int result = ida_alloc_range(&mmu_context_ida, id, id, GFP_KERNEL);
>>  
>>  WARN(result != id, "mmu: Failed to reserve context id %d (rc %d)\n", 
>> id, result);
>>  }
>> @@ -172,9 +140,7 @@ int init_new_context(struct task_struct *tsk, struct 
>> mm_struct *mm)
>>  
>>  void __destroy_context(int context_id)
>>  {
>> -spin_lock(&mmu_context_lock);
>> -ida_remove(&mmu_context_ida, context_id);
>> -spin_unlock(&mmu_context_lock);
>> +ida_free(&mmu_context_ida, context_id);
>>  }
>>  EXPORT_SYMBOL_GPL(__destroy_context);
>>  
>> @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
>>  {
>>  int index, context_id;
>>  
>> -spin_lock(&mmu_context_lock);
>>  for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
>>  context_id = ctx->extended_id[index];
>>  if (context_id)
>> -ida_remove(&mmu_context_ida, context_id);
>> +ida_free(&mmu_context_ida, context_id);
>>  }
>> -spin_unlock(&mmu_context_lock);
>>  }
>>  
>>  static void pte_frag_destroy(void *pte_frag)
>
> This hunk should be okay because the mmu_context_lock does not protect
> the extended_id array, right Aneesh?

Yes. This is called at process exit, so we should not find parallel
calls. On the allocation side, we are protected by mmap_sem. We do
allocate extended_id when doing mmap.

-aneesh



Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API

2018-06-21 Thread Aneesh Kumar K.V
Matthew Wilcox  writes:



> this situation from occurring (like, oh i don't know, this function only
> being called on process exit, so implicitly only called once per context).
>

That is correct.



[PATCH 2/2] powerpc: Document issues with TM on POWER9

2018-06-21 Thread Michael Neuling
Signed-off-by: Michael Neuling 
---
 .../powerpc/transactional_memory.txt  | 44 +++
 1 file changed, 44 insertions(+)

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
index e32fdbb4c9..b254eab517 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -198,3 +198,47 @@ presented).  The transaction cannot then be continued and 
will take the failure
 handler route.  Furthermore, the transactional 2nd register state will be
 inaccessible.  GDB can currently be used on programs using TM, but not sensibly
 in parts within transactions.
+
+POWER9
+==
+
+TM on POWER9 has issues with storing the complete register state. This
+is described in this commit:
+
+commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7
+Author: Paul Mackerras 
+Date:   Wed Mar 21 21:32:01 2018 +1100
+KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
+
+To account for this different POWER9 chips have TM enabled in
+different ways.
+
+On POWER9N DD2.01 and below, TM is disabled. ie
+HWCAP2[PPC_FEATURE2_HTM] is not set.
+
+On POWER9N DD2.1 TM is configured by firmware to always abort a
+transaction when tm suspend occurs. So tsuspend will cause a
+transaction to be aborted and rolled back. Kernel exceptions will also
+cause the transaction to be aborted and rolled back and the exception
+will not occur. If userspace constructs a sigcontext that enables TM
+suspend, the sigcontext will be rejected by the kernel. This mode is
+advertised to users with HWCAP2[PPC_FEATURE2_HTM_NO_SUSPEND] set.
+HWCAP2[PPC_FEATURE2_HTM] is not set in this mode.
+
+On POWER9N DD2.2 and above, KVM and POWERVM emulate TM for guests (as
+descibed in commit 4bb3c7a0208f), hence TM is enabled for guests
+ie. HWCAP2[PPC_FEATURE2_HTM] is set for guest userspace. Guests that
+makes heavy use of TM suspend (tsuspend or kernel suspend) will result
+in traps into the hypervisor and hence will suffer a performance
+degredation. Host userspace has TM disabled
+ie. HWCAP2[PPC_FEATURE2_HTM] is not set. (although we make enable it
+at some point in the future if we bring the emulation into host
+userspace context switching).
+
+POWER9C DD1.2 and above are only avaliable with POWERNV and hence
+Linux only runs as a guest. On these systems TM is emulated like on
+POWER9N DD2.2.
+
+Guest migration from POWER8 to POWER9 will work with POWER9N DD2.2 and
+POWER9C DD1.2. Since earlier POWER9 processors don't support TM
+emulation, migration from POWER8 to POWER9 is not supported there.
-- 
2.17.1



[PATCH 1/2] powerpc: Document issues with the DAWR on POWER9

2018-06-21 Thread Michael Neuling
Signed-off-by: Michael Neuling 
---
 Documentation/powerpc/DAWR-POWER9.txt | 58 +++
 1 file changed, 58 insertions(+)
 create mode 100644 Documentation/powerpc/DAWR-POWER9.txt

diff --git a/Documentation/powerpc/DAWR-POWER9.txt 
b/Documentation/powerpc/DAWR-POWER9.txt
new file mode 100644
index 00..cc6e69b69b
--- /dev/null
+++ b/Documentation/powerpc/DAWR-POWER9.txt
@@ -0,0 +1,58 @@
+DAWR issues on POWER9
+
+
+On POWER9 the DAWR can cause a checkstop if it points to cache
+inhibited (CI) memory. Currently Linux has no way to disinguish CI
+memory when configuring the DAWR, so (for now) the DAWR is disabled by
+this commit:
+
+commit 9654153158d3e0684a1bdb76dbababdb7111d5a0
+Author: Michael Neuling 
+Date:   Tue Mar 27 15:37:24 2018 +1100
+powerpc: Disable DAWR in the base POWER9 CPU features
+
+Technical Details:
+
+
+DAWR has 6 different ways of being set.
+1) ptrace
+2) h_set_mode(DAWR)
+3) h_set_dabr()
+4) kvmppc_set_one_reg()
+5) xmon
+
+For ptrace, we now advertise zero breakpoints on POWER9 via the
+PPC_PTRACE_GETHWDBGINFO call. This results in GDB falling back to
+software emulation of the watchpoint (which is slow).
+
+h_set_mode(DAWR) and h_set_dabr() will now return an error to the
+guest on a POWER9 host. Current Linux guests ignore this error, so
+they will silently not get the DAWR.
+
+kvmppc_set_one_reg() will store the value in the vcpu but won't
+actually set it on POWER9 hardware. This is done so we don't break
+migration from POWER8 to POWER9, at the cost of silently losing the
+DAWR on the migration.
+
+For xmon, the 'bd' command will return an error on P9.
+
+Consequences for users
+
+
+For GDB watchpoints (ie 'watch' command) on POWER9 bare metal , GDB
+will accept the command. Unfortunatley since there is no hardware
+support for the watchpoint, GDB will software emulate the watchpoint
+making it run very slowly.
+
+The same will also be true for any guests started on a POWER9
+host. The watchpoint will fail and GDB will fall back to software
+emulation.
+
+If a guest is started on a POWER8 host, GDB will accept the watchpoint
+and configure the hardware to use the DAWR. This will run at full
+speed since it can use the hardware emualation. Unfortnatley if this
+guest is migrated to a POWER9 host, the watchpoint will be lost on the
+POWER9. Loads and stores to the watchpoint locations will not be
+trapped in GDB. The watchpoint is remembered, so if the guest is
+migrated back to the POWER8 host, it will start working again.
+
-- 
2.17.1