Re: [PATCH net-next v2] net: sched: don't disable bh when accessing action idr

2018-05-23 Thread Vlad Buslov

On Wed 23 May 2018 at 01:10, Cong Wang  wrote:
> On Mon, May 21, 2018 at 1:03 PM, Vlad Buslov  wrote:
>> Initial net_device implementation used ingress_lock spinlock to synchronize
>> ingress path of device. This lock was used in both process and bh context.
>> In some code paths action map lock was obtained while holding ingress_lock.
>> Commit e1e992e52faa ("[NET_SCHED] protect action config/dump from irqs")
>> modified actions to always disable bh, while using action map lock, in
>> order to prevent deadlock on ingress_lock in softirq. This lock was removed
>> from net_device, so disabling bh, while accessing action map, is no longer
>> necessary.
>>
>> Replace all action idr spinlock usage with regular calls that do not
>> disable bh.
>
> While your patch is probably fine, the above justification seems not.

Sorry if I missed something. My justification is based on commit
description that added bh disable in subject code.

>
> In the past, tc actions could be released in BH context because tc
> filters use call_rcu(). However, I moved them to a workqueue recently.
> So before my change I don't think you can remove the BH protection,
> otherwise race with idr_remove()...

Found commit series that you described. Will modify commit message
accordingly.

Thanks,
Vlad



Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()

2018-05-23 Thread Christophe LEROY



Le 23/05/2018 à 08:29, Nicholas Piggin a écrit :

On Tue, 22 May 2018 16:50:55 +0200
Christophe LEROY  wrote:


Le 22/05/2018 à 16:38, Nicholas Piggin a écrit :

On Tue, 22 May 2018 16:02:56 +0200 (CEST)
Christophe Leroy  wrote:
   

Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
userspace instruction miss") has shown that limiting the read of
faulting instruction to likely cases improves performance.

This patch goes further into this direction by limiting the read
of the faulting instruction to the only cases where it is likely
needed.

On an MPC885, with the same benchmark app as in the commit referred
above, we see a reduction of about 3900 dTLB misses (approx 3%):

Before the patch:
   Performance counter stats for './fault 500' (10 runs):

   683033312  cpu-cycles
( +-  0.03% )
  134538  dTLB-load-misses  
( +-  0.03% )
   46099  iTLB-load-misses  
( +-  0.02% )
   19681  faults
( +-  0.02% )

 5.389747878 seconds time elapsed   
   ( +-  0.06% )

With the patch:

   Performance counter stats for './fault 500' (10 runs):

   682112862  cpu-cycles
( +-  0.03% )
  130619  dTLB-load-misses  
( +-  0.03% )
   46073  iTLB-load-misses  
( +-  0.05% )
   19681  faults
( +-  0.01% )

 5.381342641 seconds time elapsed   
   ( +-  0.07% )

The proper work of the huge stack expansion was tested with the
following app:

int main(int argc, char **argv)
{
char buf[1024 * 1025];

sprintf(buf, "Hello world !\n");
printf(buf);

exit(0);
}

Signed-off-by: Christophe Leroy 
---
   v7: Following comment from Nicholas on v6 on possibility of the page getting 
removed from the pagetables
   between the fault and the read, I have reworked the patch in order to do 
the get_user() in
   __do_page_fault() directly in order to reduce complexity compared to 
version v5


This is looking better, thanks.
   

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index fcbb34431da2..dc64b8e06477 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * can result in fault, which will cause a deadlock when called with
 * mmap_sem held
 */
-   if (is_write && is_user)
-   get_user(inst, (unsigned int __user *)regs->nip);
-
if (is_user)
flags |= FAULT_FLAG_USER;
if (is_write)
@@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
return bad_area(regs, address);
   
+	if (unlikely(is_write && is_user && address + 0x10 < vma->vm_end &&

+!inst)) {
+   unsigned int __user *nip = (unsigned int __user *)regs->nip;
+
+   if (likely(access_ok(VERIFY_READ, nip, sizeof(inst {
+   int res;
+
+   pagefault_disable();
+   res = __get_user_inatomic(inst, nip);
+   pagefault_enable();
+   if (unlikely(res)) {
+   up_read(&mm->mmap_sem);
+   res = __get_user(inst, nip);
+   if (!res && inst)
+   goto retry;


You're handling error here but the previous code did not?


The previous code did in store_updates_sp()

When I moved get_user() out of that function in preceeding patch, I did
consider that if get_user() fails, inst will remain 0, which means that
store_updates_sp() will return false if ever called.


Well it handles it just by saying no the store does not update SP.
Yours now segfaults it, doesn't it?


Yes it segfaults the same way as before, as it tell the expansion is bad.



I don't think that's a bad idea, I think it should go in a patch by
itself though. In theory we can have execute but not read, I guess
that's not really going to work here either way and I don't know if
Linux exposes it ever.


I don't understand what you mean, that's not different from before, is it ?





Now, as the semaphore has been released, we really need to do something,
because if we goto retry inconditionally, we may end up in an infinite
loop, and we can't let it continue either as the semaphore is not held
anymore.

   

+   

Re: [PATCH rdma-next 2/5] RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust

2018-05-23 Thread Leon Romanovsky
On Wed, May 23, 2018 at 02:49:35PM +0800, Wei Hu (Xavier) wrote:
>
>
> On 2018/5/23 14:05, Leon Romanovsky wrote:
> > On Thu, May 17, 2018 at 04:02:50PM +0800, Wei Hu (Xavier) wrote:
> >> This patch modified uar allocation algorithm in hns_roce_uar_alloc
> >> function to avoid bitmap exhaust.
> >>
> >> Signed-off-by: Wei Hu (Xavier) 
> >> ---
> >>  drivers/infiniband/hw/hns/hns_roce_device.h |  1 +
> >>  drivers/infiniband/hw/hns/hns_roce_pd.c | 10 ++
> >>  2 files changed, 7 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
> >> b/drivers/infiniband/hw/hns/hns_roce_device.h
> >> index 53c2f1b..412297d4 100644
> >> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> >> @@ -214,6 +214,7 @@ enum {
> >>  struct hns_roce_uar {
> >>u64 pfn;
> >>unsigned long   index;
> >> +  unsigned long   logic_idx;
> >>  };
> >>
> >>  struct hns_roce_ucontext {
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c 
> >> b/drivers/infiniband/hw/hns/hns_roce_pd.c
> >> index 4b41e04..b9f2c87 100644
> >> --- a/drivers/infiniband/hw/hns/hns_roce_pd.c
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
> >> @@ -107,13 +107,15 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, 
> >> struct hns_roce_uar *uar)
> >>int ret = 0;
> >>
> >>/* Using bitmap to manager UAR index */
> >> -  ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->index);
> >> +  ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->logic_idx);
> >>if (ret == -1)
> >>return -ENOMEM;
> >>
> >> -  if (uar->index > 0)
> >> -  uar->index = (uar->index - 1) %
> >> +  if (uar->logic_idx > 0 && hr_dev->caps.phy_num_uars > 1)
> >> +  uar->index = (uar->logic_idx - 1) %
> >> (hr_dev->caps.phy_num_uars - 1) + 1;
> >> +  else
> >> +  uar->index = 0;
> >>
> > Sorry, but maybe I didn't understand this change fully, but logic_idx is
> > not initialized at all and one of two (needs to check your uar
> > allocation): the logic_idx is always zero -> index will be zero too,
> > or logic_idx is random variable -> index will be random too.
> >
> > What did you want to do?
> >
> Hi, Leon
>
> The prototype of hns_roce_bitmap_alloc as belows:
> int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap,
> unsigned long *obj);
> In this statement,  we evaluate uar->logic_idx.
> ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap,
> &uar->logic_idx);
>
> In hip06,  hr_dev->caps.phy_num_uars equals 8,
> if (uar->logic_idx > 0)
>  uar-> index = 0;
>else
>  uar-> index =(uar->logic_idx - 1) %
> (hr_dev->caps.phy_num_uars - 1) + 1;
> In hip08,  hr_dev->caps.phy_num_uars equals 1,  uar-> index = 0;
>
>Regards

Where did you change/set logic_idx?

Thanks


> Wei Hu
> >>if (!dev_is_pci(hr_dev->dev)) {
> >>res = platform_get_resource(hr_dev->pdev, IORESOURCE_MEM, 0);
> >> @@ -132,7 +134,7 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, 
> >> struct hns_roce_uar *uar)
> >>
> >>  void hns_roce_uar_free(struct hns_roce_dev *hr_dev, struct hns_roce_uar 
> >> *uar)
> >>  {
> >> -  hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->index,
> >> +  hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->logic_idx,
> >> BITMAP_NO_RR);
> >>  }
> >>
> >> --
> >> 1.9.1
> >>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


[PATCH v8] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()

2018-05-23 Thread Christophe Leroy
Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
userspace instruction miss") has shown that limiting the read of
faulting instruction to likely cases improves performance.

This patch goes further into this direction by limiting the read
of the faulting instruction to the only cases where it is likely
needed.

On an MPC885, with the same benchmark app as in the commit referred
above, we see a reduction of about 3900 dTLB misses (approx 3%):

Before the patch:
 Performance counter stats for './fault 500' (10 runs):

 683033312  cpu-cycles  
  ( +-  0.03% )
134538  dTLB-load-misses
  ( +-  0.03% )
 46099  iTLB-load-misses
  ( +-  0.02% )
 19681  faults  
  ( +-  0.02% )

   5.389747878 seconds time elapsed 
 ( +-  0.06% )

With the patch:

 Performance counter stats for './fault 500' (10 runs):

 682112862  cpu-cycles  
  ( +-  0.03% )
130619  dTLB-load-misses
  ( +-  0.03% )
 46073  iTLB-load-misses
  ( +-  0.05% )
 19681  faults  
  ( +-  0.01% )

   5.381342641 seconds time elapsed 
 ( +-  0.07% )

The proper work of the huge stack expansion was tested with the
following app:

int main(int argc, char **argv)
{
char buf[1024 * 1025];

sprintf(buf, "Hello world !\n");
printf(buf);

exit(0);
}

Signed-off-by: Christophe Leroy 
---
 v8: Back to a single patch as it now makes no sense to split the first part in 
two. The third patch has no
 dependencies with the ones before, so it will be resend independantly. As 
suggested by Nicholas, the
 patch now does the get_user() stuff inside bad_stack_expansion(), that's a 
mid way between v5 and v7.

 v7: Following comment from Nicholas on v6 on possibility of the page getting 
removed from the pagetables
 between the fault and the read, I have reworked the patch in order to do 
the get_user() in
 __do_page_fault() directly in order to reduce complexity compared to 
version v5

 v6: Rebased on latest powerpc/merge branch ; Using __get_user_inatomic() 
instead of get_user() in order
 to move it inside the semaphored area. That removes all the complexity of 
the patch.

 v5: Reworked to fit after Benh do_fault improvement and rebased on top of 
powerpc/merge (65152902e43fef)

 v4: Rebased on top of powerpc/next (f718d426d7e42e) and doing access_ok() 
verification before __get_user_xxx()

 v3: Do a first try with pagefault disabled before releasing the semaphore

 v2: Changes 'if (cond1) if (cond2)' by 'if (cond1 && cond2)'

 arch/powerpc/mm/fault.c | 63 +++--
 1 file changed, 45 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 0c99f9b45e8f..7f9363879f4a 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -66,15 +66,11 @@ static inline bool notify_page_fault(struct pt_regs *regs)
 }
 
 /*
- * Check whether the instruction at regs->nip is a store using
+ * Check whether the instruction inst is a store using
  * an update addressing form which will update r1.
  */
-static bool store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(unsigned int inst)
 {
-   unsigned int inst;
-
-   if (get_user(inst, (unsigned int __user *)regs->nip))
-   return false;
/* check for 1 in the rA field */
if (((inst >> 16) & 0x1f) != 1)
return false;
@@ -233,9 +229,10 @@ static bool bad_kernel_fault(bool is_exec, unsigned long 
error_code,
return is_exec || (address >= TASK_SIZE);
 }
 
-static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
-   struct vm_area_struct *vma,
-   bool store_update_sp)
+/* Return value is true if bad (sem. released), false if good, -1 for retry */
+static int bad_stack_expansion(struct pt_regs *regs, unsigned long address,
+   struct vm_area_struct *vma, unsigned int flags,
+   bool is_retry)
 {
/*
 * N.B. The POWER/Open ABI allows programs to access up to
@@ -247,10 +244,15 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
unsigned long address,
 * expand to 1MB without further checks.
 */
if (address + 0x10 < vma->vm_end) {
+   struct mm_struct *mm = current->mm;
+   unsigned int __user *nip = (unsigned int __user *)regs->nip;
+  

[PATCH] netfilter: uapi: includes linux/types.h

2018-05-23 Thread YueHaibing
gcc-7.3.0 report following warning:
./usr/include/linux/netfilter/nf_osf.h:27: found __[us]{8,16,32,64} type 
without #include 

includes linux/types.h to fix it.

Signed-off-by: YueHaibing 
---
 include/uapi/linux/netfilter/nf_osf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_osf.h 
b/include/uapi/linux/netfilter/nf_osf.h
index 45376ea..d1dbe00 100644
--- a/include/uapi/linux/netfilter/nf_osf.h
+++ b/include/uapi/linux/netfilter/nf_osf.h
@@ -1,6 +1,8 @@
 #ifndef _NF_OSF_H
 #define _NF_OSF_H
 
+#include 
+
 #define MAXGENRELEN32
 
 #define NF_OSF_GENRE   (1 << 0)
-- 
2.7.0




[PATCH] powerpc/mm: Use instruction symbolic names in store_updates_sp()

2018-05-23 Thread Christophe Leroy
Use symbolic names defined in asm/ppc-opcode.h
instead of hardcoded values.

Signed-off-by: Christophe Leroy 
---
 Resending as inpependant of the do_page_fault() stuff

 arch/powerpc/include/asm/ppc-opcode.h |  1 +
 arch/powerpc/mm/fault.c   | 26 +-
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 18883b8a6dac..4436887bc415 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -162,6 +162,7 @@
 /* VMX Vector Store Instructions */
 #define OP_31_XOP_STVX  231
 
+#define OP_31   31
 #define OP_LWZ  32
 #define OP_STFS 52
 #define OP_STFSU 53
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c01d627e687a..0c99f9b45e8f 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -80,23 +80,23 @@ static bool store_updates_sp(struct pt_regs *regs)
return false;
/* check major opcode */
switch (inst >> 26) {
-   case 37:/* stwu */
-   case 39:/* stbu */
-   case 45:/* sthu */
-   case 53:/* stfsu */
-   case 55:/* stfdu */
+   case OP_STWU:
+   case OP_STBU:
+   case OP_STHU:
+   case OP_STFSU:
+   case OP_STFDU:
return true;
-   case 62:/* std or stdu */
+   case OP_STD:/* std or stdu */
return (inst & 3) == 1;
-   case 31:
+   case OP_31:
/* check minor opcode */
switch ((inst >> 1) & 0x3ff) {
-   case 181:   /* stdux */
-   case 183:   /* stwux */
-   case 247:   /* stbux */
-   case 439:   /* sthux */
-   case 695:   /* stfsux */
-   case 759:   /* stfdux */
+   case OP_31_XOP_STDUX:
+   case OP_31_XOP_STWUX:
+   case OP_31_XOP_STBUX:
+   case OP_31_XOP_STHUX:
+   case OP_31_XOP_STFSUX:
+   case OP_31_XOP_STFDUX:
return true;
}
}
-- 
2.13.3



Re: [PATCH v5 1/3] ARM: dts: tegra: Remove skeleton.dtsi and fix DTC warnings for /memory

2018-05-23 Thread Krzysztof Kozlowski
On Thu, May 17, 2018 at 1:39 PM, Stefan Agner  wrote:
> On 17.05.2018 09:45, Krzysztof Kozlowski wrote:
>> Remove the usage of skeleton.dtsi and add necessary properties to /memory
>> node to fix the DTC warnings:
>>
>> arch/arm/boot/dts/tegra20-harmony.dtb: Warning (unit_address_vs_reg):
>> /memory: node has a reg or ranges property, but no unit name
>>
>> The DTB after the change is the same as before except adding
>> unit-address to /memory node.
>>
>> Signed-off-by: Krzysztof Kozlowski 
>>
>> ---
>>
>> Changes since v4:
>> 1. None
>> ---
>>  arch/arm/boot/dts/tegra114-dalmore.dts  | 3 ++-
>>  arch/arm/boot/dts/tegra114-roth.dts | 3 ++-
>>  arch/arm/boot/dts/tegra114-tn7.dts  | 3 ++-
>>  arch/arm/boot/dts/tegra114.dtsi | 4 ++--
>>  arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi | 3 ++-
>>  arch/arm/boot/dts/tegra124-apalis.dtsi  | 3 ++-
>>  arch/arm/boot/dts/tegra124-jetson-tk1.dts   | 3 ++-
>>  arch/arm/boot/dts/tegra124-nyan.dtsi| 3 ++-
>>  arch/arm/boot/dts/tegra124-venice2.dts  | 3 ++-
>>  arch/arm/boot/dts/tegra124.dtsi | 2 --
>>  arch/arm/boot/dts/tegra20-colibri-512.dtsi  | 3 ++-
>>  arch/arm/boot/dts/tegra20-harmony.dts   | 3 ++-
>>  arch/arm/boot/dts/tegra20-paz00.dts | 3 ++-
>>  arch/arm/boot/dts/tegra20-seaboard.dts  | 3 ++-
>>  arch/arm/boot/dts/tegra20-tamonten.dtsi | 3 ++-
>>  arch/arm/boot/dts/tegra20-trimslice.dts | 3 ++-
>>  arch/arm/boot/dts/tegra20-ventana.dts   | 3 ++-
>>  arch/arm/boot/dts/tegra20.dtsi  | 7 +--
>>  arch/arm/boot/dts/tegra30-apalis.dtsi   | 5 +
>>  arch/arm/boot/dts/tegra30-beaver.dts| 3 ++-
>>  arch/arm/boot/dts/tegra30-cardhu.dtsi   | 3 ++-
>>  arch/arm/boot/dts/tegra30-colibri.dtsi  | 3 ++-
>>  arch/arm/boot/dts/tegra30.dtsi  | 7 +--
>>  23 files changed, 53 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/arm/boot/dts/tegra114-dalmore.dts
>> b/arch/arm/boot/dts/tegra114-dalmore.dts
>> index eafff16765b4..5cdcedfc19cb 100644
>> --- a/arch/arm/boot/dts/tegra114-dalmore.dts
>> +++ b/arch/arm/boot/dts/tegra114-dalmore.dts
>> @@ -23,7 +23,8 @@
>>   stdout-path = "serial0:115200n8";
>>   };
>>
>> - memory {
>> + memory@8000 {
>> + device_type = "memory";
>>   reg = <0x8000 0x4000>;
>>   };
>>
>> diff --git a/arch/arm/boot/dts/tegra114-roth.dts
>> b/arch/arm/boot/dts/tegra114-roth.dts
>> index 7ed7370ee67a..b4f329a07c60 100644
>> --- a/arch/arm/boot/dts/tegra114-roth.dts
>> +++ b/arch/arm/boot/dts/tegra114-roth.dts
>> @@ -28,7 +28,8 @@
>>   };
>>   };
>>
>> - memory {
>> + memory@8000 {
>> + device_type = "memory";
>>   /* memory >= 0x7960 is reserved for firmware usage */
>>   reg = <0x8000 0x7960>;
>>   };
>> diff --git a/arch/arm/boot/dts/tegra114-tn7.dts
>> b/arch/arm/boot/dts/tegra114-tn7.dts
>> index 7fc4a8b31e45..12092d344ce8 100644
>> --- a/arch/arm/boot/dts/tegra114-tn7.dts
>> +++ b/arch/arm/boot/dts/tegra114-tn7.dts
>> @@ -28,7 +28,8 @@
>>   };
>>   };
>>
>> - memory {
>> + memory@8000 {
>> + device_type = "memory";
>>   /* memory >= 0x37e0 is reserved for firmware usage */
>>   reg = <0x8000 0x37e0>;
>>   };
>> diff --git a/arch/arm/boot/dts/tegra114.dtsi 
>> b/arch/arm/boot/dts/tegra114.dtsi
>> index 0e4a13295d8a..b917784d3f97 100644
>> --- a/arch/arm/boot/dts/tegra114.dtsi
>> +++ b/arch/arm/boot/dts/tegra114.dtsi
>> @@ -5,11 +5,11 @@
>>  #include 
>>  #include 
>>
>> -#include "skeleton.dtsi"
>> -
>>  / {
>>   compatible = "nvidia,tegra114";
>>   interrupt-parent = <&lic>;
>> + #address-cells = <1>;
>> + #size-cells = <1>;
>>
>>   host1x@5000 {
>>   compatible = "nvidia,tegra114-host1x", "simple-bus";
>> diff --git a/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>> b/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>> index bb67edb016c5..80b52c612891 100644
>> --- a/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>> +++ b/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>> @@ -15,7 +15,8 @@
>>   compatible = "toradex,apalis-tk1-v1.2", "toradex,apalis-tk1",
>>"nvidia,tegra124";
>>
>> - memory {
>> + memory@0 {
>> + device_type = "memory";
>>   reg = <0x0 0x8000 0x0 0x8000>;
>>   };
>>
>> diff --git a/arch/arm/boot/dts/tegra124-apalis.dtsi
>> b/arch/arm/boot/dts/tegra124-apalis.dtsi
>> index 65a2161b9b8e..3ca7601cafe9 100644
>> --- a/arch/arm/boot/dts/tegra124-apalis.dtsi
>> +++ b/arch/arm/boot/dts/tegra124-apalis.dtsi
>> @@ -50,7 +50,8 @@
>>   model = "Toradex Apalis TK1";
>>   compatible = "toradex,apalis-tk1", "nvidia,tegra124";
>>
>> - memory {
>> + memory@0 {
>> + device_type = "memory";
>>   reg = <0x0 0x8000 0x0 0x8000>;
>>   };
>>
>> diff

Re: [PATCH] kdump: add default crashkernel reserve kernel config options

2018-05-23 Thread Dave Young
[snip]

> >  
> > +config CRASHKERNEL_DEFAULT_THRESHOLD_MB
> > +   int "System memory size threshold for kdump memory default reserving"
> > +   depends on CRASH_CORE
> > +   default 0
> > +   help
> > + CRASHKERNEL_DEFAULT_MB is used as default crashkernel value if
> > + the system memory size is equal or bigger than the threshold.
> 
> "the threshold" is rather vague.  Can it be clarified?
> 
> In fact I'm really struggling to understand the logic here
> 
> 
> > +config CRASHKERNEL_DEFAULT_MB
> > +   int "Default crashkernel memory size reserved for kdump"
> > +   depends on CRASH_CORE
> > +   default 0
> > +   help
> > + This is used as the default kdump reserved memory size in MB.
> > + crashkernel=X kernel cmdline can overwrite this value.
> > +
> >  config HAVE_IMA_KEXEC
> > bool
> >  
> > @@ -143,6 +144,24 @@ static int __init parse_crashkernel_simp
> > return 0;
> >  }
> >  
> > +static int __init get_crashkernel_default(unsigned long long system_ram,
> > + unsigned long long *size)
> > +{
> > +   unsigned long long sz = CONFIG_CRASHKERNEL_DEFAULT_MB;
> > +   unsigned long long thres = CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB;
> > +
> > +   thres *= SZ_1M;
> > +   sz *= SZ_1M;
> > +
> > +   if (sz >= system_ram || system_ram < thres) {
> > +   pr_debug("crashkernel default size can not be used.\n");
> > +   return -EINVAL;
> 
> In other words,
> 
>   if (system_ram <= CONFIG_CRASHKERNEL_DEFAULT_MB ||
>   system_ram < CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB)
>   fail;
> 
> yes?
> 
> How come?  What's happening here?  Perhaps a (good) explanatory comment
> is needed.  And clearer Kconfig text.
> 
> All confused :(

Andrew, I tuned it a bit, removed the check of sz >= system_ram, so if
the size is too large and kernel can not find enough memory it will
still fail in latter code.

Is below version looks clearer?
---

This is a rework of the crashkernel=auto patches back to 2009 although
I'm not sure if below is the last version of the old effort:
https://lkml.org/lkml/2009/8/12/61
https://lwn.net/Articles/345344/

I changed the original design, instead of adding the auto reserve logic
in code, in this patch just introduce two kernel config options for
the default crashkernel value in MB and the threshold of system memory
in MB so that only reserve default when system memory is equal or
above the threshold.

Signed-off-by: Dave Young 
---
Another difference is with original design the crashkernel size scales
with system memory, according to test, large machine may need more
memory in kdump kernel because of several factors:
1. cpu numbers, because of the percpu memory allocated for cpus.
   (kdump can use nr_cpus=1 to workaround this, but some
arches do not support nr_cpus=X for example powerpc) 
2. IO devices, large system can have a lot of io devices, although we
   can try to only add those device drivers we needed, it is still a
   problem because of some built-in drivers, some stacked logical devices
   eg. device mapper devices, acpi etc.  Even if only considering the
   meta data for driver model it will still be a big number eg. sysfs
   files etc.
3. The minimum memory requirement for some device drivers are big, even
   if some of them have implemented low meory profile.  It is usual to see
   10M memory use for a storage driver.
4. user space initramfs size growing.  Busybox is not usable if we need
   to add udev support and some complicate storage support.  Use dracut
   with systemd, especially networking stuff need more memory.

So probably add another kernel config option to scale the memory size
eg.  CRASHKERNEL_DEFAULT_SCALE_RATIO is also good to have,  in RHEL we
use base_value + system_mem >> (2^14) for x86.  I'm still hesatating
how to describe and add this option. Any suggestions will be appreciated.

 arch/Kconfig|   17 +
 kernel/crash_core.c |   19 ++-
 2 files changed, 35 insertions(+), 1 deletion(-)

--- linux-x86.orig/arch/Kconfig
+++ linux-x86/arch/Kconfig
@@ -10,6 +10,23 @@ config KEXEC_CORE
select CRASH_CORE
bool
 
+config CRASHKERNEL_DEFAULT_THRESHOLD_MB
+   int "System memory size threshold for using CRASHKERNEL_DEFAULT_MB"
+   depends on CRASH_CORE
+   default 0
+   help
+ CRASHKERNEL_DEFAULT_MB will be reserved for kdump if the system
+ memory is above or equal to CRASHKERNEL_DEFAULT_THRESHOLD_MB MB.
+ It is only effective in case no crashkernel=X parameter is used.
+
+config CRASHKERNEL_DEFAULT_MB
+   int "Default crashkernel memory size reserved for kdump"
+   depends on CRASH_CORE
+   default 0
+   help
+ This is used as the default kdump reserved memory size in MB.
+ crashkernel=X kernel cmdline can overwrite this value.
+
 config HAVE_IMA_KEXEC
bool
 
--- linux-x86.orig/kernel/crash_core.c
+++ linux-x86/kernel/crash_core.c
@@ 

Re: [PATCH rdma-next 2/5] RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust

2018-05-23 Thread Wei Hu (Xavier)


On 2018/5/23 15:00, Leon Romanovsky wrote:
> On Wed, May 23, 2018 at 02:49:35PM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/23 14:05, Leon Romanovsky wrote:
>>> On Thu, May 17, 2018 at 04:02:50PM +0800, Wei Hu (Xavier) wrote:
 This patch modified uar allocation algorithm in hns_roce_uar_alloc
 function to avoid bitmap exhaust.

 Signed-off-by: Wei Hu (Xavier) 
 ---
  drivers/infiniband/hw/hns/hns_roce_device.h |  1 +
  drivers/infiniband/hw/hns/hns_roce_pd.c | 10 ++
  2 files changed, 7 insertions(+), 4 deletions(-)

 diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
 b/drivers/infiniband/hw/hns/hns_roce_device.h
 index 53c2f1b..412297d4 100644
 --- a/drivers/infiniband/hw/hns/hns_roce_device.h
 +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
 @@ -214,6 +214,7 @@ enum {
  struct hns_roce_uar {
u64 pfn;
unsigned long   index;
 +  unsigned long   logic_idx;
  };

  struct hns_roce_ucontext {
 diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c 
 b/drivers/infiniband/hw/hns/hns_roce_pd.c
 index 4b41e04..b9f2c87 100644
 --- a/drivers/infiniband/hw/hns/hns_roce_pd.c
 +++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
 @@ -107,13 +107,15 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, 
 struct hns_roce_uar *uar)
int ret = 0;

/* Using bitmap to manager UAR index */
 -  ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->index);
 +  ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->logic_idx);
if (ret == -1)
return -ENOMEM;

 -  if (uar->index > 0)
 -  uar->index = (uar->index - 1) %
 +  if (uar->logic_idx > 0 && hr_dev->caps.phy_num_uars > 1)
 +  uar->index = (uar->logic_idx - 1) %
 (hr_dev->caps.phy_num_uars - 1) + 1;
 +  else
 +  uar->index = 0;

>>> Sorry, but maybe I didn't understand this change fully, but logic_idx is
>>> not initialized at all and one of two (needs to check your uar
>>> allocation): the logic_idx is always zero -> index will be zero too,
>>> or logic_idx is random variable -> index will be random too.
>>>
>>> What did you want to do?
>>>
>> Hi, Leon
>>
>> The prototype of hns_roce_bitmap_alloc as belows:
>> int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap,
>> unsigned long *obj);
>> In this statement,  we evaluate uar->logic_idx.
>> ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap,
>> &uar->logic_idx);
>>
>> In hip06,  hr_dev->caps.phy_num_uars equals 8,
>> if (uar->logic_idx > 0)
>>  uar-> index = 0;
>>else
>>  uar-> index =(uar->logic_idx - 1) %
>> (hr_dev->caps.phy_num_uars - 1) + 1;
>> In hip08,  hr_dev->caps.phy_num_uars equals 1,  uar-> index = 0;
>>
>>Regards
> Where did you change/set logic_idx?
In hns_roce_uar_alloc,
ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->logic_idx);
In hns_roce_uar_free,
hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->logic_idx,
BITMAP_NO_RR);

Thanks
> Thanks
>
>
>> Wei Hu
if (!dev_is_pci(hr_dev->dev)) {
res = platform_get_resource(hr_dev->pdev, IORESOURCE_MEM, 0);
 @@ -132,7 +134,7 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, 
 struct hns_roce_uar *uar)

  void hns_roce_uar_free(struct hns_roce_dev *hr_dev, struct hns_roce_uar 
 *uar)
  {
 -  hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->index,
 +  hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->logic_idx,
 BITMAP_NO_RR);
  }

 --
 1.9.1

>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html




Re: WARNING and PANIC in irq_matrix_free

2018-05-23 Thread Tariq Toukan



On 19/05/2018 2:20 PM, Thomas Gleixner wrote:

On Fri, 18 May 2018, Dmitry Safonov wrote:

I'm not entirely sure that it's the same fault, but at least backtrace
looks resembling.


Yes, it's similar, but not the same issue. I'll stare are the code ...

Thanks,

tglx



We still see the issue in our daily regression runs.
I have your patch merged into my internal branch, it prints the following:

[ 4898.226258] Trying to clear prev_vector: 0
[ 4898.226439] Trying to clear prev_vector: 0

i.e. vector(0) is lower than FIRST_EXTERNAL_VECTOR.


Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-23 Thread Jan Beulich
>>> On 22.05.18 at 19:10,  wrote:
> On 05/22/2018 12:32 PM, Jan Beulich wrote:
> On 22.05.18 at 18:20,  wrote:
>>> We are loading virtual address for $canary so we will always have EDX
>>> set to 0x. Isn't that what we want?
>> Oh, that's rather confusing - we're still running on the low 1:1
>> mapping when we're here. But yes, by the time we enter C code
>> (where the GS base starts to matter) we ought to be on the high
>> mappings - if only there wasn't xen_prepare_pvh().
> 
> xen_prepare_pvh() (and whatever it might call) is the only reason for
> this patch to exist. It's the only C call that we are making before
> jumping to startup_64, which I assume will have to set up GS itself
> before calling into C.
> 
> I didn't realize we are still on identity mapping. I'll clear EDX (and
> load $_pa(canary)) then.
> 
> BTW, don't we have the same issue in startup_xen()?

I don't think so, no - there we're on the high mappings already (the
ELF note specifies the virtual address of the entry point, after all).

Jan




Re: [RFC V4 PATCH 7/8] vhost: packed ring support

2018-05-23 Thread Wei Xu
On Wed, May 23, 2018 at 09:39:28AM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月23日 00:54, Wei Xu wrote:
> >On Wed, May 16, 2018 at 08:32:20PM +0800, Jason Wang wrote:
> >>Signed-off-by: Jason Wang 
> >>---
> >>  drivers/vhost/net.c   |   3 +-
> >>  drivers/vhost/vhost.c | 539 
> >> ++
> >>  drivers/vhost/vhost.h |   8 +-
> >>  3 files changed, 513 insertions(+), 37 deletions(-)
> >>
> >>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>index 8304c30..f2a0f5b 100644
> >>--- a/drivers/vhost/vhost.c
> >>+++ b/drivers/vhost/vhost.c
> >>@@ -1358,6 +1382,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned 
> >>int ioctl, void __user *arg
> >>break;
> >>}
> >>vq->last_avail_idx = s.num;
> >>+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> >>+   vq->avail_wrap_counter = s.num >> 31;
> >>/* Forget the cached index value. */
> >>vq->avail_idx = vq->last_avail_idx;
> >>break;
> >>@@ -1366,6 +1392,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned 
> >>int ioctl, void __user *arg
> >>s.num = vq->last_avail_idx;
> >>if (copy_to_user(argp, &s, sizeof s))
> >>r = -EFAULT;
> >>+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> >>+   s.num |= vq->avail_wrap_counter << 31;
> >>break;
> >>case VHOST_SET_VRING_ADDR:
> >>if (copy_from_user(&a, argp, sizeof a)) {
> >'last_used_idx' also needs to be saved/restored here.
> >
> >I have figured out the root cause of broken device after reloading
> >'virtio-net' module, all indices have been reset for a reloading but
> >'last_used_idx' is not properly reset in this case. This confuses
> >handle_rx()/tx().
> >
> >Wei
> >
> 
> Good catch, so we probably need a new ioctl to sync between qemu and vhost.
> 
> Something like VHOST_SET/GET_USED_BASE.

Sure, or can we expand 'vhost_vring_state' to keep them done in a bunch?

> 
> Thanks
> 


Re: [PATCH v8] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()

2018-05-23 Thread Nicholas Piggin
On Wed, 23 May 2018 09:01:19 +0200 (CEST)
Christophe Leroy  wrote:

> Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
> userspace instruction miss") has shown that limiting the read of
> faulting instruction to likely cases improves performance.
> 
> This patch goes further into this direction by limiting the read
> of the faulting instruction to the only cases where it is likely
> needed.
> 
> On an MPC885, with the same benchmark app as in the commit referred
> above, we see a reduction of about 3900 dTLB misses (approx 3%):
> 
> Before the patch:
>  Performance counter stats for './fault 500' (10 runs):
> 
>  683033312  cpu-cycles
> ( +-  0.03% )
> 134538  dTLB-load-misses  
> ( +-  0.03% )
>  46099  iTLB-load-misses  
> ( +-  0.02% )
>  19681  faults
> ( +-  0.02% )
> 
>5.389747878 seconds time elapsed   
>( +-  0.06% )
> 
> With the patch:
> 
>  Performance counter stats for './fault 500' (10 runs):
> 
>  682112862  cpu-cycles
> ( +-  0.03% )
> 130619  dTLB-load-misses  
> ( +-  0.03% )
>  46073  iTLB-load-misses  
> ( +-  0.05% )
>  19681  faults
> ( +-  0.01% )
> 
>5.381342641 seconds time elapsed   
>( +-  0.07% )
> 
> The proper work of the huge stack expansion was tested with the
> following app:
> 
> int main(int argc, char **argv)
> {
>   char buf[1024 * 1025];
> 
>   sprintf(buf, "Hello world !\n");
>   printf(buf);
> 
>   exit(0);
> }
> 
> Signed-off-by: Christophe Leroy 
> ---
>  v8: Back to a single patch as it now makes no sense to split the first part 
> in two. The third patch has no
>  dependencies with the ones before, so it will be resend independantly. 
> As suggested by Nicholas, the
>  patch now does the get_user() stuff inside bad_stack_expansion(), that's 
> a mid way between v5 and v7.
> 
>  v7: Following comment from Nicholas on v6 on possibility of the page getting 
> removed from the pagetables
>  between the fault and the read, I have reworked the patch in order to do 
> the get_user() in
>  __do_page_fault() directly in order to reduce complexity compared to 
> version v5
> 
>  v6: Rebased on latest powerpc/merge branch ; Using __get_user_inatomic() 
> instead of get_user() in order
>  to move it inside the semaphored area. That removes all the complexity 
> of the patch.
> 
>  v5: Reworked to fit after Benh do_fault improvement and rebased on top of 
> powerpc/merge (65152902e43fef)
> 
>  v4: Rebased on top of powerpc/next (f718d426d7e42e) and doing access_ok() 
> verification before __get_user_xxx()
> 
>  v3: Do a first try with pagefault disabled before releasing the semaphore
> 
>  v2: Changes 'if (cond1) if (cond2)' by 'if (cond1 && cond2)'
> 
>  arch/powerpc/mm/fault.c | 63 
> +++--
>  1 file changed, 45 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 0c99f9b45e8f..7f9363879f4a 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -66,15 +66,11 @@ static inline bool notify_page_fault(struct pt_regs *regs)
>  }
>  
>  /*
> - * Check whether the instruction at regs->nip is a store using
> + * Check whether the instruction inst is a store using
>   * an update addressing form which will update r1.
>   */
> -static bool store_updates_sp(struct pt_regs *regs)
> +static bool store_updates_sp(unsigned int inst)
>  {
> - unsigned int inst;
> -
> - if (get_user(inst, (unsigned int __user *)regs->nip))
> - return false;
>   /* check for 1 in the rA field */
>   if (((inst >> 16) & 0x1f) != 1)
>   return false;
> @@ -233,9 +229,10 @@ static bool bad_kernel_fault(bool is_exec, unsigned long 
> error_code,
>   return is_exec || (address >= TASK_SIZE);
>  }
>  
> -static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
> - struct vm_area_struct *vma,
> - bool store_update_sp)
> +/* Return value is true if bad (sem. released), false if good, -1 for retry 
> */
> +static int bad_stack_expansion(struct pt_regs *regs, unsigned long address,
> + struct vm_area_struct *vma, unsigned int flags,
> + bool is_retry)
>  {
>   /*
>* N.B. The POWER/Open ABI allows programs to access up to
> @@ -247,10 +244,15 @@ static bool bad_stack_exp

Re: [PATCH 1/2] rtc: st-lpc: fix possible race condition

2018-05-23 Thread Patrice CHOTARD
Hi Alexandre

On 05/20/2018 02:33 PM, Alexandre Belloni wrote:
> The IRQ is requested before the struct rtc is allocated and registered, but
> this struct is used in the IRQ handler. This may lead to a NULL pointer
> dereference.
> 
> Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
> before requesting the IRQ.
> 
> Signed-off-by: Alexandre Belloni 
> ---
>   drivers/rtc/rtc-st-lpc.c | 24 +---
>   1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-st-lpc.c b/drivers/rtc/rtc-st-lpc.c
> index d5222667f892..2f1ef2c28740 100644
> --- a/drivers/rtc/rtc-st-lpc.c
> +++ b/drivers/rtc/rtc-st-lpc.c
> @@ -212,6 +212,10 @@ static int st_rtc_probe(struct platform_device *pdev)
>   if (!rtc)
>   return -ENOMEM;
>   
> + rtc->rtc_dev = devm_rtc_allocate_device(&pdev->dev);
> + if (IS_ERR(rtc->rtc_dev))
> + return PTR_ERR(rtc->rtc_dev);
> +
>   spin_lock_init(&rtc->lock);
>   
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> @@ -253,26 +257,17 @@ static int st_rtc_probe(struct platform_device *pdev)
>   
>   platform_set_drvdata(pdev, rtc);
>   
> - rtc->rtc_dev = rtc_device_register("st-lpc-rtc", &pdev->dev,
> -&st_rtc_ops, THIS_MODULE);
> - if (IS_ERR(rtc->rtc_dev)) {
> + rtc->rtc_dev->ops = &st_rtc_ops;
> +
> + ret = rtc_register_device(rtc->rtc_dev);
> + if (ret) {
>   clk_disable_unprepare(rtc->clk);
> - return PTR_ERR(rtc->rtc_dev);
> + return ret;
>   }
>   
>   return 0;
>   }
>   
> -static int st_rtc_remove(struct platform_device *pdev)
> -{
> - struct st_rtc *rtc = platform_get_drvdata(pdev);
> -
> - if (likely(rtc->rtc_dev))
> - rtc_device_unregister(rtc->rtc_dev);
> -
> - return 0;
> -}
> -
>   #ifdef CONFIG_PM_SLEEP
>   static int st_rtc_suspend(struct device *dev)
>   {
> @@ -325,7 +320,6 @@ static struct platform_driver st_rtc_platform_driver = {
>   .of_match_table = st_rtc_match,
>   },
>   .probe = st_rtc_probe,
> - .remove = st_rtc_remove,
>   };
>   
>   module_platform_driver(st_rtc_platform_driver);
> 

Acked-by: Patrice Chotard 

Thanks

Patrice

Re: [PATCH] irqchip: gpcv2: remove unnecessary functions

2018-05-23 Thread Marc Zyngier
On Wed, 23 May 2018 07:23:00 +0100,
Anson Huang wrote:
> 
> GPC is in always-on domain, it never lost its
> content during suspend/resume, so no need to
> do save/restore for it during suspend/resume.
> 
> Signed-off-by: Anson Huang 
> ---
>  drivers/irqchip/irq-imx-gpcv2.c | 41 
> -
>  1 file changed, 41 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c
> index 4760307..e6025d9 100644
> --- a/drivers/irqchip/irq-imx-gpcv2.c
> +++ b/drivers/irqchip/irq-imx-gpcv2.c
> @@ -28,46 +28,6 @@ struct gpcv2_irqchip_data {
>  
>  static struct gpcv2_irqchip_data *imx_gpcv2_instance;
>  
> -static int gpcv2_wakeup_source_save(void)
> -{
> - struct gpcv2_irqchip_data *cd;
> - void __iomem *reg;
> - int i;
> -
> - cd = imx_gpcv2_instance;
> - if (!cd)
> - return 0;
> -
> - for (i = 0; i < IMR_NUM; i++) {
> - reg = cd->gpc_base + cd->cpu2wakeup + i * 4;
> - cd->saved_irq_mask[i] = readl_relaxed(reg);
> - writel_relaxed(cd->wakeup_sources[i], reg);
> - }

If you're removing that code, what's the purpose of keeping
saved_irq_mask?

Also, who is now programming the wake-up_source? For good or bad
reasons, this drivers uses the save/restore hooks to program the
wake-up state. Removing this code seem to simply kill the feature.

What am I missing?

Thanks,

M.

-- 
Jazz is not dead, it just smell funny.


Re: [PATCH rdma-next 2/5] RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust

2018-05-23 Thread Leon Romanovsky
On Wed, May 23, 2018 at 03:12:45PM +0800, Wei Hu (Xavier) wrote:
>
>
> On 2018/5/23 15:00, Leon Romanovsky wrote:
> > On Wed, May 23, 2018 at 02:49:35PM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2018/5/23 14:05, Leon Romanovsky wrote:
> >>> On Thu, May 17, 2018 at 04:02:50PM +0800, Wei Hu (Xavier) wrote:
>  This patch modified uar allocation algorithm in hns_roce_uar_alloc
>  function to avoid bitmap exhaust.
> 
>  Signed-off-by: Wei Hu (Xavier) 
>  ---
>   drivers/infiniband/hw/hns/hns_roce_device.h |  1 +
>   drivers/infiniband/hw/hns/hns_roce_pd.c | 10 ++
>   2 files changed, 7 insertions(+), 4 deletions(-)
> 
>  diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
>  b/drivers/infiniband/hw/hns/hns_roce_device.h
>  index 53c2f1b..412297d4 100644
>  --- a/drivers/infiniband/hw/hns/hns_roce_device.h
>  +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>  @@ -214,6 +214,7 @@ enum {
>   struct hns_roce_uar {
>   u64 pfn;
>   unsigned long   index;
>  +unsigned long   logic_idx;
>   };
> 
>   struct hns_roce_ucontext {
>  diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c 
>  b/drivers/infiniband/hw/hns/hns_roce_pd.c
>  index 4b41e04..b9f2c87 100644
>  --- a/drivers/infiniband/hw/hns/hns_roce_pd.c
>  +++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
>  @@ -107,13 +107,15 @@ int hns_roce_uar_alloc(struct hns_roce_dev 
>  *hr_dev, struct hns_roce_uar *uar)
>   int ret = 0;
> 
>   /* Using bitmap to manager UAR index */
>  -ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, 
>  &uar->index);
>  +ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, 
>  &uar->logic_idx);
>   if (ret == -1)
>   return -ENOMEM;
> 
>  -if (uar->index > 0)
>  -uar->index = (uar->index - 1) %
>  +if (uar->logic_idx > 0 && hr_dev->caps.phy_num_uars > 1)
>  +uar->index = (uar->logic_idx - 1) %
>    (hr_dev->caps.phy_num_uars - 1) + 1;
>  +else
>  +uar->index = 0;
> 
> >>> Sorry, but maybe I didn't understand this change fully, but logic_idx is
> >>> not initialized at all and one of two (needs to check your uar
> >>> allocation): the logic_idx is always zero -> index will be zero too,
> >>> or logic_idx is random variable -> index will be random too.
> >>>
> >>> What did you want to do?
> >>>
> >> Hi, Leon
> >>
> >> The prototype of hns_roce_bitmap_alloc as belows:
> >> int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap,
> >> unsigned long *obj);
> >> In this statement,  we evaluate uar->logic_idx.
> >> ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap,
> >> &uar->logic_idx);
> >>
> >> In hip06,  hr_dev->caps.phy_num_uars equals 8,
> >> if (uar->logic_idx > 0)
> >>  uar-> index = 0;
> >>else
> >>  uar-> index =(uar->logic_idx - 1) %
> >> (hr_dev->caps.phy_num_uars - 1) + 1;
> >> In hip08,  hr_dev->caps.phy_num_uars equals 1,  uar-> index = 0;
> >>
> >>Regards
> > Where did you change/set logic_idx?
> In hns_roce_uar_alloc,
> ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->logic_idx);
> In hns_roce_uar_free,
> hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->logic_idx,
> BITMAP_NO_RR);
>

I see it, thanks
Reviewed-by: Leon Romanovsky 


signature.asc
Description: PGP signature


Re: [PATCH v2] rtc: st-lpc: add range

2018-05-23 Thread Patrice CHOTARD
Hi Alexandre

On 05/21/2018 10:49 PM, Alexandre Belloni wrote:
> The RTC has a 64 bit counter.
> 
> Signed-off-by: Alexandre Belloni 
> ---
>   drivers/rtc/rtc-st-lpc.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/rtc/rtc-st-lpc.c b/drivers/rtc/rtc-st-lpc.c
> index 2f1ef2c28740..bee75ca7ff79 100644
> --- a/drivers/rtc/rtc-st-lpc.c
> +++ b/drivers/rtc/rtc-st-lpc.c
> @@ -258,6 +258,8 @@ static int st_rtc_probe(struct platform_device *pdev)
>   platform_set_drvdata(pdev, rtc);
>   
>   rtc->rtc_dev->ops = &st_rtc_ops;
> + rtc->rtc_dev->range_max = U64_MAX;
> + do_div(rtc->rtc_dev->range_max, rtc->clkrate);
>   
>   ret = rtc_register_device(rtc->rtc_dev);
>   if (ret) {
> 

Acked-by: Patrice Chotard 

Patrice

Re: [PATCH v6 2/6] dt-bindings: Add the rzn1-clocks.h file

2018-05-23 Thread Geert Uytterhoeven
Hi Michel,

On Wed, May 23, 2018 at 8:44 AM, M P  wrote:
> On Tue, 22 May 2018 at 19:44, Geert Uytterhoeven 
> wrote:
>> On Tue, May 22, 2018 at 12:01 PM, Michel Pollet
>>  wrote:
>> > This adds the constants necessary to use the renesas,rzn1-clocks driver.
>> >
>> > Signed-off-by: Michel Pollet 

>> > --- /dev/null
>> > +++ b/include/dt-bindings/clock/rzn1-clocks.h
>
>> Given this is part of the DT ABI, and there exist multiple different RZ/N1
>> SoCs (and there are probably planned more), I wouldn't call this header
>> file "rzn1-clocks.h", but e.g. "r9a06g032-clocks.h".
>
> Actually, no, there already are two r906g03X devices that will work
> perfectly fine with this driver. We had that discussion before, and you
> insist and me removing mentions of the rzn1 everywhere, however, this
> applies to *two* devices already, and I'm supposed to upstream support for
> them. I can't rename r9g06g032 because it is *inexact* that's why it's

My worry is not that there are two r906g03X devices that will work fine
with this driver, but that there will be other "rzn1" devices that will not
work with these bindings (the header file is part of the bindings).
Besides, RZ/N1D and RZ/N1S (Which apparently differ in packaging only?
Oh no, RZ/N1D (the larger package) has less QSPI channels than RZ/N1S
(the smaller package)), there's also (at least) RZ/N1L.

> called rzn1. So unless you let me call it r9a06g0xx-clocks.h (which i know
> you won't as per multiple previous discussions) this can't be called
> r9a06g032 because it won't be fit for my purpose when I try to bring back
> the RZ/N1S into the picture.

You can add r9a06g033-clocks.h when adding support for RZ/N1S.

> There are minor difference to clocking,

Aha?

> I don't know if Renesas plans to release any more rzn1's in this series,
> but my little finger tells me this isn't the case. But regardless of what

We thought the same thing when the first RZ member (RZ/A1H) showed up.
Did we know this was not going to be the first SoC of a new RZ family, but
the first SoC of the first subfamily (RZ/A) of the RZ family... And the
various subfamilies bear not much similarity.

> we plan, Marketing will screw it up.

Correct. And to mitigate that, we have no other choice than to use the real
part numbers to differentiate. Once bitten, twice shy.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH][V2] net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message

2018-05-23 Thread Tariq Toukan



On 22/05/2018 6:42 PM, Colin King wrote:

From: Colin Ian King 

Trivial fix to spelling mistake in mlx4_dbg debug message and also
change the phrasing of the message so that is is more readable

Signed-off-by: Colin Ian King 

---
V2: rephrase message, as helpfully suggested by Tariq Toukan
---
  drivers/net/ethernet/mellanox/mlx4/intf.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c 
b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 2edcce98ab2d..65482f004e50 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -172,7 +172,7 @@ int mlx4_do_bond(struct mlx4_dev *dev, bool enable)
list_add_tail(&dev_ctx->list, &priv->ctx_list);
spin_unlock_irqrestore(&priv->ctx_lock, flags);
  
-		mlx4_dbg(dev, "Inrerface for protocol %d restarted with when bonded mode is %s\n",

+   mlx4_dbg(dev, "Interface for protocol %d restarted with bonded mode 
%s\n",
 dev_ctx->intf->protocol, enable ?
 "enabled" : "disabled");
}



Thanks Colin.

Reviewed-by: Tariq Toukan 



lening

2018-05-23 Thread Funding Trusts Finance


Goede dag,
  
We zijn Funding Trusts Finance verstrekt leningen per postadvertentie. Wij 
bieden verschillende soorten leningen of projectleningen (korte en lange 
termijnleningen, persoonlijke leningen, leningen aan bedrijven enz.) Met een 
rentetarief van 3%. We verstrekken leningen aan mensen in nood, ongeacht hun 
locatie, geslacht, burgerlijke staat, opleiding of baan, maar moeten een legale 
manier van terugbetaling hebben. Onze leningen variëren tussen 5.000,00 
tot 20.000.000,00 US Dollar of Euro of Pond met een maximale duur van 15 jaar. 
Als u geïnteresseerd bent in meer informatie, hebben we investeerders die 
geïnteresseerd zijn in het financieren van projecten van groot volume. De 
procedures zijn als volgt: -

1-De klant moet een korte samenvatting van het project verzenden. Dit moet het 
totale bedrag omvatten dat vereist is voor het project, geschat rendement op 
investering, terugbetalingsperiode van de lening, dit mag niet meer dan 20 jaar 
zijn

Neem contact met ons op: i...@fundingtrustsfinance.com

INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigde hoeveelheid: ...
Looptijd: ...
Beroep: ...
Maandelijks inkomensniveau: ..
Geslacht: ..
Geboortedatum: ...
Staat: ...
Land: .
Doel: .

"Miljoenen mensen in de wereld hebben een kredietprobleem van een of andere 
vorm. Je bent niet de enige. We hebben een hele reeks leningopties die kunnen 
helpen. Ontdek nu je opties!"

Met vriendelijke groet,
Ronny Hens,
E-mail: i...@fundingtrustsfinance.com
WEBSITE: www.fundingtrustfinance.com


Re: [PATCH v8] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()

2018-05-23 Thread Christophe LEROY



Le 23/05/2018 à 09:17, Nicholas Piggin a écrit :

On Wed, 23 May 2018 09:01:19 +0200 (CEST)
Christophe Leroy  wrote:


Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
userspace instruction miss") has shown that limiting the read of
faulting instruction to likely cases improves performance.

This patch goes further into this direction by limiting the read
of the faulting instruction to the only cases where it is likely
needed.

On an MPC885, with the same benchmark app as in the commit referred
above, we see a reduction of about 3900 dTLB misses (approx 3%):

Before the patch:
  Performance counter stats for './fault 500' (10 runs):

  683033312  cpu-cycles 
   ( +-  0.03% )
 134538  dTLB-load-misses   
   ( +-  0.03% )
  46099  iTLB-load-misses   
   ( +-  0.02% )
  19681  faults 
   ( +-  0.02% )

5.389747878 seconds time elapsed
  ( +-  0.06% )

With the patch:

  Performance counter stats for './fault 500' (10 runs):

  682112862  cpu-cycles 
   ( +-  0.03% )
 130619  dTLB-load-misses   
   ( +-  0.03% )
  46073  iTLB-load-misses   
   ( +-  0.05% )
  19681  faults 
   ( +-  0.01% )

5.381342641 seconds time elapsed
  ( +-  0.07% )

The proper work of the huge stack expansion was tested with the
following app:

int main(int argc, char **argv)
{
char buf[1024 * 1025];

sprintf(buf, "Hello world !\n");
printf(buf);

exit(0);
}

Signed-off-by: Christophe Leroy 
---
  v8: Back to a single patch as it now makes no sense to split the first part 
in two. The third patch has no
  dependencies with the ones before, so it will be resend independantly. As 
suggested by Nicholas, the
  patch now does the get_user() stuff inside bad_stack_expansion(), that's 
a mid way between v5 and v7.

  v7: Following comment from Nicholas on v6 on possibility of the page getting 
removed from the pagetables
  between the fault and the read, I have reworked the patch in order to do 
the get_user() in
  __do_page_fault() directly in order to reduce complexity compared to 
version v5

  v6: Rebased on latest powerpc/merge branch ; Using __get_user_inatomic() 
instead of get_user() in order
  to move it inside the semaphored area. That removes all the complexity of 
the patch.

  v5: Reworked to fit after Benh do_fault improvement and rebased on top of 
powerpc/merge (65152902e43fef)

  v4: Rebased on top of powerpc/next (f718d426d7e42e) and doing access_ok() 
verification before __get_user_xxx()

  v3: Do a first try with pagefault disabled before releasing the semaphore

  v2: Changes 'if (cond1) if (cond2)' by 'if (cond1 && cond2)'

  arch/powerpc/mm/fault.c | 63 +++--
  1 file changed, 45 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 0c99f9b45e8f..7f9363879f4a 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -66,15 +66,11 @@ static inline bool notify_page_fault(struct pt_regs *regs)
  }
  
  /*

- * Check whether the instruction at regs->nip is a store using
+ * Check whether the instruction inst is a store using
   * an update addressing form which will update r1.
   */
-static bool store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(unsigned int inst)
  {
-   unsigned int inst;
-
-   if (get_user(inst, (unsigned int __user *)regs->nip))
-   return false;
/* check for 1 in the rA field */
if (((inst >> 16) & 0x1f) != 1)
return false;
@@ -233,9 +229,10 @@ static bool bad_kernel_fault(bool is_exec, unsigned long 
error_code,
return is_exec || (address >= TASK_SIZE);
  }
  
-static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,

-   struct vm_area_struct *vma,
-   bool store_update_sp)
+/* Return value is true if bad (sem. released), false if good, -1 for retry */
+static int bad_stack_expansion(struct pt_regs *regs, unsigned long address,
+   struct vm_area_struct *vma, unsigned int flags,
+   bool is_retry)
  {
/*
 * N.B. The POWER/Open ABI allows programs to access up to
@@ -247,10 +244,15 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
unsigned long address,
 * expand to 1MB without further checks.
 */
if (address +

lening

2018-05-23 Thread Funding Trusts Finance


Goede dag,
  
We zijn Funding Trusts Finance verstrekt leningen per postadvertentie. Wij 
bieden verschillende soorten leningen of projectleningen (korte en lange 
termijnleningen, persoonlijke leningen, leningen aan bedrijven enz.) Met een 
rentetarief van 3%. We verstrekken leningen aan mensen in nood, ongeacht hun 
locatie, geslacht, burgerlijke staat, opleiding of baan, maar moeten een legale 
manier van terugbetaling hebben. Onze leningen variëren tussen 5.000,00 
tot 20.000.000,00 US Dollar of Euro of Pond met een maximale duur van 15 jaar. 
Als u geïnteresseerd bent in meer informatie, hebben we investeerders die 
geïnteresseerd zijn in het financieren van projecten van groot volume. De 
procedures zijn als volgt: -

1-De klant moet een korte samenvatting van het project verzenden. Dit moet het 
totale bedrag omvatten dat vereist is voor het project, geschat rendement op 
investering, terugbetalingsperiode van de lening, dit mag niet meer dan 20 jaar 
zijn

Neem contact met ons op: i...@fundingtrustsfinance.com

INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigde hoeveelheid: ...
Looptijd: ...
Beroep: ...
Maandelijks inkomensniveau: ..
Geslacht: ..
Geboortedatum: ...
Staat: ...
Land: .
Doel: .

"Miljoenen mensen in de wereld hebben een kredietprobleem van een of andere 
vorm. Je bent niet de enige. We hebben een hele reeks leningopties die kunnen 
helpen. Ontdek nu je opties!"

Met vriendelijke groet,
Ronny Hens,
E-mail: i...@fundingtrustsfinance.com
WEBSITE: www.fundingtrustfinance.com


Re: [PATCH net-next v2] net: sched: don't disable bh when accessing action idr

2018-05-23 Thread Jiri Pirko
Mon, May 21, 2018 at 10:03:04PM CEST, vla...@mellanox.com wrote:
>Initial net_device implementation used ingress_lock spinlock to synchronize
>ingress path of device. This lock was used in both process and bh context.
>In some code paths action map lock was obtained while holding ingress_lock.
>Commit e1e992e52faa ("[NET_SCHED] protect action config/dump from irqs")
>modified actions to always disable bh, while using action map lock, in
>order to prevent deadlock on ingress_lock in softirq. This lock was removed
>from net_device, so disabling bh, while accessing action map, is no longer
>necessary.
>
>Replace all action idr spinlock usage with regular calls that do not
>disable bh.
>
>Signed-off-by: Vlad Buslov 

Please add my tag to v3, with the description changes requested by Cong.
Acked-by: Jiri Pirko 

Thanks!


RE: [PATCH] irqchip: gpcv2: remove unnecessary functions

2018-05-23 Thread Anson Huang
Hi, Marc

Anson Huang
Best Regards!


> -Original Message-
> From: Marc Zyngier [mailto:marc.zyng...@arm.com]
> Sent: Wednesday, May 23, 2018 3:23 PM
> To: Anson Huang 
> Cc: t...@linutronix.de; ja...@lakedaemon.net; dl-linux-imx
> ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] irqchip: gpcv2: remove unnecessary functions
> 
> On Wed, 23 May 2018 07:23:00 +0100,
> Anson Huang wrote:
> >
> > GPC is in always-on domain, it never lost its content during
> > suspend/resume, so no need to do save/restore for it during
> > suspend/resume.
> >
> > Signed-off-by: Anson Huang 
> > ---
> >  drivers/irqchip/irq-imx-gpcv2.c | 41
> > -
> >  1 file changed, 41 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-imx-gpcv2.c
> > b/drivers/irqchip/irq-imx-gpcv2.c index 4760307..e6025d9 100644
> > --- a/drivers/irqchip/irq-imx-gpcv2.c
> > +++ b/drivers/irqchip/irq-imx-gpcv2.c
> > @@ -28,46 +28,6 @@ struct gpcv2_irqchip_data {
> >
> >  static struct gpcv2_irqchip_data *imx_gpcv2_instance;
> >
> > -static int gpcv2_wakeup_source_save(void) -{
> > -   struct gpcv2_irqchip_data *cd;
> > -   void __iomem *reg;
> > -   int i;
> > -
> > -   cd = imx_gpcv2_instance;
> > -   if (!cd)
> > -   return 0;
> > -
> > -   for (i = 0; i < IMR_NUM; i++) {
> > -   reg = cd->gpc_base + cd->cpu2wakeup + i * 4;
> > -   cd->saved_irq_mask[i] = readl_relaxed(reg);
> > -   writel_relaxed(cd->wakeup_sources[i], reg);
> > -   }
> 
> If you're removing that code, what's the purpose of keeping saved_irq_mask?
> 
> Also, who is now programming the wake-up_source? For good or bad reasons,
> this drivers uses the save/restore hooks to program the wake-up state.
> Removing this code seem to simply kill the feature.
> 
> What am I missing?
> 
> Thanks,

I made a mistake here, forgot to program the wakeup source into GPC IMR register
In imx_gpcv2_irq_set_wake function.
And I think we can remove the saved_irq_mask as well, will do it in V2.

The wake-up source is programmed by module driver calling "enable_irq_wake" if
device wakeup capability is enabled, and I miss to program into GPC IMR 
register. Will
fix it later. Thanks.

Anson.

> 
>   M.
> 
> --
> Jazz is not dead, it just smell funny.


Re: [PATCH v2 02/16] arm64: dts: marvell: fix CP110 ICU node size

2018-05-23 Thread Gregory CLEMENT
Hi Miquel,
 
 On mar., mai 22 2018, Miquel Raynal  wrote:

> ICU size in CP110 is not 0x10 but at least 0x440 bytes long (from the
> specification).
>
> Fixes: 6ef84a827c37 ("arm64: dts: marvell: enable GICP and ICU on Armada 
> 7K/8K")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Miquel Raynal 
> Reviewed-by: Thomas Petazzoni 

Applied on mvebu/fixes

Thanks,

Gregory

> ---
>  arch/arm64/boot/dts/marvell/armada-cp110.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/marvell/armada-cp110.dtsi 
> b/arch/arm64/boot/dts/marvell/armada-cp110.dtsi
> index 48cad7919efa..9fa41c54f69c 100644
> --- a/arch/arm64/boot/dts/marvell/armada-cp110.dtsi
> +++ b/arch/arm64/boot/dts/marvell/armada-cp110.dtsi
> @@ -146,7 +146,7 @@
>  
>   CP110_LABEL(icu): interrupt-controller@1e {
>   compatible = "marvell,cp110-icu";
> - reg = <0x1e 0x10>;
> + reg = <0x1e 0x440>;
>   #interrupt-cells = <3>;
>   interrupt-controller;
>   msi-parent = <&gicp>;
> -- 
> 2.14.1
>

-- 
Gregory Clement, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


Re: [RFC PATCH 5/5] remoteproc: qcom: Introduce Hexagon V5 based WCSS driver

2018-05-23 Thread Vinod
On 22-05-18, 23:58, Bjorn Andersson wrote:
> On Tue 22 May 23:05 PDT 2018, Vinod wrote:
> 
> > On 22-05-18, 22:20, Bjorn Andersson wrote:
> > 
> > > +static int q6v5_wcss_reset(struct q6v5_wcss *wcss)
> > > +{
> > > + int ret;
> > > + u32 val;
> > > + int i;
> > > +
> > > + /* Assert resets, stop core */
> > > + val = readl(wcss->reg_base + QDSP6SS_RESET_REG);
> > > + val |= Q6SS_CORE_ARES | Q6SS_BUS_ARES_ENABLE | Q6SS_STOP_CORE;
> > > + writel(val, wcss->reg_base + QDSP6SS_RESET_REG);
> > > +
> > > + /* BHS require xo cbcr to be enabled */
> > > + val = readl(wcss->reg_base + QDSP6SS_XO_CBCR);
> > > + val |= 0x1;
> > > + writel(val, wcss->reg_base + QDSP6SS_XO_CBCR);
> > 
> > As commented on previous patch, it would help IMO to add a modify() wrapper
> > here which would perform read, modify and write.
> > 
> 
> Iirc the code ended up like this because a lot of these operations ended
> up being line wrapped and harder to read using some modify(reg, mask,
> val) helper. That said, the function isn't very pretty in it's current
> state either...

Agreed :) and i thought modify make help it make better

> One of the parts of the RFC is that this sequence is a verbatim copy
> from the qcom_q6v5_pil.c driver for 8996, so if we find this duplication
> suitable I would prefer that we keep them the same.
> 
> 
> The alternative to duplicating this function is as Sricharan proposed to
> have the qcom_q6v5_pil.c be both a driver for both the single-stage
> remoteproc and the two-stage (load boot loader, then modem firmware).
> 
> > Looking at the patch, few other comments would be applicable too, so would 
> > be
> > great if you/Sricharan can update this
> > 
> 
> I agree, the primary purpose of this patch was rather to get feedback on
> the structure of the drivers, I do expect this to take another round
> through the editor to get some polishing touches. Sorry if this wasn't
> clear from the description.

Since Sricharan replied to comments, I though they would be fixed. Yeah this is
fine from RFC..

-- 
~Vinod


[PATCH] userfaultfd: prevent non-cooperative events vs mcopy_atomic races

2018-05-23 Thread Mike Rapoport
If a process monitored with userfaultfd changes it's memory mappings or
forks() at the same time as uffd monitor fills the process memory with
UFFDIO_COPY, the actual creation of page table entries and copying of the
data in mcopy_atomic may happen either before of after the memory mapping
modifications and there is no way for the uffd monitor to maintain
consistent view of the process memory layout.

For instance, let's consider fork() running in parallel with
userfaultfd_copy():

process  |  uffd monitor
-+--
fork()   | userfaultfd_copy()
...  | ...
dup_mmap()   | down_read(mmap_sem)
down_write(mmap_sem) | /* create PTEs, copy data */
dup_uffd()   | up_read(mmap_sem)
copy_page_range()|
up_write(mmap_sem)   |
dup_uffd_complete()  |
/* notify monitor */ |

If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will be
present by the time copy_page_range() is called and they will appear in the
child's memory mappings. However, if the fork() is the first to take the
mmap_sem, the new pages won't be mapped in the child's address space.

Since userfaultfd monitor has no way to determine what was the order, let's
disallow userfaultfd_copy in parallel with the non-cooperative events. In
such case we return -EAGAIN and the uffd monitor can understand that
userfaultfd_copy() clashed with a non-cooperative event and take an
appropriate action.

Signed-off-by: Mike Rapoport 
Cc: Andrea Arcangeli 
Cc: Mike Kravetz 
Cc: Pavel Emelyanov 
Cc: Andrei Vagin 
---
 fs/userfaultfd.c  | 22 --
 include/linux/userfaultfd_k.h |  6 --
 mm/userfaultfd.c  | 22 +-
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index cec550c8468f..123bf7d516fc 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -62,6 +62,8 @@ struct userfaultfd_ctx {
enum userfaultfd_state state;
/* released */
bool released;
+   /* memory mappings are changing because of non-cooperative event */
+   bool mmap_changing;
/* mm with one ore more vmas attached to this userfaultfd_ctx */
struct mm_struct *mm;
 };
@@ -641,6 +643,7 @@ static void userfaultfd_event_wait_completion(struct 
userfaultfd_ctx *ctx,
 * already released.
 */
 out:
+   WRITE_ONCE(ctx->mmap_changing, false);
userfaultfd_ctx_put(ctx);
 }
 
@@ -686,10 +689,12 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct 
list_head *fcs)
ctx->state = UFFD_STATE_RUNNING;
ctx->features = octx->features;
ctx->released = false;
+   ctx->mmap_changing = false;
ctx->mm = vma->vm_mm;
mmgrab(ctx->mm);
 
userfaultfd_ctx_get(octx);
+   WRITE_ONCE(octx->mmap_changing, true);
fctx->orig = octx;
fctx->new = ctx;
list_add_tail(&fctx->list, fcs);
@@ -732,6 +737,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma,
if (ctx && (ctx->features & UFFD_FEATURE_EVENT_REMAP)) {
vm_ctx->ctx = ctx;
userfaultfd_ctx_get(ctx);
+   WRITE_ONCE(ctx->mmap_changing, true);
}
 }
 
@@ -772,6 +778,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
return true;
 
userfaultfd_ctx_get(ctx);
+   WRITE_ONCE(ctx->mmap_changing, true);
up_read(&mm->mmap_sem);
 
msg_init(&ewq.msg);
@@ -815,6 +822,7 @@ int userfaultfd_unmap_prep(struct vm_area_struct *vma,
return -ENOMEM;
 
userfaultfd_ctx_get(ctx);
+   WRITE_ONCE(ctx->mmap_changing, true);
unmap_ctx->ctx = ctx;
unmap_ctx->start = start;
unmap_ctx->end = end;
@@ -1653,6 +1661,10 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
 
user_uffdio_copy = (struct uffdio_copy __user *) arg;
 
+   ret = -EAGAIN;
+   if (READ_ONCE(ctx->mmap_changing))
+   goto out;
+
ret = -EFAULT;
if (copy_from_user(&uffdio_copy, user_uffdio_copy,
   /* don't copy "copy" last field */
@@ -1674,7 +1686,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
goto out;
if (mmget_not_zero(ctx->mm)) {
ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
-  uffdio_copy.len);
+  uffdio_copy.len, &ctx->mmap_changing);
mmput(ctx->mm);
} else {
return -ESRCH;
@@ -1705,6 +1717,10 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx 
*ctx,
 
user_uffdi

[PATCH v7 1/5] drm/rockchip: add transfer function for cdn-dp

2018-05-23 Thread Lin Huang
From: Chris Zhong 

We may support training outside firmware, so we need support
dpcd read/write to get the message or do some setting with
display.

Signed-off-by: Chris Zhong 
Signed-off-by: Lin Huang 
Reviewed-by: Sean Paul 
Reviewed-by: Enric Balletbo 
---
Changes in v2:
- update patch following Enric suggest
Changes in v3:
- None
Changes in v4:
- None
Changes in v5:
- None
Changes in v6:
- None
Changes in v7:
- None

 drivers/gpu/drm/rockchip/cdn-dp-core.c | 55 +++
 drivers/gpu/drm/rockchip/cdn-dp-core.h |  1 +
 drivers/gpu/drm/rockchip/cdn-dp-reg.c  | 69 ++
 drivers/gpu/drm/rockchip/cdn-dp-reg.h  | 14 ++-
 4 files changed, 122 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c 
b/drivers/gpu/drm/rockchip/cdn-dp-core.c
index c6fbdcd..cce64c1 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c
@@ -176,8 +176,8 @@ static int cdn_dp_get_sink_count(struct cdn_dp_device *dp, 
u8 *sink_count)
u8 value;
 
*sink_count = 0;
-   ret = cdn_dp_dpcd_read(dp, DP_SINK_COUNT, &value, 1);
-   if (ret)
+   ret = drm_dp_dpcd_read(&dp->aux, DP_SINK_COUNT, &value, 1);
+   if (ret < 0)
return ret;
 
*sink_count = DP_GET_SINK_COUNT(value);
@@ -374,9 +374,9 @@ static int cdn_dp_get_sink_capability(struct cdn_dp_device 
*dp)
if (!cdn_dp_check_sink_connection(dp))
return -ENODEV;
 
-   ret = cdn_dp_dpcd_read(dp, DP_DPCD_REV, dp->dpcd,
-  DP_RECEIVER_CAP_SIZE);
-   if (ret) {
+   ret = drm_dp_dpcd_read(&dp->aux, DP_DPCD_REV, dp->dpcd,
+  sizeof(dp->dpcd));
+   if (ret < 0) {
DRM_DEV_ERROR(dp->dev, "Failed to get caps %d\n", ret);
return ret;
}
@@ -582,8 +582,8 @@ static bool cdn_dp_check_link_status(struct cdn_dp_device 
*dp)
if (!port || !dp->link.rate || !dp->link.num_lanes)
return false;
 
-   if (cdn_dp_dpcd_read(dp, DP_LANE0_1_STATUS, link_status,
-DP_LINK_STATUS_SIZE)) {
+   if (drm_dp_dpcd_read_link_status(&dp->aux, link_status) !=
+   DP_LINK_STATUS_SIZE) {
DRM_ERROR("Failed to get link status\n");
return false;
}
@@ -1012,6 +1012,40 @@ static int cdn_dp_pd_event(struct notifier_block *nb,
return NOTIFY_DONE;
 }
 
+static ssize_t cdn_dp_aux_transfer(struct drm_dp_aux *aux,
+  struct drm_dp_aux_msg *msg)
+{
+   struct cdn_dp_device *dp = container_of(aux, struct cdn_dp_device, aux);
+   int ret;
+   u8 status;
+
+   switch (msg->request & ~DP_AUX_I2C_MOT) {
+   case DP_AUX_NATIVE_WRITE:
+   case DP_AUX_I2C_WRITE:
+   case DP_AUX_I2C_WRITE_STATUS_UPDATE:
+   ret = cdn_dp_dpcd_write(dp, msg->address, msg->buffer,
+   msg->size);
+   break;
+   case DP_AUX_NATIVE_READ:
+   case DP_AUX_I2C_READ:
+   ret = cdn_dp_dpcd_read(dp, msg->address, msg->buffer,
+  msg->size);
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   status = cdn_dp_get_aux_status(dp);
+   if (status == AUX_STATUS_ACK)
+   msg->reply = DP_AUX_NATIVE_REPLY_ACK;
+   else if (status == AUX_STATUS_NACK)
+   msg->reply = DP_AUX_NATIVE_REPLY_NACK;
+   else if (status == AUX_STATUS_DEFER)
+   msg->reply = DP_AUX_NATIVE_REPLY_DEFER;
+
+   return ret;
+}
+
 static int cdn_dp_bind(struct device *dev, struct device *master, void *data)
 {
struct cdn_dp_device *dp = dev_get_drvdata(dev);
@@ -1030,6 +1064,13 @@ static int cdn_dp_bind(struct device *dev, struct device 
*master, void *data)
dp->active = false;
dp->active_port = -1;
dp->fw_loaded = false;
+   dp->aux.name = "DP-AUX";
+   dp->aux.transfer = cdn_dp_aux_transfer;
+   dp->aux.dev = dev;
+
+   ret = drm_dp_aux_register(&dp->aux);
+   if (ret)
+   return ret;
 
INIT_WORK(&dp->event_work, cdn_dp_pd_event_work);
 
diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.h 
b/drivers/gpu/drm/rockchip/cdn-dp-core.h
index f57e296..46159b2 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.h
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.h
@@ -78,6 +78,7 @@ struct cdn_dp_device {
struct platform_device *audio_pdev;
struct work_struct event_work;
struct edid *edid;
+   struct drm_dp_aux aux;
 
struct mutex lock;
bool connected;
diff --git a/drivers/gpu/drm/rockchip/cdn-dp-reg.c 
b/drivers/gpu/drm/rockchip/cdn-dp-reg.c
index eb3042c..979355d 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-reg.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-reg.c
@@ -221,7 +221,12 @@ static int cdn_dp_reg_write_bit(struct cdn_dp_device *dp, 
u16 

[PATCH v7 4/5] phy: rockchip-typec: support variable phy config value

2018-05-23 Thread Lin Huang
the phy config values used to fix in dp firmware, but some boards
need change these values to do training and get the better eye diagram
result. So support that in phy driver.

Signed-off-by: Chris Zhong 
Signed-off-by: Lin Huang 
---
Changes in v2:
- update patch following Enric suggest
Changes in v3:
- delete need_software_training variable
- add default phy config value, if dts do not define phy config value, use 
these value
Changes in v4:
- rename variable config to tcphy_default_config
Changes in v5:
- None
Changes in v6:
- split the header file to new patch
Changes in v7:
- add default case when check link rate
- move struct rockchip_typec_phy new element to this patch

 drivers/phy/rockchip/phy-rockchip-typec.c | 263 --
 include/soc/rockchip/rockchip_phy_typec.h |   8 +
 2 files changed, 218 insertions(+), 53 deletions(-)

diff --git a/drivers/phy/rockchip/phy-rockchip-typec.c 
b/drivers/phy/rockchip/phy-rockchip-typec.c
index 795055f..69af90e 100644
--- a/drivers/phy/rockchip/phy-rockchip-typec.c
+++ b/drivers/phy/rockchip/phy-rockchip-typec.c
@@ -324,21 +324,29 @@
  * clock 0: PLL 0 div 1
  * clock 1: PLL 1 div 2
  */
-#define CLK_PLL_CONFIG 0X30
+#define CLK_PLL1_DIV1  0x20
+#define CLK_PLL1_DIV2  0x30
 #define CLK_PLL_MASK   0x33
 
 #define CMN_READY  BIT(0)
 
+#define DP_PLL_CLOCK_ENABLE_ACKBIT(3)
 #define DP_PLL_CLOCK_ENABLEBIT(2)
+#define DP_PLL_ENABLE_ACK  BIT(1)
 #define DP_PLL_ENABLE  BIT(0)
 #define DP_PLL_DATA_RATE_RBR   ((2 << 12) | (4 << 8))
 #define DP_PLL_DATA_RATE_HBR   ((2 << 12) | (4 << 8))
 #define DP_PLL_DATA_RATE_HBR2  ((1 << 12) | (2 << 8))
+#define DP_PLL_DATA_RATE_MASK  0xff00
 
-#define DP_MODE_A0 BIT(4)
-#define DP_MODE_A2 BIT(6)
-#define DP_MODE_ENTER_A0   0xc101
-#define DP_MODE_ENTER_A2   0xc104
+#define DP_MODE_MASK   0xf
+#define DP_MODE_ENTER_A0   BIT(0)
+#define DP_MODE_ENTER_A2   BIT(2)
+#define DP_MODE_ENTER_A3   BIT(3)
+#define DP_MODE_A0_ACK BIT(4)
+#define DP_MODE_A2_ACK BIT(6)
+#define DP_MODE_A3_ACK BIT(7)
+#define DP_LINK_RESET_DEASSERTED   BIT(8)
 
 #define PHY_MODE_SET_TIMEOUT   10
 
@@ -350,6 +358,8 @@
 #define MODE_DFP_USB   BIT(1)
 #define MODE_DFP_DPBIT(2)
 
+#define DP_DEFAULT_RATE162000
+
 struct phy_reg {
u16 value;
u32 addr;
@@ -372,15 +382,15 @@ struct phy_reg usb3_pll_cfg[] = {
{ 0x8,  CMN_DIAG_PLL0_LF_PROG },
 };
 
-struct phy_reg dp_pll_cfg[] = {
+struct phy_reg dp_pll_rbr_cfg[] = {
{ 0xf0, CMN_PLL1_VCOCAL_INIT },
{ 0x18, CMN_PLL1_VCOCAL_ITER },
{ 0x30b9,   CMN_PLL1_VCOCAL_START },
-   { 0x21c,CMN_PLL1_INTDIV },
+   { 0x87, CMN_PLL1_INTDIV },
{ 0,CMN_PLL1_FRACDIV },
-   { 0x5,  CMN_PLL1_HIGH_THR },
-   { 0x35, CMN_PLL1_SS_CTRL1 },
-   { 0x7f1e,   CMN_PLL1_SS_CTRL2 },
+   { 0x22, CMN_PLL1_HIGH_THR },
+   { 0x8000,   CMN_PLL1_SS_CTRL1 },
+   { 0,CMN_PLL1_SS_CTRL2 },
{ 0x20, CMN_PLL1_DSM_DIAG },
{ 0,CMN_PLLSM1_USER_DEF_CTRL },
{ 0,CMN_DIAG_PLL1_OVRD },
@@ -391,9 +401,52 @@ struct phy_reg dp_pll_cfg[] = {
{ 0x8,  CMN_DIAG_PLL1_LF_PROG },
{ 0x100,CMN_DIAG_PLL1_PTATIS_TUNE1 },
{ 0x7,  CMN_DIAG_PLL1_PTATIS_TUNE2 },
-   { 0x4,  CMN_DIAG_PLL1_INCLK_CTRL },
+   { 0x1,  CMN_DIAG_PLL1_INCLK_CTRL },
 };
 
+struct phy_reg dp_pll_hbr_cfg[] = {
+   { 0xf0, CMN_PLL1_VCOCAL_INIT },
+   { 0x18, CMN_PLL1_VCOCAL_ITER },
+   { 0x30b4,   CMN_PLL1_VCOCAL_START },
+   { 0xe1, CMN_PLL1_INTDIV },
+   { 0,CMN_PLL1_FRACDIV },
+   { 0x5,  CMN_PLL1_HIGH_THR },
+   { 0x8000,   CMN_PLL1_SS_CTRL1 },
+   { 0,CMN_PLL1_SS_CTRL2 },
+   { 0x20, CMN_PLL1_DSM_DIAG },
+   { 0x1000,   CMN_PLLSM1_USER_DEF_CTRL },
+   { 0,CMN_DIAG_PLL1_OVRD },
+   { 0,CMN_DIAG_PLL1_FBH_OVRD },
+   { 0,CMN_DIAG_PLL1_FBL_OVRD },
+   { 0x7,  CMN_DIAG_PLL1_V2I_TUNE },
+   { 0x45, CMN_DIAG_PLL1_CP_TUNE },
+   { 0x8,  CMN_DIAG_PLL1_LF_PROG },
+   { 0x1,  CMN_DIAG_PLL1_PTATIS_TUNE1 },
+   { 0x1,  CMN_DIAG_PLL1_PTATIS_TUNE2 },
+   { 0x1,  CMN_DIAG_PLL1_INCLK_CTRL },
+};
+
+struct phy_reg dp_pll_hbr2_cfg[] = {
+   { 0xf0, CMN_PLL1_VCOCAL_INIT },
+   { 0x18, CMN_PLL1_VCOCAL_ITER },
+   { 0x30b4,   CMN_PLL1_VCOCAL_STAR

[PATCH v7 3/5] soc: rockchip: split rockchip_typec_phy struct to separate header

2018-05-23 Thread Lin Huang
we may use rockchip_phy_typec struct in other driver, so split
it to separate header.

Signed-off-by: Lin Huang 
---
Changes in v2:
- None
Changes in v3:
- None
Changes in v4:
- None
Changes in v5:
- None
Changes in v6:
- new patch here
Changes in v7:
- move new element to next patch

 drivers/phy/rockchip/phy-rockchip-typec.c | 47 +-
 include/soc/rockchip/rockchip_phy_typec.h | 55 +++
 2 files changed, 56 insertions(+), 46 deletions(-)
 create mode 100644 include/soc/rockchip/rockchip_phy_typec.h

diff --git a/drivers/phy/rockchip/phy-rockchip-typec.c 
b/drivers/phy/rockchip/phy-rockchip-typec.c
index 76a4b58..795055f 100644
--- a/drivers/phy/rockchip/phy-rockchip-typec.c
+++ b/drivers/phy/rockchip/phy-rockchip-typec.c
@@ -63,6 +63,7 @@
 
 #include 
 #include 
+#include 
 
 #define CMN_SSM_BANDGAP(0x21 << 2)
 #define CMN_SSM_BIAS   (0x22 << 2)
@@ -349,52 +350,6 @@
 #define MODE_DFP_USB   BIT(1)
 #define MODE_DFP_DPBIT(2)
 
-struct usb3phy_reg {
-   u32 offset;
-   u32 enable_bit;
-   u32 write_enable;
-};
-
-/**
- * struct rockchip_usb3phy_port_cfg: usb3-phy port configuration.
- * @reg: the base address for usb3-phy config.
- * @typec_conn_dir: the register of type-c connector direction.
- * @usb3tousb2_en: the register of type-c force usb2 to usb2 enable.
- * @external_psm: the register of type-c phy external psm clock.
- * @pipe_status: the register of type-c phy pipe status.
- * @usb3_host_disable: the register of type-c usb3 host disable.
- * @usb3_host_port: the register of type-c usb3 host port.
- * @uphy_dp_sel: the register of type-c phy DP select control.
- */
-struct rockchip_usb3phy_port_cfg {
-   unsigned int reg;
-   struct usb3phy_reg typec_conn_dir;
-   struct usb3phy_reg usb3tousb2_en;
-   struct usb3phy_reg external_psm;
-   struct usb3phy_reg pipe_status;
-   struct usb3phy_reg usb3_host_disable;
-   struct usb3phy_reg usb3_host_port;
-   struct usb3phy_reg uphy_dp_sel;
-};
-
-struct rockchip_typec_phy {
-   struct device *dev;
-   void __iomem *base;
-   struct extcon_dev *extcon;
-   struct regmap *grf_regs;
-   struct clk *clk_core;
-   struct clk *clk_ref;
-   struct reset_control *uphy_rst;
-   struct reset_control *pipe_rst;
-   struct reset_control *tcphy_rst;
-   const struct rockchip_usb3phy_port_cfg *port_cfgs;
-   /* mutex to protect access to individual PHYs */
-   struct mutex lock;
-
-   bool flip;
-   u8 mode;
-};
-
 struct phy_reg {
u16 value;
u32 addr;
diff --git a/include/soc/rockchip/rockchip_phy_typec.h 
b/include/soc/rockchip/rockchip_phy_typec.h
new file mode 100644
index 000..4afe039
--- /dev/null
+++ b/include/soc/rockchip/rockchip_phy_typec.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author: Lin Huang 
+ */
+
+#ifndef __SOC_ROCKCHIP_PHY_TYPEC_H
+#define __SOC_ROCKCHIP_PHY_TYPEC_H
+
+struct usb3phy_reg {
+   u32 offset;
+   u32 enable_bit;
+   u32 write_enable;
+};
+
+/**
+ * struct rockchip_usb3phy_port_cfg: usb3-phy port configuration.
+ * @reg: the base address for usb3-phy config.
+ * @typec_conn_dir: the register of type-c connector direction.
+ * @usb3tousb2_en: the register of type-c force usb2 to usb2 enable.
+ * @external_psm: the register of type-c phy external psm clock.
+ * @pipe_status: the register of type-c phy pipe status.
+ * @usb3_host_disable: the register of type-c usb3 host disable.
+ * @usb3_host_port: the register of type-c usb3 host port.
+ * @uphy_dp_sel: the register of type-c phy DP select control.
+ */
+struct rockchip_usb3phy_port_cfg {
+   unsigned int reg;
+   struct usb3phy_reg typec_conn_dir;
+   struct usb3phy_reg usb3tousb2_en;
+   struct usb3phy_reg external_psm;
+   struct usb3phy_reg pipe_status;
+   struct usb3phy_reg usb3_host_disable;
+   struct usb3phy_reg usb3_host_port;
+   struct usb3phy_reg uphy_dp_sel;
+};
+
+struct rockchip_typec_phy {
+   struct device *dev;
+   void __iomem *base;
+   struct extcon_dev *extcon;
+   struct regmap *grf_regs;
+   struct clk *clk_core;
+   struct clk *clk_ref;
+   struct reset_control *uphy_rst;
+   struct reset_control *pipe_rst;
+   struct reset_control *tcphy_rst;
+   const struct rockchip_usb3phy_port_cfg *port_cfgs;
+   /* mutex to protect access to individual PHYs */
+   struct mutex lock;
+   bool flip;
+   u8 mode;
+};
+
+#endif
-- 
2.7.4



[PATCH v7 5/5] drm/rockchip: support dp training outside dp firmware

2018-05-23 Thread Lin Huang
DP firmware uses fixed phy config values to do training, but some
boards need to adjust these values to fit for their unique hardware
design. So get phy config values from dts and use software link training
instead of relying on firmware, if software training fail, keep firmware
training as a fallback if sw training fails.

Signed-off-by: Chris Zhong 
Signed-off-by: Lin Huang 
Reviewed-by: Sean Paul 
---
Changes in v2:
- update patch following Enric suggest
Changes in v3:
- use variable fw_training instead sw_training_success
- base on DP SPCE, if training fail use lower link rate to retry training
Changes in v4:
- improve cdn_dp_get_lower_link_rate() and cdn_dp_software_train_link() follow 
Sean suggest
Changes in v5:
- fix some whitespcae issue
Changes in v6:
- None
Changes in v7:
- None

 drivers/gpu/drm/rockchip/Makefile   |   3 +-
 drivers/gpu/drm/rockchip/cdn-dp-core.c  |  24 +-
 drivers/gpu/drm/rockchip/cdn-dp-core.h  |   2 +
 drivers/gpu/drm/rockchip/cdn-dp-link-training.c | 420 
 drivers/gpu/drm/rockchip/cdn-dp-reg.c   |  31 +-
 drivers/gpu/drm/rockchip/cdn-dp-reg.h   |  38 ++-
 6 files changed, 505 insertions(+), 13 deletions(-)
 create mode 100644 drivers/gpu/drm/rockchip/cdn-dp-link-training.c

diff --git a/drivers/gpu/drm/rockchip/Makefile 
b/drivers/gpu/drm/rockchip/Makefile
index a314e21..b932f62 100644
--- a/drivers/gpu/drm/rockchip/Makefile
+++ b/drivers/gpu/drm/rockchip/Makefile
@@ -9,7 +9,8 @@ rockchipdrm-y := rockchip_drm_drv.o rockchip_drm_fb.o \
 rockchipdrm-$(CONFIG_DRM_FBDEV_EMULATION) += rockchip_drm_fbdev.o
 
 rockchipdrm-$(CONFIG_ROCKCHIP_ANALOGIX_DP) += analogix_dp-rockchip.o
-rockchipdrm-$(CONFIG_ROCKCHIP_CDN_DP) += cdn-dp-core.o cdn-dp-reg.o
+rockchipdrm-$(CONFIG_ROCKCHIP_CDN_DP) += cdn-dp-core.o cdn-dp-reg.o \
+   cdn-dp-link-training.o
 rockchipdrm-$(CONFIG_ROCKCHIP_DW_HDMI) += dw_hdmi-rockchip.o
 rockchipdrm-$(CONFIG_ROCKCHIP_DW_MIPI_DSI) += dw-mipi-dsi.o
 rockchipdrm-$(CONFIG_ROCKCHIP_INNO_HDMI) += inno_hdmi.o
diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c 
b/drivers/gpu/drm/rockchip/cdn-dp-core.c
index cce64c1..783d57a 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c
@@ -629,11 +629,13 @@ static void cdn_dp_encoder_enable(struct drm_encoder 
*encoder)
goto out;
}
}
-
-   ret = cdn_dp_set_video_status(dp, CONTROL_VIDEO_IDLE);
-   if (ret) {
-   DRM_DEV_ERROR(dp->dev, "Failed to idle video %d\n", ret);
-   goto out;
+   if (dp->use_fw_training) {
+   ret = cdn_dp_set_video_status(dp, CONTROL_VIDEO_IDLE);
+   if (ret) {
+   DRM_DEV_ERROR(dp->dev,
+ "Failed to idle video %d\n", ret);
+   goto out;
+   }
}
 
ret = cdn_dp_config_video(dp);
@@ -642,11 +644,15 @@ static void cdn_dp_encoder_enable(struct drm_encoder 
*encoder)
goto out;
}
 
-   ret = cdn_dp_set_video_status(dp, CONTROL_VIDEO_VALID);
-   if (ret) {
-   DRM_DEV_ERROR(dp->dev, "Failed to valid video %d\n", ret);
-   goto out;
+   if (dp->use_fw_training) {
+   ret = cdn_dp_set_video_status(dp, CONTROL_VIDEO_VALID);
+   if (ret) {
+   DRM_DEV_ERROR(dp->dev,
+   "Failed to valid video %d\n", ret);
+   goto out;
+   }
}
+
 out:
mutex_unlock(&dp->lock);
 }
diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.h 
b/drivers/gpu/drm/rockchip/cdn-dp-core.h
index 46159b2..77a9793 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.h
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.h
@@ -84,6 +84,7 @@ struct cdn_dp_device {
bool connected;
bool active;
bool suspended;
+   bool use_fw_training;
 
const struct firmware *fw;  /* cdn dp firmware */
unsigned int fw_version;/* cdn fw version */
@@ -106,6 +107,7 @@ struct cdn_dp_device {
u8 ports;
u8 lanes;
int active_port;
+   u8 train_set[4];
 
u8 dpcd[DP_RECEIVER_CAP_SIZE];
bool sink_has_audio;
diff --git a/drivers/gpu/drm/rockchip/cdn-dp-link-training.c 
b/drivers/gpu/drm/rockchip/cdn-dp-link-training.c
new file mode 100644
index 000..73c3290
--- /dev/null
+++ b/drivers/gpu/drm/rockchip/cdn-dp-link-training.c
@@ -0,0 +1,420 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
+ * Author: Chris Zhong 
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "cdn-dp-core.h"
+#include "cdn-dp-reg.h"
+
+static void cdn_dp_set_signal_levels(struct cdn_dp_device *dp)
+{
+   struct cdn_dp_port *port = dp->port[dp->active_port];
+   struct rockchip_typec_phy *tcphy = phy_get_drvdata(port->phy);
+
+

[PATCH v7 2/5] Documentation: dt-bindings: phy: add phy_config for Rockchip USB Type-C PHY

2018-05-23 Thread Lin Huang
If want to do training outside DP Firmware, need phy voltage swing
and pre_emphasis value.

Signed-off-by: Lin Huang 
Reviewed-by: Rob Herring 
---
Changes in v2:
- None 
Changes in v3:
- modify property description and add this property to Example
Changes in v4:
- None
Changes in v5:
- None
Changes in v6:
- change rockchip,phy_config to rockchip,phy-config and descript it in detail.
Changes in v7:
- None

 .../devicetree/bindings/phy/phy-rockchip-typec.txt | 36 +-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt 
b/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt
index 960da7f..40d5e7a 100644
--- a/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt
+++ b/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt
@@ -17,7 +17,11 @@ Required properties:
 
 Optional properties:
  - extcon : extcon specifier for the Power Delivery
-
+ - rockchip,phy-config : A list of voltage swing(mV) and pre-emphasis
+   (dB) pairs. They are 3 blocks of 4 entries and
+   correspond to s0p0 ~ s0p3, s1p0 ~ s1p3,
+   s2p0 ~ s2p3, s3p0 ~ s2p3 swing and pre-emphasis
+   values.
 Required nodes : a sub-node is required for each port the phy provides.
 The sub-node name is used to identify dp or usb3 port,
 and shall be the following entries:
@@ -50,6 +54,21 @@ Example:
 <&cru SRST_P_UPHY0_TCPHY>;
reset-names = "uphy", "uphy-pipe", "uphy-tcphy";
 
+   rockchip,phy-config = <0x2a 0x00>,
+   <0x1f 0x15>,
+   <0x14 0x22>,
+   <0x02 0x2b>,
+
+   <0x21 0x00>,
+   <0x12 0x15>,
+   <0x02 0x22>,
+   <0 0>,
+
+   <0x15 0x00>,
+   <0x00 0x15>,
+   <0 0>,
+   <0 0>;
+
tcphy0_dp: dp-port {
#phy-cells = <0>;
};
@@ -74,6 +93,21 @@ Example:
 <&cru SRST_P_UPHY1_TCPHY>;
reset-names = "uphy", "uphy-pipe", "uphy-tcphy";
 
+   rockchip,phy-config = <0x2a 0x00>,
+   <0x1f 0x15>,
+   <0x14 0x22>,
+   <0x02 0x2b>,
+
+   <0x21 0x00>,
+   <0x12 0x15>,
+   <0x02 0x22>,
+   <0 0>,
+
+   <0x15 0x00>,
+   <0x00 0x15>,
+   <0 0>,
+   <0 0>;
+
tcphy1_dp: dp-port {
#phy-cells = <0>;
};
-- 
2.7.4



Re: INFO: task hung in xlog_grant_head_check

2018-05-23 Thread Darrick J. Wong
On Tue, May 22, 2018 at 03:52:08PM -0700, Eric Biggers wrote:
> On Wed, May 23, 2018 at 08:26:20AM +1000, Dave Chinner wrote:
> > On Tue, May 22, 2018 at 08:31:08AM -0400, Brian Foster wrote:
> > > On Mon, May 21, 2018 at 10:55:02AM -0700, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzbot found the following crash on:
> > > > 
> > > > HEAD commit:203ec2fed17a Merge tag 'armsoc-fixes' of 
> > > > git://git.kernel...
> > > > git tree:   upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11c1ad7780
> > > > kernel config:  
> > > > https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
> > > > dashboard link: 
> > > > https://syzkaller.appspot.com/bug?extid=568245b88fbaedcb1959
> > > > compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> > > > syzkaller 
> > > > repro:https://syzkaller.appspot.com/x/repro.syz?x=122c742780
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1038705780
> > > > 
> > > > IMPORTANT: if you fix the bug, please add the following tag to the 
> > > > commit:
> > > > Reported-by: syzbot+568245b88fbaedcb1...@syzkaller.appspotmail.com
> > > > 
> > > > (ptrval): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 
> > > > (ptrval): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 
> > > > XFS (loop0): metadata I/O error in "xfs_trans_read_buf_map" at daddr 
> > > > 0x2 len
> > > > 1 error 117
> > > > XFS (loop0): xfs_imap_lookup: xfs_ialloc_read_agi() returned error -117,
> > > > agno 0
> > > > XFS (loop0): failed to read root inode
> > > 
> > > FWIW, the initial console output is actually:
> > > 
> > > [  448.028253] XFS (loop0): Mounting V4 Filesystem
> > > [  448.033540] XFS (loop0): Log size 9371840 blocks too large, maximum 
> > > size is 1048576 blocks
> > > [  448.042287] XFS (loop0): Log size out of supported range.
> > > [  448.047841] XFS (loop0): Continuing onwards, but if log hangs are 
> > > experienced then please report this message in the bug report.
> > > [  448.060712] XFS (loop0): totally zeroed log
> > > 
> > > ... which warns about an oversized log and resulting log hangs. Not
> > > having dug into the details of why this occurs so quickly in this mount
> > > failure path,
> > 
> > I suspect that it is a head and/or log tail pointer overflow, so when it
> > tries to do the first trans reserve of the mount - to write the
> > unmount record - it says "no log space available, please wait".
> > 
> > > it does look like we'd never have got past this point on a
> > > v5 fs (i.e., the above warning would become an error and we'd not enter
> > > the xfs_log_mount_cancel() path).
> > 
> > And this comes back to my repeated comments about fuzzers needing
> > to fuzz properly made V5 filesystems as we catch and error out on
> > things like this. Fuzzing random collections of v4 filesystem
> > fragments will continue to trip over problems we've avoided with v5
> > filesystems, and this is further evidence to point to that.
> >
> > 
> > I'd suggest that at this point, syzbot XFS reports should be
> > redirected to /dev/null. It's not worth our time to triage
> > unreviewed bot generated bug reports until the syzbot developers
> > start listening and acting on what we have been telling them
> > about fuzzing filesystems and reproducing bugs that are meaningful
> > and useful to us.
> 
> The whole point of fuzzing is to provide improper inputs.  A kernel
> bug is a kernel bug, even if it's in deprecated/unmaintained code, or
> involves userspace doing something unexpected.  If you have known
> buggy code in XFS that you refuse to fix,

Ok, that's it.

I disagree with Google's syzbot strategy, and I dissent most vehemently!

The whole point of constructing free software in public is that we
people communally build things that anyone can use for any purpose and
that anyone can modify.  That privilege comes with a societal
expectation that the people using this commons will contribute to the
upkeep of that commons or it rots.  For end users that means helping us
to find the gaps, but for software developers at large multinational
companies that means (to a first approximation) pitching in to write the
code, write the documentation, and to fix the problems.

Yes, there are many places where fs metadata validation is insufficient
to avoid misbehavior.  Google's strategy of dumping vulnerability
disclosures on public mailing lists every week, demanding that other
people regularly reallocate their time to fix these problems, and not
helping to fix anything breaks our free software societal norms.  Again,
the whole point of free software is to share the responsibility, share
the work, and share the gains.  That is how collaboration works.

Help us to improve the software so that we all will be better off.

Figure out how to strengthen the validation, figure out how to balance
the risk of exposure against the risk of nonfunctionality, and figure
out how to discuss w

[PATCH] livepatch: Remove not longer valid limitations from the documentation

2018-05-23 Thread Petr Mladek
Semantic changes are possible since the commit d83a7cb375eec21f04
("livepatch: change to a per-task consistency model").

Also data structures can be patched since the commit 439e7271dc2b63de37
("livepatch: introduce shadow variable API").

It is a high time we removed these limitations from the documentation.

Signed-off-by: Petr Mladek 
---
I have found this when working on v12 of the atomic replace. It looks
like a no-brainer and does not conflict with the patchset, so ... ;-)

 Documentation/livepatch/livepatch.txt | 24 
 1 file changed, 24 deletions(-)

diff --git a/Documentation/livepatch/livepatch.txt 
b/Documentation/livepatch/livepatch.txt
index 1ae2de758c08..2d7ed09dbd59 100644
--- a/Documentation/livepatch/livepatch.txt
+++ b/Documentation/livepatch/livepatch.txt
@@ -429,30 +429,6 @@ See Documentation/ABI/testing/sysfs-kernel-livepatch for 
more details.
 
 The current Livepatch implementation has several limitations:
 
-
-  + The patch must not change the semantic of the patched functions.
-
-The current implementation guarantees only that either the old
-or the new function is called. The functions are patched one
-by one. It means that the patch must _not_ change the semantic
-of the function.
-
-
-  + Data structures can not be patched.
-
-There is no support to version data structures or anyhow migrate
-one structure into another. Also the simple consistency model does
-not allow to switch more functions atomically.
-
-Once there is more complex consistency mode, it will be possible to
-use some workarounds. For example, it will be possible to use a hole
-for a new member because the data structure is aligned. Or it will
-be possible to use an existing member for something else.
-
-There are no plans to add more generic support for modified structures
-at the moment.
-
-
   + Only functions that can be traced could be patched.
 
 Livepatch is based on the dynamic ftrace. In particular, functions
-- 
2.13.6



[PATCH v4 3/3] powerpc/lib: optimise PPC32 memcmp

2018-05-23 Thread Christophe Leroy

At the time being, memcmp() compares two chunks of memory
byte per byte.

This patch optimises the comparison by comparing word by word.

A small benchmark performed on an 8xx comparing two chuncks
of 512 bytes performed 10 times gives:

Before : 5852274 TB ticks
After:   1488638 TB ticks

This is almost 4 times faster

Signed-off-by: Christophe Leroy 
---
 Not resending the entire serie

 v4: Dropped the special handling for when length is 0. Handling it 
through the small length path.


 arch/powerpc/lib/string_32.S | 48 
+++-

 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S
index 40a576d56ac7..542e6cecbcaf 100644
--- a/arch/powerpc/lib/string_32.S
+++ b/arch/powerpc/lib/string_32.S
@@ -16,17 +16,45 @@
.text

 _GLOBAL(memcmp)
-   cmpwi   cr0, r5, 0
-   beq-2f
-   mtctr   r5
-   addir6,r3,-1
-   addir4,r4,-1
-1: lbzur3,1(r6)
-   lbzur0,1(r4)
-   subf.   r3,r0,r3
-   bdnzt   2,1b
+   srawi.  r7, r5, 2   /* Divide len by 4 */
+   mr  r6, r3
+   beq-3f
+   mtctr   r7
+   li  r7, 0
+1:
+#ifdef __LITTLE_ENDIAN__
+   lwbrx   r3, r6, r7
+   lwbrx   r0, r4, r7
+#else
+   lwzxr3, r6, r7
+   lwzxr0, r4, r7
+#endif
+   addir7, r7, 4
+   cmplw   cr0, r3, r0
+   bdnzt   eq, 1b
+   bne 5f
+3: andi.   r3, r5, 3
+   beqlr
+   cmplwi  cr1, r3, 2
+   blt-cr1, 4f
+#ifdef __LITTLE_ENDIAN__
+   lhbrx   r3, r6, r7
+   lhbrx   r0, r4, r7
+#else
+   lhzxr3, r6, r7
+   lhzxr0, r4, r7
+#endif
+   addir7, r7, 2
+   subf.   r3, r0, r3
+   beqlr   cr1
+   bnelr
+4: lbzxr3, r6, r7
+   lbzxr0, r4, r7
+   subf.   r3, r0, r3
blr
-2: li  r3,0
+5: li  r3, 1
+   bgtlr
+   li  r3, -1
blr
 EXPORT_SYMBOL(memcmp)

--
2.13.3



Re: [PATCH 19/33] thermal: db8500: use match_string() helper

2018-05-23 Thread Yisheng Xie
Hi Andy,

On 2018/5/22 6:00, Andy Shevchenko wrote:
> On Mon, May 21, 2018 at 2:57 PM, Yisheng Xie  wrote:
>> match_string() returns the index of an array for a matching string,
>> which can be used intead of open coded variant.
> 
>> +   i = match_string((const char **)trip_point->cdev_name,
> 
> Casting looks ugly. You need to constify the variable itself.
When I tried to const cdev_name like:
+++ b/include/linux/platform_data/db8500_thermal.h
@@ -27,7 +27,7 @@
 struct db8500_trip_point {
unsigned long temp;
enum thermal_trip_type type;
-   char cdev_name[COOLING_DEV_MAX][THERMAL_NAME_LENGTH];
+   char const cdev_name[COOLING_DEV_MAX][THERMAL_NAME_LENGTH]; // const 
char cdev_name[COOLING_DEV_MAX][THERMAL_NAME_LENGTH] will also the same
 };

The compiler will also warning:
drivers/thermal/db8500_thermal.c: In function ‘db8500_thermal_match_cdev’:
drivers/thermal/db8500_thermal.c:53:2: warning: passing argument 1 of 
‘match_string’ from incompatible pointer type [enabled by default]
  i = match_string(trip_point->cdev_name, COOLING_DEV_MAX, cdev->type);
  ^
In file included from include/linux/bitmap.h:9:0,
 from include/linux/cpumask.h:12,
 from include/linux/rcupdate.h:44,
 from include/linux/radix-tree.h:28,
 from include/linux/idr.h:15,
 from include/linux/kernfs.h:14,
 from include/linux/sysfs.h:16,
 from kernel/include/linux/kobject.h:20,
 from kernel/include/linux/of.h:17,
 from include/linux/cpu_cooling.h:27,
 from drivers/thermal/db8500_thermal.c:20:
include/linux/string.h:184:5: note: expected ‘const char * const*’ but argument 
is of type ‘const char (*)[20]’

Any idea?

Thanks
Yisheng
> 
>> +COOLING_DEV_MAX, cdev->type);
>>
>> -   return -ENODEV;
>> +   return (i < 0) ? -ENODEV : 0;
> 
> I would rather go with
> 
> if (ret < 0)
>  return -ENODEV;
> 
> return 0;
> 



Re: [PATCH] netfilter: uapi: includes linux/types.h

2018-05-23 Thread Pablo Neira Ayuso
On Wed, May 23, 2018 at 03:03:26PM +0800, YueHaibing wrote:
> gcc-7.3.0 report following warning:
> ./usr/include/linux/netfilter/nf_osf.h:27: found __[us]{8,16,32,64} type 
> without #include 
> 
> includes linux/types.h to fix it.

Thanks.

There's already a fix for this in the nf-next queue.

commit 01cd267bff52619a53fa05c930ea5ed53493d21a
Author: Florian Westphal 
Date:   Tue May 8 10:05:38 2018 +0200

netfilter: fix fallout from xt/nf osf separation


Re: [PATCH 04/10] vfio: ccw: replace IO_REQ event with SSCH_REQ event

2018-05-23 Thread Pierre Morel

On 22/05/2018 17:41, Cornelia Huck wrote:

On Fri, 4 May 2018 13:02:36 +0200
Pierre Morel  wrote:


On 04/05/2018 03:19, Dong Jia Shi wrote:

* Pierre Morel  [2018-05-03 16:26:29 +0200]:
  

On 02/05/2018 09:46, Dong Jia Shi wrote:

* Cornelia Huck  [2018-04-30 17:33:05 +0200]:
  

On Thu, 26 Apr 2018 15:48:06 +0800
Dong Jia Shi  wrote:
  

* Dong Jia Shi  [2018-04-26 15:30:54 +0800]:

[...]
  

@@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private *private,
if (private->io_trigger)
eventfd_signal(private->io_trigger, 1);

-   return private->state;
+   return VFIO_CCW_STATE_IDLE;

This is not right. For example, if we are in STANDBY state (subch driver
is probed, but mdev device is not created), we can not jump to IDLE
state.

I see my problem, for STANDBY state, we should introduce another event
callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
fsm_irq() which tries to signal userspace with interrupt notification
when mdev is not created yet... So we'd need a separated fix for this
issue too.

But how do we even get into that situation when we don't have an mdev
yet?
  

We cann't... So let's assign fsm_nop() as the interrupt callback for
STANDBY state?
  

:) Isn't it exactly what my patch series handle?

As far as I see, that's not true. ;)

After this series applied,
vfio_ccw_jumptable[VFIO_CCW_STATE_STANDBY][VFIO_CCW_EVENT_INTERRUPT] is
still fsm_irq().
  


What I mean is, this code tries to handle design problems
without changing too much of the original code at first.

The problem here is not that the fsm_irq function is called on interrupt,
if we have an interrupt it must be signaled to user land.
The problem is that this state is entered at the wrong moment.

STANDBY should be entered, during the mdev_open when we realize the QEMU
device,
and not during the probe, in which we should stay in NOT_OPER until we
get the QEMU device.

The probe() and mdev_open() function should be modified, not the state
table.

So, the takeaway is that we should handle starting via the init
callbacks and not via the state machine?


hum, sorry, I think that my previous answer was not completely right,
and did not really answer to Dong Jia comment, yes fsm_irq was not
at its place, thinking again about the comments of both of you
I think that we can suppress the INIT event.

I would like to rebase the patch to include the comments you both did.


--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany



Re: [PATCH 3/5] watchdog: sp805: set WDOG_HW_RUNNING when appropriate

2018-05-23 Thread Scott Branden



On 18-05-22 04:24 PM, Ray Jui wrote:

Hi Guenter,

On 5/22/2018 1:54 PM, Guenter Roeck wrote:

On Tue, May 22, 2018 at 11:47:18AM -0700, Ray Jui wrote:

If the watchdog hardware is already enabled during the boot process,
when the Linux watchdog driver loads, it should reset the watchdog and
tell the watchdog framework. As a result, ping can be generated from
the watchdog framework, until the userspace watchdog daemon takes over
control

Signed-off-by: Ray Jui 
Reviewed-by: Vladimir Olovyannikov 
Reviewed-by: Scott Branden 
---
  drivers/watchdog/sp805_wdt.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/drivers/watchdog/sp805_wdt.c 
b/drivers/watchdog/sp805_wdt.c

index 1484609..408ffbe 100644
--- a/drivers/watchdog/sp805_wdt.c
+++ b/drivers/watchdog/sp805_wdt.c
@@ -42,6 +42,7 @@
  /* control register masks */
  #define    INT_ENABLE    (1 << 0)
  #define    RESET_ENABLE    (1 << 1)
+    #define    ENABLE_MASK    (INT_ENABLE | RESET_ENABLE)
  #define WDTINTCLR    0x00C
  #define WDTRIS    0x010
  #define WDTMIS    0x014
@@ -74,6 +75,18 @@ module_param(nowayout, bool, 0);
  MODULE_PARM_DESC(nowayout,
  "Set to 1 to keep watchdog running after device release");
  +/* returns true if wdt is running; otherwise returns false */
+static bool wdt_is_running(struct watchdog_device *wdd)
+{
+    struct sp805_wdt *wdt = watchdog_get_drvdata(wdd);
+
+    if ((readl_relaxed(wdt->base + WDTCONTROL) & ENABLE_MASK) ==
+    ENABLE_MASK)
+    return true;
+    else
+    return false;


return !!(readl_relaxed(wdt->base + WDTCONTROL) & ENABLE_MASK));



Note ENABLE_MASK contains two bits (INT_ENABLE and RESET_ENABLE); 
therefore, a simple !!(expression) would not work? That is, the masked 
result needs to be compared against the mask again to ensure both bits 
are set, right?
Ray - your original code looks correct to me.  Easier to read and less 
prone to errors as shown in the attempted translation to a single statement.


Thanks,

Ray




Re: [PATCH V2 3/3] ARM: dts: imx7: correct enet ipg clock

2018-05-23 Thread Stefan Agner
On 18.05.2018 03:01, Anson Huang wrote:
> ENET "ipg" clock should be IMX7D_ENETx_IPG_ROOT_CLK
> rather than IMX7D_ENET_AXI_ROOT_CLK which is for ENET bus
> clock.
> 
> Based on Andy Duan's patch from the NXP kernel tree.
> 
> Signed-off-by: Anson Huang 

Reviewed-by: Stefan Agner 

--
Stefan

> ---
>  arch/arm/boot/dts/imx7d.dtsi | 2 +-
>  arch/arm/boot/dts/imx7s.dtsi | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
> index 200714e..d74dd7f 100644
> --- a/arch/arm/boot/dts/imx7d.dtsi
> +++ b/arch/arm/boot/dts/imx7d.dtsi
> @@ -120,7 +120,7 @@
>   ,
>   ,
>   ;
> - clocks = <&clks IMX7D_ENET_AXI_ROOT_CLK>,
> + clocks = <&clks IMX7D_ENET2_IPG_ROOT_CLK>,
>   <&clks IMX7D_ENET_AXI_ROOT_CLK>,
>   <&clks IMX7D_ENET2_TIME_ROOT_CLK>,
>   <&clks IMX7D_PLL_ENET_MAIN_125M_CLK>,
> diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
> index 4d42335..b90769d 100644
> --- a/arch/arm/boot/dts/imx7s.dtsi
> +++ b/arch/arm/boot/dts/imx7s.dtsi
> @@ -1091,7 +1091,7 @@
>   ,
>   ,
>   ;
> - clocks = <&clks IMX7D_ENET_AXI_ROOT_CLK>,
> + clocks = <&clks IMX7D_ENET1_IPG_ROOT_CLK>,
>   <&clks IMX7D_ENET_AXI_ROOT_CLK>,
>   <&clks IMX7D_ENET1_TIME_ROOT_CLK>,
>   <&clks IMX7D_PLL_ENET_MAIN_125M_CLK>,


Re: [PATCH V3] powercap/drivers/idle_injection: Add an idle injection framework

2018-05-23 Thread Daniel Lezcano
On 23/05/2018 07:41, Viresh Kumar wrote:
> On 22-05-18, 15:42, Daniel Lezcano wrote:
>> On 21/05/2018 12:32, Viresh Kumar wrote:
>>> On 18-05-18, 16:50, Daniel Lezcano wrote:
 Initially, the cpu_cooling device for ARM was changed by adding a new
 policy inserting idle cycles. The intel_powerclamp driver does a
 similar action.

 Instead of implementing idle injections privately in the cpu_cooling
 device, move the idle injection code in a dedicated framework and give
 the opportunity to other frameworks to make use of it.
>>>
>>> I thought you agreed to move above in the comments section ?
>>
>> This is what I did. I just kept the relevant log here.
> 
> The fact that you are stating that you tried to update the cooling
> device earlier looked like a bit of version history to me, not what
> this patch is doing.
> 
> But its okay if you really want that to be preserved in git history :)
> 
 +static void idle_injection_fn(unsigned int cpu)
 +{
 +  struct idle_injection_device *ii_dev;
 +  struct idle_injection_thread *iit;
 +  int run_duration_ms, idle_duration_ms;
 +
 +  ii_dev = per_cpu(idle_injection_device, cpu);
 +
 +  iit = per_cpu_ptr(&idle_injection_thread, cpu);
 +
 +  /*
 +   * Boolean used by the smpboot mainloop and used as a flip-flop
 +   * in this function
 +   */
 +  iit->should_run = 0;
 +
 +  atomic_inc(&ii_dev->count);
 +
 +  idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);
 +
 +  play_idle(idle_duration_ms);
 +
 +  /*
 +   * The last CPU waking up is in charge of setting the timer. If
 +   * the CPU is hotplugged, the timer will move to another CPU
 +   * (which may not belong to the same cluster) but that is not a
 +   * problem as the timer will be set again by another CPU
 +   * belonging to the cluster. This mechanism is self adaptive.
 +   */
 +  if (!atomic_dec_and_test(&ii_dev->count))
 +  return;
 +
 +  run_duration_ms = atomic_read(&ii_dev->run_duration_ms);
>>>
>>> This reads as if it is okay to have run_duration_ms set as 0, so we
>>> run idle loop only once. Which is fine, but why do you mandate this to
>>> be non-zero in idle_injection_start() ?
>>
>> It does not make sense to run this function with a run duration set to
>> zero because we will immediately go to idle again after exiting idle. So
>> the action is exiting. In this context we can't accept to start
>> injecting idle cycles.
> 
> Right and that's why I said "Which is fine" in my comment above. My
> question was more on why we error out in idle_injection_start() if
> run_duration_ms is 0.
> 
> Just for my understanding, is it a valid usecase where we want to run
> the idle loop only once ? i.e. set idle_duration_ms to a non-zero
> value but run_duration_ms to 0 ? In that case we shouldn't check for
> zero run_duration_ms in idle_injection_start().

Yes, that could be a valid use case if we want to synchronously inject
idle cycles without period.

IOW, call play_idle() on a set of cpus at the same time. And the caller
of start is the one with the control of the period.

If you want this usecase, we need to implement more things:
 - single user of the framework: as soon as we register, no-one else can
use the idle injection
 - blocking stop, we wait for all the kthreads to join a barrier before
returning to the caller
 - blocking start, we wait for all the kthreads to end injecting the
idle cycle

 +  if (!run_duration_ms)
 +  return;
 +
 +  hrtimer_start(&ii_dev->timer, ms_to_ktime(run_duration_ms),
 +HRTIMER_MODE_REL_PINNED);
 +}
 +
 +/**
 + * idle_injection_set_duration - idle and run duration helper
 + * @run_duration_ms: an unsigned int giving the running time in 
 milliseconds
 + * @idle_duration_ms: an unsigned int giving the idle time in milliseconds
 + */
 +void idle_injection_set_duration(struct idle_injection_device *ii_dev,
 +   unsigned int run_duration_ms,
 +   unsigned int idle_duration_ms)
 +{
 +  atomic_set(&ii_dev->run_duration_ms, run_duration_ms);
 +  atomic_set(&ii_dev->idle_duration_ms, idle_duration_ms);
>>>
>>> You check for valid values of these in idle_injection_start() but not
>>> here, why ?
>>
>> By checking against a zero values in the start function is a way to make
>> sure we are not starting the idle injection with uninitialized values
>> and by setting the duration to zero is a way to stop the idle injection.
> 
> Why do we need two ways of stopping the idle injection thread ? Why
> isn't just calling idle_injection_stop() the right thing to do in that
> case ?

How do we prevent the last kthread in the idle_injection_fn to set the
timer ?

 +}
 +
 +/**
 + * idle_injection_get_duration - idle and run duration helper
 + * @run_duration_ms: a poi

Re: [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing

2018-05-23 Thread Greg KH
On Tue, May 22, 2018 at 02:02:37PM -0700, Reinette Chatre wrote:
> Hi Greg,
> 
> Thank you very much for taking a look.
> 
> On 5/22/2018 12:43 PM, Greg KH wrote:
> > On Tue, May 22, 2018 at 04:29:22AM -0700, Reinette Chatre wrote:
> >> @@ -149,6 +151,9 @@ struct pseudo_lock_region {
> >>unsigned intline_size;
> >>unsigned intsize;
> >>void*kmem;
> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +  struct dentry   *debugfs_dir;
> >> +#endif
> > 
> > Who cares, just always have this here, it's not going to save you
> > anything to #ifdef the .c code everywhere just for this one pointer.
> 
> ok
> 
> > 
> >> @@ -174,6 +180,9 @@ static void pseudo_lock_region_clear(struct 
> >> pseudo_lock_region *plr)
> >>plr->d->plr = NULL;
> >>plr->d = NULL;
> >>plr->cbm = 0;
> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +  plr->debugfs_dir = NULL;
> >> +#endif
> > 
> > See?  Ick.
> > 
> >> +  ret = strtobool(buf, &bv);
> >> +  if (ret == 0 && bv) {
> >> +  ret = debugfs_file_get(file->f_path.dentry);
> >> +  if (unlikely(ret))
> >> +  return ret;
> > 
> > Only ever use unlikely/likely if you can measure the performance
> > difference.  Hint, you can't do that here, it's not needed at all.
> 
> Here my intention was to follow the current best practices and in the
> kernel source I am working with eight of the ten usages of
> debugfs_file_get() is followed by an unlikely(). My assumption was thus
> that this is a best practice. Thanks for catching this - I'll change it.

Really?  That's some horrible examples, any pointers to them?  I think I
need to do a massive sweep of the kernel tree and fix up all of this
crud so that people don't keep cut/paste the same bad code everywhere.

> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +  plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
> >> +debugfs_resctrl);
> >> +  if (IS_ERR(plr->debugfs_dir)) {
> >> +  ret = PTR_ERR(plr->debugfs_dir);
> >> +  plr->debugfs_dir = NULL;
> >> +  goto out_region;
> > 
> > Ick no, you never need to care about the return value of a debugfs call.
> > You code should never do something different if a debugfs call succeeds
> > or fails.  And you are checking it wrong, even if you did want to do
> > this :)
> 
> Ah - I see I need to be using IS_ERR_OR_NULL() instead of IS_ERR()? If
> this is the case then please note that there seems to be quite a few
> debugfs_create_dir() calls within the kernel that have the same issue.

Again, they are all wrong :)

Just ignore the return value, unless it is a directory, and then just
save it like you are here.  Don't check the value, you can always pass
it into a future debugfs call with no problems.

> >> +  }
> >> +
> >> +  entry = debugfs_create_file("pseudo_lock_measure", 0200,
> >> +  plr->debugfs_dir, rdtgrp,
> >> +  &pseudo_measure_fops);
> >> +  if (IS_ERR(entry)) {
> >> +  ret = PTR_ERR(entry);
> >> +  goto out_debugfs;
> >> +  }
> > 
> > Again, you don't care, don't do this.
> > 
> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +  debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
> >> +#endif
> > 
> > Don't put ifdefs in .c files, it's not the Linux way at all.  You can
> > make this a lot simpler/easier to maintain over time if you do not.
> 
> My mistake - I assumed this would be ok based on my interpretation of
> how CONFIG_GENERIC_IRQ_DEBUGFS is used.
> 
> I could rework the debugfs code to be contained in a new debugfs
> specific .c file that is only compiled if the configuration is set. The
> ifdefs will then be restricted to a .h file that contains the
> declarations of these debugfs functions with empty variants when the
> user did not select the debugfs config option.
> 
> Would that be acceptable to you?

Yes, that is the correct way to do this.

But why would someone _not_ want this option?  Why not always just
include the functionality, that way you don't have to ask someone to
rebuild a kernel if you need that debug information.  And distros will
always enable the option anyway, so it's not like you are keeping things
"smaller", if you disable debugfs, all of that code should just compile
away to almost nothing anyway.

thanks,

greg k-h


Re: [PATCH v8] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()

2018-05-23 Thread Nicholas Piggin
On Wed, 23 May 2018 09:31:33 +0200
Christophe LEROY  wrote:

> Le 23/05/2018 à 09:17, Nicholas Piggin a écrit :
> > On Wed, 23 May 2018 09:01:19 +0200 (CEST)
> > Christophe Leroy  wrote:
> >   

> >> @@ -264,8 +266,30 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
> >> unsigned long address,
> >> * between the last mapped region and the stack will
> >> * expand the stack rather than segfaulting.
> >> */
> >> -  if (address + 2048 < uregs->gpr[1] && !store_update_sp)
> >> -  return true;
> >> +  if (address + 2048 >= uregs->gpr[1])
> >> +  return false;
> >> +  if (is_retry)
> >> +  return false;
> >> +
> >> +  if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
> >> +  access_ok(VERIFY_READ, nip, sizeof(inst))) {
> >> +  int res;
> >> +
> >> +  pagefault_disable();
> >> +  res = __get_user_inatomic(inst, nip);
> >> +  pagefault_enable();
> >> +  if (res) {
> >> +  up_read(&mm->mmap_sem);
> >> +  res = __get_user(inst, nip);
> >> +  if (!res && store_updates_sp(inst))
> >> +  return -1;
> >> +  return true;
> >> +  }
> >> +  if (store_updates_sp(inst))
> >> +  return false;
> >> +  }
> >> +  up_read(&mm->mmap_sem);  
> > 
> > Starting to look pretty good... I think probably I prefer the mmap_sem
> > drop going into the caller so we don't don't drop in the child function.  
> 
> Yes I can do that. I though it was ok as the drop is already done in 
> children functions like bad_area(), bad_access(), ...

That's true, all exit functions though. I think it may end up being a
bit nicer with the up_read in the caller, but see what you think.

> > I thought the retry logic was a little bit complex too, what do you
> > think of using fault_in_pages_readable and just doing a full retry to
> > avoid some of this complexity?  
> 
> Yes lets try that way, allthough fault_in_pages_readable() is nothing 
> else than a get_user().
> Should we take any precaution to avoid retrying forever or is it just 
> not worth it ?

generic_perform_write() the core of the data copying for write(2)
syscall does this retry, so I think it's okay... Although I think I
wrote that so maybe that's a circular justification.

I think if we end up thrashing on this type of loop for a long time,
the system will already be basically dead.


> >>/* The stack is being expanded, check if it's valid */
> >> -  if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
> >> -  return bad_area(regs, address);
> >> +  is_bad = bad_stack_expansion(regs, address, vma, flags, is_retry);
> >> +  if (unlikely(is_bad == -1)) {
> >> +  is_retry = true;
> >> +  goto retry;
> >> +  }
> >> +  if (unlikely(is_bad))
> >> +  return bad_area_nosemaphore(regs, address);  
> > 
> > Suggest making the return so that you can do a single unlikely test for
> > the retry or bad case, and then distinguish the retry in there. Code
> > generation should be better.  
> 
> Ok. I'll try and come with v9 during this morning.

Thanks,
Nick


Re: [PATCH 2/4] clocksource: timer-imx-gpt: Switch to SPDX identifier

2018-05-23 Thread Daniel Lezcano
On 23/05/2018 01:05, Fabio Estevam wrote:
> From: Fabio Estevam 
> 
> Adopt the SPDX license identifier headers to ease license compliance
> management.
> 
> Signed-off-by: Fabio Estevam 
> ---
>  drivers/clocksource/timer-imx-gpt.c | 26 ++
>  1 file changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/clocksource/timer-imx-gpt.c 
> b/drivers/clocksource/timer-imx-gpt.c
> index b63b834..165fbbb 100644
> --- a/drivers/clocksource/timer-imx-gpt.c
> +++ b/drivers/clocksource/timer-imx-gpt.c
> @@ -1,23 +1,9 @@
> -/*
> - *  Copyright (C) 2000-2001 Deep Blue Solutions
> - *  Copyright (C) 2002 Shane Nay (sh...@minirl.com)
> - *  Copyright (C) 2006-2007 Pavel Pisa (pp...@pikron.com)
> - *  Copyright (C) 2008 Juergen Beisert (ker...@pengutronix.de)
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License
> - * as published by the Free Software Foundation; either version 2
> - * of the License, or (at your option) any later version.
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
> - * MA 02110-1301, USA.
> - */
> +// SPDX-License-Identifier: GPL-2.0+
> +//
> +//  Copyright (C) 2000-2001 Deep Blue Solutions
> +//  Copyright (C) 2002 Shane Nay (sh...@minirl.com)
> +//  Copyright (C) 2006-2007 Pavel Pisa (pp...@pikron.com)
> +//  Copyright (C) 2008 Juergen Beisert (ker...@pengutronix.de)

Hi Philippe,

I went through the code and didn't find any information about the format
of the lines following the SPDX, it seems it is relatively free.

Can you confirm the above changes are ok ?

Thanks

  -- Daniel


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH 4/5] acpi/processor: Fix the return value of acpi_processor_ids_walk()

2018-05-23 Thread Rafael J. Wysocki
On Wed, May 23, 2018 at 3:34 AM, Dou Liyang  wrote:
> At 05/22/2018 09:47 AM, Dou Liyang wrote:
>>
>>
>>
>> At 05/19/2018 11:06 PM, Thomas Gleixner wrote:
>>>
>>> On Tue, 20 Mar 2018, Dou Liyang wrote:
>>>
 ACPI driver should make sure all the processor IDs in their ACPI
 Namespace
 are unique for CPU hotplug. the driver performs a depth-first walk of
 the
 namespace tree and calls the acpi_processor_ids_walk().

 But, the acpi_processor_ids_walk() will return true if one processor is
 checked, that cause the walk break after walking pass the first
 processor.

 Repace the value with AE_OK which is the standard acpi_status value.

 Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for
 processor enumeration")

 Signed-off-by: Dou Liyang 
 ---
   drivers/acpi/acpi_processor.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/acpi/acpi_processor.c
 b/drivers/acpi/acpi_processor.c
 index 449d86d39965..db5bdb59639c 100644
 --- a/drivers/acpi/acpi_processor.c
 +++ b/drivers/acpi/acpi_processor.c
 @@ -663,11 +663,11 @@ static acpi_status __init (acpi_handle handle,
   }
   processor_validated_ids_update(uid);
 -return true;
 +return AE_OK;
   err:
   acpi_handle_info(handle, "Invalid processor object\n");
 -return false;
 +return AE_OK;
>>>
>>>
>>> I'm not sure whether this is the right return value here. Rafael?
>>>
>
> +Cc Rafael's common used email address.
>
> I am sorry, I created the cc list using ./script/get_maintainers.pl ...
> and didn't check it.

No worries, I saw your messages, but thanks!


Re: [PATCH v2 1/5] gpio: syscon: allow fetching syscon from parent node

2018-05-23 Thread Linus Walleij
On Fri, May 18, 2018 at 5:52 AM,   wrote:

> From: Heiko Stuebner 
>
> Syscon nodes can be a simple-mfd and the syscon-users then be declared
> as children of this node. That way the parent-child structure can be
> better represented for devices that are fully embedded in the syscon.
>
> Therefore allow getting the syscon from the parent if neither
> a special compatible nor a gpio,syscon-dev property is defined.
>
> Signed-off-by: Heiko Stuebner 
> Signed-off-by: Levin Du 
> ---
>
> Changes in v2: None
> Changes in v1:
> - New: allow fetching syscon from parent node in gpio-syscon driver

Regardless of what happens with the rest of the patches this
looks sane and generally useful, so patch applied!

Yours,
Linus Walleij


Re: [PATCH] mm: save two stranding bit in gfp_mask

2018-05-23 Thread kbuild test robot
Hi Shakeel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]
[also build test WARNING on v4.17-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Shakeel-Butt/mm-save-two-stranding-bit-in-gfp_mask/20180518-202316
base:   git://git.cmpxchg.org/linux-mmotm.git master


vim +/jl +2585 fs/reiserfs/journal.c

^1da177e Linus Torvalds 2005-04-16  2573  
^1da177e Linus Torvalds 2005-04-16  2574  static struct reiserfs_journal_list 
*alloc_journal_list(struct super_block *s)
^1da177e Linus Torvalds 2005-04-16  2575  {
^1da177e Linus Torvalds 2005-04-16  2576struct reiserfs_journal_list 
*jl;
8c777cc4 Pekka Enberg   2006-02-01  2577jl = kzalloc(sizeof(struct 
reiserfs_journal_list),
8c777cc4 Pekka Enberg   2006-02-01  2578 GFP_NOFS | 
__GFP_NOFAIL);
^1da177e Linus Torvalds 2005-04-16  2579INIT_LIST_HEAD(&jl->j_list);
^1da177e Linus Torvalds 2005-04-16  2580
INIT_LIST_HEAD(&jl->j_working_list);
^1da177e Linus Torvalds 2005-04-16  2581
INIT_LIST_HEAD(&jl->j_tail_bh_list);
^1da177e Linus Torvalds 2005-04-16  2582INIT_LIST_HEAD(&jl->j_bh_list);
90415dea Jeff Mahoney   2008-07-25  2583mutex_init(&jl->j_commit_mutex);
^1da177e Linus Torvalds 2005-04-16  2584SB_JOURNAL(s)->j_num_lists++;
^1da177e Linus Torvalds 2005-04-16 @2585get_journal_list(jl);
^1da177e Linus Torvalds 2005-04-16  2586return jl;
^1da177e Linus Torvalds 2005-04-16  2587  }
^1da177e Linus Torvalds 2005-04-16  2588  

:: The code at line 2585 was first introduced by commit
:: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:: TO: Linus Torvalds 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH v3 5/6] spi: at91-usart: add driver for at91-usart as spi

2018-05-23 Thread Radu Pirea



On 05/17/2018 08:04 AM, Mark Brown wrote:

On Fri, May 11, 2018 at 01:38:21PM +0300, Radu Pirea wrote:


+config SPI_AT91_USART
+tristate "Atmel USART Controller as SPI"
+   depends on HAS_DMA
+   depends on (ARCH_AT91 || COMPILE_TEST)
+select MFD_AT91_USART
+   help
+ This selects a driver for the AT91 USART Controller as SPI Master,
+ present on AT91 and SAMA5 SoC series.
+


This looks like there's some tab/space mixing going on here.


+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for AT91 USART Controllers as SPI
+ *
+ * Copyright (C) 2018 Microchip Technology Inc.


Make the entire block a C++ comment so it looks more intentional rather
tha mixing C and C++.


Hi Mark,

I know it's ugly, but SPDX license identifier must be in a separate 
comment block.





+static inline void at91_usart_spi_tx(struct at91_usart_spi *aus)
+{
+   unsigned int len = aus->current_transfer->len;
+   unsigned int remaining = aus->current_tx_remaining_bytes;
+   const u8  *tx_buf = aus->current_transfer->tx_buf;
+
+   if (tx_buf && remaining) {
+   if (at91_usart_spi_tx_ready(aus))
+   spi_writel(aus, THR, tx_buf[len - remaining]);
+   aus->current_tx_remaining_bytes--;


Missing braces here - we only write to the FIFO if there's space but we
unconditionally decrement the counter.



Thanks. I will fix it.


+   } else {
+   if (at91_usart_spi_tx_ready(aus))
+   spi_writel(aus, THR, US_DUMMY_TX);
+   }
+}


This looks like you're open coding SPI_CONTROLLER_MUST_TX


+   int len = aus->current_transfer->len;
+   int remaining = aus->current_rx_remaining_bytes;
+   u8  *rx_buf = aus->current_transfer->rx_buf;
+
+   if (aus->current_rx_remaining_bytes) {
+   rx_buf[len - remaining] = spi_readb(aus, RHR);
+   aus->current_rx_remaining_bytes--;
+   } else {
+   spi_readb(aus, RHR);
+   }


Similarly for _MUST_RX.


+   controller->flags = SPI_MASTER_MUST_RX | SPI_MASTER_MUST_TX;


You're actually setting both flags...  this means that the handling for
cases with missing TX or RX buffers can't happen.


Sorry. My mistake. I will remove unnecessary code.


Re: [PATCH 04/10] vfio: ccw: replace IO_REQ event with SSCH_REQ event

2018-05-23 Thread Cornelia Huck
On Wed, 23 May 2018 09:50:00 +0200
Pierre Morel  wrote:

> On 22/05/2018 17:41, Cornelia Huck wrote:
> > On Fri, 4 May 2018 13:02:36 +0200
> > Pierre Morel  wrote:
> >  
> >> On 04/05/2018 03:19, Dong Jia Shi wrote:  
> >>> * Pierre Morel  [2018-05-03 16:26:29 +0200]:
> >>> 
>  On 02/05/2018 09:46, Dong Jia Shi wrote:  
> > * Cornelia Huck  [2018-04-30 17:33:05 +0200]:
> > 
> >> On Thu, 26 Apr 2018 15:48:06 +0800
> >> Dong Jia Shi  wrote:
> >> 
> >>> * Dong Jia Shi  [2018-04-26 15:30:54 
> >>> +0800]:
> >>>
> >>> [...]
> >>> 
> > @@ -179,7 +160,7 @@ static int fsm_irq(struct vfio_ccw_private 
> > *private,
> > if (private->io_trigger)
> > eventfd_signal(private->io_trigger, 1);
> >
> > -   return private->state;
> > +   return VFIO_CCW_STATE_IDLE;  
>  This is not right. For example, if we are in STANDBY state (subch 
>  driver
>  is probed, but mdev device is not created), we can not jump to IDLE
>  state.  
> >>> I see my problem, for STANDBY state, we should introduce another event
> >>> callback for VFIO_CCW_EVENT_INTERRUPT. It doesn't make sense to call
> >>> fsm_irq() which tries to signal userspace with interrupt notification
> >>> when mdev is not created yet... So we'd need a separated fix for this
> >>> issue too.  
> >> But how do we even get into that situation when we don't have an mdev
> >> yet?
> >> 
> > We cann't... So let's assign fsm_nop() as the interrupt callback for
> > STANDBY state?
> > 
>  :) Isn't it exactly what my patch series handle?  
> >>> As far as I see, that's not true. ;)
> >>>
> >>> After this series applied,
> >>> vfio_ccw_jumptable[VFIO_CCW_STATE_STANDBY][VFIO_CCW_EVENT_INTERRUPT] is
> >>> still fsm_irq().
> >>> 
> >>
> >> What I mean is, this code tries to handle design problems
> >> without changing too much of the original code at first.
> >>
> >> The problem here is not that the fsm_irq function is called on interrupt,
> >> if we have an interrupt it must be signaled to user land.
> >> The problem is that this state is entered at the wrong moment.
> >>
> >> STANDBY should be entered, during the mdev_open when we realize the QEMU
> >> device,
> >> and not during the probe, in which we should stay in NOT_OPER until we
> >> get the QEMU device.
> >>
> >> The probe() and mdev_open() function should be modified, not the state
> >> table.  
> > So, the takeaway is that we should handle starting via the init
> > callbacks and not via the state machine?
> >  
> hum, sorry, I think that my previous answer was not completely right,
> and did not really answer to Dong Jia comment, yes fsm_irq was not
> at its place, thinking again about the comments of both of you
> I think that we can suppress the INIT event.
> 
> I would like to rebase the patch to include the comments you both did.
> 
> 

Yes, a respin is probably best before we get confused even more :)


Re: [PATCH] PCI / PM: Do not clear state_saved for devices that remain suspended

2018-05-23 Thread Rafael J. Wysocki
On Wed, May 23, 2018 at 12:01 AM, Bjorn Helgaas  wrote:
> On Fri, May 18, 2018 at 10:17:42AM +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki 
>>
>> The state_saved flag should not be cleared in pci_pm_suspend() if the
>> given device is going to remain suspended, or the device's config
>> space will not be restored properly during the subsequent resume.
>>
>> Namely, if the device is going to stay in suspend, both the late
>> and noirq callbacks return early for it, so if its state_saved flag
>> is cleared in pci_pm_suspend(), it will remain unset throughout the
>> remaining part of suspend and resume and pci_restore_state() called
>> for the device going forward will return without doing anything.
>>
>> For this reason, change pci_pm_suspend() to only clear state_saved
>> if the given device is not going to remain suspended.  [This is
>> analogous to what commit ae860a19f37c (PCI / PM: Do not clear
>> state_saved in pci_pm_freeze() when smart suspend is set) did for
>> hibernation.]
>>
>> Fixes: c4b65157aeef (PCI / PM: Take SMART_SUSPEND driver flag into account)
>> Signed-off-by: Rafael J. Wysocki 
>
> Acked-by: Bjorn Helgaas 
>
> I assume you'll take this one, too.

Yes, I will, thank you!


Re: [PATCH v1] MIPS: PCI: Use dev_printk() when possible

2018-05-23 Thread James Hogan
On Tue, May 22, 2018 at 08:11:42AM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> Use the pci_info() and pci_err() wrappers for dev_printk() when possible.
> 
> Signed-off-by: Bjorn Helgaas 
> ---
>  arch/mips/pci/pci-legacy.c |7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c
> index 0c65c38e05d6..73643e80f02d 100644
> --- a/arch/mips/pci/pci-legacy.c
> +++ b/arch/mips/pci/pci-legacy.c
> @@ -263,9 +263,7 @@ static int pcibios_enable_resources(struct pci_dev *dev, 
> int mask)
>   (!(r->flags & IORESOURCE_ROM_ENABLE)))
>   continue;
>   if (!r->start && r->end) {
> - printk(KERN_ERR "PCI: Device %s not available "
> -"because of resource collisions\n",
> -pci_name(dev));
> + pci_err(dev, "can't enable device: resource 
> collisions\n");

The pedantic side of me wants to point out that you could wrap that line
after the comma to keep it within 80 columns.

Either way though:
Acked-by: James Hogan 

Cheers
James


signature.asc
Description: PGP signature


Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked

2018-05-23 Thread Rafael J. Wysocki
On Wed, May 23, 2018 at 12:09 AM, Joel Fernandes  wrote:
> On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote:
>> Okay, me and Rafael were discussing this patch, locking and races around 
>> this.
>>
>> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
>> > diff --git a/kernel/sched/cpufreq_schedutil.c 
>> > b/kernel/sched/cpufreq_schedutil.c
>> > index e13df951aca7..5c482ec38610 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy 
>> > *sg_policy, u64 time)
>> > !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>> > return false;
>> >
>> > -   if (sg_policy->work_in_progress)
>> > -   return false;
>> > -
>> > if (unlikely(sg_policy->need_freq_update)) {
>> > sg_policy->need_freq_update = false;
>> > /*
>> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy 
>> > *sg_policy, u64 time,
>> >
>> > policy->cur = next_freq;
>> > trace_cpu_frequency(next_freq, smp_processor_id());
>> > -   } else {
>> > +   } else if (!sg_policy->work_in_progress) {
>> > sg_policy->work_in_progress = true;
>> > irq_work_queue(&sg_policy->irq_work);
>> > }
>> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct 
>> > update_util_data *hook, u64 time,
>> >
>> > ignore_dl_rate_limit(sg_cpu, sg_policy);
>> >
>> > +   /*
>> > +* For slow-switch systems, single policy requests can't run at the
>> > +* moment if update is in progress, unless we acquire update_lock.
>> > +*/
>> > +   if (sg_policy->work_in_progress)
>> > +   return;
>> > +
>> > if (!sugov_should_update_freq(sg_policy, time))
>> > return;
>> >
>> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, 
>> > u64 time, unsigned int flags)
>> >  static void sugov_work(struct kthread_work *work)
>> >  {
>> > struct sugov_policy *sg_policy = container_of(work, struct 
>> > sugov_policy, work);
>> > +   unsigned int freq;
>> > +   unsigned long flags;
>> > +
>> > +   /*
>> > +* Hold sg_policy->update_lock shortly to handle the case where:
>> > +* incase sg_policy->next_freq is read here, and then updated by
>> > +* sugov_update_shared just before work_in_progress is set to false
>> > +* here, we may miss queueing the new update.
>> > +*
>> > +* Note: If a work was queued after the update_lock is released,
>> > +* sugov_work will just be called again by kthread_work code; and the
>> > +* request will be proceed before the sugov thread sleeps.
>> > +*/
>> > +   raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
>> > +   freq = sg_policy->next_freq;
>> > +   sg_policy->work_in_progress = false;
>> > +   raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>> >
>> > mutex_lock(&sg_policy->work_lock);
>> > -   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
>> > -   CPUFREQ_RELATION_L);
>> > +   __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
>> > mutex_unlock(&sg_policy->work_lock);
>> > -
>> > -   sg_policy->work_in_progress = false;
>> >  }
>>
>> And I do see a race here for single policy systems doing slow switching.
>>
>> Kthread Sched update
>>
>> sugov_work()sugov_update_single()
>>
>> lock();
>> // The CPU is free to rearrange below
>> // two in any order, so it may clear
>> // the flag first and then read next
>> // freq. Lets assume it does.
>> work_in_progress = false
>>
>> if (work_in_progress)
>> return;
>>
>> sg_policy->next_freq 
>> = 0;
>> freq = sg_policy->next_freq;
>> sg_policy->next_freq 
>> = real-next-freq;
>> unlock();
>>
>
> I agree with the race you describe for single policy slow-switch. Good find :)
>
> The mainline sugov_work could also do such reordering in sugov_work, I think. 
> Even
> with the mutex_unlock in mainline's sugov_work, that work_in_progress write 
> could
> be reordered by the CPU to happen before the read of next_freq. AIUI,
> mutex_unlock is expected to be only a release-barrier.
>
> Although to be safe, I could just put an smp_mb() there. I believe with that,
> no locking would be needed for such case.

Yes, but leaving the work_in_progress check in sugov_update_single()
means that the original problem is still there in the one-CPU policy
case.  Namely, utilization updates coming in between setting
work_in_progress in sugov_update_commit() and clearing it in
sugov_work() will be discarded in the one-CPU polic

Re: [PATCH v3] arm64: allwinner: a64: Add Amarula A64-Relic initial support

2018-05-23 Thread Maxime Ripard
On Wed, May 23, 2018 at 11:44:56AM +0530, Jagan Teki wrote:
> On Tue, May 22, 2018 at 8:00 PM, Maxime Ripard
>  wrote:
> > On Tue, May 22, 2018 at 06:52:28PM +0530, Jagan Teki wrote:
> >> Amarula A64-Relic is Allwinner A64 based IoT device, which support
> >> - Allwinner A64 Cortex-A53
> >> - Mali-400MP2 GPU
> >> - AXP803 PMIC
> >> - 1GB DDR3 RAM
> >> - 8GB eMMC
> >> - AP6330 Wifi/BLE
> >> - MIPI-DSI
> >> - CSI: OV5640 sensor
> >> - USB OTG
> >
> > You claim that this is doing OTG...
> >
> > [..]
> >
> >> +&usb_otg {
> >> + dr_mode = "peripheral";
> >> + status = "okay";
> >> +};
> >
> > ... and yet you're setting it as peripheral...
> 
> Though it claims OTG, board doesn't have any USB ports to operate(not
> even Mini-AB) the only way to use the board as peripheral to transfer
> images from host.

I'm not sure what you mean here. If there's no USB connector, why do
you even enable it?

maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature


Re: [PATCH] mm: save two stranding bit in gfp_mask

2018-05-23 Thread Michal Hocko
On Wed 23-05-18 16:08:28, kbuild test robot wrote:
> Hi Shakeel,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on mmotm/master]
> [also build test WARNING on v4.17-rc6]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Shakeel-Butt/mm-save-two-stranding-bit-in-gfp_mask/20180518-202316
> base:   git://git.cmpxchg.org/linux-mmotm.git master
> 

What is the warning? Btw. this smells like a failure in the script of
some sort. The patch you are referring doesn't really change any code
except using different valuues for gfp constants which shouldn't make
any difference to any code.

> vim +/jl +2585 fs/reiserfs/journal.c
> 
> ^1da177e Linus Torvalds 2005-04-16  2573  
> ^1da177e Linus Torvalds 2005-04-16  2574  static struct reiserfs_journal_list 
> *alloc_journal_list(struct super_block *s)
> ^1da177e Linus Torvalds 2005-04-16  2575  {
> ^1da177e Linus Torvalds 2005-04-16  2576  struct reiserfs_journal_list 
> *jl;
> 8c777cc4 Pekka Enberg   2006-02-01  2577  jl = kzalloc(sizeof(struct 
> reiserfs_journal_list),
> 8c777cc4 Pekka Enberg   2006-02-01  2578   GFP_NOFS | 
> __GFP_NOFAIL);
> ^1da177e Linus Torvalds 2005-04-16  2579  INIT_LIST_HEAD(&jl->j_list);
> ^1da177e Linus Torvalds 2005-04-16  2580  
> INIT_LIST_HEAD(&jl->j_working_list);
> ^1da177e Linus Torvalds 2005-04-16  2581  
> INIT_LIST_HEAD(&jl->j_tail_bh_list);
> ^1da177e Linus Torvalds 2005-04-16  2582  INIT_LIST_HEAD(&jl->j_bh_list);
> 90415dea Jeff Mahoney   2008-07-25  2583  mutex_init(&jl->j_commit_mutex);
> ^1da177e Linus Torvalds 2005-04-16  2584  SB_JOURNAL(s)->j_num_lists++;
> ^1da177e Linus Torvalds 2005-04-16 @2585  get_journal_list(jl);
> ^1da177e Linus Torvalds 2005-04-16  2586  return jl;
> ^1da177e Linus Torvalds 2005-04-16  2587  }
> ^1da177e Linus Torvalds 2005-04-16  2588  
> 
> :: The code at line 2585 was first introduced by commit
> :: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2
> 
> :: TO: Linus Torvalds 
> :: CC: Linus Torvalds 
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v6 2/6] dt-bindings: Add the rzn1-clocks.h file

2018-05-23 Thread M P
Morning Geert,

On Wed, 23 May 2018 at 08:26, Geert Uytterhoeven 
wrote:

> Hi Michel,

> On Wed, May 23, 2018 at 8:44 AM, M P  wrote:
> > On Tue, 22 May 2018 at 19:44, Geert Uytterhoeven 
> > wrote:
> >> On Tue, May 22, 2018 at 12:01 PM, Michel Pollet
> >>  wrote:
> >> > This adds the constants necessary to use the renesas,rzn1-clocks
driver.
> >> >
> >> > Signed-off-by: Michel Pollet 

> >> > --- /dev/null
> >> > +++ b/include/dt-bindings/clock/rzn1-clocks.h
> >
> >> Given this is part of the DT ABI, and there exist multiple different
RZ/N1
> >> SoCs (and there are probably planned more), I wouldn't call this header
> >> file "rzn1-clocks.h", but e.g. "r9a06g032-clocks.h".
> >
> > Actually, no, there already are two r906g03X devices that will work
> > perfectly fine with this driver. We had that discussion before, and you
> > insist and me removing mentions of the rzn1 everywhere, however, this
> > applies to *two* devices already, and I'm supposed to upstream support
for
> > them. I can't rename r9g06g032 because it is *inexact* that's why it's

> My worry is not that there are two r906g03X devices that will work fine
> with this driver, but that there will be other "rzn1" devices that will
not
> work with these bindings (the header file is part of the bindings).
> Besides, RZ/N1D and RZ/N1S (Which apparently differ in packaging only?
> Oh no, RZ/N1D (the larger package) has less QSPI channels than RZ/N1S
> (the smaller package)), there's also (at least) RZ/N1L.

> > called rzn1. So unless you let me call it r9a06g0xx-clocks.h (which i
know
> > you won't as per multiple previous discussions) this can't be called
> > r9a06g032 because it won't be fit for my purpose when I try to bring
back
> > the RZ/N1S into the picture.

> You can add r9a06g033-clocks.h when adding support for RZ/N1S.

So it is now acceptable to duplicate a huge amount of code, and constants
when in fact there differences are so minor they would require minimal
amount of code to take care of them? That just flies straight against my
30+ years of programming -- We're going to have twice the *identical* code,
twice the header, and completely incompatible device tree files -- I mean,
*right now* our rzn1.dtsi works *as is* on the 1D and 1S, we've got ONE
file to maintain, and you can switch your CPU board from 1D to 1S and your
'board file' can stay the same.

Wasn't it the idea of that stuff in the first place? Isn't it in the
customer/engineer interest to be able to cross grade from one
manufacturer's device *in the same series* to another without having to
duplicate his whole board file?

> > There are minor difference to clocking,

> Aha?

Sure, 1S doesn't' have DDR, 1D doesn't have the second QSPI. That's about
it (I lie, theres a few other bits I'm sure). It's not like it won't even
*work* or anything, the registers are there, the bit positions are there,
all is the same, I'm *sure* that's what the compatible="" thing were
supposed to be used for, isn't it? Heck, I'm pretty sure there's a register
in sysctrl, that tells me that anyway, so I wouldn't even have to have a
special compatible= -- I didn't do it since the driver is already so big.


> > I don't know if Renesas plans to release any more rzn1's in this series,
> > but my little finger tells me this isn't the case. But regardless of
what

> We thought the same thing when the first RZ member (RZ/A1H) showed up.
> Did we know this was not going to be the first SoC of a new RZ family, but
> the first SoC of the first subfamily (RZ/A) of the RZ family... And the
> various subfamilies bear not much similarity.

> > we plan, Marketing will screw it up.

> Correct. And to mitigate that, we have no other choice than to use the
real
> part numbers to differentiate. Once bitten, twice shy.

It's not mitigation from where I stand -- it's a gigantic kludge; To handle
one exception, you throw away the baby with the bathwater. From where I
sit, it's like having to a different screwdriver for the screws on the left
of a panel vs the right of the panel.

Sorry to come out as pretty miffed -- I've just spent weeks polishing up a
driver to make it more or less similar to what they were 10 years ago (whoo
look a platform file with a big table in it!), after throwing away all the
work I had done to make it all device-tree based and make the code as
agnostic as we could -- and now it turns out we need to make it even worse
by throwing away the fact it actually *does* work on two SoC -- and that
just because... because what, again?

What about *making up names* -- The 'family names' can/will change -- the
part numbers are *too limited in scope* -- why not just make up names? does
it matter as long as it's close to reality and are documented? I dunno,
"rzn1_18" or "rzn1_mk1" and so we have a way out when they release a new
one next year? It seems to be working fine for cars "I got a 2018's
 "...

Cheers,
Michel



> Gr{oetje,eeting}s,

>  Geert

> --
> Geert Uytterhoeven --

Re: [PATCH 04/10] vfio: ccw: replace IO_REQ event with SSCH_REQ event

2018-05-23 Thread Pierre Morel

On 22/05/2018 17:38, Cornelia Huck wrote:

[still backlog processing...]

On Thu, 3 May 2018 14:06:51 +0200
Pierre Morel  wrote:


On 30/04/2018 17:30, Cornelia Huck wrote:

On Wed, 25 Apr 2018 15:52:19 +0200
Pierre Morel  wrote:
  

On 25/04/2018 10:41, Cornelia Huck wrote:

On Thu, 19 Apr 2018 16:48:07 +0200
Pierre Morel  wrote:

diff --git a/drivers/s390/cio/vfio_ccw_private.h 
b/drivers/s390/cio/vfio_ccw_private.h
index 3284e64..93aab87 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -76,7 +76,7 @@ enum vfio_ccw_state {
 */
enum vfio_ccw_event {
VFIO_CCW_EVENT_NOT_OPER,
-   VFIO_CCW_EVENT_IO_REQ,
+   VFIO_CCW_EVENT_SSCH_REQ,
VFIO_CCW_EVENT_INTERRUPT,
VFIO_CCW_EVENT_SCH_EVENT,
/* last element! */

I don't think we should separate the ssch handling. The major
difference to halt/clear is that it needs channel program translation.
Everything else (issuing the instruction and processing the interrupt)
are basically the same. If we just throw everything at the hardware
and let the host's channel subsystem figure it out, we already should
be fine with regard to most of the races.

We must test at a moment or another the kind of request we do,
cancel, halt and clear only need the subchannel id in register 1 and as
you said are much more direct to implement.

If we do not separate them here, we need a switch in the "do_io_request"
function.
Is it what you mean?

Yes. Most of the handling should be the same for any function.

I really don't know, the 4 functions are quite different.

- SSCH uses an ORB, and has a quite long kernel execution time for VFIO
- there is a race between SSCH and the others instructions
- XSCH makes subchannel no longer start pending, also reset the busy
indications
- CSCH cancels both SSCH and HSCH instruction, and perform path management
- HSCH has different busy (entry) conditions

Roughly speaking, we have two categories: An asynchronous function is
performed (SSCH, HSCH, CSCH) or not (XSCH). So I would split out XSCH
in any case.

SSCH, HSCH, CSCH all perform path management. I see them as kind of
escalating (i.e. CSCH 'beats' HSCH which 'beats' SSCH). I think they
are all similar enough, though, as we can call through to the real
hardware and have it sorted out there.

Looking through the channel I/O instructions:
- RSCH should be handled with SSCH (as a special case).
- MSCH should also be handled in the long run, STSCH as well.
- SCHM is interesting, as it's not per-subchannel. We have some basic
   handling of the instruction in QEMU, but it only emulates some ssch
   counters and completely lacks support for the other fields.
- IIRC, there's also a CHSC command dealing with channel monitoring. We
   currently fence off any CHSC that is not needed for Linux to run, but
   there are some that might be useful for the guest (path handling
   etc.) Hard to come to a conclusion here without access to the
   documentation.
- I don't think we need to care about TSCH (other than keeping the
   schib up to date, which we also need to do for STSCH).
- Likewise, TPI should be handled via emulation.

Coming back to the original issue, I think we can easily handle SSCH
(and RSCH), HSCH and CSCH together (with the actual hardware doing the
heavy lifting anyway). For other instructions, we need separate
states/processing.



OK, I make the next version with this in mind.

Thanks

Pierre


--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany



linux-kernel@vger.kernel.org

2018-05-23 Thread Peter Zijlstra
On Tue, May 22, 2018 at 02:31:42PM -0700, Linus Torvalds wrote:
> On Tue, May 22, 2018 at 2:17 PM Peter Zijlstra  wrote:
> 
> > qrwlock is a fair lock and should not exhibit writer starvation.
> 
> We actually have a special rule to make it *not* be fair, in that
> interrupts are allowed to take the read lock if there are readers - even if
> there are waiting writers.

Urgh, right.. would be interesting to know how much of that is happening
in that workload. I assumed the readers were mostly due to the syscalls
the reporter talked about, and those should not trigger that case.

> > You basically want to spin-wait with interrupts enabled, right?
> 
> That was the intent of my (untested) pseudo-code. It should work fine. Note
> that I used write_trylock() only, so there is no queueing (which also
> implies no fairness).
> 
> I'm not saying it's a _good_ idea.  I'm saying it might work if all you
> worry about is the irq-disabled part.

Right, if you make it unfair and utterly prone to starvation then yes,
you can make it 'work'.


Re: [PATCH v5 1/3] ARM: dts: tegra: Remove skeleton.dtsi and fix DTC warnings for /memory

2018-05-23 Thread Stefan Agner
On 23.05.2018 09:05, Krzysztof Kozlowski wrote:
> On Thu, May 17, 2018 at 1:39 PM, Stefan Agner  wrote:
>> On 17.05.2018 09:45, Krzysztof Kozlowski wrote:
>>> Remove the usage of skeleton.dtsi and add necessary properties to /memory
>>> node to fix the DTC warnings:
>>>
>>> arch/arm/boot/dts/tegra20-harmony.dtb: Warning (unit_address_vs_reg):
>>> /memory: node has a reg or ranges property, but no unit name
>>>
>>> The DTB after the change is the same as before except adding
>>> unit-address to /memory node.
>>>
>>> Signed-off-by: Krzysztof Kozlowski 
>>>
>>> ---
>>>
>>> Changes since v4:
>>> 1. None
>>> ---
>>>  arch/arm/boot/dts/tegra114-dalmore.dts  | 3 ++-
>>>  arch/arm/boot/dts/tegra114-roth.dts | 3 ++-
>>>  arch/arm/boot/dts/tegra114-tn7.dts  | 3 ++-
>>>  arch/arm/boot/dts/tegra114.dtsi | 4 ++--
>>>  arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi | 3 ++-
>>>  arch/arm/boot/dts/tegra124-apalis.dtsi  | 3 ++-
>>>  arch/arm/boot/dts/tegra124-jetson-tk1.dts   | 3 ++-
>>>  arch/arm/boot/dts/tegra124-nyan.dtsi| 3 ++-
>>>  arch/arm/boot/dts/tegra124-venice2.dts  | 3 ++-
>>>  arch/arm/boot/dts/tegra124.dtsi | 2 --
>>>  arch/arm/boot/dts/tegra20-colibri-512.dtsi  | 3 ++-
>>>  arch/arm/boot/dts/tegra20-harmony.dts   | 3 ++-
>>>  arch/arm/boot/dts/tegra20-paz00.dts | 3 ++-
>>>  arch/arm/boot/dts/tegra20-seaboard.dts  | 3 ++-
>>>  arch/arm/boot/dts/tegra20-tamonten.dtsi | 3 ++-
>>>  arch/arm/boot/dts/tegra20-trimslice.dts | 3 ++-
>>>  arch/arm/boot/dts/tegra20-ventana.dts   | 3 ++-
>>>  arch/arm/boot/dts/tegra20.dtsi  | 7 +--
>>>  arch/arm/boot/dts/tegra30-apalis.dtsi   | 5 +
>>>  arch/arm/boot/dts/tegra30-beaver.dts| 3 ++-
>>>  arch/arm/boot/dts/tegra30-cardhu.dtsi   | 3 ++-
>>>  arch/arm/boot/dts/tegra30-colibri.dtsi  | 3 ++-
>>>  arch/arm/boot/dts/tegra30.dtsi  | 7 +--
>>>  23 files changed, 53 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/arch/arm/boot/dts/tegra114-dalmore.dts
>>> b/arch/arm/boot/dts/tegra114-dalmore.dts
>>> index eafff16765b4..5cdcedfc19cb 100644
>>> --- a/arch/arm/boot/dts/tegra114-dalmore.dts
>>> +++ b/arch/arm/boot/dts/tegra114-dalmore.dts
>>> @@ -23,7 +23,8 @@
>>>   stdout-path = "serial0:115200n8";
>>>   };
>>>
>>> - memory {
>>> + memory@8000 {
>>> + device_type = "memory";
>>>   reg = <0x8000 0x4000>;
>>>   };
>>>
>>> diff --git a/arch/arm/boot/dts/tegra114-roth.dts
>>> b/arch/arm/boot/dts/tegra114-roth.dts
>>> index 7ed7370ee67a..b4f329a07c60 100644
>>> --- a/arch/arm/boot/dts/tegra114-roth.dts
>>> +++ b/arch/arm/boot/dts/tegra114-roth.dts
>>> @@ -28,7 +28,8 @@
>>>   };
>>>   };
>>>
>>> - memory {
>>> + memory@8000 {
>>> + device_type = "memory";
>>>   /* memory >= 0x7960 is reserved for firmware usage */
>>>   reg = <0x8000 0x7960>;
>>>   };
>>> diff --git a/arch/arm/boot/dts/tegra114-tn7.dts
>>> b/arch/arm/boot/dts/tegra114-tn7.dts
>>> index 7fc4a8b31e45..12092d344ce8 100644
>>> --- a/arch/arm/boot/dts/tegra114-tn7.dts
>>> +++ b/arch/arm/boot/dts/tegra114-tn7.dts
>>> @@ -28,7 +28,8 @@
>>>   };
>>>   };
>>>
>>> - memory {
>>> + memory@8000 {
>>> + device_type = "memory";
>>>   /* memory >= 0x37e0 is reserved for firmware usage */
>>>   reg = <0x8000 0x37e0>;
>>>   };
>>> diff --git a/arch/arm/boot/dts/tegra114.dtsi 
>>> b/arch/arm/boot/dts/tegra114.dtsi
>>> index 0e4a13295d8a..b917784d3f97 100644
>>> --- a/arch/arm/boot/dts/tegra114.dtsi
>>> +++ b/arch/arm/boot/dts/tegra114.dtsi
>>> @@ -5,11 +5,11 @@
>>>  #include 
>>>  #include 
>>>
>>> -#include "skeleton.dtsi"
>>> -
>>>  / {
>>>   compatible = "nvidia,tegra114";
>>>   interrupt-parent = <&lic>;
>>> + #address-cells = <1>;
>>> + #size-cells = <1>;
>>>
>>>   host1x@5000 {
>>>   compatible = "nvidia,tegra114-host1x", "simple-bus";
>>> diff --git a/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>>> b/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>>> index bb67edb016c5..80b52c612891 100644
>>> --- a/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>>> +++ b/arch/arm/boot/dts/tegra124-apalis-v1.2.dtsi
>>> @@ -15,7 +15,8 @@
>>>   compatible = "toradex,apalis-tk1-v1.2", "toradex,apalis-tk1",
>>>"nvidia,tegra124";
>>>
>>> - memory {
>>> + memory@0 {
>>> + device_type = "memory";
>>>   reg = <0x0 0x8000 0x0 0x8000>;
>>>   };
>>>
>>> diff --git a/arch/arm/boot/dts/tegra124-apalis.dtsi
>>> b/arch/arm/boot/dts/tegra124-apalis.dtsi
>>> index 65a2161b9b8e..3ca7601cafe9 100644
>>> --- a/arch/arm/boot/dts/tegra124-apalis.dtsi
>>> +++ b/arch/arm/boot/dts/tegra124-apalis.dtsi
>>> @@ -50,7 +50,8 @@
>>>   model = "Toradex Apalis TK1";
>>>   compatible = "toradex,apalis-tk1", 

Re: [PATCH 2/5] PCI/AER: Add sysfs stats for AER capable devices

2018-05-23 Thread Greg Kroah-Hartman
On Tue, May 22, 2018 at 03:28:02PM -0700, Rajat Jain wrote:
> Add the following AER sysfs stats to represent the counters for each
> kind of error as seen by the device:
> 
> dev_total_cor_errs
> dev_total_fatal_errs
> dev_total_nonfatal_errs

You need Documentation/ABI/ updates for new sysfs files please.

thanks,

greg k-h


Re: [PATCH 5/5] Documentation/PCI: Add details of PCI AER statistics

2018-05-23 Thread Greg Kroah-Hartman
On Tue, May 22, 2018 at 03:28:05PM -0700, Rajat Jain wrote:
> Add the PCI AER statistics details to
> Documentation/PCI/pcieaer-howto.txt
> 
> Signed-off-by: Rajat Jain 
> ---
>  Documentation/PCI/pcieaer-howto.txt | 35 +
>  1 file changed, 35 insertions(+)
> 
> diff --git a/Documentation/PCI/pcieaer-howto.txt 
> b/Documentation/PCI/pcieaer-howto.txt
> index acd06bb8..86ee9f9ff5e1 100644
> --- a/Documentation/PCI/pcieaer-howto.txt
> +++ b/Documentation/PCI/pcieaer-howto.txt
> @@ -73,6 +73,41 @@ In the example, 'Requester ID' means the ID of the device 
> who sends
>  the error message to root port. Pls. refer to pci express specs for
>  other fields.
>  
> +2.4 AER statistics
> +
> +When AER messages are captured, the statistics are exposed via the following
> +sysfs attributes under the "aer_stats" folder for the device:
> +
> +2.4.1 Device sysfs Attributes
> +
> +These attributes show up under all the devices that are AER capable. These
> +indicate the errors "as seen by the device". Note that this may mean that if
> +an end point is causing problems, the AER counters may increment at its link
> +partner (e.g. root port) because the errors will be "seen" by the link 
> partner
> +and not the the problematic end point itself (which may report all counters
> +as 0 as it never saw any problems).
> +
> + * dev_total_cor_errs: number of correctable errors seen by the device.
> + * dev_total_fatal_errs: number of fatal uncorrectable errors seen by the 
> device.
> + * dev_total_nonfatal_errs: number of nonfatal uncorr errors seen by the 
> device.
> + * dev_breakdown_correctable: Provides a breakdown of different type of
> +  correctable errors seen.
> + * dev_breakdown_uncorrectable: Provides a breakdown of different type of
> +  uncorrectable errors seen.
> +
> +2.4.1 Rootport sysfs Attributes
> +
> +These attributes showup under only the rootports that are AER capable. These
> +indicate the number of error messages as "reported to" the rootport. Please 
> note
> +that the rootports also transmit (internally) the ERR_* messages for errors 
> seen
> +by the internal rootport PCI device, so these counters includes them and are
> +thus cumulative of all the error messages on the PCI hierarchy originating
> +at that root port.
> +
> + * rootport_total_cor_errs: number of ERR_COR messages reported to rootport.
> + * rootport_total_fatal_errs: number of ERR_FATAL messages reported to 
> rootport.
> + * rootport_total_nonfatal_errs: number of ERR_NONFATAL messages reporeted to
> + rootport.

These all belong in Documentation/ABI/ please.

thanks,

greg k-h


Re: [PATCH RFC] schedutil: Address the r/w ordering race in kthread

2018-05-23 Thread Rafael J. Wysocki
On Wed, May 23, 2018 at 1:50 AM, Joel Fernandes (Google)
 wrote:
> Currently there is a race in schedutil code for slow-switch single-CPU
> systems. Fix it by enforcing ordering the write to work_in_progress to
> happen before the read of next_freq.
>
> Kthread   Sched update
>
> sugov_work()  sugov_update_single()
>
>   lock();
>   // The CPU is free to rearrange below
>   // two in any order, so it may clear
>   // the flag first and then read next
>   // freq. Lets assume it does.
>   work_in_progress = false
>
>if (work_in_progress)
>  return;
>
>sg_policy->next_freq = 0;
>   freq = sg_policy->next_freq;
>sg_policy->next_freq = 
> real-freq;
>   unlock();
>
> Reported-by: Viresh Kumar 
> CC: Rafael J. Wysocki 
> CC: Peter Zijlstra 
> CC: Ingo Molnar 
> CC: Patrick Bellasi 
> CC: Juri Lelli 
> Cc: Luca Abeni 
> CC: Todd Kjos 
> CC: clau...@evidence.eu.com
> CC: kernel-t...@android.com
> CC: linux...@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) 
> ---
> I split this into separate patch, because this race can also happen in
> mainline.
>
>  kernel/sched/cpufreq_schedutil.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index 5c482ec38610..ce7749da7a44 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -401,6 +401,13 @@ static void sugov_work(struct kthread_work *work)
>  */
> raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> freq = sg_policy->next_freq;
> +
> +   /*
> +* sugov_update_single can access work_in_progress without 
> update_lock,
> +* make sure next_freq is read before work_in_progress is set.
> +*/
> +   smp_mb();
> +

This requires a corresponding barrier somewhere else.

> sg_policy->work_in_progress = false;
> raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>
> --

Also, as I said I actually would prefer to use the spinlock in the
one-CPU case when the kthread is used.

I'll have a patch for that shortly.


Re: [alsa-devel] [RFC/RFT PATCH] ASoC: topology: Improve backwards compatibility with v4 topology files

2018-05-23 Thread Mark Brown
On Tue, May 22, 2018 at 02:59:35PM -0500, Pierre-Louis Bossart wrote:

> I am also not convinced by the notion that maintaining topology files is
> only a userspace/distro issue. This would mean some distros will have access
> to the required topology files, possibly enabling DSP processing
> capabilities, but other will not and will not be able to enable even basic
> playback/capture. Just like we have a basic firmware with limited
> functionality in /lib/firmware/intel, it would make sense to require a basic
> .conf file in alsa-lib for every upstream machine driver - along possibly
> with a basic UCM file so that audio works no matter what distro people use.

The point here is that people should be able to update their kernel
without updating their userspace so things have to work with whatever
they have right now - anything that relies on shipping new firmware or
configuration files to userspace is a problem.


signature.asc
Description: PGP signature


Re: [PATCH 2/5] PCI/AER: Add sysfs stats for AER capable devices

2018-05-23 Thread Greg Kroah-Hartman
On Tue, May 22, 2018 at 03:28:02PM -0700, Rajat Jain wrote:
> +#define aer_stats_aggregate_attr(field)  
>   \
> + static ssize_t \
> + field##_show(struct device *dev, struct device_attribute *attr,\
> +  char *buf)\
> +{
>   \
> + struct pci_dev *pdev = to_pci_dev(dev);\
> + return sprintf(buf, "0x%llx\n", pdev->aer_stats->field);   \
> +}
>   \

Use tabs at the end please, otherwise your trailing \ look horrid.

> +static DEVICE_ATTR_RO(field)
> +
> +aer_stats_aggregate_attr(dev_total_cor_errs);
> +aer_stats_aggregate_attr(dev_total_fatal_errs);
> +aer_stats_aggregate_attr(dev_total_nonfatal_errs);
> +
> +static struct attribute *aer_stats_attrs[] __ro_after_init = {
> + &dev_attr_dev_total_cor_errs.attr,
> + &dev_attr_dev_total_fatal_errs.attr,
> + &dev_attr_dev_total_nonfatal_errs.attr,
> + NULL
> +};
> +
> +static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
> +struct attribute *a, int n)
> +{
> + struct device *dev = kobj_to_dev(kobj);
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pdev->aer_stats)
> + return 0;
> +
> + return a->mode;
> +}
> +
> +const struct attribute_group aer_stats_attr_group = {
> + .name  = "aer_stats",
> + .attrs  = aer_stats_attrs,
> + .is_visible = aer_stats_attrs_are_visible,
> +};
> +
> +void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
> +{
> + int status, i, max = -1;
> + u64 *counter = NULL;
> + struct aer_stats *aer_stats = pdev->aer_stats;
> +
> + if (unlikely(!aer_stats))
> + return;

Can you measure the speed difference with and without that unlikely()
macro?  If not, please don't use it.  Hint, the cpu and compiler are
always always better at this than we are...

thanks,

greg k-h


Re: [PATCH 3/5] PCP/AER: Add sysfs attributes to provide breakdown of AERs

2018-05-23 Thread Greg Kroah-Hartman
On Tue, May 22, 2018 at 03:28:03PM -0700, Rajat Jain wrote:
> Add sysfs attributes to provide breakdown of the AERs seen,
> into different type of correctable or uncorrectable errors:
> 
> dev_breakdown_correctable
> dev_breakdown_uncorrectable
> 
> Signed-off-by: Rajat Jain 
> ---
>  drivers/pci/pcie/aer/aerdrv.h  |  6 ++
>  drivers/pci/pcie/aer/aerdrv_errprint.c |  6 --
>  drivers/pci/pcie/aer/aerdrv_stats.c| 25 +
>  3 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index b5d5ad6f2c03..048fbd7c9633 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -89,6 +89,12 @@ int pci_aer_stats_init(struct pci_dev *pdev);
>  void pci_aer_stats_exit(struct pci_dev *pdev);
>  void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info);
>  
> +extern const char
> +*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS];
> +
> +extern const char
> +*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS];
> +
>  #ifdef CONFIG_ACPI_APEI
>  int pcie_aer_get_firmware_first(struct pci_dev *pci_dev);
>  #else
> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c 
> b/drivers/pci/pcie/aer/aerdrv_errprint.c
> index 5e8b98deda08..5585f309f1a8 100644
> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> @@ -68,7 +68,8 @@ static const char *aer_error_layer[] = {
>   "Transaction Layer"
>  };
>  
> -static const char *aer_correctable_error_string[] = {
> +const char
> +*aer_correctable_error_string[AER_MAX_TYPEOF_CORRECTABLE_ERRS] = {
>   "Receiver Error",   /* Bit Position 0   */
>   NULL,
>   NULL,
> @@ -87,7 +88,8 @@ static const char *aer_correctable_error_string[] = {
>   "Header Log Overflow",  /* Bit Position 15  */
>  };
>  
> -static const char *aer_uncorrectable_error_string[] = {
> +const char
> +*aer_uncorrectable_error_string[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS] = {
>   "Undefined",/* Bit Position 0   */
>   NULL,
>   NULL,
> diff --git a/drivers/pci/pcie/aer/aerdrv_stats.c 
> b/drivers/pci/pcie/aer/aerdrv_stats.c
> index 87b7119d0a86..5f0a6e144f56 100644
> --- a/drivers/pci/pcie/aer/aerdrv_stats.c
> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
> @@ -61,10 +61,35 @@ aer_stats_aggregate_attr(dev_total_cor_errs);
>  aer_stats_aggregate_attr(dev_total_fatal_errs);
>  aer_stats_aggregate_attr(dev_total_nonfatal_errs);
>  
> +#define aer_stats_breakdown_attr(field, stats_array, strings_array)  
>   \
> + static ssize_t \
> + field##_show(struct device *dev, struct device_attribute *attr,\
> +  char *buf)\
> +{
>   \
> + unsigned int i;\
> + char *str = buf;   \
> + struct pci_dev *pdev = to_pci_dev(dev);\
> + u64 *stats = pdev->aer_stats->stats_array; \
> + for (i = 0; i < ARRAY_SIZE(strings_array); i++) {  \
> + if (strings_array[i])  \
> + str += sprintf(str, "%s = 0x%llx\n",   \
> +strings_array[i], stats[i]);\
> + }  \
> + return str-buf;\
> +}
>   \

Again with the tabs instead of spaces please.

thanks,

greg k-h


Re: [PATCH 6/6] scsi: Check sense buffer size at build time

2018-05-23 Thread Sergei Shtylyov

Hello!

On 5/22/2018 9:15 PM, Kees Cook wrote:


To avoid introducing problems like those fixed in commit f7068114d45e
("sr: pass down correctly sized SCSI sense buffer"), this creates a macro
wrapper for scsi_execute() that verifies the size of the sense buffer
similar to what was done for command string sizes in commit 3756f6401c30
("exec: avoid gcc-8 warning for get_task_comm").

Another solution could be to add another argument to scsi_execute(),
but this function already takes a lot of arguments and Jens was not fond
of that approach. As there was only a pair of dynamically allocated sense
buffers, this also moves those 96 bytes onto the stack to avoid triggering
the sizeof() check.

Signed-off-by: Kees Cook 
---
  drivers/scsi/scsi_lib.c|  6 +++---
  include/scsi/scsi_device.h | 12 +++-
  2 files changed, 14 insertions(+), 4 deletions(-)


[...]

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7ae177c8e399..1bb87b6c0ad2 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -426,11 +426,21 @@ extern const char *scsi_device_state_name(enum 
scsi_device_state);
  extern int scsi_is_sdev_device(const struct device *);
  extern int scsi_is_target_device(const struct device *);
  extern void scsi_sanitize_inquiry_string(unsigned char *s, int len);
-extern int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
+extern int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
int data_direction, void *buffer, unsigned bufflen,
unsigned char *sense, struct scsi_sense_hdr *sshdr,
int timeout, int retries, u64 flags,
req_flags_t rq_flags, int *resid);
+/* Make sure any sense buffer is the correct size. */
+#define scsi_execute(sdev, cmd, data_direction, buffer, bufflen, sense,
\
+sshdr, timeout, retries, flags, rq_flags, resid)   \
+({ \
+   BUILD_BUG_ON((sense) != NULL && \
+sizeof(sense) != SCSI_SENSE_BUFFERSIZE);   \


   This would only check the size of the 'sense' pointer, no?


+   __scsi_execute(sdev, cmd, data_direction, buffer, bufflen,  \
+  sense, sshdr, timeout, retries, flags, rq_flags, \
+  resid);  \
+})
  static inline int scsi_execute_req(struct scsi_device *sdev,
const unsigned char *cmd, int data_direction, void *buffer,
unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout,


MBR, Sergei


[PATCH -mm -V3 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-05-23 Thread Huang, Ying
From: Huang Ying 

Hi, Andrew, could you help me to check whether the overall design is
reasonable?

Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset?  Especially [02/21], [03/21], [04/21],
[05/21], [06/21], [07/21], [08/21], [09/21], [10/21], [11/21],
[12/21], [20/21].

Hi, Andrea and Kirill, could you help me to review the THP part of the
patchset?  Especially [01/21], [07/21], [09/21], [11/21], [13/21],
[15/21], [16/21], [17/21], [18/21], [19/21], [20/21], [21/21].

Hi, Johannes and Michal, could you help me to review the cgroup part
of the patchset?  Especially [14/21].

And for all, Any comment is welcome!

This patchset is based on the 2018-05-18 head of mmotm/master.

This is the final step of THP (Transparent Huge Page) swap
optimization.  After the first and second step, the splitting huge
page is delayed from almost the first step of swapout to after swapout
has been finished.  In this step, we avoid splitting THP for swapout
and swapout/swapin the THP in one piece.

We tested the patchset with vm-scalability benchmark swap-w-seq test
case, with 16 processes.  The test case forks 16 processes.  Each
process allocates large anonymous memory range, and writes it from
begin to end for 8 rounds.  The first round will swapout, while the
remaining rounds will swapin and swapout.  The test is done on a Xeon
E5 v3 system, the swap device used is a RAM simulated PMEM (persistent
memory) device.  The test result is as follow,

base  optimized
 -- 
 %stddev %change %stddev
 \  |\  
   1417897 ±  2%+992.8%   15494673vm-scalability.throughput
   1020489 ±  4%   +1091.2%   12156349vmstat.swap.si
   1255093 ±  3%+940.3%   13056114vmstat.swap.so
   1259769 ±  7%   +1818.3%   24166779meminfo.AnonHugePages
  28021761   -10.7%   25018848 ±  2%  meminfo.AnonPages
  64080064 ±  4% -95.6%2787565 ± 33%  
interrupts.CAL:Function_call_interrupts
 13.91 ±  5% -13.80.10 ± 27%  
perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath

Where, the score of benchmark (bytes written per second) improved
992.8%.  The swapout/swapin throughput improved 1008% (from about
2.17GB/s to 24.04GB/s).  The performance difference is huge.  In base
kernel, for the first round of writing, the THP is swapout and split,
so in the remaining rounds, there is only normal page swapin and
swapout.  While in optimized kernel, the THP is kept after first
swapout, so THP swapin and swapout is used in the remaining rounds.
This shows the key benefit to swapout/swapin THP in one piece, the THP
will be kept instead of being split.  meminfo information verified
this, in base kernel only 4.5% of anonymous page are THP during the
test, while in optimized kernel, that is 96.6%.  The TLB flushing IPI
(represented as interrupts.CAL:Function_call_interrupts) reduced
95.6%, while cycles for spinlock reduced from 13.9% to 0.1%.  These
are performance benefit of THP swapout/swapin too.

Below is the description for all steps of THP swap optimization.

Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swapping even on a high-end server machine.  Because the
performance of the storage device improved faster than that of single
logical CPU.  And it seems that the trend will not change in the near
future.  On the other hand, the THP becomes more and more popular
because of increased memory size.  So it becomes necessary to optimize
THP swap performance.

The advantages to swapout/swapin a THP in one piece include:

- Batch various swap operations for the THP.  Many operations need to
  be done once per THP instead of per normal page, for example,
  allocating/freeing the swap space, writing/reading the swap space,
  flushing TLB, page fault, etc.  This will improve the performance of
  the THP swap greatly.

- The THP swap space read/write will be large sequential IO (2M on
  x86_64).  It is particularly helpful for the swapin, which are
  usually 4k random IO.  This will improve the performance of the THP
  swap too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The THP order pages will be free
  up after THP swapout.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapout, it will take quite long time for the normal pages to
  collapse back into the THP after being swapin.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swapin, mainly because possible
enlarged read/write IO size (for swapout/swapin) may put more overhead
on th

[PATCH -mm -V3 04/21] mm, THP, swap: Support PMD swap mapping in swapcache_free_cluster()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

Previously, during swapout, all PMD page mapping will be split and
replaced with PTE swap mapping.  And when clearing the SWAP_HAS_CACHE
flag for the huge swap cluster in swapcache_free_cluster(), the huge
swap cluster will be split.  Now, during swapout, the PMD page mapping
will be changed to PMD swap mapping.  So when clearing the
SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the
PMD swap mapping count is 0.  Otherwise, we will keep it as the huge
swap cluster.  So that we can swapin a THP as a whole later.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/swapfile.c | 41 ++---
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 075048032383..8dbc0f9b2f90 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -514,6 +514,18 @@ static void dec_cluster_info_page(struct swap_info_struct 
*p,
free_cluster(p, idx);
 }
 
+#ifdef CONFIG_THP_SWAP
+static inline int cluster_swapcount(struct swap_cluster_info *ci)
+{
+   if (!ci || !cluster_is_huge(ci))
+   return 0;
+
+   return cluster_count(ci) - SWAPFILE_CLUSTER;
+}
+#else
+#define cluster_swapcount(ci)  0
+#endif
+
 /*
  * It's possible scan_swap_map() uses a free cluster in the middle of free
  * cluster list. Avoiding such abuse to avoid list corruption.
@@ -905,6 +917,7 @@ static void swap_free_cluster(struct swap_info_struct *si, 
unsigned long idx)
struct swap_cluster_info *ci;
 
ci = lock_cluster(si, offset);
+   memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER);
cluster_set_count_flag(ci, 0, 0);
free_cluster(si, idx);
unlock_cluster(ci);
@@ -1288,24 +1301,30 @@ static void swapcache_free_cluster(swp_entry_t entry)
 
ci = lock_cluster(si, offset);
VM_BUG_ON(!cluster_is_huge(ci));
+   VM_BUG_ON(!is_cluster_offset(offset));
+   VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER);
map = si->swap_map + offset;
-   for (i = 0; i < SWAPFILE_CLUSTER; i++) {
-   val = map[i];
-   VM_BUG_ON(!(val & SWAP_HAS_CACHE));
-   if (val == SWAP_HAS_CACHE)
-   free_entries++;
+   if (!cluster_swapcount(ci)) {
+   for (i = 0; i < SWAPFILE_CLUSTER; i++) {
+   val = map[i];
+   VM_BUG_ON(!(val & SWAP_HAS_CACHE));
+   if (val == SWAP_HAS_CACHE)
+   free_entries++;
+   }
+   if (free_entries != SWAPFILE_CLUSTER)
+   cluster_clear_huge(ci);
}
if (!free_entries) {
-   for (i = 0; i < SWAPFILE_CLUSTER; i++)
-   map[i] &= ~SWAP_HAS_CACHE;
+   for (i = 0; i < SWAPFILE_CLUSTER; i++) {
+   val = map[i];
+   VM_BUG_ON(!(val & SWAP_HAS_CACHE) ||
+ val == SWAP_HAS_CACHE);
+   map[i] = val & ~SWAP_HAS_CACHE;
+   }
}
-   cluster_clear_huge(ci);
unlock_cluster(ci);
if (free_entries == SWAPFILE_CLUSTER) {
spin_lock(&si->lock);
-   ci = lock_cluster(si, offset);
-   memset(map, 0, SWAPFILE_CLUSTER);
-   unlock_cluster(ci);
mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER);
swap_free_cluster(si, idx);
spin_unlock(&si->lock);
-- 
2.16.1



[PATCH -mm -V3 07/21] mm, THP, swap: Support PMD swap mapping in split_swap_cluster()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

When splitting a THP in swap cache or failing to allocate a THP when
swapin a huge swap cluster, the huge swap cluster will be split.  In
addition to clear the huge flag of the swap cluster, the PMD swap
mapping count recorded in cluster_count() will be set to 0.  But we
will not touch PMD swap mappings themselves, because it is hard to
find them all sometimes.  When the PMD swap mappings are operated
later, it will be found that the huge swap cluster has been split and
the PMD swap mappings will be split at that time.

Unless splitting a THP in swap cache (specified via "force"
parameter), split_swap_cluster() will return -EEXIST if there is
SWAP_HAS_CACHE flag in swap_map[offset].  Because this indicates there
is a THP corresponds to this huge swap cluster, and it isn't desired
to split the THP.

When splitting a THP in swap cache, the position to call
split_swap_cluster() is changed to before unlocking sub-pages.  So
that all sub-pages will be kept locked from the THP has been split to
the huge swap cluster is split.  This makes the code much easier to be
reasoned.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/linux/swap.h |  4 ++--
 mm/huge_memory.c | 18 --
 mm/swapfile.c| 45 ++---
 3 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index bb9de2cb952a..878f132dabc0 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -617,10 +617,10 @@ static inline swp_entry_t get_swap_page(struct page *page)
 #endif /* CONFIG_SWAP */
 
 #ifdef CONFIG_THP_SWAP
-extern int split_swap_cluster(swp_entry_t entry);
+extern int split_swap_cluster(swp_entry_t entry, bool force);
 extern int split_swap_cluster_map(swp_entry_t entry);
 #else
-static inline int split_swap_cluster(swp_entry_t entry)
+static inline int split_swap_cluster(swp_entry_t entry, bool force)
 {
return 0;
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 84d5d8ff869e..e363e13f6751 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2502,6 +2502,17 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
 
unfreeze_page(head);
 
+   /*
+* Split swap cluster before unlocking sub-pages.  So all
+* sub-pages will be kept locked from THP has been split to
+* swap cluster is split.
+*/
+   if (PageSwapCache(head)) {
+   swp_entry_t entry = { .val = page_private(head) };
+
+   split_swap_cluster(entry, true);
+   }
+
for (i = 0; i < HPAGE_PMD_NR; i++) {
struct page *subpage = head + i;
if (subpage == page)
@@ -2728,12 +2739,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
__dec_node_page_state(page, NR_SHMEM_THPS);
spin_unlock(&pgdata->split_queue_lock);
__split_huge_page(page, list, flags);
-   if (PageSwapCache(head)) {
-   swp_entry_t entry = { .val = page_private(head) };
-
-   ret = split_swap_cluster(entry);
-   } else
-   ret = 0;
+   ret = 0;
} else {
if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
pr_alert("total_mapcount: %u, page_count(): %u\n",
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 05f53c4c0cfe..1e723d3a9a6f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1414,21 +1414,6 @@ static void swapcache_free_cluster(swp_entry_t entry)
}
}
 }
-
-int split_swap_cluster(swp_entry_t entry)
-{
-   struct swap_info_struct *si;
-   struct swap_cluster_info *ci;
-   unsigned long offset = swp_offset(entry);
-
-   si = _swap_info_get(entry);
-   if (!si)
-   return -EBUSY;
-   ci = lock_cluster(si, offset);
-   cluster_clear_huge(ci);
-   unlock_cluster(ci);
-   return 0;
-}
 #else
 static inline void swapcache_free_cluster(swp_entry_t entry)
 {
@@ -4072,6 +4057,36 @@ int split_swap_cluster_map(swp_entry_t entry)
unlock_cluster(ci);
return 0;
 }
+
+int split_swap_cluster(swp_entry_t entry, bool force)
+{
+   struct swap_info_struct *si;
+   struct swap_cluster_info *ci;
+   unsigned long offset = swp_offset(entry);
+   int ret = 0;
+
+   si = get_swap_device(entry);
+   if (!si)
+   return -EINVAL;
+   ci = lock_cluster(si, offset);
+   /* The swap cluster has been split by someone else */
+   if (!cluster_is_huge(ci))
+   goto out;
+   VM_BUG_ON(!is_cluster_offset(offset));
+   VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER);
+   /* If not forced, don't split swap 

[PATCH -mm -V3 16/21] mm, THP, swap: Free PMD swap mapping when zap_huge_pmd()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call
free_swap_and_cache() to decrease the swap reference count and maybe
free or split the huge swap cluster and the THP in swap cache.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/huge_memory.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 01fdd59fe6d4..e057b966ea68 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2007,7 +2007,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct 
vm_area_struct *vma,
spin_unlock(ptl);
if (is_huge_zero_pmd(orig_pmd))
tlb_remove_page_size(tlb, pmd_page(orig_pmd), 
HPAGE_PMD_SIZE);
-   } else if (is_huge_zero_pmd(orig_pmd)) {
+   } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) {
zap_deposited_table(tlb->mm, pmd);
spin_unlock(ptl);
tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
@@ -2020,17 +2020,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct 
vm_area_struct *vma,
page_remove_rmap(page, true);
VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
VM_BUG_ON_PAGE(!PageHead(page), page);
-   } else if (thp_migration_supported()) {
-   swp_entry_t entry;
-
-   VM_BUG_ON(!is_pmd_migration_entry(orig_pmd));
-   entry = pmd_to_swp_entry(orig_pmd);
-   page = pfn_to_page(swp_offset(entry));
+   } else {
+   swp_entry_t entry = pmd_to_swp_entry(orig_pmd);
+
+   if (thp_migration_supported() &&
+   is_migration_entry(entry))
+   page = pfn_to_page(swp_offset(entry));
+   else if (thp_swap_supported() &&
+!non_swap_entry(entry))
+   free_swap_and_cache(entry, true);
+   else {
+   WARN_ONCE(1,
+"Non present huge pmd without pmd migration or swap enabled!");
+   goto unlock;
+   }
flush_needed = 0;
-   } else
-   WARN_ONCE(1, "Non present huge pmd without pmd 
migration enabled!");
+   }
 
-   if (PageAnon(page)) {
+   if (!page) {
+   zap_deposited_table(tlb->mm, pmd);
+   add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR);
+   } else if (PageAnon(page)) {
zap_deposited_table(tlb->mm, pmd);
add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
} else {
@@ -2038,7 +2048,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct 
vm_area_struct *vma,
zap_deposited_table(tlb->mm, pmd);
add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
}
-
+unlock:
spin_unlock(ptl);
if (flush_needed)
tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE);
-- 
2.16.1



Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices

2018-05-23 Thread Greg Kroah-Hartman
On Tue, May 22, 2018 at 03:28:01PM -0700, Rajat Jain wrote:
> --- /dev/null
> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *  Copyright (C) 2018 Google Inc, All Rights Reserved.
> + *  Rajat Jain (raja...@google.com)

Google has the copyright, not you, right?  You might want to make that a
bit more explicit by putting a blank line somewhere here...

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.

If you have a SPDX line, you do not need this paragraph.  Please drop it
so we don't have to delete it later on.

thanks,

greg k-h


[PATCH -mm -V3 15/21] mm, THP, swap: Support to copy PMD swap mapping when fork()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

During fork, the page table need to be copied from parent to child.  A
PMD swap mapping need to be copied too and the swap reference count
need to be increased.

When the huge swap cluster has been split already, we need to split
the PMD swap mapping and fallback to PTE copying.

When swap count continuation failed to allocate a page with
GFP_ATOMIC, we need to unlock the spinlock and try again with
GFP_KERNEL.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/huge_memory.c | 72 
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c4eb7737b313..01fdd59fe6d4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -941,6 +941,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
if (unlikely(!pgtable))
goto out;
 
+retry:
dst_ptl = pmd_lock(dst_mm, dst_pmd);
src_ptl = pmd_lockptr(src_mm, src_pmd);
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
@@ -948,26 +949,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
ret = -EAGAIN;
pmd = *src_pmd;
 
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
if (unlikely(is_swap_pmd(pmd))) {
swp_entry_t entry = pmd_to_swp_entry(pmd);
 
-   VM_BUG_ON(!is_pmd_migration_entry(pmd));
-   if (is_write_migration_entry(entry)) {
-   make_migration_entry_read(&entry);
-   pmd = swp_entry_to_pmd(entry);
-   if (pmd_swp_soft_dirty(*src_pmd))
-   pmd = pmd_swp_mksoft_dirty(pmd);
-   set_pmd_at(src_mm, addr, src_pmd, pmd);
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+   if (is_migration_entry(entry)) {
+   if (is_write_migration_entry(entry)) {
+   make_migration_entry_read(&entry);
+   pmd = swp_entry_to_pmd(entry);
+   if (pmd_swp_soft_dirty(*src_pmd))
+   pmd = pmd_swp_mksoft_dirty(pmd);
+   set_pmd_at(src_mm, addr, src_pmd, pmd);
+   }
+   add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+   mm_inc_nr_ptes(dst_mm);
+   pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
+   set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+   ret = 0;
+   goto out_unlock;
}
-   add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
-   mm_inc_nr_ptes(dst_mm);
-   pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
-   set_pmd_at(dst_mm, addr, dst_pmd, pmd);
-   ret = 0;
-   goto out_unlock;
-   }
 #endif
+   if (thp_swap_supported() && !non_swap_entry(entry)) {
+   ret = swap_duplicate(&entry, true);
+   if (!ret) {
+   add_mm_counter(dst_mm, MM_SWAPENTS,
+  HPAGE_PMD_NR);
+   mm_inc_nr_ptes(dst_mm);
+   pgtable_trans_huge_deposit(dst_mm, dst_pmd,
+  pgtable);
+   set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+   /* make sure dst_mm is on swapoff's mmlist. */
+   if (unlikely(list_empty(&dst_mm->mmlist))) {
+   spin_lock(&mmlist_lock);
+   if (list_empty(&dst_mm->mmlist))
+   list_add(&dst_mm->mmlist,
+&src_mm->mmlist);
+   spin_unlock(&mmlist_lock);
+   }
+   } else if (ret == -ENOTDIR) {
+   /*
+* The swap cluster has been split, split the
+* pmd map now
+*/
+   __split_huge_swap_pmd(vma, addr, src_pmd);
+   pte_free(dst_mm, pgtable);
+   } else if (ret == -ENOMEM) {
+   spin_unlock(src_ptl);
+   spin_unlock(dst_ptl);
+   ret = add_swap_count_continuation(entry,
+ GFP_KERNEL);
+   if (ret < 0) {
+   

[PATCH -mm -V3 10/21] mm, THP, swap: Support to count THP swapin and its fallback

2018-05-23 Thread Huang, Ying
From: Huang Ying 

2 new /proc/vmstat fields are added, "thp_swapin" and
"thp_swapin_fallback" to count swapin a THP from swap device as a
whole and fallback to normal page swapin.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 Documentation/vm/transhuge.rst | 10 +-
 include/linux/vm_event_item.h  |  2 ++
 mm/huge_memory.c   |  4 +++-
 mm/page_io.c   | 15 ---
 mm/vmstat.c|  2 ++
 5 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/Documentation/vm/transhuge.rst b/Documentation/vm/transhuge.rst
index 2c6867fca6ff..a87b1d880cd4 100644
--- a/Documentation/vm/transhuge.rst
+++ b/Documentation/vm/transhuge.rst
@@ -360,10 +360,18 @@ thp_swpout
piece without splitting.
 
 thp_swpout_fallback
-   is incremented if a huge page has to be split before swapout.
+   is incremented if a huge page is split before swapout.
Usually because failed to allocate some continuous swap space
for the huge page.
 
+thp_swpin
+   is incremented every time a huge page is swapin in one piece
+   without splitting.
+
+thp_swpin_fallback
+   is incremented if a huge page is split during swapin.  Usually
+   because failed to allocate a huge page.
+
 As the system ages, allocating huge pages may be expensive as the
 system uses memory compaction to copy data around memory to free a
 huge page for use. There are some counters in ``/proc/vmstat`` to help
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 5c7f010676a7..7b438548a78e 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -88,6 +88,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
THP_ZERO_PAGE_ALLOC_FAILED,
THP_SWPOUT,
THP_SWPOUT_FALLBACK,
+   THP_SWPIN,
+   THP_SWPIN_FALLBACK,
 #endif
 #ifdef CONFIG_MEMORY_BALLOON
BALLOON_INFLATE,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8303fa021c42..c2437914c632 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1664,8 +1664,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t 
orig_pmd)
/* swapoff occurs under us */
} else if (ret == -EINVAL)
ret = 0;
-   else
+   else {
+   count_vm_event(THP_SWPIN_FALLBACK);
goto fallback;
+   }
}
delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
goto out;
diff --git a/mm/page_io.c b/mm/page_io.c
index b41cf9644585..96277058681e 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -347,6 +347,15 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
return ret;
 }
 
+static inline void count_swpin_vm_event(struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   if (unlikely(PageTransHuge(page)))
+   count_vm_event(THP_SWPIN);
+#endif
+   count_vm_events(PSWPIN, hpage_nr_pages(page));
+}
+
 int swap_readpage(struct page *page, bool synchronous)
 {
struct bio *bio;
@@ -370,7 +379,7 @@ int swap_readpage(struct page *page, bool synchronous)
 
ret = mapping->a_ops->readpage(swap_file, page);
if (!ret)
-   count_vm_event(PSWPIN);
+   count_swpin_vm_event(page);
return ret;
}
 
@@ -381,7 +390,7 @@ int swap_readpage(struct page *page, bool synchronous)
unlock_page(page);
}
 
-   count_vm_event(PSWPIN);
+   count_swpin_vm_event(page);
return 0;
}
 
@@ -400,7 +409,7 @@ int swap_readpage(struct page *page, bool synchronous)
get_task_struct(current);
bio->bi_private = current;
bio_set_op_attrs(bio, REQ_OP_READ, 0);
-   count_vm_event(PSWPIN);
+   count_swpin_vm_event(page);
bio_get(bio);
qc = submit_bio(bio);
while (synchronous) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 75eda9c2b260..259c7bddbb6e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1263,6 +1263,8 @@ const char * const vmstat_text[] = {
"thp_zero_page_alloc_failed",
"thp_swpout",
"thp_swpout_fallback",
+   "thp_swpin",
+   "thp_swpin_fallback",
 #endif
 #ifdef CONFIG_MEMORY_BALLOON
"balloon_inflate",
-- 
2.16.1



[PATCH -mm -V3 13/21] mm, THP, swap: Support PMD swap mapping in madvise_free()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

When madvise_free() found a PMD swap mapping, if only part of the huge
swap cluster is operated on, the PMD swap mapping will be split and
fallback to PTE swap mapping processing.  Otherwise, if all huge swap
cluster is operated on, free_swap_and_cache() will be called to
decrease the PMD swap mapping count and probably free the swap space
and the THP in swap cache too.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/huge_memory.c | 50 +++---
 mm/madvise.c |  2 +-
 2 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 668d77cec14d..a8af2ddc578a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1842,6 +1842,15 @@ static inline void __split_huge_swap_pmd(struct 
vm_area_struct *vma,
 }
 #endif
 
+static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
+{
+   pgtable_t pgtable;
+
+   pgtable = pgtable_trans_huge_withdraw(mm, pmd);
+   pte_free(mm, pgtable);
+   mm_dec_nr_ptes(mm);
+}
+
 /*
  * Return true if we do MADV_FREE successfully on entire pmd page.
  * Otherwise, return false.
@@ -1862,15 +1871,35 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, 
struct vm_area_struct *vma,
goto out_unlocked;
 
orig_pmd = *pmd;
-   if (is_huge_zero_pmd(orig_pmd))
-   goto out;
-
if (unlikely(!pmd_present(orig_pmd))) {
-   VM_BUG_ON(thp_migration_supported() &&
- !is_pmd_migration_entry(orig_pmd));
-   goto out;
+   swp_entry_t entry = pmd_to_swp_entry(orig_pmd);
+
+   if (is_migration_entry(entry)) {
+   VM_BUG_ON(!thp_migration_supported());
+   goto out;
+   } else if (thp_swap_supported() && !non_swap_entry(entry)) {
+   /* If part of THP is discarded */
+   if (next - addr != HPAGE_PMD_SIZE) {
+   unsigned long haddr = addr & HPAGE_PMD_MASK;
+
+   __split_huge_swap_pmd(vma, haddr, pmd);
+   goto out;
+   }
+   free_swap_and_cache(entry, true);
+   pmd_clear(pmd);
+   zap_deposited_table(mm, pmd);
+   if (current->mm == mm)
+   sync_mm_rss(mm);
+   add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR);
+   ret = true;
+   goto out;
+   } else
+   VM_BUG_ON(1);
}
 
+   if (is_huge_zero_pmd(orig_pmd))
+   goto out;
+
page = pmd_page(orig_pmd);
/*
 * If other processes are mapping this page, we couldn't discard
@@ -1916,15 +1945,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, 
struct vm_area_struct *vma,
return ret;
 }
 
-static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
-{
-   pgtable_t pgtable;
-
-   pgtable = pgtable_trans_huge_withdraw(mm, pmd);
-   pte_free(mm, pgtable);
-   mm_dec_nr_ptes(mm);
-}
-
 int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 pmd_t *pmd, unsigned long addr)
 {
diff --git a/mm/madvise.c b/mm/madvise.c
index d18c626b..e03e85a20fb4 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
unsigned long next;
 
next = pmd_addr_end(addr, end);
-   if (pmd_trans_huge(*pmd))
+   if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd))
if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next))
goto next;
 
-- 
2.16.1



[PATCH -mm -V3 20/21] mm, THP, swap: create PMD swap mapping when unmap the THP

2018-05-23 Thread Huang, Ying
From: Huang Ying 

This is the final step of the THP swapin support.  When reclaiming a
anonymous THP, after allocating the huge swap cluster and add the THP
into swap cache, the PMD page mapping will be changed to the mapping
to the swap space.  Previously, the PMD page mapping will be split
before being changed.  In this patch, the unmap code is enhanced not
to split the PMD mapping, but create a PMD swap mapping to replace it
instead.  So later when clear the SWAP_HAS_CACHE flag in the last step
of swapout, the huge swap cluster will be kept instead of being split,
and when swapin, the huge swap cluster will be read as a whole into a
THP.  That is, the THP will not be split during swapout/swapin.  This
can eliminate the overhead of splitting/collapsing, and reduce the
page fault count, etc.  But more important, the utilization of THP is
improved greatly, that is, much more THP will be kept when swapping is
used, so that we can take full advantage of THP including its high
performance for swapout/swapin.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/linux/huge_mm.h | 11 +++
 mm/huge_memory.c| 30 ++
 mm/rmap.c   | 43 +--
 mm/vmscan.c |  6 +-
 4 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 5001c28b3d18..d03fcddcc42d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -404,6 +404,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct 
vm_area_struct *vma)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+struct page_vma_mapped_walk;
+
 #ifdef CONFIG_THP_SWAP
 extern void __split_huge_swap_pmd(struct vm_area_struct *vma,
  unsigned long haddr,
@@ -411,6 +413,8 @@ extern void __split_huge_swap_pmd(struct vm_area_struct 
*vma,
 extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
   unsigned long address, pmd_t orig_pmd);
 extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
+extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw,
+   struct page *page, unsigned long address, pmd_t pmdval);
 
 static inline bool transparent_hugepage_swapin_enabled(
struct vm_area_struct *vma)
@@ -452,6 +456,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault 
*vmf, pmd_t orig_pmd)
return 0;
 }
 
+static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw,
+ struct page *page, unsigned long address,
+ pmd_t pmdval)
+{
+   return false;
+}
+
 static inline bool transparent_hugepage_swapin_enabled(
struct vm_area_struct *vma)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e80d03c2412a..88984e95b9b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1876,6 +1876,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t 
orig_pmd)
count_vm_event(THP_SWPIN_FALLBACK);
goto fallback;
 }
+
+bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page,
+   unsigned long address, pmd_t pmdval)
+{
+   struct vm_area_struct *vma = pvmw->vma;
+   struct mm_struct *mm = vma->vm_mm;
+   pmd_t swp_pmd;
+   swp_entry_t entry = { .val = page_private(page) };
+
+   if (swap_duplicate(&entry, true) < 0) {
+   set_pmd_at(mm, address, pvmw->pmd, pmdval);
+   return false;
+   }
+   if (list_empty(&mm->mmlist)) {
+   spin_lock(&mmlist_lock);
+   if (list_empty(&mm->mmlist))
+   list_add(&mm->mmlist, &init_mm.mmlist);
+   spin_unlock(&mmlist_lock);
+   }
+   add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+   add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR);
+   swp_pmd = swp_entry_to_pmd(entry);
+   if (pmd_soft_dirty(pmdval))
+   swp_pmd = pmd_swp_mksoft_dirty(swp_pmd);
+   set_pmd_at(mm, address, pvmw->pmd, swp_pmd);
+
+   page_remove_rmap(page, true);
+   put_page(page);
+   return true;
+}
 #endif
 
 static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
diff --git a/mm/rmap.c b/mm/rmap.c
index 5f45d6325c40..4861b1a86e2a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1402,12 +1402,51 @@ static bool try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
continue;
}
 
+   address = pvmw.address;
+
+#ifdef CONFIG_THP_SWAP
+   /* PMD-mapped THP swap entry */
+   if (thp_swap_supported() && !pvmw.pte && PageAnon(page)) {
+   pmd_t pmdval;
+
+   VM_BUG_

[PATCH -mm -V3 18/21] mm, THP, swap: Support PMD swap mapping in mincore()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

During mincore(), for PMD swap mapping, swap cache will be looked up.
If the resulting page isn't compound page, the PMD swap mapping will
be split and fallback to PTE swap mapping processing.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/mincore.c | 37 +++--
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index a66f2052c7b1..897dd2c187e8 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, 
unsigned long addr,
  * and is up to date; i.e. that no page-in operation would be required
  * at this time if an application were to map and access this page.
  */
-static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
+static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff,
+ bool *compound)
 {
unsigned char present = 0;
struct page *page;
@@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space 
*mapping, pgoff_t pgoff)
 #endif
if (page) {
present = PageUptodate(page);
+   if (compound)
+   *compound = PageCompound(page);
put_page(page);
}
 
@@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, 
unsigned long end,
 
pgoff = linear_page_index(vma, addr);
for (i = 0; i < nr; i++, pgoff++)
-   vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff);
+   vec[i] = mincore_page(vma->vm_file->f_mapping,
+ pgoff, NULL);
} else {
for (i = 0; i < nr; i++)
vec[i] = 0;
@@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long 
addr, unsigned long end,
pte_t *ptep;
unsigned char *vec = walk->private;
int nr = (end - addr) >> PAGE_SHIFT;
+   swp_entry_t entry;
 
ptl = pmd_trans_huge_lock(pmd, vma);
if (ptl) {
-   memset(vec, 1, nr);
+   unsigned char val = 1;
+   bool compound;
+
+   if (thp_swap_supported() && is_swap_pmd(*pmd)) {
+   entry = pmd_to_swp_entry(*pmd);
+   if (!non_swap_entry(entry)) {
+   val = mincore_page(swap_address_space(entry),
+  swp_offset(entry),
+  &compound);
+   /*
+* The huge swap cluster has been
+* split under us
+*/
+   if (!compound) {
+   __split_huge_swap_pmd(vma, addr, pmd);
+   spin_unlock(ptl);
+   goto fallback;
+   }
+   }
+   }
+   memset(vec, val, nr);
spin_unlock(ptl);
goto out;
}
 
+fallback:
if (pmd_trans_unstable(pmd)) {
__mincore_unmapped_range(addr, end, vma, vec);
goto out;
@@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long 
addr, unsigned long end,
else if (pte_present(pte))
*vec = 1;
else { /* pte is a swap entry */
-   swp_entry_t entry = pte_to_swp_entry(pte);
-
+   entry = pte_to_swp_entry(pte);
if (non_swap_entry(entry)) {
/*
 * migration or hwpoison entries are always
@@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long 
addr, unsigned long end,
} else {
 #ifdef CONFIG_SWAP
*vec = mincore_page(swap_address_space(entry),
-   swp_offset(entry));
+   swp_offset(entry), NULL);
 #else
WARN_ON(1);
*vec = 1;
-- 
2.16.1



[PATCH -mm -V3 19/21] mm, THP, swap: Support PMD swap mapping in common path

2018-05-23 Thread Huang, Ying
From: Huang Ying 

Original code is only for PMD migration entry, it is revised to
support PMD swap mapping.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 fs/proc/task_mmu.c |  8 
 mm/gup.c   | 34 ++
 mm/huge_memory.c   |  6 +++---
 mm/mempolicy.c |  2 +-
 4 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 597969db9e90..ddb6a2832f86 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -978,7 +978,7 @@ static inline void clear_soft_dirty_pmd(struct 
vm_area_struct *vma,
pmd = pmd_clear_soft_dirty(pmd);
 
set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
-   } else if (is_migration_entry(pmd_to_swp_entry(pmd))) {
+   } else if (is_swap_pmd(pmd)) {
pmd = pmd_swp_clear_soft_dirty(pmd);
set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
}
@@ -1309,7 +1309,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long 
addr, unsigned long end,
frame = pmd_pfn(pmd) +
((addr & ~PMD_MASK) >> PAGE_SHIFT);
}
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP)
else if (is_swap_pmd(pmd)) {
swp_entry_t entry = pmd_to_swp_entry(pmd);
unsigned long offset;
@@ -1323,8 +1323,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long 
addr, unsigned long end,
flags |= PM_SWAP;
if (pmd_swp_soft_dirty(pmd))
flags |= PM_SOFT_DIRTY;
-   VM_BUG_ON(!is_pmd_migration_entry(pmd));
-   page = migration_entry_to_page(entry);
+   if (is_pmd_migration_entry(pmd))
+   page = migration_entry_to_page(entry);
}
 #endif
 
diff --git a/mm/gup.c b/mm/gup.c
index b70d7ba7cc13..84ba4ad8120d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -216,6 +216,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct 
*vma,
spinlock_t *ptl;
struct page *page;
struct mm_struct *mm = vma->vm_mm;
+   swp_entry_t entry;
 
pmd = pmd_offset(pudp, address);
/*
@@ -243,18 +244,21 @@ static struct page *follow_pmd_mask(struct vm_area_struct 
*vma,
if (!pmd_present(pmdval)) {
if (likely(!(flags & FOLL_MIGRATION)))
return no_page_table(vma, flags);
-   VM_BUG_ON(thp_migration_supported() &&
- !is_pmd_migration_entry(pmdval));
-   if (is_pmd_migration_entry(pmdval))
+   entry = pmd_to_swp_entry(pmdval);
+   if (thp_migration_supported() && is_migration_entry(entry)) {
pmd_migration_entry_wait(mm, pmd);
-   pmdval = READ_ONCE(*pmd);
-   /*
-* MADV_DONTNEED may convert the pmd to null because
-* mmap_sem is held in read mode
-*/
-   if (pmd_none(pmdval))
+   pmdval = READ_ONCE(*pmd);
+   /*
+* MADV_DONTNEED may convert the pmd to null because
+* mmap_sem is held in read mode
+*/
+   if (pmd_none(pmdval))
+   return no_page_table(vma, flags);
+   goto retry;
+   }
+   if (thp_swap_supported() && !non_swap_entry(entry))
return no_page_table(vma, flags);
-   goto retry;
+   VM_BUG_ON(1);
}
if (pmd_devmap(pmdval)) {
ptl = pmd_lock(mm, pmd);
@@ -276,11 +280,17 @@ static struct page *follow_pmd_mask(struct vm_area_struct 
*vma,
return no_page_table(vma, flags);
}
if (unlikely(!pmd_present(*pmd))) {
+   entry = pmd_to_swp_entry(*pmd);
spin_unlock(ptl);
if (likely(!(flags & FOLL_MIGRATION)))
return no_page_table(vma, flags);
-   pmd_migration_entry_wait(mm, pmd);
-   goto retry_locked;
+   if (thp_migration_supported() && is_migration_entry(entry)) {
+   pmd_migration_entry_wait(mm, pmd);
+   goto retry_locked;
+   }
+   if (thp_swap_supported() && !non_swap_entry(entry))
+   return no_page_table(vma, flags);
+   VM_BUG_ON(1);
}
if (unlikely(!pmd_trans_huge(*pmd))) {
spin_unlock(ptl);
diff --git a/mm/huge_me

[PATCH -mm -V3 21/21] mm, THP: Avoid to split THP when reclaim MADV_FREE THP

2018-05-23 Thread Huang, Ying
From: Huang Ying 

Previously, to reclaim MADV_FREE THP, the THP will be split firstly,
then reclaim each sub-pages.  This wastes cycles to split THP and
unmap and free each sub-pages, and split THP even if it has been
written since MADV_FREE.  We have to do this because MADV_FREE THP
reclaiming shares same try_to_unmap() calling with swap, while swap
needs to split the PMD page mapping at that time.  Now swap can
process PMD mapping, this makes it easy to avoid to split THP when
MADV_FREE THP is reclaimed.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/huge_memory.c | 41 -
 mm/vmscan.c  |  3 ++-
 2 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 88984e95b9b2..2d68a8f65531 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1671,6 +1671,15 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t 
pmd)
return 0;
 }
 
+static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
+{
+   pgtable_t pgtable;
+
+   pgtable = pgtable_trans_huge_withdraw(mm, pmd);
+   pte_free(mm, pgtable);
+   mm_dec_nr_ptes(mm);
+}
+
 #ifdef CONFIG_THP_SWAP
 void __split_huge_swap_pmd(struct vm_area_struct *vma,
   unsigned long haddr,
@@ -1885,6 +1894,28 @@ bool set_pmd_swap_entry(struct page_vma_mapped_walk 
*pvmw, struct page *page,
pmd_t swp_pmd;
swp_entry_t entry = { .val = page_private(page) };
 
+   if (unlikely(PageSwapBacked(page) != PageSwapCache(page))) {
+   WARN_ON_ONCE(1);
+   return false;
+   }
+
+   /* MADV_FREE page check */
+   if (!PageSwapBacked(page)) {
+   if (!PageDirty(page)) {
+   zap_deposited_table(mm, pvmw->pmd);
+   add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+   goto out_remove_rmap;
+   }
+
+   /*
+* If the page was redirtied, it cannot be
+* discarded. Remap the page to page table.
+*/
+   set_pmd_at(mm, address, pvmw->pmd, pmdval);
+   SetPageSwapBacked(page);
+   return false;
+   }
+
if (swap_duplicate(&entry, true) < 0) {
set_pmd_at(mm, address, pvmw->pmd, pmdval);
return false;
@@ -1902,21 +1933,13 @@ bool set_pmd_swap_entry(struct page_vma_mapped_walk 
*pvmw, struct page *page,
swp_pmd = pmd_swp_mksoft_dirty(swp_pmd);
set_pmd_at(mm, address, pvmw->pmd, swp_pmd);
 
+out_remove_rmap:
page_remove_rmap(page, true);
put_page(page);
return true;
 }
 #endif
 
-static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
-{
-   pgtable_t pgtable;
-
-   pgtable = pgtable_trans_huge_withdraw(mm, pmd);
-   pte_free(mm, pgtable);
-   mm_dec_nr_ptes(mm);
-}
-
 /*
  * Return true if we do MADV_FREE successfully on entire pmd page.
  * Otherwise, return false.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9f46047d4dee..1b89552523f6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1137,7 +1137,8 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
/* Adding to swap updated mapping */
mapping = page_mapping(page);
}
-   } else if (unlikely(PageTransHuge(page))) {
+   } else if (unlikely(PageTransHuge(page)) &&
+  (!thp_swap_supported() || !PageAnon(page))) {
/* Split file THP */
if (split_huge_page_to_list(page, page_list))
goto keep_locked;
-- 
2.16.1



[PATCH -mm -V3 17/21] mm, THP, swap: Support PMD swap mapping for MADV_WILLNEED

2018-05-23 Thread Huang, Ying
From: Huang Ying 

During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled
for the VMA, the whole swap cluster will be swapin.  Otherwise, the
huge swap cluster and the PMD swap mapping will be split and fallback
to PTE swap mapping.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/madvise.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index e03e85a20fb4..44a0a62f4848 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned 
long start,
pte_t *orig_pte;
struct vm_area_struct *vma = walk->private;
unsigned long index;
+   swp_entry_t entry;
+   struct page *page;
+   pmd_t pmdval;
+
+   pmdval = *pmd;
+   if (thp_swap_supported() && is_swap_pmd(pmdval) &&
+   !is_pmd_migration_entry(pmdval)) {
+   entry = pmd_to_swp_entry(pmdval);
+   if (!transparent_hugepage_swapin_enabled(vma)) {
+   if (!split_swap_cluster(entry, false))
+   split_huge_swap_pmd(vma, pmd, start, pmdval);
+   } else {
+   page = read_swap_cache_async(entry,
+GFP_HIGHUSER_MOVABLE,
+vma, start, false);
+   /* The swap cluster has been split under us */
+   if (page) {
+   if (!PageTransHuge(page))
+   split_huge_swap_pmd(vma, pmd, start,
+   pmdval);
+   put_page(page);
+   }
+   }
+   }
 
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
return 0;
 
for (index = start; index != end; index += PAGE_SIZE) {
pte_t pte;
-   swp_entry_t entry;
-   struct page *page;
spinlock_t *ptl;
 
orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
-- 
2.16.1



Re: [PATCH v3 1/2] regulator: dt-bindings: add QCOM RPMh regulator bindings

2018-05-23 Thread Mark Brown
On Tue, May 22, 2018 at 05:08:45PM -0700, Doug Anderson wrote:

> So one client's vote for a voltage continues to be in effect even if
> that client votes to have the regulator disabled?  That seems
> fundamentally broken in RPMh.  I guess my take would be to work around

It's arguable either way - you could say that the client gets to specify
a safe range at all times or you could say that the machine constraints
should cover all cases where the hardware is idling.  Of course RPMh
is missing anything like the machine constraints (as we can see from all
the fixing up of undesirable hard coding we have to do) so it's kind of
pushed towards the first case.

> >> A) Turn off VMMC and VQMMC
> >> B) Program VMMC and VQMMC to defaults
> >> C) Turn on VMMC and VQMMC

> >> ...right now we bootup and pretend to Linux that VMMC and VQMMC start
> >> off, so step A) will be no-op.  Sigh.

> > Step A) would not work because the regulator's use_count would be 0 and
> > regulator_disable() can only be called successfully if use_count > 0.  The
> > call would have no impact and it would return an error.

> Are you sure regulator_force_disable() won't do the trick on most
> boards (which will report the regulator being enabled at bootup)?  I
> haven't tried it, but it seems like it might.

It does mean that things will go wrong if the regulator is shared.


signature.asc
Description: PGP signature


Re: [PATCH v6] gpio: dwapb: Add support for 1 interrupt per port A GPIO

2018-05-23 Thread Linus Walleij
On Fri, May 11, 2018 at 10:31 AM, Phil Edworthy
 wrote:

> The DesignWare GPIO IP can be configured for either 1 interrupt or 1
> per GPIO in port A, but the driver currently only supports 1 interrupt.
> See the DesignWare DW_apb_gpio Databook description of the
> 'GPIO_INTR_IO' parameter.
>
> This change allows the driver to work with up to 32 interrupts, it will
> get as many interrupts as specified in the DT 'interrupts' property.
> It doesn't do anything clever with the different interrupts, it just calls
> the same handler used for single interrupt hardware.
>
> Signed-off-by: Phil Edworthy 
> Reviewed-by: Rob Herring 
> Acked-by: Lee Jones 
> ---
> One point to mention is that I have made it possible for users to have
> unconnected interrupts by specifying holes in the list of interrupts. This is
> done by supporting the interrupts-extended DT prop.
> However, I have no use for this and had to hack some test case for this.
> Perhaps the driver should support 1 interrupt or all GPIOa as interrupts?
>
> v6:
>  - Treat DT and ACPI the same as much as possible. Note that we can't use
>platform_get_irq() to get the DT interrupts as they are in the port
>sub-node and hence do not have an associated platform device.

I already applied this patch in some version, can you check what is
in my devel branch and send incremental patches on top if
something needs changing?
https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git/commit/?h=devel&id=e6ca26abd37606ba4864f20c85d3fe4a2173b93f

Sorry for not knowing by heart what was applied or when, it's
just too much for me sometimes.

Yours,
Linus Walleij


[PATCH -mm -V3 14/21] mm, cgroup, THP, swap: Support to move swap account for PMD swap mapping

2018-05-23 Thread Huang, Ying
From: Huang Ying 

Previously the huge swap cluster will be split after the THP is
swapout.  Now, to support to swapin the THP as a whole, the huge swap
cluster will not be split after the THP is reclaimed.  So in memcg, we
need to move the swap account for PMD swap mappings in the process's
page table.

When the page table is scanned during moving memcg charge, the PMD
swap mapping will be identified.  And mem_cgroup_move_swap_account()
and its callee is revised to move account for the whole huge swap
cluster.  If the swap cluster mapped by PMD has been split, the PMD
swap mapping will be split and fallback to PTE processing.  Because
the swap slots of the swap cluster may have been swapin or moved to
other cgroup already.

Because there is no way to prevent a huge swap cluster from being
split except when it has SWAP_HAS_CACHE flag set.  It is possible for
the huge swap cluster to be split and the charge for the swap slots
inside to be changed, after we check the PMD swap mapping and the huge
swap cluster before we commit the charge moving.  But the race window
is so small, that we will just ignore the race.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/linux/huge_mm.h |   9 +++
 include/linux/swap.h|   6 ++
 include/linux/swap_cgroup.h |   3 +-
 mm/huge_memory.c|  12 +---
 mm/memcontrol.c | 138 +++-
 mm/swap_cgroup.c|  45 ---
 mm/swapfile.c   |  12 
 7 files changed, 180 insertions(+), 45 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4e299e327720..5001c28b3d18 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -405,6 +405,9 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct 
vm_area_struct *vma)
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #ifdef CONFIG_THP_SWAP
+extern void __split_huge_swap_pmd(struct vm_area_struct *vma,
+ unsigned long haddr,
+ pmd_t *pmd);
 extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
   unsigned long address, pmd_t orig_pmd);
 extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
@@ -432,6 +435,12 @@ static inline bool transparent_hugepage_swapin_enabled(
return false;
 }
 #else /* CONFIG_THP_SWAP */
+static inline void __split_huge_swap_pmd(struct vm_area_struct *vma,
+unsigned long haddr,
+pmd_t *pmd)
+{
+}
+
 static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
  unsigned long address, pmd_t orig_pmd)
 {
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5832a750baed..88677acdcff6 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -628,6 +628,7 @@ static inline swp_entry_t get_swap_page(struct page *page)
 #ifdef CONFIG_THP_SWAP
 extern int split_swap_cluster(swp_entry_t entry, bool force);
 extern int split_swap_cluster_map(swp_entry_t entry);
+extern bool in_huge_swap_cluster(swp_entry_t entry);
 #else
 static inline int split_swap_cluster(swp_entry_t entry, bool force)
 {
@@ -638,6 +639,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry)
 {
return 0;
 }
+
+static inline bool in_huge_swap_cluster(swp_entry_t entry)
+{
+   return false;
+}
 #endif
 
 #ifdef CONFIG_MEMCG
diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h
index a12dd1c3966c..c40fb52b0563 100644
--- a/include/linux/swap_cgroup.h
+++ b/include/linux/swap_cgroup.h
@@ -7,7 +7,8 @@
 #ifdef CONFIG_MEMCG_SWAP
 
 extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
-   unsigned short old, unsigned short new);
+   unsigned short old, unsigned short new,
+   unsigned int nr_ents);
 extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id,
 unsigned int nr_ents);
 extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a8af2ddc578a..c4eb7737b313 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1630,9 +1630,9 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 }
 
 #ifdef CONFIG_THP_SWAP
-static void __split_huge_swap_pmd(struct vm_area_struct *vma,
- unsigned long haddr,
- pmd_t *pmd)
+void __split_huge_swap_pmd(struct vm_area_struct *vma,
+  unsigned long haddr,
+  pmd_t *pmd)
 {
struct mm_struct *mm = vma->vm_mm;
pgtable

[PATCH -mm -V3 12/21] mm, THP, swap: Support PMD swap mapping in swapoff

2018-05-23 Thread Huang, Ying
From: Huang Ying 

During swapoff, for a huge swap cluster, we need to allocate a THP,
read its contents into the THP and unuse the PMD and PTE swap mappings
to it.  If failed to allocate a THP, the huge swap cluster will be
split.

During unuse, if it is found that the swap cluster mapped by a PMD
swap mapping is split already, we will split the PMD swap mapping and
unuse the PTEs.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/asm-generic/pgtable.h | 15 ++--
 include/linux/huge_mm.h   |  8 
 mm/huge_memory.c  |  4 +-
 mm/swapfile.c | 86 ++-
 4 files changed, 98 insertions(+), 15 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index bb8354981a36..caa381962cd2 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -931,22 +931,13 @@ static inline int 
pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
barrier();
 #endif
/*
-* !pmd_present() checks for pmd migration entries
-*
-* The complete check uses is_pmd_migration_entry() in linux/swapops.h
-* But using that requires moving current function and 
pmd_trans_unstable()
-* to linux/swapops.h to resovle dependency, which is too much code 
move.
-*
-* !pmd_present() is equivalent to is_pmd_migration_entry() currently,
-* because !pmd_present() pages can only be under migration not swapped
-* out.
-*
-* pmd_none() is preseved for future condition checks on pmd migration
+* pmd_none() is preseved for future condition checks on pmd swap
 * entries and not confusing with this function name, although it is
 * redundant with !pmd_present().
 */
if (pmd_none(pmdval) || pmd_trans_huge(pmdval) ||
-   (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && 
!pmd_present(pmdval)))
+   ((IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) ||
+ IS_ENABLED(CONFIG_THP_SWAP)) && !pmd_present(pmdval)))
return 1;
if (unlikely(pmd_bad(pmdval))) {
pmd_clear_bad(pmd);
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1cfd43047f0d..4e299e327720 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -405,6 +405,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct 
vm_area_struct *vma)
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #ifdef CONFIG_THP_SWAP
+extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+  unsigned long address, pmd_t orig_pmd);
 extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
 
 static inline bool transparent_hugepage_swapin_enabled(
@@ -430,6 +432,12 @@ static inline bool transparent_hugepage_swapin_enabled(
return false;
 }
 #else /* CONFIG_THP_SWAP */
+static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long address, pmd_t orig_pmd)
+{
+   return 0;
+}
+
 static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd)
 {
return 0;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3fd129a21f2e..668d77cec14d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1663,8 +1663,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct 
*vma,
pmd_populate(mm, pmd, pgtable);
 }
 
-static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
-  unsigned long address, pmd_t orig_pmd)
+int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+   unsigned long address, pmd_t orig_pmd)
 {
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1a62fbc13381..77b2ddd37d9b 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1937,6 +1937,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t 
swp_pte)
return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte);
 }
 
+static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd)
+{
+   return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd);
+}
+
 /*
  * No need to decide whether this PTE shares the swap entry with others,
  * just let do_wp_page work it out if a write is requested later - to
@@ -1998,6 +2003,57 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t 
*pmd,
return ret;
 }
 
+#ifdef CONFIG_THP_SWAP
+static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+unsigned long addr, swp_entry_t entry, struct page *page)
+{
+   struct mem_cgroup *memcg;
+   struct swap_info_struct *si;
+   spinlock_t *ptl;
+   int ret = 1;
+
+   if (mem_cgroup_try_charge(page, 

Re: [PATCH v3 5/6] spi: at91-usart: add driver for at91-usart as spi

2018-05-23 Thread Mark Brown
On Wed, May 23, 2018 at 11:10:28AM +0300, Radu Pirea wrote:
> On 05/17/2018 08:04 AM, Mark Brown wrote:

> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Driver for AT91 USART Controllers as SPI
> > > + *
> > > + * Copyright (C) 2018 Microchip Technology Inc.

> > Make the entire block a C++ comment so it looks more intentional rather
> > tha mixing C and C++.

> I know it's ugly, but SPDX license identifier must be in a separate comment
> block.

No, it doesn't - it just needs to be the first line of the file.


signature.asc
Description: PGP signature


[PATCH -mm -V3 11/21] mm, THP, swap: Add sysfs interface to configure THP swapin

2018-05-23 Thread Huang, Ying
From: Huang Ying 

Swapin a THP as a whole isn't desirable at some situations.  For
example, for random access pattern, swapin a THP as a whole will
inflate the reading greatly.  So a sysfs interface:
/sys/kernel/mm/transparent_hugepage/swapin_enabled is added to
configure it.  Three options as follow are provided,

- always: THP swapin will be enabled always

- madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE
  flag set.

- never: THP swapin will be disabled always

The default configuration is: madvise.

During page fault, if a PMD swap mapping is found and THP swapin is
disabled, the huge swap cluster and the PMD swap mapping will be split
and fallback to normal page swapin.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 Documentation/vm/transhuge.rst | 21 ++
 include/linux/huge_mm.h| 31 +++
 mm/huge_memory.c   | 89 +-
 3 files changed, 123 insertions(+), 18 deletions(-)

diff --git a/Documentation/vm/transhuge.rst b/Documentation/vm/transhuge.rst
index a87b1d880cd4..d727706cffc3 100644
--- a/Documentation/vm/transhuge.rst
+++ b/Documentation/vm/transhuge.rst
@@ -163,6 +163,27 @@ Some userspace (such as a test program, or an optimized 
memory allocation
 
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
 
+Transparent hugepage may be swapout and swapin in one piece without
+splitting.  This will improve the utility of transparent hugepage but
+inflate the read/write too.  So whether to enable swapin transparent
+hugepage in one piece can be configured as follow.
+
+   echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled
+   echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled
+   echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled
+
+always
+   Attempt to allocate a transparent huge page and read it from
+   swap space in one piece every time.
+
+never
+   Always split the swap space and PMD swap mapping and swapin
+   the fault normal page during swapin.
+
+madvise
+   Only swapin the transparent huge page in one piece for
+   MADV_HUGEPAGE madvise regions.
+
 khugepaged will be automatically started when
 transparent_hugepage/enabled is set to "always" or "madvise, and it'll
 be automatically shutdown if it's set to "never".
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f5348d072351..1cfd43047f0d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -62,6 +62,8 @@ enum transparent_hugepage_flag {
 #ifdef CONFIG_DEBUG_VM
TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG,
 #endif
+   TRANSPARENT_HUGEPAGE_SWAPIN_FLAG,
+   TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG,
 };
 
 struct kobject;
@@ -404,11 +406,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct 
vm_area_struct *vma)
 
 #ifdef CONFIG_THP_SWAP
 extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
+
+static inline bool transparent_hugepage_swapin_enabled(
+   struct vm_area_struct *vma)
+{
+   if (vma->vm_flags & VM_NOHUGEPAGE)
+   return false;
+
+   if (is_vma_temporary_stack(vma))
+   return false;
+
+   if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+   return false;
+
+   if (transparent_hugepage_flags &
+   (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG))
+   return true;
+
+   if (transparent_hugepage_flags &
+   (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG))
+   return !!(vma->vm_flags & VM_HUGEPAGE);
+
+   return false;
+}
 #else /* CONFIG_THP_SWAP */
 static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd)
 {
return 0;
 }
+
+static inline bool transparent_hugepage_swapin_enabled(
+   struct vm_area_struct *vma)
+{
+   return false;
+}
 #endif /* CONFIG_THP_SWAP */
 
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c2437914c632..3fd129a21f2e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags __read_mostly =
 #endif
(1pmd, orig_pmd))) {
-   

[PATCH -mm -V3 06/21] mm, THP, swap: Support PMD swap mapping when splitting huge PMD

2018-05-23 Thread Huang, Ying
From: Huang Ying 

A huge PMD need to be split when zap a part of the PMD mapping etc.
If the PMD mapping is a swap mapping, we need to split it too.  This
patch implemented the support for this.  This is similar as splitting
the PMD page mapping, except we need to decrease the PMD swap mapping
count for the huge swap cluster too.  If the PMD swap mapping count
becomes 0, the huge swap cluster will be split.

Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap
PMD, so pmd_present() check is called before them.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/linux/swap.h |  6 ++
 mm/huge_memory.c | 58 +++-
 mm/swapfile.c| 28 +
 3 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7ed2c727c9b6..bb9de2cb952a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -618,11 +618,17 @@ static inline swp_entry_t get_swap_page(struct page *page)
 
 #ifdef CONFIG_THP_SWAP
 extern int split_swap_cluster(swp_entry_t entry);
+extern int split_swap_cluster_map(swp_entry_t entry);
 #else
 static inline int split_swap_cluster(swp_entry_t entry)
 {
return 0;
 }
+
+static inline int split_swap_cluster_map(swp_entry_t entry)
+{
+   return 0;
+}
 #endif
 
 #ifdef CONFIG_MEMCG
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e9177363fe2e..84d5d8ff869e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1602,6 +1602,47 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t 
pmd)
return 0;
 }
 
+#ifdef CONFIG_THP_SWAP
+static void __split_huge_swap_pmd(struct vm_area_struct *vma,
+ unsigned long haddr,
+ pmd_t *pmd)
+{
+   struct mm_struct *mm = vma->vm_mm;
+   pgtable_t pgtable;
+   pmd_t _pmd;
+   swp_entry_t entry;
+   int i, soft_dirty;
+
+   entry = pmd_to_swp_entry(*pmd);
+   soft_dirty = pmd_soft_dirty(*pmd);
+
+   split_swap_cluster_map(entry);
+
+   pgtable = pgtable_trans_huge_withdraw(mm, pmd);
+   pmd_populate(mm, &_pmd, pgtable);
+
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) {
+   pte_t *pte, ptent;
+
+   pte = pte_offset_map(&_pmd, haddr);
+   VM_BUG_ON(!pte_none(*pte));
+   ptent = swp_entry_to_pte(entry);
+   if (soft_dirty)
+   ptent = pte_swp_mksoft_dirty(ptent);
+   set_pte_at(mm, haddr, pte, ptent);
+   pte_unmap(pte);
+   }
+   smp_wmb(); /* make pte visible before pmd */
+   pmd_populate(mm, pmd, pgtable);
+}
+#else
+static inline void __split_huge_swap_pmd(struct vm_area_struct *vma,
+unsigned long haddr,
+pmd_t *pmd)
+{
+}
+#endif
+
 /*
  * Return true if we do MADV_FREE successfully on entire pmd page.
  * Otherwise, return false.
@@ -2068,7 +2109,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
VM_BUG_ON_VMA(vma->vm_start > haddr, vma);
VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma);
-   VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd)
+   VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd)
&& !pmd_devmap(*pmd));
 
count_vm_event(THP_SPLIT_PMD);
@@ -2090,8 +2131,11 @@ static void __split_huge_pmd_locked(struct 
vm_area_struct *vma, pmd_t *pmd,
put_page(page);
add_mm_counter(mm, MM_FILEPAGES, -HPAGE_PMD_NR);
return;
-   } else if (is_huge_zero_pmd(*pmd)) {
+   } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) {
/*
+* is_huge_zero_pmd() may return true for PMD swap
+* entry, so checking pmd_present() firstly.
+*
 * FIXME: Do we want to invalidate secondary mmu by calling
 * mmu_notifier_invalidate_range() see comments below inside
 * __split_huge_pmd() ?
@@ -2134,6 +2178,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
page = pfn_to_page(swp_offset(entry));
} else
 #endif
+   if (thp_swap_supported() && is_swap_pmd(old_pmd))
+   return __split_huge_swap_pmd(vma, haddr, pmd);
+   else
page = pmd_page(old_pmd);
VM_BUG_ON_PAGE(!page_count(page), page);
page_ref_add(page, HPAGE_PMD_NR - 1);
@@ -2225,14 +2272,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
 * pmd against. Otherwise we can end up replacing wrong

[PATCH -mm -V3 09/21] mm, THP, swap: Swapin a THP as a whole

2018-05-23 Thread Huang, Ying
From: Huang Ying 

With this patch, when page fault handler find a PMD swap mapping, it
will swap in a THP as a whole.  This avoids the overhead of
splitting/collapsing before/after the THP swapping.  And improves the
swap performance greatly for reduced page fault count etc.

do_huge_pmd_swap_page() is added in the patch to implement this.  It
is similar to do_swap_page() for normal page swapin.

If failing to allocate a THP, the huge swap cluster and the PMD swap
mapping will be split to fallback to normal page swapin.

If the huge swap cluster has been split already, the PMD swap mapping
will be split to fallback to normal page swapin.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/linux/huge_mm.h |   9 +++
 include/linux/swap.h|   9 +++
 mm/huge_memory.c| 170 
 mm/memory.c |  16 +++--
 4 files changed, 198 insertions(+), 6 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 0dbfbe34b01a..f5348d072351 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -402,4 +402,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct 
vm_area_struct *vma)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+#ifdef CONFIG_THP_SWAP
+extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
+#else /* CONFIG_THP_SWAP */
+static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd)
+{
+   return 0;
+}
+#endif /* CONFIG_THP_SWAP */
+
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index d2e017dd7bbd..5832a750baed 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -560,6 +560,15 @@ static inline struct page *lookup_swap_cache(swp_entry_t 
swp,
return NULL;
 }
 
+static inline struct page *read_swap_cache_async(swp_entry_t swp,
+gfp_t gft_mask,
+struct vm_area_struct *vma,
+unsigned long addr,
+bool do_poll)
+{
+   return NULL;
+}
+
 static inline int add_to_swap(struct page *page)
 {
return 0;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3975d824b4ed..8303fa021c42 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -33,6 +33,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1609,6 +1611,174 @@ static void __split_huge_swap_pmd(struct vm_area_struct 
*vma,
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
 }
+
+static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+  unsigned long address, pmd_t orig_pmd)
+{
+   struct mm_struct *mm = vma->vm_mm;
+   spinlock_t *ptl;
+   int ret = 0;
+
+   ptl = pmd_lock(mm, pmd);
+   if (pmd_same(*pmd, orig_pmd))
+   __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd);
+   else
+   ret = -ENOENT;
+   spin_unlock(ptl);
+
+   return ret;
+}
+
+int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd)
+{
+   struct page *page;
+   struct mem_cgroup *memcg;
+   struct vm_area_struct *vma = vmf->vma;
+   unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
+   swp_entry_t entry;
+   pmd_t pmd;
+   int i, locked, exclusive = 0, ret = 0;
+
+   entry = pmd_to_swp_entry(orig_pmd);
+   VM_BUG_ON(non_swap_entry(entry));
+   delayacct_set_flag(DELAYACCT_PF_SWAPIN);
+retry:
+   page = lookup_swap_cache(entry, NULL, vmf->address);
+   if (!page) {
+   page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma,
+haddr, false);
+   if (!page) {
+   /*
+* Back out if somebody else faulted in this pmd
+* while we released the pmd lock.
+*/
+   if (likely(pmd_same(*vmf->pmd, orig_pmd))) {
+   ret = split_swap_cluster(entry, false);
+   /*
+* Retry if somebody else swap in the swap
+* entry
+*/
+   if (ret == -EEXIST) {
+   ret = 0;
+   goto retry;
+   /* swapoff occurs under us */
+   } else if (ret == -EINVAL)
+   ret = 0;
+   else
+   goto fallback;
+ 

[PATCH -mm -V3 08/21] mm, THP, swap: Support to read a huge swap cluster for swapin a THP

2018-05-23 Thread Huang, Ying
From: Huang Ying 

To swapin a THP as a whole, we need to read a huge swap cluster from
the swap device.  This patch revised the __read_swap_cache_async() and
its callers and callees to support this.  If __read_swap_cache_async()
find the swap cluster of the specified swap entry is huge, it will try
to allocate a THP, add it into the swap cache.  So later the contents
of the huge swap cluster can be read into the THP.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/linux/huge_mm.h | 38 
 include/linux/swap.h|  4 +--
 mm/huge_memory.c| 26 -
 mm/swap_state.c | 77 ++---
 mm/swapfile.c   | 11 ---
 5 files changed, 100 insertions(+), 56 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 0d0cfddbf4b7..0dbfbe34b01a 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -250,6 +250,39 @@ static inline bool thp_migration_supported(void)
return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
 }
 
+/*
+ * always: directly stall for all thp allocations
+ * defer: wake kswapd and fail if not immediately available
+ * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise
+ *   fail if not immediately available
+ * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately
+ * available
+ * never: never stall for any thp allocation
+ */
+static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
+{
+   bool vma_madvised;
+
+   if (!vma)
+   return GFP_TRANSHUGE_LIGHT;
+   vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
+   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
+&transparent_hugepage_flags))
+   return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
+   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
+&transparent_hugepage_flags))
+   return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
+   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG,
+&transparent_hugepage_flags))
+   return GFP_TRANSHUGE_LIGHT |
+   (vma_madvised ? __GFP_DIRECT_RECLAIM :
+   __GFP_KSWAPD_RECLAIM);
+   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
+&transparent_hugepage_flags))
+   return GFP_TRANSHUGE_LIGHT |
+   (vma_madvised ? __GFP_DIRECT_RECLAIM : 0);
+   return GFP_TRANSHUGE_LIGHT;
+}
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
 #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -362,6 +395,11 @@ static inline bool thp_migration_supported(void)
 {
return false;
 }
+
+static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
+{
+   return 0;
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 878f132dabc0..d2e017dd7bbd 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -462,7 +462,7 @@ extern sector_t map_swap_page(struct page *, struct 
block_device **);
 extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
 extern int __swap_count(swp_entry_t entry);
-extern int __swp_swapcount(swp_entry_t entry);
+extern int __swp_swapcount(swp_entry_t entry, bool *huge_cluster);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
 extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
@@ -589,7 +589,7 @@ static inline int __swap_count(swp_entry_t entry)
return 0;
 }
 
-static inline int __swp_swapcount(swp_entry_t entry)
+static inline int __swp_swapcount(swp_entry_t entry, bool *huge_cluster)
 {
return 0;
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e363e13f6751..3975d824b4ed 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -620,32 +620,6 @@ static int __do_huge_pmd_anonymous_page(struct vm_fault 
*vmf, struct page *page,
 
 }
 
-/*
- * always: directly stall for all thp allocations
- * defer: wake kswapd and fail if not immediately available
- * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise
- *   fail if not immediately available
- * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately
- * available
- * never: never stall for any thp allocation
- */
-static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
-{
-   const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
-
-   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_

[PATCH -mm -V3 05/21] mm, THP, swap: Support PMD swap mapping in free_swap_and_cache()/swap_free()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

When a PMD swap mapping is removed from a huge swap cluster, for
example, unmap a memory range mapped with PMD swap mapping, etc,
free_swap_and_cache() will be called to decrease the reference count
to the huge swap cluster.  free_swap_and_cache() may also free or
split the huge swap cluster, and free the corresponding THP in swap
cache if necessary.  swap_free() is similar, and shares most
implementation with free_swap_and_cache().  This patch revises
free_swap_and_cache() and swap_free() to implement this.

If the swap cluster has been split already, for example, because of
failing to allocate a THP during swapin, we just decrease one from the
reference count of all swap slots.

Otherwise, we will decrease one from the reference count of all swap
slots and the PMD swap mapping count in cluster_count().  When the
corresponding THP isn't in swap cache, if PMD swap mapping count
becomes 0, the huge swap cluster will be split, and if all swap count
becomes 0, the huge swap cluster will be freed.  When the corresponding
THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we
will try to delete the THP from swap cache.  Which will cause the THP
and the huge swap cluster be freed.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 arch/s390/mm/pgtable.c |   2 +-
 include/linux/swap.h   |   9 ++--
 kernel/power/swap.c|   4 +-
 mm/madvise.c   |   2 +-
 mm/memory.c|   4 +-
 mm/shmem.c |   6 +--
 mm/swapfile.c  | 114 +++--
 7 files changed, 116 insertions(+), 25 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 301e466e4263..3079a23eef75 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -646,7 +646,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, 
swp_entry_t entry)
 
dec_mm_counter(mm, mm_counter(page));
}
-   free_swap_and_cache(entry);
+   free_swap_and_cache(entry, false);
 }
 
 void ptep_zap_unused(struct mm_struct *mm, unsigned long addr,
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 57aa655ab27d..7ed2c727c9b6 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -453,9 +453,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t *entry, bool cluster);
 extern int swapcache_prepare(swp_entry_t entry, bool cluster);
-extern void swap_free(swp_entry_t);
+extern void swap_free(swp_entry_t entry, bool cluster);
 extern void swapcache_free_entries(swp_entry_t *entries, int n);
-extern int free_swap_and_cache(swp_entry_t);
+extern int free_swap_and_cache(swp_entry_t entry, bool cluster);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
@@ -509,7 +509,8 @@ static inline void show_swap_cache_info(void)
 {
 }
 
-#define free_swap_and_cache(e) ({(is_migration_entry(e) || 
is_device_private_entry(e));})
+#define free_swap_and_cache(e, c)  \
+   ({(is_migration_entry(e) || is_device_private_entry(e)); })
 #define swapcache_prepare(e, c)
\
({(is_migration_entry(e) || is_device_private_entry(e)); })
 
@@ -527,7 +528,7 @@ static inline int swap_duplicate(swp_entry_t *swp, bool 
cluster)
return 0;
 }
 
-static inline void swap_free(swp_entry_t swp)
+static inline void swap_free(swp_entry_t swp, bool cluster)
 {
 }
 
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 1efcb5b0c3ed..f8b4d6df73fd 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap)
offset = swp_offset(get_swap_page_of_type(swap));
if (offset) {
if (swsusp_extents_insert(offset))
-   swap_free(swp_entry(swap, offset));
+   swap_free(swp_entry(swap, offset), false);
else
return swapdev_block(swap, offset);
}
@@ -206,7 +206,7 @@ void free_all_swap_pages(int swap)
ext = rb_entry(node, struct swsusp_extent, node);
rb_erase(node, &swsusp_extents);
for (offset = ext->start; offset <= ext->end; offset++)
-   swap_free(swp_entry(swap, offset));
+   swap_free(swp_entry(swap, offset), false);
 
kfree(ext);
}
diff --git a/mm/madvise.c b/mm/madvise.c
index 4d3c922ea1a1..d18c626b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,

[PATCH -mm -V3 02/21] mm, THP, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP

2018-05-23 Thread Huang, Ying
From: Huang Ying 

It's unreasonable to optimize swapping for THP without basic swapping
support.  And this will cause build errors when THP_SWAP functions are
defined in swapfile.c and called elsewhere.

The comments are fixed too to reflect the latest progress.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 mm/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ce95491abd6a..cee958bb6002 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -420,10 +420,9 @@ config ARCH_WANTS_THP_SWAP
 
 config THP_SWAP
def_bool y
-   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP
+   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP
help
  Swap transparent huge pages in one piece, without splitting.
- XXX: For now this only does clustered swap space allocation.
 
  For selection by architectures with reasonable THP sizes.
 
-- 
2.16.1



Re: [PATCH v2 1/5] dt-bindings: pinctrl: Add gpio bindings for Actions S900 SoC

2018-05-23 Thread Linus Walleij
On Sun, May 20, 2018 at 7:17 AM, Manivannan Sadhasivam
 wrote:

> Add gpio bindings for Actions Semi S900 SoC.
>
> Signed-off-by: Manivannan Sadhasivam 

Patch applied with Rob's review tag.

Yours,
Linus Walleij


[PATCH -mm -V3 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-05-23 Thread Huang, Ying
From: Huang Ying 

To support to swapin the THP as a whole, we need to create PMD swap
mapping during swapout, and maintain PMD swap mapping count.  This
patch implements the support to increase the PMD swap mapping
count (for swapout, fork, etc.)  and set SWAP_HAS_CACHE flag (for
swapin, etc.) for a huge swap cluster in swap_duplicate() function
family.  Although it only implements a part of the design of the swap
reference count with PMD swap mapping, the whole design is described
as follow to make it easy to understand the patch and the whole
picture.

A huge swap cluster is used to hold the contents of a swapouted THP.
After swapout, a PMD page mapping to the THP will become a PMD
swap mapping to the huge swap cluster via a swap entry in PMD.  While
a PTE page mapping to a subpage of the THP will become the PTE swap
mapping to a swap slot in the huge swap cluster via a swap entry in
PTE.

If there is no PMD swap mapping and the corresponding THP is removed
from the page cache (reclaimed), the huge swap cluster will be split
and become a normal swap cluster.

The count (cluster_count()) of the huge swap cluster is
SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count.  Because
all swap slots in the huge swap cluster are mapped by PTE or PMD, or
has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
HPAGE_PMD_NR.  And the PMD swap mapping count is recorded too to make
it easy to determine whether there are remaining PMD swap mappings.

The count in swap_map[offset] is the sum of PTE and PMD swap mapping
count.  This means when we increase the PMD swap mapping count, we
need to increase swap_map[offset] for all swap slots inside the swap
cluster.  An alternative choice is to make swap_map[offset] to record
PTE swap map count only, given we have recorded PMD swap mapping count
in the count of the huge swap cluster.  But this need to increase
swap_map[offset] when splitting the PMD swap mapping, that may fail
because of memory allocation for swap count continuation.  That is
hard to dealt with.  So we choose current solution.

The PMD swap mapping to a huge swap cluster may be split when unmap a
part of PMD mapping etc.  That is easy because only the count of the
huge swap cluster need to be changed.  When the last PMD swap mapping
is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
cluster (clear the huge flag).  This makes it easy to reason the
cluster state.

A huge swap cluster will be split when splitting the THP in swap
cache, or failing to allocate THP during swapin, etc.  But when
splitting the huge swap cluster, we will not try to split all PMD swap
mappings, because we haven't enough information available for that
sometimes.  Later, when the PMD swap mapping is duplicated or swapin,
etc, the PMD swap mapping will be split and fallback to the PTE
operation.

When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
set in the swap_map[offset] of all swap slots inside the huge swap
cluster backing the THP.  This huge swap cluster will not be split
unless the THP is split even if its PMD swap mapping count dropped to
0.  Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
flag will be cleared in the swap_map[offset] of all swap slots inside
the huge swap cluster.  And this huge swap cluster will be split if
its PMD swap mapping count is 0.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 include/linux/huge_mm.h |   5 +
 include/linux/swap.h|   9 +-
 mm/memory.c |   2 +-
 mm/rmap.c   |   2 +-
 mm/swapfile.c   | 287 +---
 5 files changed, 213 insertions(+), 92 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a8a126259bc4..0d0cfddbf4b7 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -79,6 +79,11 @@ extern struct kobj_attribute shmem_enabled_attr;
 #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
 #define HPAGE_PMD_NR (1head);
@@ -1166,16 +1174,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
entry)
return NULL;
 }
 
-static unsigned char __swap_entry_free(struct swap_info_struct *p,
-  swp_entry_t entry, unsigned char usage)
+static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
+ struct swap_cluster_info *ci,
+ unsigned long offset,
+ unsigned char usage)
 {
-   struct swap_cluste

[PATCH -mm -V3 01/21] mm, THP, swap: Enable PMD swap operations for CONFIG_THP_SWAP

2018-05-23 Thread Huang, Ying
From: Huang Ying 

Previously, the PMD swap operations are only enabled for
CONFIG_ARCH_ENABLE_THP_MIGRATION.  Because they are only used by the
THP migration support.  We will support PMD swap mapping to the huge
swap cluster and swapin the THP as a whole.  That will be enabled via
CONFIG_THP_SWAP and needs these PMD swap operations.  So enable the
PMD swap operations for CONFIG_THP_SWAP too.

Signed-off-by: "Huang, Ying" 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Naoya Horiguchi 
Cc: Zi Yan 
---
 arch/x86/include/asm/pgtable.h |  2 +-
 include/asm-generic/pgtable.h  |  2 +-
 include/linux/swapops.h| 44 ++
 3 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f1633de5a675..0dff03085197 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1224,7 +1224,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
 }
 
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP)
 static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
 {
return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..bb8354981a36 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct 
*mm,
 #endif
 
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
-#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#if !defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !defined(CONFIG_THP_SWAP)
 static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
 {
return pmd;
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 1d3877c39a00..f1be5a52f5c8 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -258,17 +258,7 @@ static inline int is_write_migration_entry(swp_entry_t 
entry)
 
 #endif
 
-struct page_vma_mapped_walk;
-
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
-   struct page *page);
-
-extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw,
-   struct page *new);
-
-extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
-
+#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP)
 static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
 {
swp_entry_t arch_entry;
@@ -286,6 +276,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
return __swp_entry_to_pmd(arch_entry);
 }
+#else
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+   return swp_entry(0, 0);
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+   return __pmd(0);
+}
+#endif
+
+struct page_vma_mapped_walk;
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
+   struct page *page);
+
+extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw,
+   struct page *new);
+
+extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
 
 static inline int is_pmd_migration_entry(pmd_t pmd)
 {
@@ -306,16 +318,6 @@ static inline void remove_migration_pmd(struct 
page_vma_mapped_walk *pvmw,
 
 static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { }
 
-static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
-{
-   return swp_entry(0, 0);
-}
-
-static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
-{
-   return __pmd(0);
-}
-
 static inline int is_pmd_migration_entry(pmd_t pmd)
 {
return 0;
-- 
2.16.1



Re: [PATCH v5 1/3] ARM: dts: tegra: Remove skeleton.dtsi and fix DTC warnings for /memory

2018-05-23 Thread Krzysztof Kozlowski
On Wed, May 23, 2018 at 10:22 AM, Stefan Agner  wrote:
> On 23.05.2018 09:05, Krzysztof Kozlowski wrote:
>> On Thu, May 17, 2018 at 1:39 PM, Stefan Agner  wrote:
>>> On 17.05.2018 09:45, Krzysztof Kozlowski wrote:
>>> Could we not add
>>>
>>> memory { device_type = "memory"; };
>>>
>>> in the SoC level device trees?
>>>
>>> This would save device_type in all other instances.
>>>
>>> That is also how it is done in other places, e.g.
>>> arch/arm/boot/dts/imx6qdl.dtsi
>>
>> Not really because the unit address will not match between different
>> boards. The imx6qdl, as I see, has the same issue:
>>  - imx6qdl.dtsi defines "memory" node
>>  - imx6dl-apf6dev.dts includes the previous and defines "memory@1000"
>>
>> This is wrong - two memory nodes.
>>
>
> Hm I see. We could add
>
> memory@0 { device_type = "memory"; };
>
> Since the reg property is specified in the board level device tree it
> would be still fine?
>
> Or probably better to provide a complete spec with length zero:
>
> memory@0 {
> device_type = "memory";
> reg = <0x0 0x0>;
> };
>
> Even some boards do that and assume that boot loader will fill it
> correctly, so that should be fine.

That could be the solution although tegra30-apalis.dtsi is a problem
here. For Tegra 114, 124 and 20 it would work fine - all boards from
given SoC have the same address of memory (0x0 or 0x8000). However
for Tegra30 the Apalis did not have any memory reg before so I am not
sure what should be used. I added 0x0. The other Tegra30 boards have
memory@8000.

Best regards,
Krzysztof


Re: [PATCH v2 2/5] arm64: dts: actions: Add gpio properties to pinctrl node for S900

2018-05-23 Thread Linus Walleij
On Sun, May 20, 2018 at 7:17 AM, Manivannan Sadhasivam
 wrote:

> Add gpio properties to pinctrl node for Actions Semi S900 SoC.
>
> Signed-off-by: Manivannan Sadhasivam 

Reviewed-by: Linus Walleij 

Yours,
Linus Walleij


Re: [PATCH v2 3/5] arm64: dts: actions: Add gpio line names to Bubblegum-96 board

2018-05-23 Thread Linus Walleij
On Sun, May 20, 2018 at 7:17 AM, Manivannan Sadhasivam
 wrote:

> Add gpio line names to Actions Semi S900 based Bubblegum-96 board.
>
> Signed-off-by: Manivannan Sadhasivam 

Reviewed-by: Linus Walleij 

Yours,
Linus Walleij


Re: [PATCH v2 4/5] pinctrl: actions: Add gpio support for Actions S900 SoC

2018-05-23 Thread Linus Walleij
On Sun, May 20, 2018 at 7:17 AM, Manivannan Sadhasivam
 wrote:

> Add gpio support to pinctrl driver for Actions Semi S900 SoC.
>
> Signed-off-by: Manivannan Sadhasivam 
> Reviewed-by: Andy Shevchenko 

Patch applied for v4.18 so we get some rotation in linux-next!

Yours,
Linus Walleij


Re: [PATCH/RFC] ARM: dts: r8a7791: Move enable-method to CPU nodes

2018-05-23 Thread Simon Horman
On Tue, May 22, 2018 at 03:29:25PM +0200, Geert Uytterhoeven wrote:
> According to Documentation/devicetree/bindings/arm/cpus.txt, the
> "enable-method" property should be a property of the individual CPU
> nodes, not of the parent "cpus" node.  However, on R-Car M2-W (and on
> several other arm32 SoCs), the property is tied to the "cpus" node
> instead.
> 
> Secondary CPU bringup and CPU hot (un)plug work regardless, as
> arm_dt_init_cpu_maps() falls back to looking in the "cpus" node.
> 
> The cpuidle code does not have such a fallback, so it does not detect
> the enable-method.  Note that cpuidle does not support the
> "renesas,apmu" enable-method yet, so for now this does not make any
> difference.

Is the implication that if we keep the current binding for cpu nodes
then at some point we will need to update the cpuidle binding?

> 
> Signed-off-by: Geert Uytterhoeven 
> ---
> Arm64 and powerpc do not have such a fallback, but SH has, like arm32.
> 
> This is marked RFC, as the alternative is to update the DT bindings to
> keep the status quo.
> ---
>  arch/arm/boot/dts/r8a7791.dtsi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/dts/r8a7791.dtsi b/arch/arm/boot/dts/r8a7791.dtsi
> index d568bd22d6cbd855..b214cb8f52e47109 100644
> --- a/arch/arm/boot/dts/r8a7791.dtsi
> +++ b/arch/arm/boot/dts/r8a7791.dtsi
> @@ -71,7 +71,6 @@
>   cpus {
>   #address-cells = <1>;
>   #size-cells = <0>;
> - enable-method = "renesas,apmu";
>  
>   cpu0: cpu@0 {
>   device_type = "cpu";
> @@ -83,6 +82,7 @@
>   clock-latency = <30>; /* 300 us */
>   power-domains = <&sysc R8A7791_PD_CA15_CPU0>;
>   next-level-cache = <&L2_CA15>;
> + enable-method = "renesas,apmu";
>  
>   /* kHz - uV - OPPs unknown yet */
>   operating-points = <150 100>,
> @@ -101,6 +101,7 @@
>   clocks = <&cpg CPG_CORE R8A7791_CLK_Z>;
>   power-domains = <&sysc R8A7791_PD_CA15_CPU1>;
>   next-level-cache = <&L2_CA15>;
> + enable-method = "renesas,apmu";
>   };
>  
>   L2_CA15: cache-controller-0 {
> -- 
> 2.7.4
> 


Re: [PATCH v2 5/5] MAINTAINERS: Add Actions Semi S900 pinctrl entries

2018-05-23 Thread Linus Walleij
On Sun, May 20, 2018 at 7:17 AM, Manivannan Sadhasivam
 wrote:

> Add S900 pinctrl entries under ARCH_ACTIONS
>
> Signed-off-by: Manivannan Sadhasivam 

Patch applied tentatively so we have some maintenance entry for this.

Andreas expressed concerns about the driver earlier, so he might want it
split from the platform parts and have a separate entry for the pinctrl+GPIO
so Manivannan can maintain that part, also it makes sense to list
Manivannan as comaintainer of ARCH_ACTIONS with this in.

Andreas: how would you like to proceed?

I understand that I was a bit pushy or even rude in my last message
about the maintenance of this platform and the code structure of
the pin control driver. I am sorry if it caused any bad feelings on your
side :( social conflicts give me the creeps, I just try my best. Maybe
my best isn't always what it should be.

Yours,
Linus Walleij


Re: [PATCHv4 06/10] arm64: add basic pointer authentication support

2018-05-23 Thread Suzuki K Poulose

Hi Mark,


On 03/05/18 14:20, Mark Rutland wrote:

This patch adds basic support for pointer authentication, allowing
userspace to make use of APIAKey. The kernel maintains an APIAKey value
for each process (shared by all threads within), which is initialised to
a random value at exec() time.

To describe that address authentication instructions are available, the
ID_AA64ISAR0.{APA,API} fields are exposed to userspace. A new hwcap,
APIA, is added to describe that the kernel manages APIAKey.

Instructions using other keys (APIBKey, APDAKey, APDBKey) are disabled,
and will behave as NOPs. These may be made use of in future patches.

No support is added for the generic key (APGAKey), though this cannot be
trapped or made to behave as a NOP. Its presence is not advertised with
a hwcap.

Signed-off-by: Mark Rutland 
Cc: Catalin Marinas 
Cc: Ramana Radhakrishnan 
Cc: Suzuki K Poulose 
Cc: Will Deacon 




diff --git a/arch/arm64/include/asm/pointer_auth.h 
b/arch/arm64/include/asm/pointer_auth.h
new file mode 100644
index ..034877ee28bc
--- /dev/null
+++ b/arch/arm64/include/asm/pointer_auth.h


...


+
+#define __ptrauth_key_install(k, v)\
+do {   \
+   write_sysreg_s(v.lo, SYS_ ## k ## KEYLO_EL1);   \
+   write_sysreg_s(v.hi, SYS_ ## k ## KEYHI_EL1);   \
+} while (0)


I think it might be safer to have parentheses around v, to prevent
something like __ptrauth_key_install(APIA, *key_val) work fine.


diff --git a/arch/arm64/include/uapi/asm/hwcap.h 
b/arch/arm64/include/uapi/asm/hwcap.h
index 17c65c8f33cb..01f02ac500ae 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -48,5 +48,6 @@
  #define HWCAP_USCAT   (1 << 25)
  #define HWCAP_ILRCPC  (1 << 26)
  #define HWCAP_FLAGM   (1 << 27)
+#define HWCAP_APIA (1 << 28)
  
  #endif /* _UAPI__ASM_HWCAP_H */

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 01b1a7e7d70f..f418d4cb6691 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1030,6 +1030,11 @@ static void cpu_copy_el2regs(const struct 
arm64_cpu_capabilities *__unused)
  #endif

...


+#ifdef CONFIG_ARM64_PNTR_AUTH
+   HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_APA_SHIFT, FTR_UNSIGNED, 
1, CAP_HWCAP, HWCAP_APIA),
+#endif


Did you mean CONFIG_ARM64_PTR_AUTH here ?

Cheers

Suzuki


[PATCH] pinctrl: armada-37xx: Fix spurious irq management

2018-05-23 Thread Gregory CLEMENT
From: Terry Zhou 

Until now, if we found spurious irq in irq_handler, we only updated the
status in register but not the status in the code. Due to this the system
will got stuck dues to the infinite loop

[gregory.clem...@bootlin.com: update comment and add fix and stable tags]
Fixes: 30ac0d3b0702 ("pinctrl: armada-37xx: Add edge both type gpio irq 
support")
Cc: 
Signed-off-by: Terry Zhou 
Reviewed-by: Gregory CLEMENT 
Signed-off-by: Gregory CLEMENT 
---
 drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c 
b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c
index 5b63248c8209..7bef929bd7fe 100644
--- a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c
+++ b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c
@@ -679,12 +679,13 @@ static void armada_37xx_irq_handler(struct irq_desc *desc)
writel(1 << hwirq,
   info->base +
   IRQ_STATUS + 4 * i);
-   continue;
+   goto update_status;
}
}
 
generic_handle_irq(virq);
 
+update_status:
/* Update status in case a new IRQ appears */
spin_lock_irqsave(&info->irq_lock, flags);
status = readl_relaxed(info->base +
-- 
2.17.0



  1   2   3   4   5   6   7   8   9   10   >