date:20200925

Re: [PATCH] quota: clear padding in v2r1_mem2diskdqb()

2020-09-25 Thread Jan Kara

On Thu 24-09-20 11:36:19, Eric Dumazet wrote:
> Freshly allocated memory contains garbage, better make sure
> to init all struct v2r1_disk_dqblk fields to avoid KMSAN report:
> 
> BUG: KMSAN: uninit-value in qtree_entry_unused+0x137/0x1b0 
> fs/quota/quota_tree.c:218
> CPU: 0 PID: 23373 Comm: syz-executor.1 Not tainted 5.9.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x21c/0x280 lib/dump_stack.c:118
>  kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
>  __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
>  qtree_entry_unused+0x137/0x1b0 fs/quota/quota_tree.c:218
>  v2r1_mem2diskdqb+0x43d/0x710 fs/quota/quota_v2.c:285
>  qtree_write_dquot+0x226/0x870 fs/quota/quota_tree.c:394
>  v2_write_dquot+0x1ad/0x280 fs/quota/quota_v2.c:333
>  dquot_commit+0x4af/0x600 fs/quota/dquot.c:482
>  ext4_write_dquot fs/ext4/super.c:5934 [inline]
>  ext4_mark_dquot_dirty+0x4d8/0x6a0 fs/ext4/super.c:5985
>  mark_dquot_dirty fs/quota/dquot.c:347 [inline]
>  mark_all_dquot_dirty fs/quota/dquot.c:385 [inline]
>  dquot_alloc_inode+0xc05/0x12b0 fs/quota/dquot.c:1755
>  __ext4_new_inode+0x8204/0x9d70 fs/ext4/ialloc.c:1155
>  ext4_tmpfile+0x41a/0x850 fs/ext4/namei.c:2686
>  vfs_tmpfile+0x2a2/0x570 fs/namei.c:3283
>  do_tmpfile fs/namei.c:3316 [inline]
>  path_openat+0x4035/0x6a90 fs/namei.c:3359
>  do_filp_open+0x2b8/0x710 fs/namei.c:3395
>  do_sys_openat2+0xa88/0x1140 fs/open.c:1168
>  do_sys_open fs/open.c:1184 [inline]
>  __do_compat_sys_openat fs/open.c:1242 [inline]
>  __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240
>  __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240
>  do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
>  __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
>  do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
>  do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205
>  entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
> RIP: 0023:0xf7ff4549
> Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 
> 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 
> 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
> RSP: 002b:f55cd0cc EFLAGS: 0296 ORIG_RAX: 0127
> RAX: ffda RBX: ff9c RCX: 2000
> RDX: 00410481 RSI:  RDI: 
> RBP:  R08:  R09: 
> R10:  R11:  R12: 
> R13:  R14:  R15: 
> 
> Uninit was created at:
>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline]
>  kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126
>  kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80
>  slab_alloc_node mm/slub.c:2907 [inline]
>  slab_alloc mm/slub.c:2916 [inline]
>  __kmalloc+0x2bb/0x4b0 mm/slub.c:3982
>  kmalloc include/linux/slab.h:559 [inline]
>  getdqbuf+0x56/0x150 fs/quota/quota_tree.c:52
>  qtree_write_dquot+0xf2/0x870 fs/quota/quota_tree.c:378
>  v2_write_dquot+0x1ad/0x280 fs/quota/quota_v2.c:333
>  dquot_commit+0x4af/0x600 fs/quota/dquot.c:482
>  ext4_write_dquot fs/ext4/super.c:5934 [inline]
>  ext4_mark_dquot_dirty+0x4d8/0x6a0 fs/ext4/super.c:5985
>  mark_dquot_dirty fs/quota/dquot.c:347 [inline]
>  mark_all_dquot_dirty fs/quota/dquot.c:385 [inline]
>  dquot_alloc_inode+0xc05/0x12b0 fs/quota/dquot.c:1755
>  __ext4_new_inode+0x8204/0x9d70 fs/ext4/ialloc.c:1155
>  ext4_tmpfile+0x41a/0x850 fs/ext4/namei.c:2686
>  vfs_tmpfile+0x2a2/0x570 fs/namei.c:3283
>  do_tmpfile fs/namei.c:3316 [inline]
>  path_openat+0x4035/0x6a90 fs/namei.c:3359
>  do_filp_open+0x2b8/0x710 fs/namei.c:3395
>  do_sys_openat2+0xa88/0x1140 fs/open.c:1168
>  do_sys_open fs/open.c:1184 [inline]
>  __do_compat_sys_openat fs/open.c:1242 [inline]
>  __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240
>  __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240
>  do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
>  __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
>  do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
>  do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205
>  entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
> 
> Fixes: 498c60153ebb ("quota: Implement quota format with 64-bit space and 
> inode limits")
> Signed-off-by: Eric Dumazet 
> Cc: Jan Kara 

Thanks for the patch! I've added it to my tree.

Honza

> ---
>  fs/quota/quota_v2.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c
> index 
> 58fc2a7c7fd19f0be265e8189e476782571cbcfe..e69a2bfdd81c050c1ff2653528f803bd402fb399
>  100644
> --- a/fs/quota/quota_v2.c
> +++ b/fs/quota/quota_v2.c
> @@ -282,6 +282,7 @@ static void v2r1_mem2diskdqb(void *dp, struct dquot 
> *dquot)
>   d->dqb_curspace = cpu_to_le64(m->dqb_cursp

Re: [PATCH] drm/vc4: Deleted the drm_device declaration

2020-09-25 Thread tiantao (H)


Hi :
I alwaygs used scripts/get_maintainers.pl to get the recipient list.I 
don't know why miss maintainers for a given piece of code.


tiantao@ubuntu:~$ git send-email -to e...@anholt.net -to 
airl...@linux.ie -to dan...@ffwll.ch -to sumit.sem...@linaro.org -to 
christian.koe...@amd.com -to dri-de...@lists.freedesktop.org -to 
linux-kernel@vger.kernel.org -to linux-me...@vger.kernel.org -to 
linaro-mm-...@lists.linaro.org 
mailline/drm/drm/0001-drm-vc4-Deleted-the-drm_device-declaration.patch 
--suppress-cc=all

mailline/drm/drm/0001-drm-vc4-Deleted-the-drm_device-declaration.patch

tiantao@ubuntu:~/mailline/drm/drm$ vim drivers/gpu/drm/vc4/vc4_drv.h^C
tiantao@ubuntu:~/mailline/drm/drm$ ./scripts/get_maintainer.pl 
drivers/gpu/drm/vc4/vc4_drv.h

Eric Anholt  (supporter:DRM DRIVERS FOR VC4)
David Airlie  (maintainer:DRM DRIVERS)
Daniel Vetter  (maintainer:DRM DRIVERS)
Sumit Semwal  (maintainer:DMA BUFFER SHARING 
FRAMEWORK)
"Christian König"  (maintainer:DMA BUFFER 
SHARING FRAMEWORK)

dri-de...@lists.freedesktop.org (open list:DRM DRIVERS)
linux-kernel@vger.kernel.org (open list)
linux-me...@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK)
linaro-mm-...@lists.linaro.org (moderated list:DMA BUFFER SHARING FRAMEWORK)


在 2020/9/25 17:10, Daniel Vetter 写道:

On Fri, Sep 25, 2020 at 04:51:38PM +0800, Tian Tao wrote:

drm_modeset_lock.h already declares struct drm_device, so there's no
need to declare it in vc4_drv.h

Signed-off-by: Tian Tao 


Just an aside, when submitting patches please use
scripts/get_maintainers.pl to generate the recipient list. Looking through
past few patches from you it seems fairly arbitrary and often misses the
actual maintainers for a given piece of code, which increases the odds the
patch will get lost a lot.

E.g. for this one I'm only like the 5th or so fallback person, and the
main maintainer isn't on the recipient list.

Cheeers, Daniel


---
  drivers/gpu/drm/vc4/vc4_drv.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index 8c8d96b..8717a1c 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -19,7 +19,6 @@
  
  #include "uapi/drm/vc4_drm.h"
  
-struct drm_device;

  struct drm_gem_object;
  
  /* Don't forget to update vc4_bo.c: bo_type_names[] when adding to

--
2.7.4

[PATCH] libfs: fix error cast of negative value in simple_attr_write()

2020-09-25 Thread Yicong Yang

The attr->set() receive a value of u64, but we use simple_strtoll()
for doing the conversion. It will lead to the error cast if user inputs
a negative value.

Use kstrtoull() instead to resolve this issue, -EINVAL will be returned
if a negative value is input.

Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang 
---
 fs/libfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index e0d42e9..803c439 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -975,7 +975,9 @@ ssize_t simple_attr_write(struct file *file, const char 
__user *buf,
goto out;

attr->set_buf[size] = '\0';
-   val = simple_strtoll(attr->set_buf, NULL, 0);
+   ret = kstrtoull(attr->set_buf, 0, &val);
+   if (ret)
+   goto out;
ret = attr->set(attr->data, val);
if (ret == 0)
ret = len; /* on success, claim we got the whole input */
--
2.8.1

Re: [PATCH v2 0/5] PCI: dwc: improve msi handling

2020-09-25 Thread Jisheng Zhang

Hi Jon,

On Fri, 25 Sep 2020 09:53:45 +0100 Jon Hunter wrote:


> 
> On 24/09/2020 12:05, Jisheng Zhang wrote:
> > Improve the msi code:
> > 1. Add proper error handling.
> > 2. Move dw_pcie_msi_init() from each users to designware host to solve
> > msi page leakage in resume path.  
> 
> Apologies if this is slightly off topic, but I have been meaning to ask
> about MSIs and PCI. On Tegra194 which uses the DWC PCI driver, whenever we
> hotplug CPUs we see the following warnings ...
> 
>  [  79.068351] WARNING KERN IRQ70: set affinity failed(-22).
>  [  79.068362] WARNING KERN IRQ71: set affinity failed(-22).
> 
> These interrupts are the MSIs ...
> 
> 70:  0  0  0  0  0  0 
>  0  0   PCI-MSI 134217728 Edge  PCIe PME, aerdrv
> 71:  0  0  0  0  0  0 
>  0  0   PCI-MSI 134742016 Edge  ahci[0001:01:00.0]
> 
> This caused because ...
> 
>  static int dw_pci_msi_set_affinity(struct irq_data *d,
> const struct cpumask *mask, bool force)
>  {
>  return -EINVAL;
>  }
> 
> Now the above is not unique to the DWC PCI host driver, it appears that
> most PCIe drivers also do the same. However, I am curious if there is
> any way to avoid the above warnings given that setting the affinity does
> not appear to be supported in anyway AFAICT.
> 


Could you please try below patch?


diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index bf25d783b5c5..7e5dc54d060e 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -197,7 +197,6 @@ static struct irq_chip dw_pci_msi_bottom_irq_chip = {
.name = "DWPCI-MSI",
.irq_ack = dw_pci_bottom_ack,
.irq_compose_msi_msg = dw_pci_setup_msi_msg,
-   .irq_set_affinity = dw_pci_msi_set_affinity,
.irq_mask = dw_pci_bottom_mask,
.irq_unmask = dw_pci_bottom_unmask,
 };

Re: [PATCH] x86/irq: Use printk_deferred() on raw_spin_lock() protected sections

2020-09-25 Thread Peter Zijlstra

On Thu, Sep 24, 2020 at 09:08:57AM -0700, Joe Perches wrote:
> On Thu, 2020-09-24 at 12:28 +0200, Peter Zijlstra wrote:
> > On Mon, Sep 21, 2020 at 06:22:12PM +0200, Daniel Bristot de Oliveira wrote:
> > > While testing hotplug I got this BUG:
> > > It was caused by printk() inside a code section protected by a
> > > raw_spin_lock() that ended up calling a serial console that
> > > uses a regular spin_lock().
> > > 
> > > Use the printk_deferred() to avoid calling the serial console
> > > in a raw_spin_lock() protected section.
> > 
> > I consider printk_deferred() to be a bug, can't we just wait for the new
> > printk implementation to land so we don't need all this nonsense?
> 
> It will be good to do a sed fixup for all these
> printk_deferred uses soon,  but in the meantime
> isn't it useful to avoid BUGs?

The kernel isn't PROVE_RAW_LOCK_NESTING clean anyway. Printk is one of
the biggest offenders there anyway, iirc.

Re: [PATCH 2/3] PCI: dwc: Add common iATU register support

2020-09-25 Thread Kunihiko Hayashi


Hi Rob,

On 2020/09/24 0:57, Rob Herring wrote:

On Fri, Sep 11, 2020 at 05:50:02PM +0900, Kunihiko Hayashi wrote:

This gets iATU register area from reg property that has reg-names "atu".
In Synopsys DWC version 4.80 or later, since iATU register area is
separated from core register area, this area is necessary to get from
DT independently.

Cc: Murali Karicheri 
Cc: Jingoo Han 
Cc: Gustavo Pimentel 
Suggested-by: Rob Herring 
Signed-off-by: Kunihiko Hayashi 
---
  drivers/pci/controller/dwc/pcie-designware.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware.c 
b/drivers/pci/controller/dwc/pcie-designware.c
index 4d105ef..4a360bc 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -10,6 +10,7 @@
  
  #include 

  #include 
+#include 
  #include 
  
  #include "../../pci.h"

@@ -526,11 +527,16 @@ void dw_pcie_setup(struct dw_pcie *pci)
u32 val;
struct device *dev = pci->dev;
struct device_node *np = dev->of_node;
+   struct platform_device *pdev;
  
  	if (pci->version >= 0x480A || (!pci->version &&

   dw_pcie_iatu_unroll_enabled(pci))) {
pci->iatu_unroll_enabled = true;
-   if (!pci->atu_base)
+   pdev = of_find_device_by_node(np);


Use to_platform_device(dev) instead. Put that at the beginning as I'm
going to move 'dbi' in here too.


Okay, I'll rewrite it with to_platform_device(dev).
Should I refer somewhere to rebase to your change?

Thank you,

---
Best Regards
Kunihiko Hayashi

Re: [RFC PATCH v2] sched/fair: select idle cpu from idle cpumask in sched domain

2020-09-25 Thread Li, Aubrey

Hi Vicent,

On 2020/9/24 21:09, Vincent Guittot wrote:

 Would you mind share uperf(netperf load) result on your side? That's the
 workload I have seen the most benefit this patch contributed under heavy
 load level.
>>>
>>> with uperf, i've got the same kind of result as sched pipe
>>> tip/sched/core: Throughput 24.83Mb/s (+/- 0.09%)
>>> with this patch:  Throughput 19.02Mb/s (+/- 0.71%) which is a 23%
>>> regression as for sched pipe
>>>
>> In case this is caused by the logic error in this patch(sorry again), did
>> you see any improvement in patch V2? Though it does not helps for nohz=off
>> case, just want to know if it helps or does not help at all on arm platform.
> 
> With the v2 which rate limit the update of the cpumask (but doesn't
> support sched_idle stask),  I don't see any performance impact:

I agree we should go the way with cpumask update rate limited.

And I think no performance impact for sched-pipe is expected, as this workload
has only 2 threads and the platform has 8 cores, so mostly previous cpu is
returned, and even if select_idle_sibling is called, select_idle_core is hit
and rarely call select_idle_cpu.

But I'm more curious why there is 23% performance penalty? So for this patch, if
you revert this change but keep cpumask updated, is 23% penalty still there?

-   cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+   cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);

I just wonder if it's caused by the atomic ops as you have two cache domains 
with
sd_llc(?). Do you have a x86 machine to make a comparison? It's hard for me to 
find
an ARM machine but I'll try.

Also, for uperf(task thread num = cpu num) workload, how is it on patch v2? no 
any
performance impact?

Thanks,
-Aubrey

[PATCH] MAINTAINERS: Add me as a reviewer for PCI aardvark

2020-09-25 Thread Pali Rohár

Signed-off-by: Pali Rohár 
---
I have provided more fixes to this driver, I have needed functional
specification for this PCI controller and also hardware for testing
and developing (Espressobin V5 and Turris MOX B and G modules).

Thomas already wrote [1] that is less involved in this driver, so I
can help with reviewing/maintaining it.

[1] - https://lore.kernel.org/linux-pci/20200513135643.478ff...@windsurf.home/
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d746519253c3..e245dbe280ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13185,6 +13185,7 @@ F:  drivers/firmware/pcdp.*
 
 PCI DRIVER FOR AARDVARK (Marvell Armada 3700)
 M: Thomas Petazzoni 
+R: Pali Rohár 
 L: linux-...@vger.kernel.org
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 S: Maintained
-- 
2.20.1

Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use

2020-09-25 Thread Halil Pasic

On Fri, 21 Aug 2020 15:56:04 -0400
Tony Krowiak  wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The intent of
> this callback is to provide a driver with the means to prevent a root user
> from inadvertently taking a queue away from a matrix mdev and giving it to
> the host while it is assigned to the matrix mdev. The callback will
> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters, domains and
> control domains assigned to the matrix mdev). This will enforce the proper
> procedure for removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 
> Reported-by: kernel test robot 
> ---
>  drivers/s390/crypto/ap_bus.c | 148 ---
>  drivers/s390/crypto/ap_bus.h |   4 +
>  2 files changed, 142 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 24a1940b829e..db27bd931308 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "ap_bus.h"
>  #include "ap_debug.h"
> @@ -889,6 +890,23 @@ static int modify_bitmap(const char *str, unsigned long 
> *bitmap, int bits)
>   return 0;
>  }
>  
> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int 
> bits,
> +unsigned long *newmap)
> +{
> + unsigned long size;
> + int rc;
> +
> + size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
> + if (*str == '+' || *str == '-') {
> + memcpy(newmap, bitmap, size);
> + rc = modify_bitmap(str, newmap, bits);
> + } else {
> + memset(newmap, 0, size);
> + rc = hex2bitmap(str, newmap, bits);
> + }
> + return rc;
> +}
> +
>  int ap_parse_mask_str(const char *str,
> unsigned long *bitmap, int bits,
> struct mutex *lock)
> @@ -908,14 +926,7 @@ int ap_parse_mask_str(const char *str,
>   kfree(newmap);
>   return -ERESTARTSYS;
>   }
> -
> - if (*str == '+' || *str == '-') {
> - memcpy(newmap, bitmap, size);
> - rc = modify_bitmap(str, newmap, bits);
> - } else {
> - memset(newmap, 0, size);
> - rc = hex2bitmap(str, newmap, bits);
> - }
> + rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
>   if (rc == 0)
>   memcpy(bitmap, newmap, size);
>   mutex_unlock(lock);
> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char 
> *buf)
>   return rc;
>  }
>  
> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> + int rc = 0;
> + struct ap_driver *ap_drv = to_ap_drv(drv);
> + unsigned long *newapm = (unsigned long *)data;
> +
> + /*
> +  * No need to verify whether the driver is using the queues if it is the
> +  * default driver.
> +  */
> + if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> + return 0;
> +
> + /* The non-default driver's module must be loaded */
> + if (!try_module_get(drv->owner))
> + return 0;
> +
> + if (ap_drv->in_use)
> + if (ap_drv->in_use(newapm, ap_perms.aqm))
> + rc = -EADDRINUSE;
> +
> + module_put(drv->owner);
> +
> + return rc;
> +}
> +
> +static int apmask_commit(unsigned long *newapm)
> +{
> + int rc;
> + unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
> +
> + /*
> +  * Check if any bits in the apmask have been set which will
> +  * result in queues being removed from non-default drivers
> +  */
> + if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
> + rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> +   __verify_card_reservations);
> + if (rc)
> + return rc;
> + }

I understand the above asks all the non-default drivers if some of the
queues are 'used'. But AFAIU this reflects the truth ap_drv->in_use()
is only telling us something about a given moment...

> +
> +

Re: [PATCH v3 3/6] spi: spi-mtk-nor: support 7 bytes transfer of generic spi

2020-09-25 Thread Ikjoon Jang

On Fri, Sep 25, 2020 at 3:47 PM Chuanhong Guo  wrote:
>
> Hi!
>
> On Fri, Sep 25, 2020 at 2:55 PM Ikjoon Jang  wrote:
> >
> > When mtk-nor fallbacks to generic spi transfers, it can actually
> > transfer up to 7 bytes.
>
> generic transfer_one_message should support full-duplex transfers,
> not transfers with special format requirements. (e.g. here the last
> byte is rx only.) These transfers with format requirements should
> be implemented with spi-mem interface instead.

yep, that's correct.

>
> >
> > This patch fixes adjust_op_size() and supports_op() to explicitly
> > check 7 bytes range and also fixes possible under/overflow conditions
> > in register offsets calculation.
> >
> > Signed-off-by: Ikjoon Jang 
>
> I was notified by Bayi about your discussion and sent some
> patches yesterday for the same purpose. Whoops...
> As transfer_one_message isn't the proper place to implement
> this, maybe we could work on my version instead?
>

I didn't noticed that before,
Sure, please go ahead, I'll follow up with your patch in v4.

> > ---
> >
> > (no changes since v1)
>
> This should be "new patch" not "no changes" :P

oops, it seems my script did something wrong.

>
>
> >
> >  drivers/spi/spi-mtk-nor.c | 102 --
> >  1 file changed, 76 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/spi/spi-mtk-nor.c b/drivers/spi/spi-mtk-nor.c
> > index 0f7d4ec68730..e7719d249095 100644
> > --- a/drivers/spi/spi-mtk-nor.c
> > +++ b/drivers/spi/spi-mtk-nor.c
> > @@ -79,7 +79,11 @@
> >  #define MTK_NOR_REG_DMA_DADR   0x720
> >  #define MTK_NOR_REG_DMA_END_DADR   0x724
> >
> > +/* maximum bytes of TX in PRG mode */
> >  #define MTK_NOR_PRG_MAX_SIZE   6
> > +/* maximum bytes of TX + RX is 7, last 1 byte is always being sent as zero 
> > */
> > +#define MTK_NOR_PRG_MAX_CYCLES 7
> > +
> >  // Reading DMA src/dst addresses have to be 16-byte aligned
> >  #define MTK_NOR_DMA_ALIGN  16
> >  #define MTK_NOR_DMA_ALIGN_MASK (MTK_NOR_DMA_ALIGN - 1)
> > @@ -167,6 +171,24 @@ static bool mtk_nor_match_read(const struct spi_mem_op 
> > *op)
> > return false;
> >  }
> >
> > +static bool mtk_nor_check_prg(const struct spi_mem_op *op)
> > +{
> > +   size_t len = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes;
> > +
> > +   if (len > MTK_NOR_PRG_MAX_SIZE)
> > +   return false;
> > +
> > +   if (!op->data.nbytes)
> > +   return true;
> > +
> > +   if (op->data.dir == SPI_MEM_DATA_OUT)
> > +   return ((len + op->data.nbytes) <= MTK_NOR_PRG_MAX_SIZE);
> > +   else if (op->data.dir == SPI_MEM_DATA_IN)
> > +   return ((len + op->data.nbytes) <= MTK_NOR_PRG_MAX_CYCLES);
> > +   else
> > +   return true;
> > +}
> > +
> >  static int mtk_nor_adjust_op_size(struct spi_mem *mem, struct spi_mem_op 
> > *op)
> >  {
> > size_t len;
> > @@ -195,10 +217,22 @@ static int mtk_nor_adjust_op_size(struct spi_mem 
> > *mem, struct spi_mem_op *op)
> > }
> > }
> >
> > -   len = MTK_NOR_PRG_MAX_SIZE - op->cmd.nbytes - op->addr.nbytes -
> > - op->dummy.nbytes;
> > -   if (op->data.nbytes > len)
> > -   op->data.nbytes = len;
> > +   if (mtk_nor_check_prg(op))
> > +   return 0;
> > +
> > +   len = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes;
> > +
> > +   if (op->data.dir == SPI_MEM_DATA_OUT) {
> > +   if (len == MTK_NOR_PRG_MAX_SIZE)
> > +   return -EINVAL;
> > +   op->data.nbytes = min_t(unsigned int, op->data.nbytes,
> > +   MTK_NOR_PRG_MAX_SIZE - len);
> > +   } else  {
> > +   if (len == MTK_NOR_PRG_MAX_CYCLES)
> > +   return -EINVAL;
> > +   op->data.nbytes = min_t(unsigned int, op->data.nbytes,
> > +   MTK_NOR_PRG_MAX_CYCLES - len);
> > +   }
> >
> > return 0;
> >  }
> > @@ -206,8 +240,6 @@ static int mtk_nor_adjust_op_size(struct spi_mem *mem, 
> > struct spi_mem_op *op)
> >  static bool mtk_nor_supports_op(struct spi_mem *mem,
> > const struct spi_mem_op *op)
> >  {
> > -   size_t len;
> > -
> > if (op->cmd.buswidth != 1)
> > return false;
> >
> > @@ -223,12 +255,11 @@ static bool mtk_nor_supports_op(struct spi_mem *mem,
> >(op->data.buswidth == 1);
> > }
> >
> > -   len = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes;
> > -   if ((len > MTK_NOR_PRG_MAX_SIZE) ||
> > -   ((op->data.nbytes) && (len == MTK_NOR_PRG_MAX_SIZE)))
> > +   /* fallback to generic spi xfer */
> > +   if (op->cmd.buswidth > 1 || op->addr.buswidth > 1 || 
> > op->data.buswidth > 1)
> > return false;
>
> Rejecting an op in supports_op doesn't tell it to fall back to generic
> spi transfer.
>

Re: [PATCH v2 0/5] PCI: dwc: improve msi handling

2020-09-25 Thread Jisheng Zhang

On Fri, 25 Sep 2020 17:17:12 +0800
Jisheng Zhang  wrote:

> CAUTION: Email originated externally, do not click links or open attachments 
> unless you recognize the sender and know the content is safe.
> 
> 
> Hi Jon,
> 
> On Fri, 25 Sep 2020 09:53:45 +0100 Jon Hunter wrote:
> 
> 
> >
> > On 24/09/2020 12:05, Jisheng Zhang wrote:  
> > > Improve the msi code:
> > > 1. Add proper error handling.
> > > 2. Move dw_pcie_msi_init() from each users to designware host to solve
> > > msi page leakage in resume path.  
> >
> > Apologies if this is slightly off topic, but I have been meaning to ask
> > about MSIs and PCI. On Tegra194 which uses the DWC PCI driver, whenever we
> > hotplug CPUs we see the following warnings ...
> >
> >  [  79.068351] WARNING KERN IRQ70: set affinity failed(-22).
> >  [  79.068362] WARNING KERN IRQ71: set affinity failed(-22).
> >
> > These interrupts are the MSIs ...
> >
> > 70:  0  0  0  0  0  0   
> >0  0   PCI-MSI 134217728 Edge  PCIe PME, aerdrv
> > 71:  0  0  0  0  0  0   
> >0  0   PCI-MSI 134742016 Edge  ahci[0001:01:00.0]
> >
> > This caused because ...
> >
> >  static int dw_pci_msi_set_affinity(struct irq_data *d,
> > const struct cpumask *mask, bool force)
> >  {
> >  return -EINVAL;
> >  }
> >
> > Now the above is not unique to the DWC PCI host driver, it appears that
> > most PCIe drivers also do the same. However, I am curious if there is
> > any way to avoid the above warnings given that setting the affinity does
> > not appear to be supported in anyway AFAICT.
> >  
> 
> 
> Could you please try below patch?
> 
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
> b/drivers/pci/controller/dwc/pcie-designware-host.c
> index bf25d783b5c5..7e5dc54d060e 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -197,7 +197,6 @@ static struct irq_chip dw_pci_msi_bottom_irq_chip = {
> .name = "DWPCI-MSI",
> .irq_ack = dw_pci_bottom_ack,
> .irq_compose_msi_msg = dw_pci_setup_msi_msg,
> -   .irq_set_affinity = dw_pci_msi_set_affinity,
> .irq_mask = dw_pci_bottom_mask,
> .irq_unmask = dw_pci_bottom_unmask,
>  };

A complete patch w/o compiler warning:

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index bf25d783b5c5..18f719cfed0b 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -137,12 +137,6 @@ static void dw_pci_setup_msi_msg(struct irq_data *d, 
struct msi_msg *msg)
(int)d->hwirq, msg->address_hi, msg->address_lo);
 }
 
-static int dw_pci_msi_set_affinity(struct irq_data *d,
-  const struct cpumask *mask, bool force)
-{
-   return -EINVAL;
-}
-
 static void dw_pci_bottom_mask(struct irq_data *d)
 {
struct pcie_port *pp = irq_data_get_irq_chip_data(d);
@@ -197,7 +191,6 @@ static struct irq_chip dw_pci_msi_bottom_irq_chip = {
.name = "DWPCI-MSI",
.irq_ack = dw_pci_bottom_ack,
.irq_compose_msi_msg = dw_pci_setup_msi_msg,
-   .irq_set_affinity = dw_pci_msi_set_affinity,
.irq_mask = dw_pci_bottom_mask,
.irq_unmask = dw_pci_bottom_unmask,
 };

Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path

2020-09-25 Thread Greg KH

On Thu, Sep 24, 2020 at 08:21:07PM +0200, Thomas Gleixner wrote:
> On Thu, Sep 24 2020 at 08:33, Greg KH wrote:
> > On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
> >> +config CONSOLE_FLUSH_ON_HOTPLUG
> >> +  bool "Enable console flush configurable in hot plug code path"
> >> +  depends on HOTPLUG_CPU
> >> +  def_bool n
> >
> > n is the default, no need to list it.
> >
> >> +  help
> >> +  In cpu hot plug path console lock acquire and release causes the
> >> +  console to flush. If console lock is not free hot plug latency
> >> +  increases. So make console flush configurable in hot plug path
> >> +  and default disabled to help in cpu hot plug latencies.
> >
> > Why would you not want this option?
> >
> > Why isn't this just a bugfix?
> 
> Because it's the normal behaviour of console lock and there are
> gazillion other ways to delay stuff in the hotplug path.
> 
> CPU hotplug is not meant to be a high speed operation and if people
> think they need it to be fast then its pretty much guaranteed that they
> want it for the completely wrong reasons.

Odds are, it's the big/little systems that are trying to use cpu hotplug
for this type of thing :(

> This #ifdef tinkering is just digusting especially as it just tackles an
> obvious way how to delay timer migration, but does not address the
> underlying root cause.

Agreed, thanks for putting it better than I did.

greg k-h

[PATCH 0/2] regmap: add support to regmap_field_bulk_alloc/free

2020-09-25 Thread Srinivas Kandagatla

Usage of regmap_field_alloc becomes much overhead when number of fields
exceed more than 3. Most of driver seems to totally covered up with these
allocs/free making to very hard to read the code! On such driver is QCOM LPASS
driver has extensively converted to use regmap_fileds.

This patchset add this new api and a user of it.

Using new bluk api to allocate fields makes it much more cleaner code to read!

Srinivas Kandagatla (2):
  regmap: add support to regmap_field_bulk_alloc/free apis
  ASoC: lpass-platform: use devm_regmap_field_bulk_alloc

 drivers/base/regmap/regmap.c| 100 
 include/linux/regmap.h  |  11 
 sound/soc/qcom/lpass-platform.c |  31 +++---
 3 files changed, 118 insertions(+), 24 deletions(-)

-- 
2.21.0

[PATCH 1/2] regmap: add support to regmap_field_bulk_alloc/free apis

2020-09-25 Thread Srinivas Kandagatla

Usage of regmap_field_alloc becomes much overhead when number of fields
exceed more than 3.
QCOM LPASS driver has extensively converted to use regmap_fileds.

Using new bluk api to allocate fields makes it much more cleaner code to read!

Signed-off-by: Srinivas Kandagatla 
Tested-by: Srinivasa Rao Mandadapu 
---
 drivers/base/regmap/regmap.c | 100 +++
 include/linux/regmap.h   |  11 
 2 files changed, 111 insertions(+)

diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c
index aec3f26bf711..271740a163ad 100644
--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -1270,6 +1270,106 @@ struct regmap_field *devm_regmap_field_alloc(struct 
device *dev,
 }
 EXPORT_SYMBOL_GPL(devm_regmap_field_alloc);
 
+
+/**
+ * regmap_field_bulk_alloc() - Allocate and initialise a bulk register field.
+ *
+ * @regmap: regmap bank in which this register field is located.
+ * @rm_field: regmap register fileds with in the bank.
+ * @reg_field: Register fields with in the bank.
+ * @num_fields: Number of register fileds.
+ *
+ * The return value will be an -ENOMEM on error or zero for success.
+ * Newly allocated regmap_fields should be freed by calling
+ * regmap_field_bulk_free()
+ */
+int regmap_field_bulk_alloc(struct regmap *regmap,
+   struct regmap_field **rm_field,
+   struct reg_field *reg_field,
+   int num_fields)
+{
+   struct regmap_field *rf;
+   int i;
+
+   rf = kcalloc(num_fields, sizeof(*rf), GFP_KERNEL);
+   if (!rf)
+   return -ENOMEM;
+
+   for (i = 0; i < num_fields; i++) {
+   regmap_field_init(&rf[i], regmap, reg_field[i]);
+   rm_field[i] = &rf[i];
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(regmap_field_bulk_alloc);
+
+/**
+ * devm_regmap_field_bulk_alloc() - Allocate and initialise a bulk register
+ * fields.
+ *
+ * @dev: Device that will be interacted with
+ * @regmap: regmap bank in which this register field is located.
+ * @rm_field: regmap register fileds with in the bank.
+ * @reg_field: Register fields with in the bank.
+ * @num_fields: Number of register fileds.
+ *
+ * The return value will be an -ENOMEM on error or zero for success.
+ * Newly allocated regmap_fields will be automatically freed by the
+ * device management code.
+ */
+int devm_regmap_field_bulk_alloc(struct device *dev,
+struct regmap *regmap,
+struct regmap_field **rm_field,
+struct reg_field *reg_field,
+int num_fields)
+{
+   struct regmap_field *rf;
+   int i;
+
+   rf = devm_kcalloc(dev, num_fields, sizeof(*rf), GFP_KERNEL);
+   if (!rf)
+   return -ENOMEM;
+
+   for (i = 0; i < num_fields; i++) {
+   regmap_field_init(&rf[i], regmap, reg_field[i]);
+   rm_field[i] = &rf[i];
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(devm_regmap_field_bulk_alloc);
+
+/**
+ * regmap_field_bulk_free() - Free register field allocated using
+ *   regmap_field_bulk_alloc.
+ *
+ * @field: regmap fields which should be freed.
+ */
+void regmap_field_bulk_free(struct regmap_field *field)
+{
+   kfree(field);
+}
+EXPORT_SYMBOL_GPL(regmap_field_bulk_free);
+
+/**
+ * devm_regmap_field_bulk_free() - Free a bulk register field allocated using
+ *devm_regmap_field_bulk_alloc.
+ *
+ * @dev: Device that will be interacted with
+ * @field: regmap field which should be freed.
+ *
+ * Free register field allocated using devm_regmap_field_bulk_alloc(). Usually
+ * drivers need not call this function, as the memory allocated via devm
+ * will be freed as per device-driver life-cyle.
+ */
+void devm_regmap_field_bulk_free(struct device *dev,
+struct regmap_field *field)
+{
+   devm_kfree(dev, field);
+}
+EXPORT_SYMBOL_GPL(devm_regmap_field_bulk_free);
+
 /**
  * devm_regmap_field_free() - Free a register field allocated using
  *devm_regmap_field_alloc.
diff --git a/include/linux/regmap.h b/include/linux/regmap.h
index 0c49d59168b5..a35ec0a0d6e0 100644
--- a/include/linux/regmap.h
+++ b/include/linux/regmap.h
@@ -1189,6 +1189,17 @@ struct regmap_field *devm_regmap_field_alloc(struct 
device *dev,
struct regmap *regmap, struct reg_field reg_field);
 void devm_regmap_field_free(struct device *dev,struct regmap_field 
*field);
 
+int regmap_field_bulk_alloc(struct regmap *regmap,
+struct regmap_field **rm_field,
+struct reg_field *reg_field,
+int num_fields);
+void regmap_field_bulk_free(struct regmap_field *field);
+int devm_regmap_field_bulk_alloc(struct device *dev, struct regmap *regmap,
+struct regmap_fie

[PATCH 2/2] ASoC: lpass-platform: use devm_regmap_field_bulk_alloc

2020-09-25 Thread Srinivas Kandagatla

use new devm_regmap_field_bulk_alloc to allocate fields as
it make the code more readable!

Signed-off-by: Srinivas Kandagatla 
Tested-by: Srinivasa Rao Mandadapu 
---
 sound/soc/qcom/lpass-platform.c | 31 +++
 1 file changed, 7 insertions(+), 24 deletions(-)

diff --git a/sound/soc/qcom/lpass-platform.c b/sound/soc/qcom/lpass-platform.c
index df692ed95503..7ac26290082f 100644
--- a/sound/soc/qcom/lpass-platform.c
+++ b/sound/soc/qcom/lpass-platform.c
@@ -56,6 +56,7 @@ static int lpass_platform_alloc_dmactl_fields(struct device 
*dev,
struct lpass_data *drvdata = dev_get_drvdata(dev);
struct lpass_variant *v = drvdata->variant;
struct lpaif_dmactl *rd_dmactl, *wr_dmactl;
+   int rval;
 
drvdata->rd_dmactl = devm_kzalloc(dev, sizeof(struct lpaif_dmactl),
  GFP_KERNEL);
@@ -70,31 +71,13 @@ static int lpass_platform_alloc_dmactl_fields(struct device 
*dev,
rd_dmactl = drvdata->rd_dmactl;
wr_dmactl = drvdata->wr_dmactl;
 
-   rd_dmactl->bursten = devm_regmap_field_alloc(dev, map, v->rdma_bursten);
-   rd_dmactl->wpscnt = devm_regmap_field_alloc(dev, map, v->rdma_wpscnt);
-   rd_dmactl->fifowm = devm_regmap_field_alloc(dev, map, v->rdma_fifowm);
-   rd_dmactl->intf = devm_regmap_field_alloc(dev, map, v->rdma_intf);
-   rd_dmactl->enable = devm_regmap_field_alloc(dev, map, v->rdma_enable);
-   rd_dmactl->dyncclk = devm_regmap_field_alloc(dev, map, v->rdma_dyncclk);
+   rval = devm_regmap_field_bulk_alloc(dev, map, &rd_dmactl->bursten,
+   &v->rdma_bursten, 6);
+   if (rval)
+   return rval;
 
-   if (IS_ERR(rd_dmactl->bursten) || IS_ERR(rd_dmactl->wpscnt) ||
-   IS_ERR(rd_dmactl->fifowm) || IS_ERR(rd_dmactl->intf) ||
-   IS_ERR(rd_dmactl->enable) || IS_ERR(rd_dmactl->dyncclk))
-   return -EINVAL;
-
-   wr_dmactl->bursten = devm_regmap_field_alloc(dev, map, 
v->wrdma_bursten);
-   wr_dmactl->wpscnt = devm_regmap_field_alloc(dev, map, v->wrdma_wpscnt);
-   wr_dmactl->fifowm = devm_regmap_field_alloc(dev, map, v->wrdma_fifowm);
-   wr_dmactl->intf = devm_regmap_field_alloc(dev, map, v->wrdma_intf);
-   wr_dmactl->enable = devm_regmap_field_alloc(dev, map, v->wrdma_enable);
-   wr_dmactl->dyncclk = devm_regmap_field_alloc(dev, map, 
v->wrdma_dyncclk);
-
-   if (IS_ERR(wr_dmactl->bursten) || IS_ERR(wr_dmactl->wpscnt) ||
-   IS_ERR(wr_dmactl->fifowm) || IS_ERR(wr_dmactl->intf) ||
-   IS_ERR(wr_dmactl->enable) || IS_ERR(wr_dmactl->dyncclk))
-   return -EINVAL;
-
-   return 0;
+   return devm_regmap_field_bulk_alloc(dev, map, &wr_dmactl->bursten,
+   &v->wrdma_bursten, 6);
 }
 
 static int lpass_platform_pcmops_open(struct snd_soc_component *component,
-- 
2.21.0

Re: [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-09-25 Thread Halil Pasic

On Fri, 21 Aug 2020 15:56:05 -0400
Tony Krowiak  wrote:

> +
> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> +{
> + bool in_use;
> +
> + mutex_lock(&matrix_dev->lock);
> + in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
> + mutex_unlock(&matrix_dev->lock);

See also my comment for patch 4. AFAIU as soon as you release the lock
the in_use may become outdated in any moment.

> +
> + return in_use;
> +}

Re: [PATCH v2] kernel/kthread.c: kthread_worker: add work status check in timer_fn

2020-09-25 Thread Petr Mladek

On Fri 2020-09-25 13:07:59, qiang.zh...@windriver.com wrote:
> From: Zqiang 
> 
> When queue delayed work to worker, at some point after that the timer_fn
> will be call, add work to worker's work_list, at this time, the work may
> be cancel, so add "work->canceling" check current work status.

Great catch!

I was able to understand the problem from the description. Though I
would still try to improve it a bit. I suggest:

Subject: kthread_worker: Prevent queuing delayed work from timer_fn when it is 
being canceled

There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:

CPU0CPU1

kthread_cancel_delayed_work_sync()
  __kthread_cancel_work_sync()
__kthread_cancel_work()
work->canceling++;

kthread_delayed_work_timer_fn()
  kthread_insert_work();

BUG: kthread_insert_work() should not get called when work->canceling
is set.

> Signed-off-by: Zqiang 

With the above subject and commit message:

Reviewed-by: Petr Mladek 

Best Regards,
Petr

Re: [PATCH v3 3/7] mfd: Add base driver for Netronix embedded controller

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 10:26 PM Jonathan Neuschäfer
 wrote:
>
> The Netronix embedded controller is a microcontroller found in some
> e-book readers designed by the ODM Netronix, Inc. It contains RTC,
> battery monitoring, system power management, and PWM functionality.
>
> This driver implements register access and version detection.
>
> Third-party hardware documentation is available at:
>
>   
> https://github.com/neuschaefer/linux/wiki/Netronix-MSP430-embedded-controller
>
> The EC supports interrupts, but the driver doesn't make use of them so
> far.

...

> +#include 

This usually goes after linux/*.h
(and actually not visible how it's being used, but see below first)

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

...

> +static void ntxec_poweroff(void)
> +{
> +   int res;
> +   u8 buf[] = {
> +   NTXEC_REG_POWEROFF,

> +   (NTXEC_POWEROFF_VALUE >> 8) & 0xff,
> +   NTXEC_POWEROFF_VALUE & 0xff,

'& 0xff' parts are redundant. *u8 implies that. Fix in all cases.
Also I would rather see something like

  buf[0] = _POWEROFF;
  put_unaligned_be16(_VALUE, &buf[1]);

to explicitly show the endianess of the register values.

> +   };
> +   struct i2c_msg msgs[] = {
> +   {
> +   .addr = poweroff_restart_client->addr,
> +   .flags = 0,
> +   .len = sizeof(buf),

> +   .buf = buf

It's slightly better to keep trailing commas in cases like this.

> +   }
> +   };
> +
> +   res = i2c_transfer(poweroff_restart_client->adapter, msgs, 
> ARRAY_SIZE(msgs));
> +   if (res < 0)

> +   dev_alert(&poweroff_restart_client->dev,
> + "Failed to power off (err = %d)\n", res);

alert? This needs to be explained.

> +   /*
> +* The time from the register write until the host CPU is powered off
> +* has been observed to be about 2.5 to 3 seconds. Sleep long enough 
> to
> +* safely avoid returning from the poweroff handler.
> +*/
> +   msleep(5000);
> +}
> +
> +static int ntxec_restart(struct notifier_block *nb,
> +unsigned long action, void *data)
> +{
> +   int res;
> +   /*
> +* NOTE: The lower half of the reset value is not sent, because 
> sending
> +* it causes an error

Why? Any root cause? Perhaps you need to send 0x ?

> +*/
> +   u8 buf[] = {
> +   NTXEC_REG_RESET,

> +   (NTXEC_RESET_VALUE >> 8) & 0xff,

Here you may still use put_unaligned_be16() but move the comment to be
before len hardcoded to sizeof(buf) - 1.

> +   };
> +   struct i2c_msg msgs[] = {
> +   {
> +   .addr = poweroff_restart_client->addr,
> +   .flags = 0,
> +   .len = sizeof(buf),
> +   .buf = buf
> +   }
> +   };
> +
> +   res = i2c_transfer(poweroff_restart_client->adapter, msgs, 
> ARRAY_SIZE(msgs));
> +   if (res < 0)
> +   dev_alert(&poweroff_restart_client->dev,
> + "Failed to restart (err = %d)\n", res);
> +
> +   return NOTIFY_DONE;
> +}

...

> +static int ntxec_probe(struct i2c_client *client)
> +{
> +   struct ntxec *ec;
> +   unsigned int version;
> +   int res;
> +
> +   ec = devm_kmalloc(&client->dev, sizeof(*ec), GFP_KERNEL);
> +   if (!ec)
> +   return -ENOMEM;
> +
> +   ec->dev = &client->dev;
> +
> +   ec->regmap = devm_regmap_init_i2c(client, ®map_config);
> +   if (IS_ERR(ec->regmap)) {
> +   dev_err(ec->dev, "Failed to set up regmap for device\n");
> +   return res;
> +   }
> +
> +   /* Determine the firmware version */
> +   res = regmap_read(ec->regmap, NTXEC_REG_VERSION, &version);
> +   if (res < 0) {
> +   dev_err(ec->dev, "Failed to read firmware version number\n");
> +   return res;
> +   }

> +   dev_info(ec->dev,
> +"Netronix embedded controller version %04x detected.\n",
> +version);

This info level may confuse users if followed by an error path.

> +   /* Bail out if we encounter an unknown firmware version */
> +   switch (version) {
> +   case 0xd726: /* found in Kobo Aura */
> +   break;
> +   default:
> +   return -ENODEV;
> +   }
> +
> +   if (of_device_is_system_power_controller(ec->dev->of_node)) {
> +   /*
> +* Set the 'powerkeep' bit. This is necessary on some boards
> +* in order to keep the system running.
> +*/
> +   res = regmap_write(ec->regmap, NTXEC_REG_POWERKEEP,
> +  NTXEC_POWERKEEP_VALUE);
> +   if (res < 0)
> +

Re: [PATCH 1/2] arm64: dts: ls1028a: add missing CAN nodes

2020-09-25 Thread Michael Walle


Am 2020-09-24 17:53, schrieb Leo Li:

-Original Message-
From: Michael Walle 
Sent: Thursday, September 24, 2020 6:31 AM
To: Leo Li 
Cc: linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org; 
linux-

ker...@vger.kernel.org; linux-...@vger.kernel.org; Shawn Guo
; Rob Herring ; Marc Kleine-
Budde ; Joakim Zhang 
Subject: Re: [PATCH 1/2] arm64: dts: ls1028a: add missing CAN nodes

Am 2020-09-24 02:35, schrieb Leo Li:
>> -Original Message-
>> From: Michael Walle 
>> Sent: Wednesday, September 23, 2020 4:57 AM
>> To: linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org;
>> linux-
>> ker...@vger.kernel.org; linux-...@vger.kernel.org
>> Cc: Shawn Guo ; Leo Li ;
Rob
>> Herring ; Marc Kleine-Budde
;
>> Joakim Zhang ; Michael Walle
>> 
>> Subject: [PATCH 1/2] arm64: dts: ls1028a: add missing CAN nodes
>>
>> The LS1028A has two FlexCAN controller. These are compatible with the
>> ones
>> from the LX2160A. Add the nodes.
>>
>> The first controller was tested on the Kontron sl28 board.
>>
>> Signed-off-by: Michael Walle 
>> ---
>>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 18
>> ++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
>> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
>> index 0efeb8fa773e..807ee921ec12 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
>> @@ -386,6 +386,24 @@
>>status = "disabled";
>>};
>>
>> +  can0: can@218 {
>> +  compatible = "fsl,ls1028ar1-flexcan", "fsl,lx2160ar1-
>> flexcan";
>
> The explicit compatible strings cannot be found in the binding, but
> matched by the "fsl,-flexcan" pattern in the binding.  Is
> this considered to be acceptable now?

What is the consequence if it is not acceptable? replacing the pattern
with individual compatible strings?


There is a recommendation in the kernel documentation quoted below:

  7) The wildcard "" may be used in compatible strings, as in
 the following example:

 - compatible: Must contain '"nvidia,-pcie",
   "nvidia,tegra20-pcie"' where  is tegra30, tegra132, 
...


 As in the above example, the known values of "" should be
 documented if it is used.

But I am not sure if this is still a hard requirement.  If so, we
should list the processors in the binding.


Marc, I'd convert this to yaml format, may I put your name as the
maintainer in the binding?

-michael

Re: [PATCH v2] mmc: sdhci: Don't enable presets while tuning

2020-09-25 Thread Adrian Hunter

On 18/09/20 8:57 pm, Raul Rangel wrote:
> On Tue, Sep 1, 2020 at 4:54 AM Adrian Hunter  wrote:
>>
>> On 24/08/20 9:21 pm, Raul E Rangel wrote:
>>> SDHCI presets are not currently used for eMMC HS/HS200/HS400, but are
>>> used for DDR52. The HS400 retuning sequence is:
>>>
>>> HS400->DDR52->HS->HS200->Perform Tuning->HS->HS400
>>>
>>> This means that when HS400 tuning happens, we transition through DDR52
>>> for a very brief period. This causes presets to be enabled
>>> unintentionally and stay enabled when transitioning back to HS200 or
>>> HS400.
>>>
>>> This patch prevents enabling presets while tuning is in progress.
>>
>> Preset value should not generally have to depend on tuning, so this
>> seems less than ideal.  Also I am not sure you can say some controllers
>> are not accidentally benefiting from the current situation.
>>
>> What about just letting drivers choose the timing modes that support
>> preset values?  e.g. using the change below, a driver could alter
>> host->preset_value_support as needed
> 
> Sorry for the late reply, I'm just getting back to this. I like the
> patch. I have a few other patches I'm
> going to push up soon. Do you want me to include this in the chain, or
> do you want to push it up?

I'm snowed.  You will have to do it I am afraid.

Re: [PATCH v3] nvmem: core: fix possibly memleak when use nvmem_cell_info_to_nvmem_cell()

2020-09-25 Thread Srinivas Kandagatla





On 23/09/2020 21:44, Vadym Kochan wrote:

Fix missing 'kfree_const(cell->name)' when call to
nvmem_cell_info_to_nvmem_cell() in several places:

  * after nvmem_cell_info_to_nvmem_cell() failed during
nvmem_add_cells()

  * during nvmem_device_cell_{read,write} when cell->name is
kstrdup'ed() without calling kfree_const() at the end, but
really there is no reason to do that 'dup, because the cell
instance is allocated on the stack for some short period to be
read/write without exposing it to the caller.

So the new nvmem_cell_info_to_nvmem_cell_nodup() helper is introduced
which is used to convert cell_info -> cell without name duplication as
a lighweight version of nvmem_cell_info_to_nvmem_cell().

Fixes: e2a5402ec7c6 ("nvmem: Add nvmem_device based consumer apis.")
Signed-off-by: Vadym Kochan 


Looks good to me! Thanks for the patch.

Reviewed-by: Srinivas Kandagatla 
Acked-by: Srinivas Kandagatla 

Greg,

Can you please pick this one? As don't have any nvmem pending patches to 
send it together.


thanks,
srini



---
v3:
 * rename __nvmem_cell_info_to_nvmem_cell() -> 
nvmem_cell_info_to_nvmem_cell_nodup()

 * rephrase commit description a bit

 * get rid of wrong func line wrapping

v2:
 * remove not needed 'kfree_const(cell->name)' after 
nvmem_cell_info_to_nvmem_cell()
   failed.

  drivers/nvmem/core.c | 33 -
  1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 6cd3edb2eaf6..10cd935cf30e 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -361,16 +361,14 @@ static void nvmem_cell_add(struct nvmem_cell *cell)
blocking_notifier_call_chain(&nvmem_notifier, NVMEM_CELL_ADD, cell);
  }
  
-static int nvmem_cell_info_to_nvmem_cell(struct nvmem_device *nvmem,

-  const struct nvmem_cell_info *info,
-  struct nvmem_cell *cell)
+static int nvmem_cell_info_to_nvmem_cell_nodup(struct nvmem_device *nvmem,
+   const struct nvmem_cell_info *info,
+   struct nvmem_cell *cell)
  {
cell->nvmem = nvmem;
cell->offset = info->offset;
cell->bytes = info->bytes;
-   cell->name = kstrdup_const(info->name, GFP_KERNEL);
-   if (!cell->name)
-   return -ENOMEM;
+   cell->name = info->name;
  
  	cell->bit_offset = info->bit_offset;

cell->nbits = info->nbits;
@@ -382,13 +380,30 @@ static int nvmem_cell_info_to_nvmem_cell(struct 
nvmem_device *nvmem,
if (!IS_ALIGNED(cell->offset, nvmem->stride)) {
dev_err(&nvmem->dev,
"cell %s unaligned to nvmem stride %d\n",
-   cell->name, nvmem->stride);
+   cell->name ?: "", nvmem->stride);
return -EINVAL;
}
  
  	return 0;

  }
  
+static int nvmem_cell_info_to_nvmem_cell(struct nvmem_device *nvmem,

+   const struct nvmem_cell_info *info,
+   struct nvmem_cell *cell)
+{
+   int err;
+
+   err = nvmem_cell_info_to_nvmem_cell_nodup(nvmem, info, cell);
+   if (err)
+   return err;
+
+   cell->name = kstrdup_const(info->name, GFP_KERNEL);
+   if (!cell->name)
+   return -ENOMEM;
+
+   return 0;
+}
+
  /**
   * nvmem_add_cells() - Add cell information to an nvmem device
   *
@@ -1460,7 +1475,7 @@ ssize_t nvmem_device_cell_read(struct nvmem_device *nvmem,
if (!nvmem)
return -EINVAL;
  
-	rc = nvmem_cell_info_to_nvmem_cell(nvmem, info, &cell);

+   rc = nvmem_cell_info_to_nvmem_cell_nodup(nvmem, info, &cell);
if (rc)
return rc;
  
@@ -1490,7 +1505,7 @@ int nvmem_device_cell_write(struct nvmem_device *nvmem,

if (!nvmem)
return -EINVAL;
  
-	rc = nvmem_cell_info_to_nvmem_cell(nvmem, info, &cell);

+   rc = nvmem_cell_info_to_nvmem_cell_nodup(nvmem, info, &cell);
if (rc)
return rc;

[PATCH 1/2] ASoC: cs47l15: Fix EPOUT->HPOUT1 Mono Mux routing

2020-09-25 Thread Richard Fitzgerald

EPOUT is always mono so should have a permanent routing through the
HPOUT1 Mono Mux.

Signed-off-by: Richard Fitzgerald 
---
 sound/soc/codecs/cs47l15.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/cs47l15.c b/sound/soc/codecs/cs47l15.c
index a591e7457d11..254f9d96e766 100644
--- a/sound/soc/codecs/cs47l15.c
+++ b/sound/soc/codecs/cs47l15.c
@@ -1089,6 +1089,7 @@ static const struct snd_soc_dapm_route 
cs47l15_dapm_routes[] = {
{ "HPOUT1 Demux", NULL, "OUT1R" },
 
{ "OUT1R", NULL, "HPOUT1 Mono Mux" },
+   { "HPOUT1 Mono Mux", "EPOUT", "OUT1L" },
 
{ "HPOUTL", "HPOUT", "HPOUT1 Demux" },
{ "HPOUTR", "HPOUT", "HPOUT1 Demux" },
@@ -1268,7 +1269,6 @@ static irqreturn_t cs47l15_adsp2_irq(int irq, void *data)
 
 static const struct snd_soc_dapm_route cs47l15_mono_routes[] = {
{ "HPOUT1 Mono Mux", "HPOUT", "OUT1L" },
-   { "HPOUT1 Mono Mux", "EPOUT", "OUT1L" },
 };
 
 static int cs47l15_component_probe(struct snd_soc_component *component)
-- 
2.20.1

Re: [PATCH 1/2] arm64: dts: ls1028a: add missing CAN nodes

2020-09-25 Thread Marc Kleine-Budde

On 9/25/20 11:31 AM, Michael Walle wrote:
> Marc, I'd convert this to yaml format,

Oleksij (CC'ed) is working already on this.

> may I put your name as the maintainer in the binding?

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v3 2/6] spi: spi-mtk-nor: fix mishandled logics in checking SPI memory operation

2020-09-25 Thread Ikjoon Jang

On Fri, Sep 25, 2020 at 2:54 PM Ikjoon Jang  wrote:
>
> Fix a bug which limits its protocol availability in supports_op().
>
> Fixes: a59b2c7c56bf ("spi: spi-mtk-nor: support standard spi properties")
> Signed-off-by: Ikjoon Jang 
> ---

This is also duplicated work of https://patchwork.kernel.org/patch/11797723/,
I'm going to drop this patch in v4.

>
> (no changes since v1)
>
>  drivers/spi/spi-mtk-nor.c | 26 +++---
>  1 file changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/spi/spi-mtk-nor.c b/drivers/spi/spi-mtk-nor.c
> index 6e6ca2b8e6c8..0f7d4ec68730 100644
> --- a/drivers/spi/spi-mtk-nor.c
> +++ b/drivers/spi/spi-mtk-nor.c
> @@ -211,28 +211,24 @@ static bool mtk_nor_supports_op(struct spi_mem *mem,
> if (op->cmd.buswidth != 1)
> return false;
>
> +   if (!spi_mem_default_supports_op(mem, op))
> +   return false;
> +
> if ((op->addr.nbytes == 3) || (op->addr.nbytes == 4)) {
> -   switch(op->data.dir) {
> -   case SPI_MEM_DATA_IN:
> -   if (!mtk_nor_match_read(op))
> -   return false;
> -   break;
> -   case SPI_MEM_DATA_OUT:
> -   if ((op->addr.buswidth != 1) ||
> -   (op->dummy.nbytes != 0) ||
> -   (op->data.buswidth != 1))
> -   return false;
> -   break;
> -   default:
> -   break;
> -   }
> +   if ((op->data.dir == SPI_MEM_DATA_IN) && 
> mtk_nor_match_read(op))
> +   return true;
> +   else if (op->data.dir == SPI_MEM_DATA_OUT)
> +   return (op->addr.buswidth == 1) &&
> +  (op->dummy.nbytes == 0) &&
> +  (op->data.buswidth == 1);
> }
> +
> len = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes;
> if ((len > MTK_NOR_PRG_MAX_SIZE) ||
> ((op->data.nbytes) && (len == MTK_NOR_PRG_MAX_SIZE)))
> return false;
>
> -   return spi_mem_default_supports_op(mem, op);
> +   return true;
>  }
>
>  static void mtk_nor_setup_bus(struct mtk_nor *sp, const struct spi_mem_op 
> *op)
> --
> 2.28.0.681.g6f77f65b4e-goog
>

[PATCH v3 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-09-25 Thread Steven Price

Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This exposes the feature to the guest and automatically tags
memory pages touched by the VM as PG_mte_tagged (and clears the tags
storage) to ensure that the guest cannot see stale tags, and so that the
tags are correctly saved/restored across swap.

Signed-off-by: Steven Price 
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h|  3 +++
 arch/arm64/kvm/arm.c |  9 +
 arch/arm64/kvm/mmu.c | 15 +++
 arch/arm64/kvm/sys_regs.c|  6 +-
 include/uapi/linux/kvm.h |  1 +
 6 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 49a55be2b9a2..4923a566ae6e 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID2;
+
+   if (vcpu->kvm->arch.mte_enabled)
+   vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 4f4360dd149e..1379300c1487 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -110,6 +110,9 @@ struct kvm_arch {
 * supported.
 */
bool return_nisv_io_abort_to_user;
+
+   /* Memory Tagging Extension enabled for the guest */
+   bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 46dc3d75cf13..624edca0a1fa 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -87,6 +87,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = 0;
kvm->arch.return_nisv_io_abort_to_user = true;
break;
+   case KVM_CAP_ARM_MTE:
+   if (!system_supports_mte() || kvm->created_vcpus)
+   return -EINVAL;
+   r = 0;
+   kvm->arch.mte_enabled = true;
+   break;
default:
r = -EINVAL;
break;
@@ -206,6 +212,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 */
r = 1;
break;
+   case KVM_CAP_ARM_MTE:
+   r = system_supports_mte();
+   break;
default:
r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
break;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..befb9e1f0aa6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (vma_pagesize == PAGE_SIZE && !force_pte)
vma_pagesize = transparent_hugepage_adjust(memslot, hva,
   &pfn, &fault_ipa);
+   if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
+   /*
+* VM will be able to see the page's tags, so we must ensure
+* they have been initialised.
+*/
+   struct page *page = pfn_to_page(pfn);
+   long i, nr_pages = compound_nr(page);
+
+   /* if PG_mte_tagged is set, tags have already been initialised 
*/
+   for (i = 0; i < nr_pages; i++, page++) {
+   if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+   mte_clear_page_tags(page_address(page));
+   }
+   }
+
if (writable)
kvm_set_pfn_dirty(pfn);
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a655f172b5ad..5010a47152b4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
} else if (id == SYS_ID_AA64PFR1_EL1) {
-   val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
+   if (!vcpu->kvm->arch.mte_enabled)
+   val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
   const struct sys_reg_desc *rd)
 {
+   if (vcpu->kvm->arch.mte_enabled)
+   return 0;

Re: [PATCH v9 09/20] gpiolib: cdev: support edge detection for uAPI v2

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 6:07 AM Kent Gibson  wrote:
> On Wed, Sep 23, 2020 at 06:47:28PM +0300, Andy Shevchenko wrote:
> > On Tue, Sep 22, 2020 at 5:35 AM Kent Gibson  wrote:

...

> Also, this code is drawn from lineevent_irq_thread(), which is ordered
> this way.

Negative conditionals are slightly harder to read.

...

> > > +   if (!line->timestamp_ns) {
> > > +   le.timestamp_ns = ktime_get_ns();
> > > +   if (lr->num_lines != 1)
> > > +   line->req_seqno = atomic_inc_return(&lr->seqno);
> > > +   } else {
> > > +   le.timestamp_ns = line->timestamp_ns;
> > > > +   }
> >
> > Ditto.
>
> Firstly, drawn from lineevent_irq_thread() which is structured this way.
>
> In this case the comment relates to the condition being true, so
> re-ordering the if/else would be confusing - unless the comment were
> moved into the corresponding body??

Yes.

...

> > > +   irq = gpiod_to_irq(line->desc);
> > > +   if (irq <= 0)
> > > +   return -ENODEV;
> >
> > So, you mean this is part of ABI. Can we return more appropriate code,
> > because getting no IRQ doesn't mean we don't have a device.
> > Also does 0 case have the same meaning?
>
> Firstly, this code is drawn from lineevent_create(), so any changes
> here should be considered for there as well - though this may
> constitute an ABI change??

For v1 probably, for v2 we are free to fix this.

> I agree ENODEV doesn't seem right here. Are you ok with ENXIO?

Yes.

> From gpiod_to_irq():
>
> /* Zero means NO_IRQ */
> if (!retirq)
> return -ENXIO;
>
> so it can't even return a 0 :-| - we're just being cautious.

I would drop = part then.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v3 5/7] rtc: New driver for RTC in Netronix embedded controller

2020-09-25 Thread Alexandre Belloni

Hi,

On 24/09/2020 21:24:53+0200, Jonathan Neuschäfer wrote:
> With this driver, mainline Linux can keep its time and date in sync with
> the vendor kernel.
> 
> Advanced functionality like alarm and automatic power-on is not yet
> supported.
> 
> Signed-off-by: Jonathan Neuschäfer 
> ---
> 
> v3:
> - Add email address to copyright line
> - Remove OF compatible string and don't include linux/of_device.h
> - Don't use a comma after sentinels
> - Avoid ret |= ... pattern
> - Move 8-bit register conversion to ntxec.h
> - Relicense as GPLv2 or later

I don't think you had to relicense. The kernel is GPL 2 only, you are
free to license your code under GPL 2 only if that is what you desire.

> 
> v2:
> - https://lore.kernel.org/lkml/20200905133230.1014581-7-j.neuschae...@gmx.net/
> - Rework top-of-file comment [Lee Jones]
> - Sort the #include lines [Alexandre Belloni]
> - don't align = signs in struct initializers [Uwe Kleine-König]
> - Switch to regmap
> - Fix register number used to read minutes and seconds
> - Prefix registers with NTXEC_REG_
> - Add help text to the Kconfig option
> - Use devm_rtc_allocate_device and rtc_register_device, set ->range_min and 
> ->range_max
> ---
>  drivers/rtc/Kconfig |   8 +++
>  drivers/rtc/Makefile|   1 +
>  drivers/rtc/rtc-ntxec.c | 132 
>  3 files changed, 141 insertions(+)
>  create mode 100644 drivers/rtc/rtc-ntxec.c
> 
> diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
> index 48c536acd777f..ae8f3dc36c9a3 100644
> --- a/drivers/rtc/Kconfig
> +++ b/drivers/rtc/Kconfig
> @@ -1301,6 +1301,14 @@ config RTC_DRV_CROS_EC
> This driver can also be built as a module. If so, the module
> will be called rtc-cros-ec.
> 
> +config RTC_DRV_NTXEC
> + tristate "Netronix embedded controller RTC driver"
> + depends on MFD_NTXEC
> + help
> +   Say yes here if you want to support the RTC functionality of the
> +   embedded controller found in certain e-book readers designed by the
> +   ODM Netronix.
> +
>  comment "on-CPU RTC drivers"
> 
>  config RTC_DRV_ASM9260
> diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile
> index 880e08a409c3d..733479db18896 100644
> --- a/drivers/rtc/Makefile
> +++ b/drivers/rtc/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_RTC_DRV_MT7622)  += rtc-mt7622.o
>  obj-$(CONFIG_RTC_DRV_MV) += rtc-mv.o
>  obj-$(CONFIG_RTC_DRV_MXC)+= rtc-mxc.o
>  obj-$(CONFIG_RTC_DRV_MXC_V2) += rtc-mxc_v2.o
> +obj-$(CONFIG_RTC_DRV_NTXEC)  += rtc-ntxec.o
>  obj-$(CONFIG_RTC_DRV_OMAP)   += rtc-omap.o
>  obj-$(CONFIG_RTC_DRV_OPAL)   += rtc-opal.o
>  obj-$(CONFIG_RTC_DRV_PALMAS) += rtc-palmas.o
> diff --git a/drivers/rtc/rtc-ntxec.c b/drivers/rtc/rtc-ntxec.c
> new file mode 100644
> index 0..af23c7cc76544
> --- /dev/null
> +++ b/drivers/rtc/rtc-ntxec.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * The Netronix embedded controller is a microcontroller found in some
> + * e-book readers designed by the ODM Netronix, Inc. It contains RTC,
> + * battery monitoring, system power management, and PWM functionality.
> + *
> + * This driver implements access to the RTC time and date.
> + *
> + * Copyright 2020 Jonathan Neuschäfer 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct ntxec_rtc {
> + struct device *dev;
> + struct ntxec *ec;
> +};
> +
> +#define NTXEC_REG_WRITE_YEAR 0x10
> +#define NTXEC_REG_WRITE_MONTH0x11
> +#define NTXEC_REG_WRITE_DAY  0x12
> +#define NTXEC_REG_WRITE_HOUR 0x13
> +#define NTXEC_REG_WRITE_MINUTE   0x14
> +#define NTXEC_REG_WRITE_SECOND   0x15
> +
> +#define NTXEC_REG_READ_YM0x20
> +#define NTXEC_REG_READ_DH0x21
> +#define NTXEC_REG_READ_MS0x23
> +
> +static int ntxec_read_time(struct device *dev, struct rtc_time *tm)
> +{
> + struct ntxec_rtc *rtc = dev_get_drvdata(dev);
> + unsigned int value;
> + int res;
> +
> + res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_YM, &value);
> + if (res < 0)
> + return res;
> +
> + tm->tm_year = (value >> 8) + 100;
> + tm->tm_mon = (value & 0xff) - 1;
> +
> + res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_DH, &value);
> + if (res < 0)
> + return res;
> +
> + tm->tm_mday = value >> 8;
> + tm->tm_hour = value & 0xff;
> +
> + res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_MS, &value);
> + if (res < 0)
> + return res;
> +
> + tm->tm_min = value >> 8;
> + tm->tm_sec = value & 0xff;
> +
> + return 0;
> +}
> +
> +static int ntxec_set_time(struct device *dev, struct rtc_time *tm)
> +{
> + struct ntxec_rtc *rtc = dev_get_drvdata(dev);
> + int res = 0;
> +
> + res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_YEAR, 
> ntxec_reg8(tm->tm_year - 100));
> + if (res)
> + return res;
> +
> + res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_MONTH, 
> ntxec_reg8(tm->t

[PATCH v3 0/2] MTE support for KVM guest

2020-09-25 Thread Steven Price

Version 3 of adding MTE support for KVM guests. See the previous (v2)
posting for background:

 https://lore.kernel.org/r/20200904160018.29481-1-steven.price%40arm.com

These patches add support to KVM to enable MTE within a guest. They are
based on Catalin's v9 MTE user-space support series[1] (currently in
next).

Changes since v2:

 * MTE is no longer a VCPU feature, instead it is a VM cap.

 * Being a VM cap means easier probing (check for KVM_CAP_ARM_MTE).

 * The cap must be set before any VCPUs are created, preventing any
   shenanigans where MTE is enabled for the guest after memory accesses
   have been performed.

[1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.mari...@arm.com

Steven Price (2):
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Introduce MTE VCPU feature

 arch/arm64/include/asm/kvm_emulate.h   |  3 +++
 arch/arm64/include/asm/kvm_host.h  |  7 +++
 arch/arm64/include/asm/sysreg.h|  3 ++-
 arch/arm64/kvm/arm.c   |  9 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++
 arch/arm64/kvm/mmu.c   | 15 +++
 arch/arm64/kvm/sys_regs.c  | 20 +++-
 include/uapi/linux/kvm.h   |  1 +
 8 files changed, 66 insertions(+), 6 deletions(-)

-- 
2.20.1

linux-next: build warning after merge of the tip tree

2020-09-25 Thread Stephen Rothwell

Hi all,

After merging the tip tree, today's linux-next build (powerpc and
powerpc64 allnoconfig) produced this warning:

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - PPC [=y]

Introduced by commit

  077ee78e3928 ("PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable")

-- 
Cheers,
Stephen Rothwell


pgpJ76QQSXUY0.pgp
Description: OpenPGP digital signature

[PATCH v3 1/2] arm64: kvm: Save/restore MTE registers

2020-09-25 Thread Steven Price

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price 
---
 arch/arm64/include/asm/kvm_host.h  |  4 
 arch/arm64/include/asm/sysreg.h|  3 ++-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++
 arch/arm64/kvm/sys_regs.c  | 14 ++
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e52c927aade5..4f4360dd149e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -126,6 +126,8 @@ enum vcpu_sysreg {
SCTLR_EL1,  /* System Control Register */
ACTLR_EL1,  /* Auxiliary Control Register */
CPACR_EL1,  /* Coprocessor Access Control */
+   RGSR_EL1,   /* Random Allocation Tag Seed Register */
+   GCR_EL1,/* Tag Control Register */
ZCR_EL1,/* SVE Control */
TTBR0_EL1,  /* Translation Table Base Register 0 */
TTBR1_EL1,  /* Translation Table Base Register 1 */
@@ -142,6 +144,8 @@ enum vcpu_sysreg {
TPIDR_EL1,  /* Thread ID, Privileged */
AMAIR_EL1,  /* Aux Memory Attribute Indirection Register */
CNTKCTL_EL1,/* Timer Control Register (EL1) */
+   TFSRE0_EL1, /* Tag Fault Status Register (EL0) */
+   TFSR_EL1,   /* Tag Fault Stauts Register (EL1) */
PAR_EL1,/* Physical Address Register */
MDSCR_EL1,  /* Monitor Debug System Control Register */
MDCCINT_EL1,/* Monitor Debug Comms Channel Interrupt Enable Reg */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 52eefe2f7d95..cd60677551b7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -563,7 +563,8 @@
 #define SCTLR_ELx_M(BIT(0))
 
 #define SCTLR_ELx_FLAGS(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
-SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
+SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
+SCTLR_ELx_ITFSB)
 
 /* SCTLR_EL2 specific flags. */
 #define SCTLR_EL2_RES1 ((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h 
b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7a986030145f..a124ffa49ba3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -18,6 +18,11 @@
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
ctxt_sys_reg(ctxt, MDSCR_EL1)   = read_sysreg(mdscr_el1);
+   if (system_supports_mte()) {
+   ctxt_sys_reg(ctxt, RGSR_EL1)= read_sysreg_s(SYS_RGSR_EL1);
+   ctxt_sys_reg(ctxt, GCR_EL1) = read_sysreg_s(SYS_GCR_EL1);
+   ctxt_sys_reg(ctxt, TFSRE0_EL1)  = read_sysreg_s(SYS_TFSRE0_EL1);
+   }
 }
 
 static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct 
kvm_cpu_context *ctxt)
ctxt_sys_reg(ctxt, CNTKCTL_EL1) = read_sysreg_el1(SYS_CNTKCTL);
ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
ctxt_sys_reg(ctxt, TPIDR_EL1)   = read_sysreg(tpidr_el1);
+   if (system_supports_mte())
+   ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
 
ctxt_sys_reg(ctxt, SP_EL1)  = read_sysreg(sp_el1);
ctxt_sys_reg(ctxt, ELR_EL1) = read_sysreg_el1(SYS_ELR);
@@ -63,6 +70,11 @@ static inline void __sysreg_save_el2_return_state(struct 
kvm_cpu_context *ctxt)
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
+   if (system_supports_mte()) {
+   write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
+   write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
+   write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+   }
 }
 
 static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -106,6 +118,8 @@ static inline void __sysreg_restore_el1_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),   par_el1);
write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);
+   if (system_supports_mte())
+   write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
 
if (!has_vhe() &&
cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 379f4969d0bd..a655f172b5ad 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1391,6 +1391,12 @@ static bool access_mte_regs

[PATCH net-next 0/1] net: stmmac: Enable VLAN filter fail queue for Intel platform data

2020-09-25 Thread Wong Vee Khee

This is a follow-up on a earlier patch submission at:-
https://patchwork.ozlabs.org/patch/1275604/

Changes since the previous patch submission:
- Enable VLAN fail queue for Intel platform data (dwmac-intel).
- Steer the VLAN failed packet to the last Rx queue.


Chuah, Kim Tatt (1):
  net: stmmac: Add option for VLAN filter fail queue enable

 drivers/net/ethernet/stmicro/stmmac/common.h  |  2 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c |  5 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 15 +--
 drivers/net/ethernet/stmicro/stmmac/dwmac5.h  |  6 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  3 +++
 include/linux/stmmac.h|  2 ++
 7 files changed, 32 insertions(+), 2 deletions(-)

-- 
2.17.0

Re: ledtrig-cpu: Limit to 4 CPUs

2020-09-25 Thread Pavel Machek

Hi!

> >>So.. no, it is not causing kernel crashes or something. But it is
> >>example of bad interface, and _that_ is causing problems. (And yes, if
> >>I realized it is simply possible to limit it, maybe the BIN_ATTR
> >>conversion would not be neccessary...)
> >
> >The limitation you proposed breaks the trigger on many plafforms.
> 
> Actually it precludes its use.
> 
> I still see the patch in your linux-next, so I reserve myself the
> right to comment on your pull request.

You are free to comment on anything.

I believe probability someone uses that with more than 4 CPUs is <
5%. Probability that someone uses it with more than 100 CPUs is << 1%
I'd say. Systems just don't have that many LEDs. I'll take the risk.

If I broke someone's real, existing setup, I'll raise the limit.

(With exception of uled setups. In such cases, I'll just laugh.)

If you know or can find out someone using it with more than 4 CPUs, I
can obviously raise the limit before the merge.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH v9 12/20] gpiolib: cdev: support setting debounce

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 10:49 AM Kent Gibson  wrote:
> On Wed, Sep 23, 2020 at 07:27:37PM +0300, Andy Shevchenko wrote:
> > On Tue, Sep 22, 2020 at 5:36 AM Kent Gibson  wrote:

...

> > Shouldn't we rather return 0/1 guaranteed?
> >
> > Perhaps
> >
> >  if (active_low)
> >   return !value;
> >
> > return !!value;
> >
> > ?
> >
>
> Or just make the return value a bool?

If it's good enough.

...

> > > +   /* switch from physical level to logical - if they differ */
> > > +   if (test_bit(FLAG_ACTIVE_LOW, &line->desc->flags))
> > > +   level = !level;
> >
> > Seems to me a good candidate to have
> >
> > static inline bool convert_with_active_low_respected(desc, value)
> > {
> >   if (active_low)
> >return !value;
> >  return !!value;
> > }

> Not sure it is worth the effort - it would only be used twice - here
> and in debounced_value() - which is only a couple of lines itself.

I'm thinking about possible candidates to use this...
gpiod_direction_output()
gpiod_get_array_value_complex()
gpiod_get_value()
gpiod_set_value_nocheck()
gpiod_get_value_cansleep()
...I stopped here...

I agree that not all of them are good to be converted (b/c few flags
being tested and it will bring inconsistency), but many deserve the
perhaps separate patch to introduce the above mentioned helper.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v9 00/20] gpio: cdev: add uAPI v2

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 11:00 AM Kent Gibson  wrote:
> On Wed, Sep 23, 2020 at 07:35:30PM +0300, Andy Shevchenko wrote:
> > On Tue, Sep 22, 2020 at 5:34 AM Kent Gibson  wrote:

...

> > For the rest I gave some comments but most of them are simply to
> > address. The uAPI definition I agree with after Arnd's comment. I
> > don't see big impediments to having this for v5.10.
> >
> > Thanks!
> >
>
> Thanks for your review - I nearly always learn something new from them,
> and you can be picky pain in the arse at times - which is a good thing
> for a reviewer. Apart from the pain in the arse ;-).

Thanks for being patient!
As a reviewer I'm also in constant learning.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] Revert "net: ethernet: ixgbe: check the return value of ixgbe_mii_bus_init()"

2020-09-25 Thread Bartosz Golaszewski

On Fri, Sep 25, 2020 at 10:51 AM Liu, Yongxin  wrote:
>

[snip]

> > > true);
> > >
> > > -   err = ixgbe_mii_bus_init(hw);
> > > -   if (err)
> > > -   goto err_netdev;
> > > +   ixgbe_mii_bus_init(hw);
> > >
> > > return 0;
> > >
> > > -err_netdev:
> > > -   unregister_netdev(netdev);
> > >  err_register:
> > > ixgbe_release_hw_control(adapter);
> > > ixgbe_clear_interrupt_scheme(adapter);
> > > --
> > > 2.14.4
> > >
> >
> > Then we should check if err == -ENODEV, not outright ignore all potential
> > errors, right?
> >
>
> Hm, it is weird to take -ENODEV as a no error.
> How about just return 0 instead of -ENODEV in the following function?
>

No, it's perfectly fine. -ENODEV means there's no device and this can
happen. The caller can then act accordingly - for example: ignore that
fact. We do it in several places[1].

Bartosz

[snip]

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/misc/eeprom/at24.c#L714

[PATCH net-next 1/1] net: stmmac: Add option for VLAN filter fail queue enable

2020-09-25 Thread Wong Vee Khee

From: "Chuah, Kim Tatt" 

Add option in plat_stmmacenet_data struct to enable VLAN Filter Fail
Queuing. This option allows packets that fail VLAN filter to be routed
to a specific Rx queue when Receive All is also set.

When this option is enabled:
- Enable VFFQ only when entering promiscuous mode, because Receive All
  will pass up all rx packets that failed address filtering (similar to
  promiscuous mode).
- VLAN-promiscuous mode is never entered to allow rx packet to fail VLAN
  filters and get routed to selected VFFQ Rx queue.

Reviewed-by: Voon Weifeng 
Reviewed-by: Ong Boon Leong 
Signed-off-by: Chuah, Kim Tatt 
Signed-off-by: Ong Boon Leong 
---
 drivers/net/ethernet/stmicro/stmmac/common.h  |  2 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c |  5 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |  1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 15 +--
 drivers/net/ethernet/stmicro/stmmac/dwmac5.h  |  6 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  3 +++
 include/linux/stmmac.h|  2 ++
 7 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index cafa8e3c3573..df7de50497a0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -481,6 +481,8 @@ struct mac_device_info {
unsigned int num_vlan;
u32 vlan_filter[32];
unsigned int promisc;
+   bool vlan_fail_q_en;
+   u8 vlan_fail_q;
 };
 
 struct stmmac_rx_routing {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
index 2ac9dfb3462c..ab0a81e0fea1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
@@ -321,6 +321,11 @@ static int intel_mgbe_common_data(struct pci_dev *pdev,
/* Set the maxmtu to a default of JUMBO_LEN */
plat->maxmtu = JUMBO_LEN;
 
+   plat->vlan_fail_q_en = true;
+
+   /* Use the last Rx queue */
+   plat->vlan_fail_q = plat->rx_queues_to_use - 1;
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 61f3249bd724..592b043f9676 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -76,6 +76,7 @@
 #define GMAC_PACKET_FILTER_HPF BIT(10)
 #define GMAC_PACKET_FILTER_VTFEBIT(16)
 #define GMAC_PACKET_FILTER_IPFEBIT(20)
+#define GMAC_PACKET_FILTER_RA  BIT(31)
 
 #define GMAC_MAX_PERFECT_ADDRESSES 128
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index ecd834e0e121..002791b77356 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -618,7 +618,18 @@ static void dwmac4_set_filter(struct mac_device_info *hw,
value &= ~GMAC_PACKET_FILTER_PM;
value &= ~GMAC_PACKET_FILTER_PR;
if (dev->flags & IFF_PROMISC) {
-   value = GMAC_PACKET_FILTER_PR | GMAC_PACKET_FILTER_PCF;
+   /* VLAN Tag Filter Fail Packets Queuing */
+   if (hw->vlan_fail_q_en) {
+   value = readl(ioaddr + GMAC_RXQ_CTRL4);
+   value &= ~GMAC_RXQCTRL_VFFQ_MASK;
+   value |= GMAC_RXQCTRL_VFFQE |
+(hw->vlan_fail_q << GMAC_RXQCTRL_VFFQ_SHIFT);
+   writel(value, ioaddr + GMAC_RXQ_CTRL4);
+   value = GMAC_PACKET_FILTER_PR | GMAC_PACKET_FILTER_RA;
+   } else {
+   value = GMAC_PACKET_FILTER_PR | GMAC_PACKET_FILTER_PCF;
+   }
+
} else if ((dev->flags & IFF_ALLMULTI) ||
   (netdev_mc_count(dev) > hw->multicast_filter_bins)) {
/* Pass all multi */
@@ -680,7 +691,7 @@ static void dwmac4_set_filter(struct mac_device_info *hw,
 
writel(value, ioaddr + GMAC_PACKET_FILTER);
 
-   if (dev->flags & IFF_PROMISC) {
+   if (dev->flags & IFF_PROMISC && !hw->vlan_fail_q_en) {
if (!hw->promisc) {
hw->promisc = 1;
dwmac4_vlan_promisc_enable(dev, hw);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
index 3e8faa96b4d4..56b0762c1276 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
@@ -92,6 +92,12 @@
 #define TCEIE  BIT(0)
 #define DMA_ECC_INT_STATUS 0x1088
 
+/* EQoS version 5.xx VLAN Tag Filter Fail Packets Queuing */
+#define GMAC_RXQ_CTRL4 0x0094
+#define GMAC_RXQCTRL_VFFQ_MASK GENMASK(19, 17)
+#define GMAC_RXQCTRL_VFFQ_SHIFT

Re: [PATCH v11 5/6] iommu/uapi: Handle data and argsz filled by users

2020-09-25 Thread Jean-Philippe Brucker

On Thu, Sep 24, 2020 at 12:24:19PM -0700, Jacob Pan wrote:
> IOMMU user APIs are responsible for processing user data. This patch
> changes the interface such that user pointers can be passed into IOMMU
> code directly. Separate kernel APIs without user pointers are introduced
> for in-kernel users of the UAPI functionality.
> 
> IOMMU UAPI data has a user filled argsz field which indicates the data
> length of the structure. User data is not trusted, argsz must be
> validated based on the current kernel data size, mandatory data size,
> and feature flags.
> 
> User data may also be extended, resulting in possible argsz increase.
> Backward compatibility is ensured based on size and flags (or
> the functional equivalent fields) checking.
> 
> This patch adds sanity checks in the IOMMU layer. In addition to argsz,
> reserved/unused fields in padding, flags, and version are also checked.
> Details are documented in Documentation/userspace-api/iommu.rst
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 

Reviewed-by: Jean-Philippe Brucker 

Some comments below in case you're resending, but nothing important.

> ---
>  drivers/iommu/iommu.c  | 199 
> +++--
>  include/linux/iommu.h  |  28 +--
>  include/uapi/linux/iommu.h |   1 +
>  3 files changed, 212 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4ae02291ccc2..5c1b7ae48aae 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1961,34 +1961,219 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +/*
> + * Check flags and other user provided data for valid combinations. We also
> + * make sure no reserved fields or unused flags are set. This is to ensure
> + * not breaking userspace in the future when these fields or flags are used.
> + */
> +static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
> *info)
> +{
> + u32 mask;
> + int i;
> +
> + if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
> + if (info->cache & ~mask)
> + return -EINVAL;
> +
> + if (info->granularity >= IOMMU_INV_GRANU_NR)
> + return -EINVAL;
> +
> + switch (info->granularity) {
> + case IOMMU_INV_GRANU_ADDR:
> + if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
> + return -EINVAL;
> +
> + mask = IOMMU_INV_ADDR_FLAGS_PASID |
> + IOMMU_INV_ADDR_FLAGS_ARCHID |
> + IOMMU_INV_ADDR_FLAGS_LEAF;
> +
> + if (info->granu.addr_info.flags & ~mask)
> + return -EINVAL;
> + break;
> + case IOMMU_INV_GRANU_PASID:
> + mask = IOMMU_INV_PASID_FLAGS_PASID |
> + IOMMU_INV_PASID_FLAGS_ARCHID;
> + if (info->granu.pasid_info.flags & ~mask)
> + return -EINVAL;
> +
> + break;
> + case IOMMU_INV_GRANU_DOMAIN:
> + if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
> + return -EINVAL;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + /* Check reserved padding fields */
> + for (i = 0; i < sizeof(info->padding); i++) {
> + if (info->padding[i])
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
> *dev,
> - struct iommu_cache_invalidate_info *inv_info)
> + void __user *uinfo)
>  {
> + struct iommu_cache_invalidate_info inv_info = { 0 };
> + u32 minsz;
> + int ret = 0;

nit: no need to initialize it

> +
>   if (unlikely(!domain->ops->cache_invalidate))
>   return -ENODEV;
>  
> - return domain->ops->cache_invalidate(domain, dev, inv_info);
> + /*
> +  * No new spaces can be added before the variable sized union, the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_cache_invalidate_info, granu);

Why not use offsetofend() to avoid naming the unions?

> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(&inv_info, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before variable size union is mandatory */
> + if (inv_info.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu require additional info beyond minsz */
> + if (inv_info.argsz == minsz &&
> + ((inv_info.granularity == IOMMU_INV_GRANU_PASID) ||
> + (inv_info.granularity == IOMMU_INV_GRANU_ADDR)))
> + return -EINVAL;

Made redundant by the two checks below

> +
> + if (inv_info.granularity == IOMMU_INV_GRANU_P

general protection fault in gfs2_withdraw

2020-09-25 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:ba4f184e Linux 5.9-rc6
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13a0ccad90
kernel config:  https://syzkaller.appspot.com/x/.config?x=6f192552d75898a1
dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df
compiler:   gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com

gfs2: fsid=syz:syz.0: fatal: invalid metadata block
  bh = 2072 (magic number)
  function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 417
gfs2: fsid=syz:syz.0: about to withdraw this file system
general protection fault, probably for non-canonical address 
0xdc0e:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0070-0x0077]
CPU: 0 PID: 27118 Comm: syz-executor.0 Not tainted 5.9.0-rc6-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
RIP: 0010:gfs2_withdraw.cold+0xff/0xc0e fs/gfs2/util.c:294
Code: 00 48 c1 e0 2a 80 3c 02 00 0f 85 19 02 00 00 4c 8b bb a0 08 00 00 b8 ff 
ff 37 00 48 c1 e0 2a 49 8d 7f 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 74 05 e8 67 
6d 68 fe 4d 8b 7f 70 b8 ff ff 37 00 48 c1
RSP: 0018:c900018b73b8 EFLAGS: 00010202
RAX: dc00 RBX: 888059d7 RCX: c90002639000
RDX: 000e RSI: 834e9fdf RDI: 0070
RBP: 888059d7026d R08: 0038 R09: 88802ce318e7
R10:  R11:  R12: 888059d70050
R13: 888059d702f0 R14: 88cc1320 R15: 
FS:  7f348fd73700() GS:88802ce0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00b60004 CR3: 4a089000 CR4: 00350ef0
DR0: 2000 DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0600
Call Trace:
 gfs2_meta_check_ii+0x68/0xa0 fs/gfs2/util.c:450
 gfs2_metatype_check_i fs/gfs2/util.h:126 [inline]
 gfs2_meta_indirect_buffer+0x3a3/0x3f0 fs/gfs2/meta_io.c:417
 gfs2_meta_inode_buffer fs/gfs2/meta_io.h:70 [inline]
 gfs2_inode_refresh+0x95/0xdf0 fs/gfs2/glops.c:438
 inode_go_lock+0x309/0x49f fs/gfs2/glops.c:468
 do_promote+0x4a0/0xc10 fs/gfs2/glock.c:390
 finish_xmote+0x4ed/0xf40 fs/gfs2/glock.c:560
 do_xmote+0x812/0xba0 fs/gfs2/glock.c:686
 run_queue+0x323/0x680 fs/gfs2/glock.c:751
 gfs2_glock_nq+0x716/0x11b0 fs/gfs2/glock.c:1410
 gfs2_glock_nq_init fs/gfs2/glock.h:238 [inline]
 gfs2_lookupi+0x314/0x630 fs/gfs2/inode.c:317
 gfs2_lookup_simple+0x99/0xe0 fs/gfs2/inode.c:268
 init_journal fs/gfs2/ops_fstype.c:620 [inline]
 init_inodes+0x367/0x1f40 fs/gfs2/ops_fstype.c:756
 gfs2_fill_super+0x195e/0x254a fs/gfs2/ops_fstype.c:1125
 get_tree_bdev+0x421/0x740 fs/super.c:1342
 gfs2_get_tree+0x4a/0x270 fs/gfs2/ops_fstype.c:1201
 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x1387/0x20a0 fs/namespace.c:3192
 do_mount fs/namespace.c:3205 [inline]
 __do_sys_mount fs/namespace.c:3413 [inline]
 __se_sys_mount fs/namespace.c:3390 [inline]
 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45e5ea
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 6d 9e fb ff c3 66 2e 0f 1f 
84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 
4a 9e fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7f348fd72aa8 EFLAGS: 0202 ORIG_RAX: 00a5
RAX: ffda RBX: 7f348fd72b40 RCX: 0045e5ea
RDX: 2000 RSI: 2100 RDI: 7f348fd72b00
RBP: 7f348fd72b00 R08: 7f348fd72b40 R09: 2000
R10:  R11: 0202 R12: 2000
R13: 2100 R14: 2200 R15: 20047a20
Modules linked in:
---[ end trace a1967e7d2c26629b ]---
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
RIP: 0010:gfs2_withdraw.cold+0xff/0xc0e fs/gfs2/util.c:294
Code: 00 48 c1 e0 2a 80 3c 02 00 0f 85 19 02 00 00 4c 8b bb a0 08 00 00 b8 ff 
ff 37 00 48 c1 e0 2a 49 8d 7f 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 74 05 e8 67 
6d 68 fe 4d 8b 7f 70 b8 ff ff 37 00 48 c1
RSP: 0018:c900018b73b8 EFLAGS: 00010202
RAX: dc00 RBX: 888059d7 RCX: c90002639000
RDX: 000e RSI: 834e9fdf RDI: 0070
RBP: 888059d7026d R08: 0038 R09: 88802ce318e7
R10:  R11:  R12: 888059d70050
R13: 888059d702f0 R14: 88cc1320 R15: 
FS:  7f348fd73700() GS:88802cf0() knlGS:
CS:  0010 DS:

RE: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.

2020-09-25 Thread Jianyong Wu

Hi Dominique,

> -Original Message-
> From: Dominique Martinet 
> Sent: Thursday, September 24, 2020 5:52 PM
> To: Jianyong Wu 
> Cc: eri...@gmail.com; lu...@ionkov.net; qemu_...@crudebyte.com;
> gr...@kaod.org; v9fs-develo...@lists.sourceforge.net; linux-
> ker...@vger.kernel.org; Justin He 
> Subject: Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
>
> Jianyong Wu wrote on Thu, Sep 24, 2020:
> > > Jianyong Wu wrote on Wed, Sep 23, 2020:
> > > > open-unlink-f*syscall test:
> > > > I have tested for f*syscall include: ftruncate fstat fchown fchmod
> faccessat.
> > >
> > > Given the other thread, what did you test this with?
> >
> > Er, I just use Greg's qemu of https://github.com/gkurz/qemu.git:
> > 9p-attr-fixes. I should have referenced it in commit message.
>
> Ok. That branch is fairly old (based on pre-2.7.0 code), so will need some
> work as well.
> It doesn't look like anyone has time to refresh the patches from what I just
> have read so it might fall on you as well...
>
> (I see Greg just made the same point, took a bit too long to write this mail 
> ;))
>
> > > Since qemu doesn't work apparently do you have a in-house server at
> > > arm I could test?
> > > (I'll try with ganesha otherwise, it keeps files open so it should
> > > work I think...)
> >
> > Yeah, I test this on my arm server. But as these codes are arch-free, we can
> test it in whatever a machine.
> > (also the server in arm can't be accessed by outer space, so sorry)
> > But I think that this test are far from serious and complete.
>
> Yes I meant arm-specific code, not infrastructure. This is fine I'll do more 
> tests
> here.
>
Yeah.
> > > > +atomic_set(&fid->count, 1);
> > >
> > > I kind of like the refcount API becauese it has some extra overflow
> > > checks; but it requires a bit more work around clunk (instead of
> > > bailing out early if counter hits 0, you need to have it call a
> > > separate function in case it does)
> > >
> > > That's mostly esthetics though I'm not going to fuss over that.
> >
> > Sorry, I'm not sure what's the context of this line, does this line
> > lie in "__add_fid”. I'm not sure about why it do harm to clunk?
>
> It's not about clunk, it's about atomic_set/inc/dec vs. refcount_set/inc/dec 
> --
> because refcount has overflow checks.
>
> I've just noticed refcount_dec_and_test that could be used as a drop-in
> replacement, so it would be good ot update.
>
OK.
> >
> > > > @@ -74,6 +77,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct
> > > > inode *inode, kuid_t uid)  void v9fs_open_fid_add(struct inode
> > > > *inode, struct p9_fid *fid)  {  spin_lock(&inode->i_lock);
> > > > +atomic_set(&fid->count, 1);
> > >
> > > Hm, that should be done at fid creation time in net/9p/client.c
> p9_fid_create ;
> > > no ?
> > > (you do it there already, I don't see what reseting count here brings
> except
> > > confusion)
> >
> > I put this counter set op before the fids are added to hlist. So I can
> > make sure the counter value is 1 before fids are used. It's redundant
> > code. I can remove it in both "__add_fid" and "v9fs_open_fid_add", but
>
> I'm not sure what you're trying to do.
> There are two ways to handle inserting to a list as far as refcounting
> goes:
>  - consider you add a new reference that will be removed when the fid is
> removed from the list ; in that case you should increment the counter
> and clunk the fid as usual whne you're done using it in whatever calls
> __add_fid and v9fs_open_fid_add
>  - or consider it an ownership transfer; then don't increment refcount,
> but you must also never use the fid ever again after in the same thread
> (because another thread could free the list and clunk) ; so if you still
> need to use the fid after adding to the list the first option is better.
>
> I'm not sure a fid could be added to both list in some usage patterns
> but if that is the case reseting the count to 1 is most definitely an
> error, as both would want to be able to decrement once.
>
> > we must take care of it that no clunk is called between fids are
> > created and added to hlist. Both are good for me.
>
> note for this point that if the fid is not in any list it cannot be
> accessed by another thread and thus cannot be raced on clunk.
>
>
Fine, I will remove the "set" op.
>
> > > > diff --git a/fs/9p/fid.h b/fs/9p/fid.h index
> > > > dfa11df02818..1fed96546728 100644
> > > > --- a/fs/9p/fid.h
> > > > +++ b/fs/9p/fid.h
> > > > @@ -22,6 +22,14 @@ static inline struct p9_fid *clone_fid(struct
> > > > p9_fid *fid)  }  static inline struct p9_fid *v9fs_fid_clone(struct
> > > > dentry *dentry)  {
> > > > -return clone_fid(v9fs_fid_lookup(dentry));
> > > > +struct p9_fid *fid, *nfid;
> > > > +
> > > > +fid = v9fs_fid_lookup(dentry);
> > > > +if (!fid || IS_ERR(fid))
> > > > +return fid;
> > > > +
> > > > +nfid = p9_client_walk(fid, 0, NULL, 1);
> > >
> > > I think you clone_fid() here is slightly easier to understand; everyone
> doesn't
> > > know that a walk with

Re: [RFC] mm/vmstat: Add events for HugeTLB migration

2020-09-25 Thread Oscar Salvador

On Fri, Sep 25, 2020 at 02:42:29PM +0530, Anshuman Khandual wrote:
> Add following new vmstat events which will track HugeTLB page migration.
> 
> 1. HUGETLB_MIGRATION_SUCCESS
> 2. HUGETLB_MIGRATION_FAILURE
> 
> It follows the existing semantics to accommodate HugeTLB subpages in total
> page migration statistics. While here, this updates current trace event
> "mm_migrate_pages" to accommodate now available HugeTLB based statistics.
> 
> Cc: Daniel Jordan 
> Cc: Zi Yan 
> Cc: John Hubbard 
> Cc: Mike Kravetz 
> Cc: Andrew Morton 
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual 

Was this inspired by some usecase/debugging or just to follow THP's example?

>   int retry = 1;
>   int thp_retry = 1;
> + int hugetlb_retry = 1;
>   int nr_failed = 0;
>   int nr_succeeded = 0;
>   int nr_thp_succeeded = 0;
>   int nr_thp_failed = 0;
>   int nr_thp_split = 0;
> + int nr_hugetlb_succeeded = 0;
> + int nr_hugetlb_failed = 0;
>   int pass = 0;
>   bool is_thp = false;
> + bool is_hugetlb = false;
>   struct page *page;
>   struct page *page2;
>   int swapwrite = current->flags & PF_SWAPWRITE;
> @@ -1433,6 +1437,7 @@ int migrate_pages(struct list_head *from, new_page_t 
> get_new_page,
>   for (pass = 0; pass < 10 && (retry || thp_retry); pass++) {

Should you not have put hugetlb_retry within the loop as well?
Otherwise we might not rety for hugetlb pages now?

-- 
Oscar Salvador
SUSE L3

Re: [RFC PATCH 3/3] KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VM

2020-09-25 Thread Vitaly Kuznetsov

Sean Christopherson  writes:

> On Thu, Sep 24, 2020 at 02:34:14PM +0200, Vitaly Kuznetsov wrote:
>> Sean Christopherson  writes:
>> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> > index 6f9a0c6d5dc5..810d46ab0a47 100644
>> > --- a/arch/x86/kvm/vmx/vmx.c
>> > +++ b/arch/x86/kvm/vmx/vmx.c
>> > @@ -2250,7 +2250,7 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, 
>> > enum kvm_reg reg)
>> >vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits;
>> >break;
>> >default:
>> > -  WARN_ON_ONCE(1);
>> > +  KVM_BUG_ON(1, vcpu->kvm);
>> >break;
>> >}
>> >  }
>> > @@ -4960,6 +4960,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
>> >return kvm_complete_insn_gp(vcpu, err);
>> >case 3:
>> >WARN_ON_ONCE(enable_unrestricted_guest);
>> > +
>> >err = kvm_set_cr3(vcpu, val);
>> >return kvm_complete_insn_gp(vcpu, err);
>> >case 4:
>> > @@ -4985,14 +4986,13 @@ static int handle_cr(struct kvm_vcpu *vcpu)
>> >}
>> >break;
>> >case 2: /* clts */
>> > -  WARN_ONCE(1, "Guest should always own CR0.TS");
>> > -  vmx_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS));
>> > -  trace_kvm_cr_write(0, kvm_read_cr0(vcpu));
>> > -  return kvm_skip_emulated_instruction(vcpu);
>> > +  KVM_BUG(1, vcpu->kvm, "Guest always owns CR0.TS");
>> > +  return -EIO;
>> >case 1: /*mov from cr*/
>> >switch (cr) {
>> >case 3:
>> >WARN_ON_ONCE(enable_unrestricted_guest);
>> > +
>> 
>> Here, were you intended to replace WARN_ON_ONCE() with KVM_BUG_ON() or
>> this is just a stray newline added?
>
> I think it's just a stray newline.  At one point I had converted this to a
> KVM_BUG_ON(), but then reversed direction because it's not fatal to the guest,
> i.e. KVM should continue to function even though it's spuriously intercepting
> CR3 loads.
>
> Which, rereading this patch, completely contradicts the KVM_BUG() for CLTS.
>
> That's probably something we should sort out in this RFC: is KVM_BUG() only
> to be used if the bug is fatal/dangerous, or should it be used any time the
> error is definitely a KVM (or hardware) bug.

Personally, I'm feeling adventurous so my vote goes to the later :-)
Whenever a KVM bug was discovered by a VM it's much safer to stop
executing it as who knows what the implications might be?

In this particular case I can think of a nested scenario when L1 didn't
ask for CR3 intercept but L0 is still injecting it. It is not fatal by
itself but probably there is bug in calculating intercepts in L0 so
if we're getting something extra maybe we're also missing some? And this
doesn't sound good at all.

>
>> >val = kvm_read_cr3(vcpu);
>> >kvm_register_write(vcpu, reg, val);
>> >trace_kvm_cr_read(cr, val);
>> > @@ -5330,7 +5330,9 @@ static int handle_ept_misconfig(struct kvm_vcpu 
>> > *vcpu)
>> >  
>> >  static int handle_nmi_window(struct kvm_vcpu *vcpu)
>> >  {
>> > -  WARN_ON_ONCE(!enable_vnmi);
>> > +  if (KVM_BUG_ON(!enable_vnmi, vcpu->kvm))
>> > +  return -EIO;
>> > +
>> >exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING);
>> >++vcpu->stat.nmi_window_exits;
>> >kvm_make_request(KVM_REQ_EVENT, vcpu);
>> > @@ -5908,7 +5910,8 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, 
>> > fastpath_t exit_fastpath)
>> > * below) should never happen as that means we incorrectly allowed a
>> > * nested VM-Enter with an invalid vmcs12.
>> > */
>> > -  WARN_ON_ONCE(vmx->nested.nested_run_pending);
>> > +  if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm))
>> > +  return -EIO;
>> >  
>> >/* If guest state is invalid, start emulating */
>> >if (vmx->emulation_required)
>> > @@ -6258,7 +6261,9 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>> >int max_irr;
>> >bool max_irr_updated;
>> >  
>> > -  WARN_ON(!vcpu->arch.apicv_active);
>> > +  if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))
>> > +  return -EIO;
>> > +
>> >if (pi_test_on(&vmx->pi_desc)) {
>> >pi_clear_on(&vmx->pi_desc);
>> >/*
>> > @@ -6345,7 +6350,7 @@ static void handle_external_interrupt_irqoff(struct 
>> > kvm_vcpu *vcpu)
>> >gate_desc *desc;
>> >u32 intr_info = vmx_get_intr_info(vcpu);
>> >  
>> > -  if (WARN_ONCE(!is_external_intr(intr_info),
>> > +  if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
>> >"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
>> >return;
>> >  
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index 17f4995e80a7..672eb5142b34 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -8363,6 +8363,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> >bool req_immediate_exit = false;

Re: [PATCH v9 10/20] gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 12:26 PM Kent Gibson  wrote:
> On Thu, Sep 24, 2020 at 11:26:49AM +0300, Andy Shevchenko wrote:
> > On Thu, Sep 24, 2020 at 6:24 AM Kent Gibson  wrote:
> > > On Wed, Sep 23, 2020 at 07:15:46PM +0300, Andy Shevchenko wrote:
> > > > On Wed, Sep 23, 2020 at 7:14 PM Andy Shevchenko
> > > >  wrote:
> > > > > On Tue, Sep 22, 2020 at 5:35 AM Kent Gibson  
> > > > > wrote:

...

> > > > > > +   polarity_change =
> > > > > > +   (test_bit(FLAG_ACTIVE_LOW, &desc->flags) !=
> > > > > > +((flags & GPIO_V2_LINE_FLAG_ACTIVE_LOW) != 
> > > > > > 0));
> > > > >
> > > > > Comparison
> > > >
> > > > Comparison between int / long (not all archs are agreed on this) and
> > > > boolean is not the best we can do.
> > > >
> > >
> > > There is no bool to int comparision here.
> >
> > test_bit() returns int or long depending on arch... Then you compare
> > it to bool (which is a product of != 0).

> Really - I thought it returned bool.
> It is a test - why would it return int or long?

I assume due to arch relation. Some archs may convert test_bit() to a
single assembly instruction that returns a register which definitely
fits long or int depending on case.

> Surely it is guaranteed to return 0 or 1?

Not sure about this, it's all in arch/* and needs to be investigated.
Would be nice to have it cleaned up if there is any inconsistency (and
document if not yet). But It's out of scope of this series I believe.

> > > There are two comparisons - the inner int vs int => bool and the
> > > outer bool vs bool.  The "!= 0" is effectively an implicit cast to
> > > bool, as is your new_polarity initialisation below.
> > >
> > > > What about
> > > >
> > > >   bool old_polarity = test_bit(FLAG_ACTIVE_LOW, &desc->flags);
> > > >   bool new_polarity = flags & GPIO_V2_LINE_FLAG_ACTIVE_LOW;
> > > >
> > > >   old_polarity ^ new_polarity
> > >
> > > So using bitwise operators on bools is ok??
> >
> > XOR is special. There were never bitwise/boolean XORs.
> >
>
> We must live in different universes, cos there has been a bitwise XOR in
> mine since K&R.  The logical XOR is '!='.

Oops, you are right, It was never boolean XOR because it's the same as
a simple comparison.

...

> > > > and move this under INPUT conditional?
> > >
> > > It has to be before the gpio_v2_line_config_flags_to_desc_flags() call,
> > > as that modifies the desc flags, including the new polarity, so
> > > polarity_change would then always be false :-).
> >
> > I really don't see in the code how polarity_change value is used in
> > FLAG_OUTPUT branch below.
>
> It isn't.  But desc->flags is modified before both - and so the
> polarity_change initialization has to go before both SINCE IT TESTS
> THE FLAGS.

I see now. Sorry for being too blind.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation

2020-09-25 Thread Peter Zijlstra

On Fri, Sep 25, 2020 at 11:00:30AM +0200, David Hildenbrand wrote:
> On 25.09.20 09:41, Peter Zijlstra wrote:
> > On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote:
> >> From: Mike Rapoport 
> >>
> >> Removing a PAGE_SIZE page from the direct map every time such page is
> >> allocated for a secret memory mapping will cause severe fragmentation of
> >> the direct map. This fragmentation can be reduced by using PMD-size pages
> >> as a pool for small pages for secret memory mappings.
> >>
> >> Add a gen_pool per secretmem inode and lazily populate this pool with
> >> PMD-size pages.
> > 
> > What's the actual efficacy of this? Since the pmd is per inode, all I
> > need is a lot of inodes and we're in business to destroy the directmap,
> > no?
> > 
> > Afaict there's no privs needed to use this, all a process needs is to
> > stay below the mlock limit, so a 'fork-bomb' that maps a single secret
> > page will utterly destroy the direct map.
> > 
> > I really don't like this, at all.
> 
> As I expressed earlier, I would prefer allowing allocation of secretmem
> only from a previously defined CMA area. This would physically locally
> limit the pain.

Given that this thing doesn't have a migrate hook, that seems like an
eminently reasonable contraint. Because not only will it mess up the
directmap, it will also destroy the ability of the page-allocator /
compaction to re-form high order blocks by sprinkling holes throughout.

Also, this is all very close to XPFO, yet I don't see that mentioned
anywhere.

Further still, it has this HAVE_SECRETMEM_UNCACHED nonsense which is
completely unused. I'm not at all sure exposing UNCACHED to random
userspace is a sane idea.

Re: [PATCH 1/2] printk: Store all three timestamps

2020-09-25 Thread Petr Mladek

On Thu 2020-09-24 10:07:57, Mark Salyzyn wrote:
> I would hope for monotonic_raw, boottime and realtime as being the most
> useful for most situations.
> 
> [TL;DR]
> 
> Currently kernel logs actually uses monotonic_raw (no timing clock
> correction), not monotonic (timing correction).
> 
> Whereas boot (timing clock adjusted, still monotonic) and realtime (timing
> clock _and_ time adjusted, non monotonic), when we try to correlate to user
> space is workable, but we will have troubles correlating monotonic (w.r.t.
> monotonic_raw) clocks if used in user space.
> 
> In Android we have the option of monotonic and realtime only right now for
> the user space logger (which integrates logd, klogd and auditd, the later
> two come from the kernel). I have bugs open to consider boottime, but it is
> blocked on boottime availability from kernel space logging (this change). I
> have another bug to consider switching the logger to monotonic_raw instead
> of monotonic, to make it correlate better with existing kernel logs. But
> alas a lot of resistance for phones switching to monotonic(_raw), the only
> devices that chose monotonic(_raw) is everything else (google glass,
> watches, ...). As such, phones, and the associated developers, will
> continue to want realtime correlated in the kernel logs (yes, this change
> too).
> 
> realtime sucks for the first 10 seconds on Android, since phones generally
> do not get their time correction until then from network resources, and
> many of their rtc clocks are not adjustable, they store a correction factor
> that does not get picked up from user space until userdata is mounted
> (about 20 seconds in). But only kernel developers care about this first
> part of boot, everything after that (and associated correlated kernel
> interactions) are for user space folks.

Thanks a lot for this detailed feedback.

Just to be sure that I understand it correctly. You suggest to store
three timestamps: local_clock(), boot and real clock.

It makes sense to me. I just wonder if there might be any use case
when the adjusted mono clock is needed or preferred over local_clock().

Best Regards,
Petr

[PATCH net 1/1] net: stmmac: Fix clock handling on remove path

2020-09-25 Thread Wong Vee Khee

While unloading the dwmac-intel driver, clk_disable_unprepare() is
being called twice in stmmac_dvr_remove() and
intel_eth_pci_remove(). This causes kernel panic on the second call.

Removing the second call of clk_disable_unprepare() in
intel_eth_pci_remove().

Fixes: 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove 
paths")
Cc: Andy Shevchenko 
Reviewed-by: Voon Weifeng 
Signed-off-by: Wong Vee Khee 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
index 2ac9dfb3462c..9e6d60e75f85 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
@@ -653,7 +653,6 @@ static void intel_eth_pci_remove(struct pci_dev *pdev)
 
pci_free_irq_vectors(pdev);
 
-   clk_disable_unprepare(priv->plat->stmmac_clk);
clk_unregister_fixed_rate(priv->plat->stmmac_clk);
 
pcim_iounmap_regions(pdev, BIT(0));
-- 
2.17.0

[PATCH 1/2] iommu/iova: Flush CPU rcache for when a depot fills

2020-09-25 Thread John Garry

Leizhen reported some time ago that IOVA performance may degrade over time
[0], but unfortunately his solution to fix this problem was not given
attention.

To summarize, the issue is that as time goes by, the CPU rcache and depot
rcache continue to grow. As such, IOVA RB tree access time also continues
to grow.

At a certain point, a depot may become full, and also some CPU rcaches may
also be full when we try to insert another IOVA. For this scenario,
currently we free the "loaded" CPU rcache and create a new one. This
free'ing means that we need to free many IOVAs in the RB tree, which
makes IO throughput performance fall off a cliff in our storage scenario:

Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops]

And continue in this fashion, without recovering. Note that in this
example we had to wait 16 hours for this to occur. Also note that IO
throughput also becomes gradually becomes more unstable leading up to this
point.

As a solution this issue, we judge that the IOVA rcaches have grown too
big, and just flush all the CPUs rcaches instead.

The depot rcaches, however, are not flushed, as they can be used to
immediately replenish active CPUs.

In future, some IOVA rcache compaction could be implemented to solve the
instabilty issue, which I figure could be quite complex to implement.

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/

Reported-by: Xiang Chen 
Tested-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 45a251da5453..05e0b462e0d9 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -892,9 +892,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 struct iova_rcache *rcache,
 unsigned long iova_pfn)
 {
-   struct iova_magazine *mag_to_free = NULL;
struct iova_cpu_rcache *cpu_rcache;
-   bool can_insert = false;
+   bool can_insert = false, flush = false;
unsigned long flags;
 
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
@@ -913,13 +912,19 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,
if (rcache->depot_size < MAX_GLOBAL_MAGS) {
rcache->depot[rcache->depot_size++] =
cpu_rcache->loaded;
+   can_insert = true;
+   cpu_rcache->loaded = new_mag;
} else {
-   mag_to_free = cpu_rcache->loaded;
+   /*
+* The depot is full, meaning that a very large
+* cache of IOVAs has built up, which slows
+* down RB tree accesses significantly
+* -> let's flush at this point.
+*/
+   flush = true;
+   iova_magazine_free(new_mag);
}
spin_unlock(&rcache->lock);
-
-   cpu_rcache->loaded = new_mag;
-   can_insert = true;
}
}
 
@@ -928,9 +933,11 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 
spin_unlock_irqrestore(&cpu_rcache->lock, flags);
 
-   if (mag_to_free) {
-   iova_magazine_free_pfns(mag_to_free, iovad);
-   iova_magazine_free(mag_to_free);
+   if (flush)

[PATCH 2/2] iommu: avoid taking iova_rbtree_lock twice

2020-09-25 Thread John Garry

From: Cong Wang 

Both find_iova() and __free_iova() take iova_rbtree_lock,
there is no reason to take and release it twice inside
free_iova().

Fold them into one critical section by calling the unlock
versions instead.

Signed-off-by: Cong Wang 
Reviewed-by: Robin Murphy 
Tested-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 05e0b462e0d9..921e80f64ae5 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -390,10 +390,14 @@ EXPORT_SYMBOL_GPL(__free_iova);
 void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
-   struct iova *iova = find_iova(iovad, pfn);
+   unsigned long flags;
+   struct iova *iova;
 
+   spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn);
if (iova)
-   __free_iova(iovad, iova);
+   private_free_iova(iovad, iova);
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
 
 }
 EXPORT_SYMBOL_GPL(free_iova);
-- 
2.26.2

[PATCH 0/3] slimbus: fixes for 5.10

2020-09-25 Thread Srinivas Kandagatla

Hi Greg, 

Here are few fixes found recently while testing Qualcomm SSR (SubSystem Restart)
feature on SDM845 SoC. Mostly the fixes are around when device absense is 
reported.

If its not too late, can you take them for 5.10.

Many thanks,
Srini


Srinivas Kandagatla (3):
  slimbus: core: check get_addr before removing laddr ida
  slimbus: core: do not enter to clock pause mode in core
  slimbus: qcom-ngd-ctrl: disable ngd in qmi server down callback

 drivers/slimbus/core.c  | 6 ++
 drivers/slimbus/qcom-ngd-ctrl.c | 4 
 2 files changed, 6 insertions(+), 4 deletions(-)

-- 
2.21.0

Re: [PATCH 8/9] sched: Fix migrate_disable() vs set_cpus_allowed_ptr()

2020-09-25 Thread Peter Zijlstra

On Fri, Sep 25, 2020 at 11:05:28AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 24, 2020 at 08:59:33PM +0100, Valentin Schneider wrote:
> > > @@ -2025,19 +2138,8 @@ static int __set_cpus_allowed_ptr(struct
> > >   if (cpumask_test_cpu(task_cpu(p), new_mask))
> > >   goto out;
> > 
> > I think this needs a cancellation of any potential pending migration
> > requests. Consider a task P0 running on CPU0:
> > 
> >P0 P1   P2
> > 
> >migrate_disable();
> >
> >   set_cpus_allowed_ptr(P0, CPU1);
> >   // waits for completion
> >
> > set_cpus_allowed_ptr(P0, CPU0);
> >// Already good, 
> > no waiting for completion
> >
> >migrate_enable();
> >// task_cpu(p) allowed, no move_task()
> > 
> > AIUI in this scenario P1 would stay forever waiting.
> 

> The other approach is trying to handle that last condition in
> move_task(), but I'm quite sure that's going to be aweful too :/

Something like so perhaps?

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2039,6 +2039,10 @@ static int move_task(struct rq *rq, stru
if (WARN_ON_ONCE(!pending))
return -EINVAL;
 
+   /* Can the task run on the task's current CPU? If so, we're done */
+   if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
+   goto easy;
+
arg.done = &pending->done;
 
if (flags & SCA_MIGRATE_ENABLE) {
@@ -2063,6 +2067,7 @@ static int move_task(struct rq *rq, stru
if (task_on_rq_queued(p))
rq = move_queued_task(rq, rf, p, dest_cpu);
 
+easy:
p->migration_pending = NULL;
complete = true;
}
@@ -2151,10 +2156,6 @@ static int __set_cpus_allowed_ptr(struct
p->nr_cpus_allowed != 1);
}
 
-   /* Can the task run on the task's current CPU? If so, we're done */
-   if (cpumask_test_cpu(task_cpu(p), new_mask))
-   goto out;
-
return move_task(rq, &rf, p, dest_cpu, flags);
 
 out:

[PATCH 3/3] slimbus: qcom-ngd-ctrl: disable ngd in qmi server down callback

2020-09-25 Thread Srinivas Kandagatla

In QMI new server notification we enable the NGD however during
delete server notification we do not disable the NGD.

This can lead to multiple instances of NGD being enabled, so make
sure that we disable NGD in delete server callback to fix this issue!

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Signed-off-by: Srinivas Kandagatla 
---
 drivers/slimbus/qcom-ngd-ctrl.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/slimbus/qcom-ngd-ctrl.c b/drivers/slimbus/qcom-ngd-ctrl.c
index 743ee7b4e63f..218aefc3531c 100644
--- a/drivers/slimbus/qcom-ngd-ctrl.c
+++ b/drivers/slimbus/qcom-ngd-ctrl.c
@@ -1277,9 +1277,13 @@ static void qcom_slim_ngd_qmi_del_server(struct 
qmi_handle *hdl,
 {
struct qcom_slim_ngd_qmi *qmi =
container_of(hdl, struct qcom_slim_ngd_qmi, svc_event_hdl);
+   struct qcom_slim_ngd_ctrl *ctrl =
+   container_of(qmi, struct qcom_slim_ngd_ctrl, qmi);
 
qmi->svc_info.sq_node = 0;
qmi->svc_info.sq_port = 0;
+
+   qcom_slim_ngd_enable(ctrl, false);
 }
 
 static struct qmi_ops qcom_slim_ngd_qmi_svc_event_ops = {
-- 
2.21.0

[PATCH bpf-next 2/4] selftests: bpf: Add helper to compare socket cookies

2020-09-25 Thread Lorenz Bauer

We compare socket cookies to ensure that insertion into a sockmap worked.
Pull this out into a helper function for use in other tests.

Signed-off-by: Lorenz Bauer 
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 50 +--
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c 
b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 4b7a527e7e82..67d3301bdf81 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -50,6 +50,37 @@ static int connected_socket_v4(void)
return -1;
 }
 
+static void compare_cookies(struct bpf_map *src, struct bpf_map *dst)
+{
+   __u32 i, max_entries = bpf_map__max_entries(src);
+   int err, duration, src_fd, dst_fd;
+
+   src_fd = bpf_map__fd(src);
+   dst_fd = bpf_map__fd(dst);
+
+   for (i = 0; i < max_entries; i++) {
+   __u64 src_cookie, dst_cookie;
+
+   err = bpf_map_lookup_elem(src_fd, &i, &src_cookie);
+   if (err && errno == ENOENT) {
+   err = bpf_map_lookup_elem(dst_fd, &i, &dst_cookie);
+   CHECK(!err, "map_lookup_elem(dst)", "element %u not 
deleted\n", i);
+   CHECK(err && errno != ENOENT, "map_lookup_elem(dst)", 
"%s\n",
+ strerror(errno));
+   continue;
+   }
+   if (CHECK(err, "lookup_elem(src)", "%s\n", strerror(errno)))
+   continue;
+
+   err = bpf_map_lookup_elem(dst_fd, &i, &dst_cookie);
+   if (CHECK(err, "lookup_elem(dst)", "%s\n", strerror(errno)))
+   continue;
+
+   CHECK(dst_cookie != src_cookie, "cookie mismatch",
+ "%llu != %llu (pos %u)\n", dst_cookie, src_cookie, i);
+   }
+}
+
 /* Create a map, populate it with one socket, and free the map. */
 static void test_sockmap_create_update_free(enum bpf_map_type map_type)
 {
@@ -109,9 +140,9 @@ static void test_skmsg_helpers(enum bpf_map_type map_type)
 static void test_sockmap_update(enum bpf_map_type map_type)
 {
struct bpf_prog_test_run_attr tattr;
-   int err, prog, src, dst, duration = 0;
+   int err, prog, src, duration = 0;
struct test_sockmap_update *skel;
-   __u64 src_cookie, dst_cookie;
+   struct bpf_map *dst_map;
const __u32 zero = 0;
char dummy[14] = {0};
__s64 sk;
@@ -127,18 +158,14 @@ static void test_sockmap_update(enum bpf_map_type 
map_type)
prog = bpf_program__fd(skel->progs.copy_sock_map);
src = bpf_map__fd(skel->maps.src);
if (map_type == BPF_MAP_TYPE_SOCKMAP)
-   dst = bpf_map__fd(skel->maps.dst_sock_map);
+   dst_map = skel->maps.dst_sock_map;
else
-   dst = bpf_map__fd(skel->maps.dst_sock_hash);
+   dst_map = skel->maps.dst_sock_hash;
 
err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST);
if (CHECK(err, "update_elem(src)", "errno=%u\n", errno))
goto out;
 
-   err = bpf_map_lookup_elem(src, &zero, &src_cookie);
-   if (CHECK(err, "lookup_elem(src, cookie)", "errno=%u\n", errno))
-   goto out;
-
tattr = (struct bpf_prog_test_run_attr){
.prog_fd = prog,
.repeat = 1,
@@ -151,12 +178,7 @@ static void test_sockmap_update(enum bpf_map_type map_type)
   "errno=%u retval=%u\n", errno, tattr.retval))
goto out;
 
-   err = bpf_map_lookup_elem(dst, &zero, &dst_cookie);
-   if (CHECK(err, "lookup_elem(dst, cookie)", "errno=%u\n", errno))
-   goto out;
-
-   CHECK(dst_cookie != src_cookie, "cookie mismatch", "%llu != %llu\n",
- dst_cookie, src_cookie);
+   compare_cookies(skel->maps.src, dst_map);
 
 out:
test_sockmap_update__destroy(skel);
-- 
2.25.1

[PATCH 1/3] slimbus: core: check get_addr before removing laddr ida

2020-09-25 Thread Srinivas Kandagatla

logical address can be either assigned by the SLIMBus controller or the core.
Core uses IDA in cases where get_addr callback is not provided by the
controller.
Core already has this check while allocating IDR, however during absence
reporting this is not checked. This patch fixes this issue.

Fixes: 46a2bb5a7f7ea("slimbus: core: Add slim controllers support")
Signed-off-by: Srinivas Kandagatla 
---
 drivers/slimbus/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/slimbus/core.c b/drivers/slimbus/core.c
index ae1e248a8fb8..58b63ae0e75a 100644
--- a/drivers/slimbus/core.c
+++ b/drivers/slimbus/core.c
@@ -326,8 +326,8 @@ void slim_report_absent(struct slim_device *sbdev)
mutex_lock(&ctrl->lock);
sbdev->is_laddr_valid = false;
mutex_unlock(&ctrl->lock);
-
-   ida_simple_remove(&ctrl->laddr_ida, sbdev->laddr);
+   if (!ctrl->get_laddr)
+   ida_simple_remove(&ctrl->laddr_ida, sbdev->laddr);
slim_device_update_status(sbdev, SLIM_DEVICE_STATUS_DOWN);
 }
 EXPORT_SYMBOL_GPL(slim_report_absent);
-- 
2.21.0

[PATCH 0/2] iommu/iova: Solve longterm IOVA issue

2020-09-25 Thread John Garry

This series contains a patch to solve the longterm IOVA issue which
leizhen originally tried to address at [0].

I also included the small optimisation from Cong Wang, which never seems
to be have been accepted [1]. There was some debate of the other patches
in that series, but this one is quite straightforward.

@Cong Wang, Please resend your series if prefer I didn't upstream your
patch.

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/4b74d40a-22d1-af53-fcb6-5d7018370...@huawei.com/

Cong Wang (1):
  iommu: avoid taking iova_rbtree_lock twice

John Garry (1):
  iommu/iova: Flush CPU rcache for when a depot fills

 drivers/iommu/iova.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

-- 
2.26.2

[PATCH bpf-next 3/4] bpf: selftests: remove shared header from sockmap iter test

2020-09-25 Thread Lorenz Bauer

The shared header to define SOCKMAP_MAX_ENTRIES is a bit overkill.
Dynamically allocate the sock_fd array based on bpf_map__max_entries
instead.

Suggested-by: Yonghong Song 
Signed-off-by: Lorenz Bauer 
Acked-by: Yonghong Song 
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 36 +--
 .../selftests/bpf/progs/bpf_iter_sockmap.c|  5 ++-
 .../selftests/bpf/progs/bpf_iter_sockmap.h|  3 --
 3 files changed, 20 insertions(+), 24 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c 
b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 67d3301bdf81..e8a4bfb4d9f4 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -8,8 +8,6 @@
 #include "test_sockmap_invalid_update.skel.h"
 #include "bpf_iter_sockmap.skel.h"
 
-#include "progs/bpf_iter_sockmap.h"
-
 #define TCP_REPAIR 19  /* TCP sock is under repair right now */
 
 #define TCP_REPAIR_ON  1
@@ -201,9 +199,9 @@ static void test_sockmap_iter(enum bpf_map_type map_type)
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
int err, len, src_fd, iter_fd, duration = 0;
union bpf_iter_link_info linfo = {0};
-   __s64 sock_fd[SOCKMAP_MAX_ENTRIES];
-   __u32 i, num_sockets, max_elems;
+   __u32 i, num_sockets, num_elems;
struct bpf_iter_sockmap *skel;
+   __s64 *sock_fd = NULL;
struct bpf_link *link;
struct bpf_map *src;
char buf[64];
@@ -212,22 +210,23 @@ static void test_sockmap_iter(enum bpf_map_type map_type)
if (CHECK(!skel, "bpf_iter_sockmap__open_and_load", "skeleton 
open_and_load failed\n"))
return;
 
-   for (i = 0; i < ARRAY_SIZE(sock_fd); i++)
-   sock_fd[i] = -1;
-
-   /* Make sure we have at least one "empty" entry to test iteration of
-* an empty slot.
-*/
-   num_sockets = ARRAY_SIZE(sock_fd) - 1;
-
if (map_type == BPF_MAP_TYPE_SOCKMAP) {
src = skel->maps.sockmap;
-   max_elems = bpf_map__max_entries(src);
+   num_elems = bpf_map__max_entries(src);
+   num_sockets = num_elems - 1;
} else {
src = skel->maps.sockhash;
-   max_elems = num_sockets;
+   num_elems = bpf_map__max_entries(src) - 1;
+   num_sockets = num_elems;
}
 
+   sock_fd = calloc(num_sockets, sizeof(*sock_fd));
+   if (CHECK(!sock_fd, "calloc(sock_fd)", "failed to allocate\n"))
+   goto out;
+
+   for (i = 0; i < num_sockets; i++)
+   sock_fd[i] = -1;
+
src_fd = bpf_map__fd(src);
 
for (i = 0; i < num_sockets; i++) {
@@ -258,8 +257,8 @@ static void test_sockmap_iter(enum bpf_map_type map_type)
goto close_iter;
 
/* test results */
-   if (CHECK(skel->bss->elems != max_elems, "elems", "got %u expected 
%u\n",
- skel->bss->elems, max_elems))
+   if (CHECK(skel->bss->elems != num_elems, "elems", "got %u expected 
%u\n",
+ skel->bss->elems, num_elems))
goto close_iter;
 
if (CHECK(skel->bss->socks != num_sockets, "socks", "got %u expected 
%u\n",
@@ -271,10 +270,11 @@ static void test_sockmap_iter(enum bpf_map_type map_type)
 free_link:
bpf_link__destroy(link);
 out:
-   for (i = 0; i < num_sockets; i++) {
+   for (i = 0; sock_fd && i < num_sockets; i++)
if (sock_fd[i] >= 0)
close(sock_fd[i]);
-   }
+   if (sock_fd)
+   free(sock_fd);
bpf_iter_sockmap__destroy(skel);
 }
 
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
index 0e27f73dd803..1af7555f6057 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
@@ -2,7 +2,6 @@
 /* Copyright (c) 2020 Cloudflare */
 #include "bpf_iter.h"
 #include "bpf_tracing_net.h"
-#include "bpf_iter_sockmap.h"
 #include 
 #include 
 #include 
@@ -11,14 +10,14 @@ char _license[] SEC("license") = "GPL";
 
 struct {
__uint(type, BPF_MAP_TYPE_SOCKMAP);
-   __uint(max_entries, SOCKMAP_MAX_ENTRIES);
+   __uint(max_entries, 64);
__type(key, __u32);
__type(value, __u64);
 } sockmap SEC(".maps");
 
 struct {
__uint(type, BPF_MAP_TYPE_SOCKHASH);
-   __uint(max_entries, SOCKMAP_MAX_ENTRIES);
+   __uint(max_entries, 64);
__type(key, __u32);
__type(value, __u64);
 } sockhash SEC(".maps");
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h 
b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
deleted file mode 100644
index 35a675d13c0f..
--- a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
+++ /dev/null
@@ -1,3 +0,0 @@
-/* SPDX-Lic

[PATCH bpf-next 1/4] bpf: sockmap: enable map_update_elem from bpf_iter

2020-09-25 Thread Lorenz Bauer

Allow passing a pointer to a BTF struct sock_common* when updating
a sockmap or sockhash. Since BTF pointers can fault and therefore be
NULL at runtime we need to add an additional !sk check to
sock_map_update_elem. Doing this allows calling map_update_elem on
sockmap from bpf_iter context, which uses BTF pointers.

Signed-off-by: Lorenz Bauer 
---
 kernel/bpf/verifier.c | 2 +-
 net/core/sock_map.c   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d4ba29fb17a6..5bd0239da8b6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3943,7 +3943,7 @@ static int resolve_map_arg_type(struct bpf_verifier_env 
*env,
case BPF_MAP_TYPE_SOCKMAP:
case BPF_MAP_TYPE_SOCKHASH:
if (*arg_type == ARG_PTR_TO_MAP_VALUE) {
-   *arg_type = ARG_PTR_TO_SOCKET;
+   *arg_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON;
} else {
verbose(env, "invalid arg_type for sockmap/sockhash\n");
return -EINVAL;
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index e1f05e3fa1d0..497e7df466d4 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -610,6 +610,9 @@ static int sock_map_update_elem(struct bpf_map *map, void 
*key,
struct sock *sk = (struct sock *)value;
int ret;
 
+   if (unlikely(!sk))
+   return -EINVAL;
+
if (!sock_map_sk_is_suitable(sk))
return -EOPNOTSUPP;
 
-- 
2.25.1

[PATCH 2/3] slimbus: core: do not enter to clock pause mode in core

2020-09-25 Thread Srinivas Kandagatla

Let the controller logic decide when to enter into clock pause mode!
Entering in to pause mode during unregistration does not really make
sense as the controller is totally going down at that point in time.

Fixes: 4b14e62ad3c9e ("slimbus: Add support for 'clock-pause' feature")
Signed-off-by: Srinivas Kandagatla 
---
 drivers/slimbus/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/slimbus/core.c b/drivers/slimbus/core.c
index 58b63ae0e75a..1d2bc181da05 100644
--- a/drivers/slimbus/core.c
+++ b/drivers/slimbus/core.c
@@ -301,8 +301,6 @@ int slim_unregister_controller(struct slim_controller *ctrl)
 {
/* Remove all clients */
device_for_each_child(ctrl->dev, NULL, slim_ctrl_remove_device);
-   /* Enter Clock Pause */
-   slim_ctrl_clk_pause(ctrl, false, 0);
ida_simple_remove(&ctrl_ida, ctrl->id);
 
return 0;
-- 
2.21.0

[PATCH bpf-next 4/4] selftest: bpf: Test copying a sockmap and sockhash

2020-09-25 Thread Lorenz Bauer

Since we can now call map_update_elem(sockmap) from bpf_iter context
it's possible to copy a sockmap or sockhash in the kernel. Add a
selftest which exercises this.

Signed-off-by: Lorenz Bauer 
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 14 +-
 .../selftests/bpf/progs/bpf_iter_sockmap.c| 27 +++
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c 
b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index e8a4bfb4d9f4..854a508e81ce 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -194,7 +194,7 @@ static void test_sockmap_invalid_update(void)
test_sockmap_invalid_update__destroy(skel);
 }
 
-static void test_sockmap_iter(enum bpf_map_type map_type)
+static void test_sockmap_copy(enum bpf_map_type map_type)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
int err, len, src_fd, iter_fd, duration = 0;
@@ -242,7 +242,7 @@ static void test_sockmap_iter(enum bpf_map_type map_type)
linfo.map.map_fd = src_fd;
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
-   link = bpf_program__attach_iter(skel->progs.count_elems, &opts);
+   link = bpf_program__attach_iter(skel->progs.copy, &opts);
if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
goto out;
 
@@ -265,6 +265,8 @@ static void test_sockmap_iter(enum bpf_map_type map_type)
  skel->bss->socks, num_sockets))
goto close_iter;
 
+   compare_cookies(src, skel->maps.dst);
+
 close_iter:
close(iter_fd);
 free_link:
@@ -294,8 +296,8 @@ void test_sockmap_basic(void)
test_sockmap_update(BPF_MAP_TYPE_SOCKHASH);
if (test__start_subtest("sockmap update in unsafe context"))
test_sockmap_invalid_update();
-   if (test__start_subtest("sockmap iter"))
-   test_sockmap_iter(BPF_MAP_TYPE_SOCKMAP);
-   if (test__start_subtest("sockhash iter"))
-   test_sockmap_iter(BPF_MAP_TYPE_SOCKHASH);
+   if (test__start_subtest("sockmap copy"))
+   test_sockmap_copy(BPF_MAP_TYPE_SOCKMAP);
+   if (test__start_subtest("sockhash copy"))
+   test_sockmap_copy(BPF_MAP_TYPE_SOCKHASH);
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
index 1af7555f6057..f3af0e30cead 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
@@ -22,21 +22,38 @@ struct {
__type(value, __u64);
 } sockhash SEC(".maps");
 
+struct {
+   __uint(type, BPF_MAP_TYPE_SOCKHASH);
+   __uint(max_entries, 64);
+   __type(key, __u32);
+   __type(value, __u64);
+} dst SEC(".maps");
+
 __u32 elems = 0;
 __u32 socks = 0;
 
 SEC("iter/sockmap")
-int count_elems(struct bpf_iter__sockmap *ctx)
+int copy(struct bpf_iter__sockmap *ctx)
 {
struct sock *sk = ctx->sk;
__u32 tmp, *key = ctx->key;
int ret;
 
-   if (key)
-   elems++;
+   if (!key)
+   return 0;
 
-   if (sk)
+   elems++;
+
+   /* We need a temporary buffer on the stack, since the verifier doesn't
+* let us use the pointer from the context as an argument to the helper.
+*/
+   tmp = *key;
+
+   if (sk) {
socks++;
+   return bpf_map_update_elem(&dst, &tmp, sk, 0) != 0;
+   }
 
-   return 0;
+   ret = bpf_map_delete_elem(&dst, &tmp);
+   return ret && ret != -ENOENT;
 }
-- 
2.25.1

linux-next: manual merge of the akpm-current tree with the block tree

2020-09-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/page-writeback.c

between commit:

  1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag")

from the block tree and commit:

  7a3714df632a ("mm/page-writeback: support tail pages in wait_for_stable_page")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/page-writeback.c
index 358d6f28c627,dac075e451d3..
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@@ -2849,7 -2849,8 +2849,8 @@@ EXPORT_SYMBOL_GPL(wait_on_page_writebac
   */
  void wait_for_stable_page(struct page *page)
  {
+   page = thp_head(page);
 -  if (bdi_cap_stable_pages_required(inode_to_bdi(page->mapping->host)))
 +  if (page->mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
wait_on_page_writeback(page);
  }
  EXPORT_SYMBOL_GPL(wait_for_stable_page);


pgppn5PswwnAC.pgp
Description: OpenPGP digital signature

Re: [PATCH v9 11/20] gpiolib: cdev: support GPIO_V2_LINE_SET_VALUES_IOCTL

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 3:46 PM Kent Gibson  wrote:
> On Thu, Sep 24, 2020 at 03:32:48PM +0800, Kent Gibson wrote:
> > On Wed, Sep 23, 2020 at 07:18:08PM +0300, Andy Shevchenko wrote:
> > > On Tue, Sep 22, 2020 at 5:36 AM Kent Gibson  wrote:
> > > >
> > > > Add support for the GPIO_V2_LINE_SET_VALUES_IOCTL.
> > >
> > > > +static long linereq_set_values_unlocked(struct linereq *lr,
> > > > +   struct gpio_v2_line_values *lv)
> > > > +{
> > > > +   DECLARE_BITMAP(vals, GPIO_V2_LINES_MAX);
> > > > +   struct gpio_desc **descs;
> > > > +   unsigned int i, didx, num_set;
> > > > +   int ret;
> > > > +
> > > > +   bitmap_zero(vals, GPIO_V2_LINES_MAX);
> > > > +   for (num_set = 0, i = 0; i < lr->num_lines; i++) {
> > > > +   if (lv->mask & BIT_ULL(i)) {
> > >
> > > Similar idea
> > >
> > > DECLARE_BITMAP(mask, 64) = BITMAP_FROM_U64(lv->mask);
> > >
> > > num_set = bitmap_weight();
> > >
> >
> > I had played with this option, but bitmap_weight() counts all
> > the bits set in the mask - which considers bits >= lr->num_lines.
> > So you would need to mask lv->mask before converting it to a bitmap.
> > (I'm ok with ignoring those bits in case userspace wants to be lazy and
> > use an all 1s mask.)
> >
> > But since we're looping over the bitmap anyway we may as well just
> > count as we go.
> >
> > > for_each_set_bit(i, mask, lr->num_lines)
> > >
> >
> > Yeah, that should work.  I vaguely recall trying this and finding it
> > generated larger object code, but I'll give it another try and if it
> > works out then include it in v10.
> >
>
> Tried it again and, while it works, it does increase the size of
> gpiolib-cdev.o as follows:
>
>   u64   ->   bitmap
> x86_64   28360   28616
> i386 22056   22100
> aarch64  37392   37600
> mips32   28008   28016

Yes, that's pity... See below.

> So for 64-bit platforms changing to bitmap generates larger code,
> probably as we are forcing them to use 32-bit array semantics where
> before they could use the native u64.  For 32-bit there is a much
> smaller difference as they were already using 32-bit array semantics
> to realise the u64.
>
> Those are for some of my test builds, so obviously YMMV.
>
> It is also only for changing linereq_get_values(), which has three
> instances of the loop.  linereq_set_values_unlocked() has another two,
> so you could expect another increase of ~2/3 of that seen here if we
> change that as well.
>
> The sizeable increase in x86_64 was what made me revert this last time,
> and I'm still satisfied with that choice.  Are you still eager to switch
> to for_each_set_bit()?

I already asked once about short cut for for_each_set_bit in case of
constant nbits parameter when it's <= BITS_PER_LONG, but here it seems
we have variadic amount of lines, dunno if compiler can prove that
it's smaller than long. In any case my point is that code readability
has a preference vs. memory footprint (except hot paths) and if we are
going to fix this it should be done in general. That said, if
maintainers are okay with that I would prefer bitmap API over
open-coded pieces.

Also note, that it will be easier to extend in the future if needed
(if we want to have more than BITS_PER_LONG [64] lines to handle).

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 02/13] iommu: amd: Prepare for generic IO page table framework

2020-09-25 Thread Suravee Suthikulpanit


Robin,

On 9/24/20 7:25 PM, Robin Murphy wrote:

+struct io_pgtable_ops *amd_iommu_setup_io_pgtable_ops(struct iommu_dev_data 
*dev_data,
+ struct protection_domain *domain)
+{
+domain->iop.pgtbl_cfg = (struct io_pgtable_cfg) {
+.pgsize_bitmap= AMD_IOMMU_PGSIZES,
+.ias= IOMMU_IN_ADDR_BIT_SIZE,
+.oas= IOMMU_OUT_ADDR_BIT_SIZE,
+.coherent_walk= false,


Is that right? Given that you seem to use regular kernel addresses for pagetable pages and don't have any obvious cache 
maintenance around PTE manipulation, I suspect not ;)

> It's fair enough if your implementation doesn't use this and simply assumes 
coherency, but in that case it would be less
confusing to have the driver set it to true for the sake of honesty, or just leave it out 
entirely - explicitly setting false gives the illusion of being meaningful.


AMD IOMMU can be configured to disable snoop for page table walk of a particular device (DTE[SD]=1). However, the 
current Linux driver does not set this bit, which should assume coherency. We can just leaving this out for now. I can 
remove this when I send out V2 along w/ other changes.


Otherwise, the io-pgtable parts all look OK to me - it's nice to finally 
fulfil the original intent of not being an Arm-specific thing :D


Robin.


Thanks,
Suravee

Re: [PATCH] KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd()

2020-09-25 Thread Vitaly Kuznetsov

Sean Christopherson  writes:

> Explicitly check that kvm_x86_ops.tlb_remote_flush() points at Hyper-V's
> implementation for PV flushing instead of assuming that a non-NULL
> implementation means running on Hyper-V.  Wrap the related logic in
> ifdeffery as hv_remote_flush_tlb() is defined iff CONFIG_HYPERV!=n.
>
> Short term, the explicit check makes it more obvious why a non-NULL
> tlb_remote_flush() triggers EPTP shenanigans.  Long term, this will
> allow TDX to define its own implementation of tlb_remote_flush() without
> running afoul of Hyper-V.
>
> Cc: Vitaly Kuznetsov 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/vmx/vmx.c | 7 +--
>  arch/x86/kvm/vmx/vmx.h | 2 ++
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 6f9a0c6d5dc5..a56fa9451b84 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3073,14 +3073,15 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, 
> unsigned long pgd,
>   eptp = construct_eptp(vcpu, pgd, pgd_level);
>   vmcs_write64(EPT_POINTER, eptp);
>  
> - if (kvm_x86_ops.tlb_remote_flush) {
> +#if IS_ENABLED(CONFIG_HYPERV)
> + if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
>   spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
>   to_vmx(vcpu)->ept_pointer = eptp;
>   to_kvm_vmx(kvm)->ept_pointers_match
>   = EPT_POINTERS_CHECK;
>   spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
>   }
> -
> +#endif
>   if (!enable_unrestricted_guest && !is_paging(vcpu))
>   guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
>   else if (test_bit(VCPU_EXREG_CR3, (ulong 
> *)&vcpu->arch.regs_avail))
> @@ -6956,7 +6957,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
>  
>  static int vmx_vm_init(struct kvm *kvm)
>  {
> +#if IS_ENABLED(CONFIG_HYPERV)
>   spin_lock_init(&to_kvm_vmx(kvm)->ept_pointer_lock);
> +#endif
>  
>   if (!ple_gap)
>   kvm->arch.pause_in_guest = true;
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index d7ec66db5eb8..51107b7309bc 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -316,8 +316,10 @@ struct kvm_vmx {
>   bool ept_identity_pagetable_done;
>   gpa_t ept_identity_map_addr;
>  
> +#if IS_ENABLED(CONFIG_HYPERV)
>   enum ept_pointers_status ept_pointers_match;
>   spinlock_t ept_pointer_lock;
> +#endif

Reviewed-by: Vitaly Kuznetsov 

In case ept_pointers_match/ept_pointer_lock are useless for TDX we may
want to find better names for them to make it clear this is a Hyper-V
thingy (e.g. something like hv_tlb_ept_match/hv_tlb_ept_lock).

>  };
>  
>  bool nested_vmx_allowed(struct kvm_vcpu *vcpu);

-- 
Vitaly

Re: [PATCH] lib/mpi: Fix unused variable warnings

2020-09-25 Thread Tianjia Zhang


Hi Herbert,

Thanks for your patch, I will do a test later.

By the way, did you add special compilation parameters? I compile 
normally without warnings in gcc 6.5 and 9.3.


Best regards,
Tianjia

On 9/25/20 4:19 PM, Herbert Xu wrote:

On Mon, Sep 21, 2020 at 12:20:55AM +0800, Tianjia Zhang wrote:

Expand the mpi library based on libgcrypt, and the ECC algorithm of
mpi based on libgcrypt requires these functions.
Some other algorithms will be developed based on mpi ecc, such as SM2.

Signed-off-by: Tianjia Zhang 
Tested-by: Xufeng Zhang 


This creates some compiler warnings.

---8<---
This patch removes a number of unused variables and marks others
as unused in order to silence compiler warnings about them.

Fixes: a8ea8bdd9df9 ("lib/mpi: Extend the MPI library")
Signed-off-by: Herbert Xu 

diff --git a/lib/mpi/mpi-div.c b/lib/mpi/mpi-div.c
index 21332dab97d4..45beab8b9e9e 100644
--- a/lib/mpi/mpi-div.c
+++ b/lib/mpi/mpi-div.c
@@ -92,7 +92,6 @@ void mpi_tdiv_qr(MPI quot, MPI rem, MPI num, MPI den)
unsigned int normalization_steps;
mpi_limb_t q_limb;
mpi_ptr_t marker[5];
-   unsigned int marker_nlimbs[5];
int markidx = 0;
  
  	/* Ensure space is enough for quotient and remainder.

@@ -152,7 +151,6 @@ void mpi_tdiv_qr(MPI quot, MPI rem, MPI num, MPI den)
 * numerator would be gradually overwritten by the quotient 
limbs.
 */
if (qp == np) { /* Copy NP object to temporary space.  */
-   marker_nlimbs[markidx] = nsize;
np = marker[markidx++] = mpi_alloc_limb_space(nsize);
MPN_COPY(np, qp, nsize);
}
@@ -173,7 +171,6 @@ void mpi_tdiv_qr(MPI quot, MPI rem, MPI num, MPI den)
 * the most significant word.  Use temporary storage not to 
clobber
 * the original contents of the denominator.
 */
-   marker_nlimbs[markidx] = dsize;
tp = marker[markidx++] = mpi_alloc_limb_space(dsize);
mpihelp_lshift(tp, dp, dsize, normalization_steps);
dp = tp;
@@ -195,7 +192,6 @@ void mpi_tdiv_qr(MPI quot, MPI rem, MPI num, MPI den)
if (dp == rp || (quot && (dp == qp))) {
mpi_ptr_t tp;
  
-			marker_nlimbs[markidx] = dsize;

tp = marker[markidx++] = mpi_alloc_limb_space(dsize);
MPN_COPY(tp, dp, dsize);
dp = tp;
diff --git a/lib/mpi/mpi-internal.h b/lib/mpi/mpi-internal.h
index d29c4537c3a3..554002182db1 100644
--- a/lib/mpi/mpi-internal.h
+++ b/lib/mpi/mpi-internal.h
@@ -114,7 +114,7 @@ typedef int mpi_size_t; /* (must be a signed 
type) */
   */
  #define UDIV_QRNND_PREINV(q, r, nh, nl, d, di)
\
do {\
-   mpi_limb_t _ql; \
+   mpi_limb_t _ql __maybe_unused;  \
mpi_limb_t _q, _r;  \
mpi_limb_t _xh, _xl;\
umul_ppmm(_q, _ql, (nh), (di)); \
diff --git a/lib/mpi/mpi-mul.c b/lib/mpi/mpi-mul.c
index 587e6335cc12..8f5fa200f297 100644
--- a/lib/mpi/mpi-mul.c
+++ b/lib/mpi/mpi-mul.c
@@ -21,7 +21,6 @@ void mpi_mul(MPI w, MPI u, MPI v)
int usign, vsign, sign_product;
int assign_wp = 0;
mpi_ptr_t tmp_limb = NULL;
-   unsigned int tmp_limb_nlimbs = 0;
  
  	if (u->nlimbs < v->nlimbs) {

/* Swap U and V. */
@@ -55,7 +54,6 @@ void mpi_mul(MPI w, MPI u, MPI v)
} else { /* Make U and V not overlap with W.*/
if (wp == up) {
/* W and U are identical.  Allocate temporary space for 
U. */
-   tmp_limb_nlimbs = usize;
up = tmp_limb = mpi_alloc_limb_space(usize);
/* Is V identical too?  Keep it identical with U.  */
if (wp == vp)
@@ -64,7 +62,6 @@ void mpi_mul(MPI w, MPI u, MPI v)
MPN_COPY(up, wp, usize);
} else if (wp == vp) {
/* W and V are identical.  Allocate temporary space for 
V. */
-   tmp_limb_nlimbs = vsize;
vp = tmp_limb = mpi_alloc_limb_space(vsize);
/* Copy to the temporary space.  */
MPN_COPY(vp, wp, vsize);
diff --git a/lib/mpi/mpih-div.c b/lib/mpi/mpih-div.c
index 182a656a1ba0..be70ee2e42d3 100644
--- a/lib/mpi/mpih-div.c
+++ b/lib/mpi/mpih-div.c
@@ -31,7 +31,7 @@ mpihelp_mod_1(mpi_ptr_t dividend_ptr, mpi_size_t 
dividend_size,
  {
mpi_size_t i;
mpi_limb_t n1, n0, r;
-   mpi_limb_t dummy;
+   mpi_limb_t dummy __maybe_unused;
  
  	/* Botch: Should this be han

Re: [PATCH] MAINTAINERS: Add me as a reviewer for PCI aardvark

2020-09-25 Thread Thomas Petazzoni

Hello Pali,

On Fri, 25 Sep 2020 11:21:15 +0200
Pali Rohár  wrote:

> Signed-off-by: Pali Rohár 
> ---
> I have provided more fixes to this driver, I have needed functional
> specification for this PCI controller and also hardware for testing
> and developing (Espressobin V5 and Turris MOX B and G modules).
> 
> Thomas already wrote [1] that is less involved in this driver, so I
> can help with reviewing/maintaining it.
> 
> [1] - https://lore.kernel.org/linux-pci/20200513135643.478ff...@windsurf.home/

Acked-by: Thomas Petazzzoni 

I would certainly be fine if you were OK to switch from a R: to a M: line.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[PATCH 2/2] ASoC: cs47l35: Fix EPOUT->HPOUT1 Mono Mux routing

2020-09-25 Thread Richard Fitzgerald

EPOUT is always mono so should have a permanent routing through the
HPOUT1 Mono Mux.

Signed-off-by: Richard Fitzgerald 
---
 sound/soc/codecs/cs47l35.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/cs47l35.c b/sound/soc/codecs/cs47l35.c
index 7f5dd01f40c9..e967609da8a3 100644
--- a/sound/soc/codecs/cs47l35.c
+++ b/sound/soc/codecs/cs47l35.c
@@ -1305,6 +1305,7 @@ static const struct snd_soc_dapm_route 
cs47l35_dapm_routes[] = {
{ "SPKOUTP", NULL, "OUT4L" },
 
{ "OUT1R", NULL, "HPOUT1 Mono Mux" },
+   { "HPOUT1 Mono Mux", "EPOUT", "OUT1L" },
 
{ "HPOUTL", "HPOUT", "HPOUT1 Demux" },
{ "HPOUTR", "HPOUT", "HPOUT1 Demux" },
@@ -1550,7 +1551,6 @@ static irqreturn_t cs47l35_adsp2_irq(int irq, void *data)
 
 static const struct snd_soc_dapm_route cs47l35_mono_routes[] = {
{ "HPOUT1 Mono Mux", "HPOUT", "OUT1L" },
-   { "HPOUT1 Mono Mux", "EPOUT", "OUT1L" },
 };
 
 static int cs47l35_component_probe(struct snd_soc_component *component)
-- 
2.20.1

Re: [PATCH V2 1/4] cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()

2020-09-25 Thread Rafael J. Wysocki

On Thu, Sep 24, 2020 at 3:15 PM Viresh Kumar  wrote:
>
> On 23-09-20, 15:48, Rafael J. Wysocki wrote:
> > On Wed, Sep 16, 2020 at 8:45 AM Viresh Kumar  
> > wrote:
> > >
> > > In order to prepare for lock-less stats update, add support to defer any
> > > updates to it until cpufreq_stats_record_transition() is called.
> >
> > This is a bit devoid of details.
> >
> > I guess you mean reset in particular, but that's not clear from the above.
>
> We used to update the stats from two places earlier, reset and
> show_time_in_state, the later got removed with this patch.
>
> > Also, it would be useful to describe the design somewhat.
>
> Okay. I will work on it.
>
> > > Signed-off-by: Viresh Kumar 
> > > ---
> > >  drivers/cpufreq/cpufreq_stats.c | 75 -
> > >  1 file changed, 56 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/cpufreq/cpufreq_stats.c 
> > > b/drivers/cpufreq/cpufreq_stats.c
> > > index 94d959a8e954..3e7eee29ee86 100644
> > > --- a/drivers/cpufreq/cpufreq_stats.c
> > > +++ b/drivers/cpufreq/cpufreq_stats.c
> > > @@ -22,17 +22,22 @@ struct cpufreq_stats {
> > > spinlock_t lock;
> > > unsigned int *freq_table;
> > > unsigned int *trans_table;
> > > +
> > > +   /* Deferred reset */
> > > +   unsigned int reset_pending;
> > > +   unsigned long long reset_time;
> > >  };
> > >
> > > -static void cpufreq_stats_update(struct cpufreq_stats *stats)
> > > +static void cpufreq_stats_update(struct cpufreq_stats *stats,
> > > +unsigned long long time)
> > >  {
> > > unsigned long long cur_time = get_jiffies_64();
> > >
> > > -   stats->time_in_state[stats->last_index] += cur_time - 
> > > stats->last_time;
> > > +   stats->time_in_state[stats->last_index] += cur_time - time;
> > > stats->last_time = cur_time;
> > >  }
> > >
> > > -static void cpufreq_stats_clear_table(struct cpufreq_stats *stats)
> > > +static void cpufreq_stats_reset_table(struct cpufreq_stats *stats)
> > >  {
> > > unsigned int count = stats->max_state;
> > >
> > > @@ -41,42 +46,67 @@ static void cpufreq_stats_clear_table(struct 
> > > cpufreq_stats *stats)
> > > memset(stats->trans_table, 0, count * count * sizeof(int));
> > > stats->last_time = get_jiffies_64();
> > > stats->total_trans = 0;
> > > +
> > > +   /* Adjust for the time elapsed since reset was requested */
> > > +   WRITE_ONCE(stats->reset_pending, 0);
> >
> > What if this runs in parallel with store_reset()?
> >
> > The latter may update reset_pending to 1 before the below runs.
> > Conversely, this may clear reset_pending right after store_reset() has
> > set it to 1, but before it manages to set reset_time.  Is that not a
> > problem?
>
> What I tried to do here with reset_time is to capture the delta time-in-state
> that has elapsed since the last time reset was called and reset was actually
> performed, so we keep showing it in stats.
>
> In the first race you mentioned, whatever update we make here won't matter 
> much
> as reset-pending will be 1 and so below stats will not be used anywhere.
>
> In the second race, we may get the delta time-in-state wrong, since this is 
> just
> stats it shouldn't matter much. But still I think we need to fix the way we do
> reset as you pointed out. More later.
>
> > > +   cpufreq_stats_update(stats, stats->reset_time);
> > >
> > > spin_unlock(&stats->lock);
> > >  }
> > >
> > >  static ssize_t show_total_trans(struct cpufreq_policy *policy, char *buf)
> > >  {
> > > -   return sprintf(buf, "%d\n", policy->stats->total_trans);
> > > +   struct cpufreq_stats *stats = policy->stats;
> > > +
> > > +   if (READ_ONCE(stats->reset_pending))
> > > +   return sprintf(buf, "%d\n", 0);
> > > +   else
> > > +   return sprintf(buf, "%d\n", stats->total_trans);
> > >  }
> > >  cpufreq_freq_attr_ro(total_trans);
> > >
> > >  static ssize_t show_time_in_state(struct cpufreq_policy *policy, char 
> > > *buf)
> > >  {
> > > struct cpufreq_stats *stats = policy->stats;
> > > +   bool pending = READ_ONCE(stats->reset_pending);
> > > +   unsigned long long time;
> > > ssize_t len = 0;
> > > int i;
> > >
> > > if (policy->fast_switch_enabled)
> > > return 0;
> > >
> > > -   spin_lock(&stats->lock);
> > > -   cpufreq_stats_update(stats);
> > > -   spin_unlock(&stats->lock);
> > > -
> > > for (i = 0; i < stats->state_num; i++) {
> > > +   if (pending) {
> > > +   if (i == stats->last_index)
> > > +   time = get_jiffies_64() - 
> > > stats->reset_time;
> >
> > What if this runs in parallel with store_reset() and reads reset_time
> > before the latter manages to update it?
>
> We should update reset-time first I think. More later.
>
> > > +   else
> > > +

Re: [PATCH v9 07/20] gpiolib: cdev: support GPIO_V2_GET_LINE_IOCTL and GPIO_V2_LINE_GET_VALUES_IOCTL

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 11:09 AM Kent Gibson  wrote:
> On Wed, Sep 23, 2020 at 02:11:54PM +0300, Andy Shevchenko wrote:
> > On Tue, Sep 22, 2020 at 5:35 AM Kent Gibson  wrote:

...

> > > +   assign_bit(FLAG_ACTIVE_LOW, flagsp,
> > > +  flags & GPIO_V2_LINE_FLAG_ACTIVE_LOW);
> >
> > What I meant is to attach also this to the other assign_bit():s below.
> > And just in case a question: why not __asign_bit() do we really need 
> > atomicity?
> >
>
> These are initialized as per their order in the flags so it is easier to
> tell if any are missing.
>
> The atomicity is not required here, but it is elsewhere so you are
> oblidged to use it for all accesses, no?

I'm not sure. I think if you are using non-atomic in one place, it
means that all automatically drop the atomicity guarantee. So, it's
all or none for atomicity, for non-atomicity it's rather none or at
least one. That said, code should be carefully checked before doing
such.

> > > +   if (flags & GPIO_V2_LINE_FLAG_OUTPUT)
> > > +   set_bit(FLAG_IS_OUT, flagsp);
> > > +   else if (flags & GPIO_V2_LINE_FLAG_INPUT)
> > > +   clear_bit(FLAG_IS_OUT, flagsp);
> > > +
> > > +   assign_bit(FLAG_OPEN_DRAIN, flagsp,
> > > +  flags & GPIO_V2_LINE_FLAG_OPEN_DRAIN);
> > > +   assign_bit(FLAG_OPEN_SOURCE, flagsp,
> > > +  flags & GPIO_V2_LINE_FLAG_OPEN_SOURCE);
> > > +   assign_bit(FLAG_PULL_UP, flagsp,
> > > +  flags & GPIO_V2_LINE_FLAG_BIAS_PULL_UP);
> > > +   assign_bit(FLAG_PULL_DOWN, flagsp,
> > > +  flags & GPIO_V2_LINE_FLAG_BIAS_PULL_DOWN);
> > > +   assign_bit(FLAG_BIAS_DISABLE, flagsp,
> > > +  flags & GPIO_V2_LINE_FLAG_BIAS_DISABLED);

...

> > > +   /* Make sure this is terminated */
> > > +   ulr.consumer[sizeof(ulr.consumer)-1] = '\0';
> > > +   if (strlen(ulr.consumer)) {
> > > +   lr->label = kstrdup(ulr.consumer, GFP_KERNEL);
> > > +   if (!lr->label) {
> > > +   ret = -ENOMEM;
> > > +   goto out_free_linereq;
> > > +   }
> > > +   }
> >
> > Still don't get why we can\t use kstrndup() here...
> >
>
> I know ;-).
>
> Another one directly from v1, and the behaviour there is to leave
> lr->label nulled if consumer is empty.
> It just avoids a pointless malloc for the null terminator.

Again, similar as for bitmap API usage, if it makes code cleaner and
increases readability, I will go for it.
Also don't forget the army of janitors that won't understand the case
and simply convert everything that can be converted.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 00/12] soc: mediatek: pm-domains: Add new driver for SCPSYS power domains controller

2020-09-25 Thread Weiyi Lu

On Thu, 2020-09-10 at 19:28 +0200, Enric Balletbo i Serra wrote:
> Dear all,
> 
> This is a new driver with the aim to deprecate the mtk-scpsys driver.
> The problem with that driver is that, in order to support more Mediatek
> SoCs you need to add some logic to handle properly the power-up
> sequence of newer Mediatek SoCs, doesn't handle parent-child power
> domains and need to hardcode all the clocks in the driver itself. The
> result is that the driver is getting bigger and bigger every time a
> new SoC needs to be supported.
> 

Hi Enric and Matthias,

First of all, thank you for the patch. But I'm worried the problem you
mentioned won't be solved even if we work on this new driver in the
future. My work on the MT8183 scpsys(now v17) is to implement the new
hardware logic. Here, I also see related patches, which means that these
new logics are necessary. Why can't we work on the original driver?
Meanwhile, I thought maybe we should separate the driver into general
control and platform data for each SoC, otherwise it'll keep getting
bigger and bigger if it need to be support new SoC.

And consider DVFSRC
(dynamic voltage and frequency scaling resource collector), should we
keep the original driver name "scpsys" instead of "pm-domains" because
it may provide more functions than power domains?

> All this information can be getted from a properly defined binding, so
> can be cleaner and smaller, hence, we implemented a new driver. For
> now, only MT8173 and MT8183 is supported but should be fairly easy to
> add support for new SoCs.
> 
> Best regards,
>   Enric
> 
> Enric Balletbo i Serra (4):
>   dt-bindings: power: Add bindings for the Mediatek SCPSYS power domains
> controller
>   soc: mediatek: Add MediaTek SCPSYS power domains
>   arm64: dts: mediatek: Add mt8173 power domain controller
>   dt-bindings: power: Add MT8183 power domains
> 
> Matthias Brugger (8):
>   soc: mediatek: pm-domains: Add bus protection protocol
>   soc: mediatek: pm_domains: Make bus protection generic
>   soc: mediatek: pm-domains: Add SMI block as bus protection block
>   soc: mediatek: pm-domains: Add extra sram control
>   soc: mediatek: pm-domains: Add subsystem clocks
>   soc: mediatek: pm-domains: Allow bus protection to ignore clear ack
>   soc: mediatek: pm-domains: Add support for mt8183
>   arm64: dts: mediatek: Add mt8183 power domains controller
> 
>  .../power/mediatek,power-controller.yaml  | 173 
>  arch/arm64/boot/dts/mediatek/mt8173.dtsi  |  78 +-
>  arch/arm64/boot/dts/mediatek/mt8183.dtsi  | 160 +++
>  drivers/soc/mediatek/Kconfig  |  13 +
>  drivers/soc/mediatek/Makefile |   1 +
>  drivers/soc/mediatek/mtk-infracfg.c   |   5 -
>  drivers/soc/mediatek/mtk-pm-domains.c | 952 ++
>  include/dt-bindings/power/mt8183-power.h  |  26 +
>  include/linux/soc/mediatek/infracfg.h |  39 +
>  9 files changed, 1433 insertions(+), 14 deletions(-)
>  create mode 100644 
> Documentation/devicetree/bindings/power/mediatek,power-controller.yaml
>  create mode 100644 drivers/soc/mediatek/mtk-pm-domains.c
>  create mode 100644 include/dt-bindings/power/mt8183-power.h
>

Re: [PATCH 8/9] sched: Fix migrate_disable() vs set_cpus_allowed_ptr()

2020-09-25 Thread Valentin Schneider



On 25/09/20 09:43, Peter Zijlstra wrote:
> On Thu, Sep 24, 2020 at 08:59:33PM +0100, Valentin Schneider wrote:
>
>> > +  if (task_running(rq, p) || p->state == TASK_WAKING) {
>> > +
>> > +  task_rq_unlock(rq, p, rf);
>> > +  stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> > +
>>
>> Shouldn't we check for is_migrate_disabled(p) before doing any of that?
>> migration_cpu_stop() does check for it, is there something that prevents us
>> from acting on it earlier than that?
>
> Since migrate_disable() / ->migration_disabled is only touched from the
> current task, you can only reliably read it from the same CPU.
>
> Hence I only look at it when the stop task has pinned the task, because
> at that point I know it's stable.
>
> Doing it earlier gives races, races give me head-aches. This is a slow
> path, I don't care about performance.

Makes sense, thanks.

KASAN: slab-out-of-bounds Read in f2fs_build_segment_manager

2020-09-25 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:171d4ff7 Merge tag 'mmc-v5.9-rc4-2' of git://git.kernel.or..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=105e69ab90
kernel config:  https://syzkaller.appspot.com/x/.config?x=5f4c828c9e3cef97
dashboard link: https://syzkaller.appspot.com/bug?extid=481a3ffab50fed41dcc0
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=134bf08d90
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=102fbac590

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+481a3ffab50fed41d...@syzkaller.appspotmail.com

F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): Fix alignment : done, start(4096) end(147456) block(12288)
F2FS-fs (loop0): invalid crc_offset: 0
==
BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 
[inline]
BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 
fs/f2fs/segment.c:4792
Read of size 8 at addr 8880a1b934a8 by task syz-executor682/6878

CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
 f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
 f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633
 mount_bdev+0x32e/0x3f0 fs/super.c:1417
 legacy_get_tree+0x105/0x220 fs/fs_context.c:592
 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x1387/0x20a0 fs/namespace.c:3192
 do_mount fs/namespace.c:3205 [inline]
 __do_sys_mount fs/namespace.c:3413 [inline]
 __se_sys_mount fs/namespace.c:3390 [inline]
 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x446ffa
Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 
84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 
da ad fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7ffcfc6dc038 EFLAGS: 0297 ORIG_RAX: 00a5
RAX: ffda RBX: 7ffcfc6dc090 RCX: 00446ffa
RDX: 2000 RSI: 2100 RDI: 7ffcfc6dc050
RBP: 7ffcfc6dc050 R08: 7ffcfc6dc090 R09: 7ffc0015
R10:  R11: 0297 R12: 0008
R13: 0004 R14: 0003 R15: 0003

Allocated by task 6878:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
 kmalloc_node include/linux/slab.h:577 [inline]
 kvmalloc_node+0x61/0xf0 mm/util.c:574
 kvmalloc include/linux/mm.h:757 [inline]
 f2fs_kvmalloc fs/f2fs/f2fs.h:3030 [inline]
 f2fs_kvzalloc fs/f2fs/f2fs.h:3036 [inline]
 build_sit_info fs/f2fs/segment.c:4029 [inline]
 f2fs_build_segment_manager+0xb5b/0xa6d0 fs/f2fs/segment.c:4768
 f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633
 mount_bdev+0x32e/0x3f0 fs/super.c:1417
 legacy_get_tree+0x105/0x220 fs/fs_context.c:592
 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x1387/0x20a0 fs/namespace.c:3192
 do_mount fs/namespace.c:3205 [inline]
 __do_sys_mount fs/namespace.c:3413 [inline]
 __se_sys_mount fs/namespace.c:3390 [inline]
 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at 8880a1b93000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1192 bytes inside of
 2048-byte region [8880a1b93000, 8880a1b93800)
The buggy address belongs to the page:
page:8c03fe3c refcount:1 mapcount:0 mapping: index:0x0 
pfn:0xa1b93
flags: 0xfffe000200(slab)
raw: 00fffe000200 ea000275f408 ea00025a7548 8880aa040800
raw:  8880a1b93000 00010001 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8880a1b93380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 8880a1b93400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>8880a1b93480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ^
 8880a1b93500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 8880a1b93580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
=

Re: [PATCH 3/4] drm/etnaviv: add total hi bandwidth perfcounter

2020-09-25 Thread Lucas Stach

On Fr, 2020-08-14 at 11:05 +0200, Christian Gmeiner wrote:
> These two perf counters represent the total read and write
> GPU bandwidth in terms of 64bits.
> 
> The used sequence was taken from Vivante kernel driver.
> 
> Signed-off-by: Christian Gmeiner 
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_perfmon.c | 35 ++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_perfmon.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_perfmon.c
> index 782732e6ce72..b37459f022d7 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_perfmon.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_perfmon.c
> @@ -69,6 +69,29 @@ static u32 pipe_perf_reg_read(struct etnaviv_gpu *gpu,
>   return value;
>  }
>  
> +static u32 pipe_reg_read(struct etnaviv_gpu *gpu,
> + const struct etnaviv_pm_domain *domain,
> + const struct etnaviv_pm_signal *signal)
> +{
> + u32 clock = gpu_read(gpu, VIVS_HI_CLOCK_CONTROL);
> + u32 value = 0;
> + unsigned i;
> +
> + for (i = 0; i < gpu->identity.pixel_pipes; i++) {
> + clock &= ~(VIVS_HI_CLOCK_CONTROL_DEBUG_PIXEL_PIPE__MASK);
> + clock |= VIVS_HI_CLOCK_CONTROL_DEBUG_PIXEL_PIPE(i);
> + gpu_write(gpu, VIVS_HI_CLOCK_CONTROL, clock);
> + value += gpu_read(gpu, signal->data);
> + }
> +
> + /* switch back to pixel pipe 0 to prevent GPU hang */
> + clock &= ~(VIVS_HI_CLOCK_CONTROL_DEBUG_PIXEL_PIPE__MASK);
> + clock |= VIVS_HI_CLOCK_CONTROL_DEBUG_PIXEL_PIPE(0);
> + gpu_write(gpu, VIVS_HI_CLOCK_CONTROL, clock);
> +
> + return value;
> +}
> +
>  static u32 hi_total_cycle_read(struct etnaviv_gpu *gpu,
>   const struct etnaviv_pm_domain *domain,
>   const struct etnaviv_pm_signal *signal)
> @@ -102,8 +125,18 @@ static const struct etnaviv_pm_domain doms_3d[] = {
>   .name = "HI",
>   .profile_read = VIVS_MC_PROFILE_HI_READ,
>   .profile_config = VIVS_MC_PROFILE_CONFIG2,
> - .nr_signals = 5,
> + .nr_signals = 7,

I've tripped across this part. It's something I don't particularly
like, as this value has a risk of getting inconsistent with the actual
array. Maybe we could split out signal array from this initialization,
so we could then use the ARRAY_SIZE macro to initialize this value?

But that's not really related to this patch and can be done in a
follow-up cleanup.

Regards,
Lucas

>   .signal = (const struct etnaviv_pm_signal[]) {
> + {
> + "TOTAL_READ_BYTES8",
> + VIVS_HI_PROFILE_READ_BYTES8,
> + &pipe_reg_read,
> + },
> + {
> + "TOTAL_WRITE_BYTES8",
> + VIVS_HI_PROFILE_WRITE_BYTES8,
> + &pipe_reg_read,
> + },
>   {
>   "TOTAL_CYCLES",
>   0,

Re: [PATCH 0/4] drm/etnaviv: add total hi bandwidth perf counters

2020-09-25 Thread Lucas Stach

On Fr, 2020-08-14 at 11:05 +0200, Christian Gmeiner wrote:
> This little patch set adds support for the total bandwidth used by HI. The
> basic hi bandwidth read-out is quite simple but I needed to add some little
> clean-ups to make it nice looking.
> 
> Christian Gmeiner (4):
>   drm/etnaviv: rename pipe_reg_read(..)
>   drm/etnaviv: call perf_reg_read(..)
>   drm/etnaviv: add total hi bandwidth perfcounter
>   drm/etnaviv: add pipe_select(..) helper
> 
>  drivers/gpu/drm/etnaviv/etnaviv_perfmon.c | 78 ---
>  1 file changed, 55 insertions(+), 23 deletions(-)

Thanks,

I've applied the whole series to my etnaviv/next branch.

regards,
Lucas

Re: [PATCH 8/9] sched: Fix migrate_disable() vs set_cpus_allowed_ptr()

2020-09-25 Thread Valentin Schneider



On 25/09/20 10:56, Peter Zijlstra wrote:
> On Fri, Sep 25, 2020 at 11:05:28AM +0200, Peter Zijlstra wrote:
>> On Thu, Sep 24, 2020 at 08:59:33PM +0100, Valentin Schneider wrote:
>> > > @@ -2025,19 +2138,8 @@ static int __set_cpus_allowed_ptr(struct
>> > >   if (cpumask_test_cpu(task_cpu(p), new_mask))
>> > >   goto out;
>> >
>> > I think this needs a cancellation of any potential pending migration
>> > requests. Consider a task P0 running on CPU0:
>> >
>> >P0 P1   P2
>> >
>> >migrate_disable();
>> >
>> >   set_cpus_allowed_ptr(P0, CPU1);
>> >   // waits for completion
>> >
>> > set_cpus_allowed_ptr(P0, CPU0);
>> >// Already 
>> > good, no waiting for completion
>> >
>> >migrate_enable();
>> >// task_cpu(p) allowed, no move_task()
>> >
>> > AIUI in this scenario P1 would stay forever waiting.
>>
>
>> The other approach is trying to handle that last condition in
>> move_task(), but I'm quite sure that's going to be aweful too :/
>
> Something like so perhaps?
>

That looks somewhat sane (not pretty, but we're past that :-)). I was
trying something similar in __set_cpus_allowed_ptr() itself, but this
condenses the completion / refcount logic in one place, so fewer
headaches.

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2039,6 +2039,10 @@ static int move_task(struct rq *rq, stru
>   if (WARN_ON_ONCE(!pending))
>   return -EINVAL;
>
> + /* Can the task run on the task's current CPU? If so, we're done */
> + if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
> + goto easy;
> +
>   arg.done = &pending->done;
>
>   if (flags & SCA_MIGRATE_ENABLE) {
> @@ -2063,6 +2067,7 @@ static int move_task(struct rq *rq, stru
>   if (task_on_rq_queued(p))
>   rq = move_queued_task(rq, rf, p, dest_cpu);
>
> +easy:
>   p->migration_pending = NULL;
>   complete = true;
>   }
> @@ -2151,10 +2156,6 @@ static int __set_cpus_allowed_ptr(struct
>   p->nr_cpus_allowed != 1);
>   }
>
> - /* Can the task run on the task's current CPU? If so, we're done */
> - if (cpumask_test_cpu(task_cpu(p), new_mask))
> - goto out;
> -
>   return move_task(rq, &rf, p, dest_cpu, flags);
>
>  out:

Re: [PATCH 0/9] sched: Migrate disable support

2020-09-25 Thread Peter Zijlstra

On Fri, Sep 25, 2020 at 11:12:09AM +0200, Dietmar Eggemann wrote:

> I get this when running 6 (periodic) RT50 tasks with CPU hp stress on my
> 6 CPU JUNO board (!CONFIG_PREEMPT_RT).
> 
> [   55.490263] [ cut here ]
> [   55.505261] Modules linked in:
> [   55.508322] CPU: 3 PID: 24 Comm: migration/3 Not tainted
> 5.9.0-rc1-00132-gc096e6406c50-dirty #90
> [   55.517119] Hardware name: ARM Juno development board (r0) (DT)
> [   55.523058] Stopper: multi_cpu_stop+0x0/0x170 <- 0x0
> [   55.528029] pstate: 2085 (nzCv daIf -PAN -UAO BTYPE=--)
> [   55.533612] pc : sched_cpu_dying+0x124/0x130
> [   55.537887] lr : sched_cpu_dying+0xd8/0x130
> [   55.542071] sp : 800011f0bca0
> [   55.545385] x29: 800011f0bca0 x28: 0002
> [   55.550703] x27:  x26: 0060
> [   55.556022] x25:  x24: 0001
> [   55.561340] x23:  x22: 0003
> [   55.566659] x21: 0080 x20: 0003
> [   55.571977] x19: 00097ef9e1c0 x18: 0010
> [   55.577295] x17:  x16: 
> [   55.582613] x15:  x14: 015c
> [   55.587932] x13:  x12: 06f1
> [   55.593250] x11: 0080 x10: 
> [   55.598567] x9 : 0003 x8 : 0009743f5900
> [   55.603886] x7 : 0003 x6 : 
> [   55.609204] x5 : 0001 x4 : 0002
> [   55.614521] x3 :  x2 : 0013
> [   55.619839] x1 : 0008 x0 : 0003
> [   55.625158] Call trace:
> [   55.627607]  sched_cpu_dying+0x124/0x130
> [   55.631535]  cpuhp_invoke_callback+0x88/0x210
> [   55.635897]  take_cpu_down+0x7c/0xd8
> [   55.639475]  multi_cpu_stop+0xac/0x170
> [   55.643227]  cpu_stopper_thread+0x98/0x130
> [   55.647327]  smpboot_thread_fn+0x1c4/0x280
> [   55.651427]  kthread+0x140/0x160
> [   55.654658]  ret_from_fork+0x10/0x34
> [   55.658239] Code: f000e1c1 913fc021 1400034a 17de (d421)
> [   55.664342] ---[ end trace c5b8988b7b701e56 ]---
> [   55.668963] note: migration/3[24] exited with preempt_count 3
> 
> 7309 int sched_cpu_dying(unsigned int cpu)
> ...
> BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq));
> ...
> 
> rq->nr_running is always 2 here in this cases.
> 
> balance_hotplug_wait and sched_cpu_wait_empty run in cpuhp/X (CFS)
> whereas sched_cpu_dying in migration/X ?

takedown_cpu() has:

  kthread_park(per_cpu_ptr(&cpuhp_state, cpu)->thread);

before calling:

  err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));

So when we get to sched_cpu_dying(), the only task that _should_ still
be there is migration/X.

Do you have any idea what thread, other than migration/X, is still
active on that CPU? per sched_cpu_wait_empty() we should've pushed out
all userspace tasks, and the cpu hotplug machinery should've put all the
per-cpu kthreads to sleep at this point.

[PATCH] x86/xen: disable Firmware First mode for correctable memory errors

2020-09-25 Thread Juergen Gross

When running as Xen dom0 the kernel isn't responsible for selecting the
error handling mode, this should be handled by the hypervisor.

So disable setting FF mode when running as Xen pv guest. Not doing so
might result in boot splats like:

[7.509696] HEST: Enabling Firmware First mode for corrected errors.
[7.510382] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 2.
[7.510383] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 3.
[7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 4.
[7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 5.
[7.510385] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 6.
[7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 7.
[7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 8.

Reason is that the HEST ACPI table contains the real number of MCA
banks, while the hypervisor is emulating only 2 banks for guests.

Cc: sta...@vger.kernel.org
Signed-off-by: Juergen Gross 
---
 arch/x86/xen/enlighten_pv.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 22e741e0b10c..065c049d0f3c 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1296,6 +1296,14 @@ asmlinkage __visible void __init xen_start_kernel(void)
 
xen_smp_init();
 
+#ifdef CONFIG_ACPI
+   /*
+* Disable selecting "Firmware First mode" for correctable memory
+* errors, as this is the duty of the hypervisor to decide.
+*/
+   acpi_disable_cmcff = 1;
+#endif
+
 #ifdef CONFIG_ACPI_NUMA
/*
 * The pages we from Xen are not related to machine pages, so
-- 
2.26.2

RE: [PATCH] [v2] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

2020-09-25 Thread Tianxianting

Hi Jens
Could I get your comments for the issue? Whether it is a problem?
Maybe my understanding is superficial, highly appreciated if you can comment a 
bit.

-Original Message-
From: tianxianting (RD) 
Sent: Thursday, September 17, 2020 4:13 PM
To: ax...@kernel.dk
Cc: linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org; tianxianting 
(RD) 
Subject: [PATCH] [v2] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

We found it takes more time of blk_mq_alloc_rq_maps() in kernel space when 
testing nvme hot-plugging. The test and anlysis as below.

Debug code,
1, blk_mq_alloc_rq_maps():
u64 start, end;
depth = set->queue_depth;
start = ktime_get_ns();
pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n",
current->pid, current->comm, current->nvcsw, 
current->nivcsw,
set->queue_depth, set->nr_hw_queues);
do {
err = __blk_mq_alloc_rq_maps(set);
if (!err)
break;

set->queue_depth >>= 1;
if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) {
err = -ENOMEM;
break;
}
} while (set->queue_depth);
end = ktime_get_ns();
pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n",
current->pid, current->comm,
current->nvcsw, current->nivcsw, end - start);

2, __blk_mq_alloc_rq_maps():
u64 start, end;
for (i = 0; i < set->nr_hw_queues; i++) {
start = ktime_get_ns();
if (!__blk_mq_alloc_rq_map(set, i))
goto out_unwind;
end = ktime_get_ns();
pr_err("hw queue %d init cost time %lld\n", i, end - start);
}

Test nvme hot-plugging with above debug code, we found it totally cost more 
than 3ms in kernel space without being scheduled out when alloc rqs for all
16 hw queues with depth 1024, each hw queue cost about 140-250us. The time cost 
will be increased with hw queue number and queue depth increasing. And if 
__blk_mq_alloc_rq_maps() returns -ENOMEM, it will try "queue_depth >>= 1", more 
time will be consumed.
[  428.428771] nvme nvme0: pci function 1:01:00.0
[  428.428798] nvme 1:01:00.0: enabling device ( -> 0002)
[  428.428806] pcieport 1:00:00.0: can't derive routing for PCI INT 
A
[  428.428809] nvme 1:01:00.0: PCI INT A: no GSI
[  432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, 
nr_hw_queues 1
[  432.593404] hw queue 0 init cost time 22883 ns
[  432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init 
cost time 35960 ns
[  432.595953] nvme nvme0: 16/0/0 default/read/poll queues
[  432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, 
nr_hw_queues 16
[  432.596203] hw queue 0 init cost time 242630 ns
[  432.596441] hw queue 1 init cost time 235913 ns
[  432.596659] hw queue 2 init cost time 216461 ns
[  432.596877] hw queue 3 init cost time 215851 ns
[  432.597107] hw queue 4 init cost time 228406 ns
[  432.597336] hw queue 5 init cost time 227298 ns
[  432.597564] hw queue 6 init cost time 224633 ns
[  432.597785] hw queue 7 init cost time 219954 ns
[  432.597937] hw queue 8 init cost time 150930 ns
[  432.598082] hw queue 9 init cost time 143496 ns
[  432.598231] hw queue 10 init cost time 147261 ns
[  432.598397] hw queue 11 init cost time 164522 ns
[  432.598542] hw queue 12 init cost time 143401 ns
[  432.598692] hw queue 13 init cost time 148934 ns
[  432.598841] hw queue 14 init cost time 147194 ns
[  432.598991] hw queue 15 init cost time 148942 ns
[  432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init 
cost time 3035099 ns
[  432.602611]  nvme0n1: p1

So use this patch to trigger schedule between each hw queue init, to avoid 
other threads getting stuck. We call cond_resched() only when "queue depth >= 
512". We are not in atomic context when executing __blk_mq_alloc_rq_maps(), so 
it is safe to call cond_resched().

Signed-off-by: Xianting Tian 
---
 block/blk-mq.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c index b3d2785ee..5a71fe53a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3255,11 +3255,16 @@ void blk_mq_exit_queue(struct request_queue *q)  static 
int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)  {
int i;
+   unsigned int depth = set->queue_depth;
 
-   for (i = 0; i < set->nr_hw_queues; i++)
+   for (i = 0; i < set->nr_hw_queues; i++) {
if (!__blk_mq_alloc_map_and_request(set, i))
goto out_unwind;
 
+   if (depth

Re: [PATCH v9 08/20] gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL

2020-09-25 Thread Andy Shevchenko

On Thu, Sep 24, 2020 at 12:48 PM Kent Gibson  wrote:
> On Thu, Sep 24, 2020 at 11:39:03AM +0300, Andy Shevchenko wrote:
> > On Thu, Sep 24, 2020 at 5:39 AM Kent Gibson  wrote:
> > > On Wed, Sep 23, 2020 at 06:41:45PM +0300, Andy Shevchenko wrote:
> > > > On Tue, Sep 22, 2020 at 5:35 AM Kent Gibson  
> > > > wrote:

...

> > > > > +static int lineinfo_ensure_abi_version(struct gpio_chardev_data 
> > > > > *cdata,
> > > > > +  unsigned int version)
> > > > > +{
> > > >
> > > > > +   int abiv = atomic_read(&cdata->watch_abi_version);
> > > > > +
> > > > > +   if (abiv == 0) {
> > > >
> > > > > +   atomic_cmpxchg(&cdata->watch_abi_version, 0, version);
> > > > > +   abiv = atomic_read(&cdata->watch_abi_version);
> > > >
> > > > atomic_cmpxchng() returns a value.
> > >
> > > Yep, it returns the old value - which we don't care about - see below.
> >
> > Then what's the point to read back?..
> >
> > > > Also there are no barriers here...
> > > >
> > >
> > > No barriers required - the atomic_cmpxchg() is sufficient.
> > >
> > > > > +   }
> > > > > +   if (abiv != version)
> > > > > +   return -EPERM;
> > > >
> > > > I'm not sure I understand why this is atomic.
> > > >
> > >
> > > The algorithm requires some level of protection and atomic is
> > > sufficient.
> > >
> > > > Also this seems to be racy if cdata changed in background.
> > > >
> > >
> > > Can you provide a case?
> >
> > CPU0:CPU1:
> >  xchg()   ...
> >  ...  xchg()
> >  ...  read() -> OK
> > read() ->NOK
> >
>
> Lets say CPU0 is setting 1 and CPU1 setting 2, and assuming the xchg()
> completes...
> Your case is not possible - CPU1 would see the value 1 set by CPU0 in the
> read() and so NOK.  Its xchg() would fail as it compares against 0
> and that also sees the 1 and so fails.
>
> What am I missing?

Barriers? That's what documentation says about xchg().
https://stackoverflow.com/q/20950603/2511795

> > > The atomic_cmpxchg() ensures cdata->watch_abi_version is only set
> > > once - first in wins.  The atomic_read() is so we can check that
> > > the set version matches what the caller wants.
> > > Note that multiple callers may request the same version - and all
> > > should succeed.
> >
> > So, that's basically what you need when using _old_ value.
> >
> > 0 means you were first, right?
> > Anything else you simply compare and bail out if it's not the same as
> > what has been asked.
> >
>
> Could you provide a complete implementation that behaves as I expect,
> rather than snippets and verbage?

if (atomic_cmpxchg(&cdata..., version) == 0)
 return 0; // we were first!
return -EPERM; // somebody has changed the version before us!

> > > > Shouldn't be rather
> > > >
> > > > if (atomic_cmpxchg() == 0) {
> > > >   if (atomic_read() != version)
> > > > return ...;
> > > > }
> > > >
> > >
> > > My algorithm allows for multiple callers requesting the same version
> > > to all succeed.  Yours would fail the first conditional for all but
> > > the first, and you haven't provided an else for that case...
> > >
> > > ... but it would probably look the same so the conditional is pointless,
> > > or it would reject the request - which would be wrong.
> > >
> > > > But here is still the question: why do you expect the version to be
> > > > changed on background? And what about barriers?
> > > >
> > >
> > > While it is unlikely that userspace will be attempting to use both ABI
> > > versions simultaneously on the same chip request, it is a possiblity and
> > > so needs to be protected against. And better to have the watch request
> > > fail than the read fail or worse - return the wrong struct version.
> > >
> > > The atomic_cmpxchg() is sufficient for this algorithm - no barriers
> > > required.  It could also be written with a spinlock but I was trying to
> > > avoid locks unless they were absolutely necessary.  A spinlock version
> > > may arguably be more readable, but it would certainly be more verbose,
> > > larger and slower.
> > >
> > > I'm happy to add some documentation to the function if that would help.
> >
> > Yes, I guess documentation is what is eagerly needed here.
> >
>
> No problem.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 0/3] Prevent out-of-bounds access for built-in font data buffers

2020-09-25 Thread Peilin Ye

Hi all!

On Fri, Sep 25, 2020 at 08:46:04AM +0200, Jiri Slaby wrote:
> > In order to perform a reliable range check, fbcon_get_font() needs to know
> > `FONTDATAMAX` for each built-in font under lib/fonts/. Unfortunately, we
> > do not keep that information in our font descriptor,
> > `struct console_font`:
> > 
> > (include/uapi/linux/kd.h)
> > struct console_font {
> > unsigned int width, height; /* font size */
> > unsigned int charcount;
> > unsigned char *data;/* font data with height fixed to 32 */
> > };
> > 
> > To make things worse, `struct console_font` is part of the UAPI, so we
> > cannot add a new field to keep track of `FONTDATAMAX`.
> 
> Hi,
> 
> but you still can define struct kernel_console_font containing struct
> console_font and the 4 more members you need in the kernel. See below.
> 
> > Fortunately, the framebuffer layer itself gives us a hint of how to
> > resolve this issue without changing UAPI. When allocating a buffer for a
> > user-provided font, fbcon_set_font() reserves four "extra words" at the
> > beginning of the buffer:
> > 
> > (drivers/video/fbdev/core/fbcon.c)
> > new_data = kmalloc(FONT_EXTRA_WORDS * sizeof(int) + size, GFP_USER);
> 
> I might be missing something (like coffee in the morning), but why don't
> you just:
> 1) declare struct font_data as
> {
>   unsigned sum, char_count, size, refcnt;
>   const unsigned char data[];
> }
> 
> Or maybe "struct console_font font" instead of "const unsigned char
> data[]", if need be.
> 
> 2) allocate by:
>   kmalloc(struct_size(struct font_data, data, size));
> 
> 3) use container_of wherever needed
> 
> That is you name the data on negative indexes using struct as you
> already have to define one.
> 
> Then you don't need the ugly macros with negative indexes. And you can
> pass this structure down e.g. to fbcon_do_set_font, avoiding potential
> mistakes in accessing data[-1] and similar.

Sorry that I didn't mention it in the cover letter, but yes, I've tried
this - a new `kernel_console_font` would be much cleaner than negative
array indexing.

The reason I ended up giving it up was, frankly speaking, these macros
are being used at about 30 places, and I am not familiar enough with the
framebuffer and newport_con code, so I wasn't confident how to clean
them up and plug in `kernel_console_font` properly...

Another reason was that, functions like fbcon_get_font() handle both user
fonts and built-in fonts, so I wanted a single solution for both of
them. I think we can't really introduce `kernel_console_font` while
keeping these macros, that would make the error handling logics etc.
very messy.

I'm not very sure what to do now. Should I give it another try cleaning
up all the macros?

And thank you for reviewing this!

Peilin Ye

Re: [PATCH v3 24/39] arm64: mte: Add in-kernel MTE helpers

2020-09-25 Thread Catalin Marinas

On Fri, Sep 25, 2020 at 12:50:31AM +0200, Andrey Konovalov wrote:
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 035003acfa87..bc0dc66a6a27 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -103,6 +103,7 @@
>  #define ESR_ELx_FSC  (0x3F)
>  #define ESR_ELx_FSC_TYPE (0x3C)
>  #define ESR_ELx_FSC_EXTABT   (0x10)
> +#define ESR_ELx_FSC_MTE  (0x11)
>  #define ESR_ELx_FSC_SERROR   (0x11)
>  #define ESR_ELx_FSC_ACCESS   (0x08)
>  #define ESR_ELx_FSC_FAULT(0x04)
> diff --git a/arch/arm64/include/asm/mte-kasan.h 
> b/arch/arm64/include/asm/mte-kasan.h
> new file mode 100644
> index ..b0f27de8de33
> --- /dev/null
> +++ b/arch/arm64/include/asm/mte-kasan.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 ARM Ltd.
> + */
> +#ifndef __ASM_MTE_ASM_H
> +#define __ASM_MTE_ASM_H
> +
> +#include 
> +
> +#define __MTE_PREAMBLE   ARM64_ASM_PREAMBLE ".arch_extension 
> memtag\n"

Can this not live in mte.h?

> +#define MTE_GRANULE_SIZE UL(16)
> +#define MTE_GRANULE_MASK (~(MTE_GRANULE_SIZE - 1))
> +#define MTE_TAG_SHIFT56
> +#define MTE_TAG_SIZE 4
> +#define MTE_TAG_MASK GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), 
> MTE_TAG_SHIFT)
> +#define MTE_TAG_MAX  (MTE_TAG_MASK >> MTE_TAG_SHIFT)

I'd still like these MTE_* macros in a separate mte-hwdef.h file. The
only reason I see they were not in mte.h is because they need to be
included in asm/cache.h. They are not KASAN specific.

> +
> +#ifndef __ASSEMBLY__
> +
> +#include 
> +
> +#ifdef CONFIG_ARM64_MTE
> +
> +static inline u8 mte_get_ptr_tag(void *ptr)
> +{
> + u8 tag = (u8)(((u64)(ptr)) >> MTE_TAG_SHIFT);
> +
> + return tag;
> +}

So this returns the top 8 bits of the address (i.e. no masking with
MTE_TAG_MASK). Fine by me.

> +
> +u8 mte_get_mem_tag(void *addr);
> +u8 mte_get_random_tag(void);
> +void *mte_set_mem_tag_range(void *addr, size_t size, u8 tag);
> +
> +#else /* CONFIG_ARM64_MTE */
> +
> +static inline u8 mte_get_ptr_tag(void *ptr)
> +{
> + return 0xFF;
> +}
> +
> +static inline u8 mte_get_mem_tag(void *addr)
> +{
> + return 0xFF;
> +}
> +static inline u8 mte_get_random_tag(void)
> +{
> + return 0xFF;
> +}
> +static inline void *mte_set_mem_tag_range(void *addr, size_t size, u8 tag)
> +{
> + return addr;
> +}

Maybe these can stay in mte-kasan.h, although they are not a direct
interface for KASAN AFAICT (the arch_* equivalent are defined in
asm/memory.h. If there's no good reason, we could move them to mte.h.

> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 1c99fcadb58c..3a2bf3ccb26c 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -5,14 +5,13 @@
>  #ifndef __ASM_MTE_H
>  #define __ASM_MTE_H
>  
> -#define MTE_GRANULE_SIZE UL(16)
> -#define MTE_GRANULE_MASK (~(MTE_GRANULE_SIZE - 1))
> -#define MTE_TAG_SHIFT56
> -#define MTE_TAG_SIZE 4
> +#include 
>  
>  #ifndef __ASSEMBLY__
>  
> +#include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -45,7 +44,9 @@ long get_mte_ctrl(struct task_struct *task);
>  int mte_ptrace_copy_tags(struct task_struct *child, long request,
>unsigned long addr, unsigned long data);
>  
> -#else
> +void mte_assign_mem_tag_range(void *addr, size_t size);

So mte_set_mem_tag_range() is KASAN specific but
mte_assign_mem_tag_range() is not. Slightly confusing.

> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 52a0638ed967..833b63fdd5e2 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -13,8 +13,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -72,6 +74,48 @@ int memcmp_pages(struct page *page1, struct page *page2)
>   return ret;
>  }
>  
> +u8 mte_get_mem_tag(void *addr)
> +{
> + if (!system_supports_mte())
> + return 0xFF;
> +
> + asm volatile(__MTE_PREAMBLE "ldg %0, [%0]"
> + : "+r" (addr));

Nitpick: do we need volatile or plain asm would do?

I wonder whether we'd need the "memory" clobber. I don't see how this
would fail though, maybe later on with stack tagging if the compiler
writes tags behind our back.

> +
> + return 0xF0 | mte_get_ptr_tag(addr);

Since mte_get_ptr_tag() returns the top byte of the address, we don't
need the additional 0xF0 or'ing. LDG only sets bits 59:56.

> +}
> +
> +u8 mte_get_random_tag(void)
> +{
> + void *addr;
> +
> + if (!system_supports_mte())
> + return 0xFF;
> +
> + asm volatile(__MTE_PREAMBLE "irg %0, %0"
> + : "+r" (addr));
> +
> + return 0xF0 | mte_get_ptr_tag(addr);

Same here.

> +}
> +
> +void *mte_set_mem_tag_range(void *addr, size_t size, u8 tag)
> +{
> + void *ptr = addr;
> +
> + if ((!system_supports_mte(

Re: [PATCH v3 23/39] arm64: Enable armv8.5-a asm-arch option

2020-09-25 Thread Catalin Marinas

On Fri, Sep 25, 2020 at 12:50:30AM +0200, Andrey Konovalov wrote:
> From: Vincenzo Frascino 
> 
> Hardware tag-based KASAN relies on Memory Tagging Extension (MTE) which
> is an armv8.5-a architecture extension.
> 
> Enable the correct asm option when the compiler supports it in order to
> allow the usage of ALTERNATIVE()s with MTE instructions.
> 
> Signed-off-by: Vincenzo Frascino 
> Signed-off-by: Andrey Konovalov 

Reviewed-by: Catalin Marinas

Re: [PATCH] fs/affs: Fix basic permission bits to actually work

2020-09-25 Thread John Paul Adrian Glaubitz

Hi Max!

On 8/27/20 5:49 PM, Max Staudt wrote:
> The basic permission bits (protection bits in AmigaOS) have been broken
> in Linux' affs - it would only set bits, but never delete them.
> Also, contrary to the documentation, the Archived bit was not handled.
> 
> Let's fix this for good, and set the bits such that Linux and classic
> AmigaOS can coexist in the most peaceful manner.
> 
> Also, update the documentation to represent the current state of things.
Has there already been any progress on reviewing this?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Re: [PATCH 1/9] mm, page_alloc: clean up pageset high and batch update

2020-09-25 Thread David Hildenbrand

On 22.09.20 16:37, Vlastimil Babka wrote:
> The updates to pcplists' high and batch valued are handled by multiple
> functions that make the calculations hard to follow. Consolidate everything
> to pageset_set_high_and_batch() and remove pageset_set_batch() and
> pageset_set_high() wrappers.
> 
> The only special case using one of the removed wrappers was:
> build_all_zonelists_init()
>   setup_pageset()
> pageset_set_batch()
> which was hardcoding batch as 0, so we can just open-code a call to
> pageset_update() with constant parameters instead.
> 
> No functional change.
> 
> Signed-off-by: Vlastimil Babka 
> Reviewed-by: Oscar Salvador 
> ---
>  mm/page_alloc.c | 49 -
>  1 file changed, 20 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 60a0e94645a6..a163c5e561f2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5823,7 +5823,7 @@ static void build_zonelists(pg_data_t *pgdat)
>   * not check if the processor is online before following the pageset pointer.
>   * Other parts of the kernel may not check if the zone is available.
>   */
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> +static void setup_pageset(struct per_cpu_pageset *p);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
>  static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
>  
> @@ -5891,7 +5891,7 @@ build_all_zonelists_init(void)
>* (a chicken-egg dilemma).
>*/
>   for_each_possible_cpu(cpu)
> - setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> + setup_pageset(&per_cpu(boot_pageset, cpu));
>  
>   mminit_verify_zonelist();
>   cpuset_init_current_mems_allowed();
> @@ -6200,12 +6200,6 @@ static void pageset_update(struct per_cpu_pages *pcp, 
> unsigned long high,
>   pcp->batch = batch;
>  }
>  
> -/* a companion to pageset_set_high() */
> -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> -{
> - pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> -}
> -
>  static void pageset_init(struct per_cpu_pageset *p)
>  {
>   struct per_cpu_pages *pcp;
> @@ -6218,35 +6212,32 @@ static void pageset_init(struct per_cpu_pageset *p)
>   INIT_LIST_HEAD(&pcp->lists[migratetype]);
>  }
>  
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> +static void setup_pageset(struct per_cpu_pageset *p)
>  {
>   pageset_init(p);
> - pageset_set_batch(p, batch);
> + pageset_update(&p->pcp, 0, 1);
>  }
>  
>  /*
> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
> - * to the value high for the pageset p.
> + * Calculate and set new high and batch values for given per-cpu pageset of a
> + * zone, based on the zone's size and the percpu_pagelist_fraction sysctl.
>   */
> -static void pageset_set_high(struct per_cpu_pageset *p,
> - unsigned long high)
> -{
> - unsigned long batch = max(1UL, high / 4);
> - if ((high / 4) > (PAGE_SHIFT * 8))
> - batch = PAGE_SHIFT * 8;
> -
> - pageset_update(&p->pcp, high, batch);
> -}
> -
>  static void pageset_set_high_and_batch(struct zone *zone,
> -struct per_cpu_pageset *pcp)
> +struct per_cpu_pageset *p)
>  {
> - if (percpu_pagelist_fraction)
> - pageset_set_high(pcp,
> - (zone_managed_pages(zone) /
> - percpu_pagelist_fraction));
> - else
> - pageset_set_batch(pcp, zone_batchsize(zone));
> + unsigned long new_high, new_batch;
> +
> + if (percpu_pagelist_fraction) {
> + new_high = zone_managed_pages(zone) / percpu_pagelist_fraction;
> + new_batch = max(1UL, new_high / 4);
> + if ((new_high / 4) > (PAGE_SHIFT * 8))
> + new_batch = PAGE_SHIFT * 8;
> + } else {
> + new_batch = zone_batchsize(zone);
> + new_high = 6 * new_batch;
> + new_batch = max(1UL, 1 * new_batch);
> + }
> + pageset_update(&p->pcp, new_high, new_batch);
>  }
>  
>  static void __meminit zone_pageset_init(struct zone *zone, int cpu)
> 

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

2020-09-25 Thread Mickaël Salaün

On 25/09/2020 00:05, Pavel Machek wrote:
> Hi!
> 
> I believe you should simply delete confusing "introduction" and
> provide details of super-secure system where your patches would be
> useful, instead.

 This RFC talks about converting dynamic code (which cannot be 
 authenticated)
 to static code that can be authenticated using signature verification. That
 is the scope of this RFC.

 If I have not been clear before, by dynamic code, I mean machine code that 
 is
 dynamic in nature. Scripts are beyond the scope of this RFC.

 Also, malware compiled from sources is not dynamic code. That is orthogonal
 to this RFC. If such malware has a valid signature that the kernel permits 
 its
 execution, we have a systemic problem.

 I am not saying that script authentication or compiled malware are not 
 problems.
 I am just saying that this RFC is not trying to solve all of the security 
 problems.
 It is trying to define one way to convert dynamic code to static code to 
 address
 one class of problems.
>>>
>>> Well, you don't have to solve all problems at once.
>>>
>>> But solutions have to exist, and AFAIK in this case they don't. You
>>> are armoring doors, but ignoring open windows.
>>
>> FYI, script execution is being addressed (for the kernel part) by this
>> patch series:
>> https://lore.kernel.org/lkml/20200924153228.387737-1-...@digikod.net/
> 
> Ok.
> 
>>> Or very probably you are thinking about something different than
>>> normal desktop distros (Debian 10). Because on my systems, I have
>>> python, gdb and gcc...
>>
>> It doesn't make sense for a tailored security system to leave all these
>> tools available to an attacker.
> 
> And it also does not make sense to use "trampoline file descriptor" on
> generic system... while W^X should make sense there.

Well, as said before, (full/original/system-wide) W^X may require
trampfd (as well as other building-blocks).

I guess most Linux deployments are not on "generic systems"
anyway (even if they may be based on generic distros), and W^X
contradicts the fact that users/attackers can do whatever they want on
the system.

> 
>>> It would be nice to specify what other pieces need to be present for
>>> this to make sense -- because it makes no sense on Debian 10.
>>
>> Not all kernel features make sense for a generic/undefined usage,
>> especially specific security mechanisms (e.g. SELinux, Smack, Tomoyo,
>> SafeSetID, LoadPin, IMA, IPE, secure/trusted boot, lockdown, etc.), but
>> they can still be definitely useful.
> 
> Yep... so... I'd expect something like... "so you have single-purpose
> system

No one talked about a single-purpose system.

> with all script interpreters removed,

Not necessarily with the patch series I pointed out just before.

> IMA hashing all the files
> to make sure they are not modified, and W^X enabled.

System-wide W^X is not only for memory, and as Madhavan said: "this RFC
pertains to converting dynamic [writable] machine code to static
[non-writable] code".

> Attacker can
> still execute code after buffer overflow by  and trapoline file
> descriptor addrsses that"... so that people running generic systems
> can stop reading after first sentence.

Are you proposing to add a
"[feature-not-useful-without-a-proper-system-configuration]" tag in
subjects? :)

Re: [PATCH v3 -next] vdpa: mlx5: change Kconfig depends to fix build errors

2020-09-25 Thread Michael S. Tsirkin

On Fri, Sep 25, 2020 at 10:20:05AM +0300, Leon Romanovsky wrote:
> On Thu, Sep 24, 2020 at 12:02:43PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 24, 2020 at 08:47:05AM -0700, Randy Dunlap wrote:
> > > On 9/24/20 3:24 AM, Eli Cohen wrote:
> > > > On Thu, Sep 24, 2020 at 05:30:55AM -0400, Michael S. Tsirkin wrote:
> > >  --- linux-next-20200917.orig/drivers/vdpa/Kconfig
> > >  +++ linux-next-20200917/drivers/vdpa/Kconfig
> > >  @@ -31,7 +31,7 @@ config IFCVF
> > > 
> > >   config MLX5_VDPA
> > >   bool "MLX5 VDPA support library for ConnectX devices"
> > >  -depends on MLX5_CORE
> > >  +depends on VHOST_IOTLB && MLX5_CORE
> > >   default n
> > > >>>
> > > >>> While we are here, can anyone who apply this patch delete the 
> > > >>> "default n" line?
> > > >>> It is by default "n".
> > > >
> > > > I can do that
> > > >
> > > >>>
> > > >>> Thanks
> > > >>
> > > >> Hmm other drivers select VHOST_IOTLB, why not do the same?
> > >
> > > v1 used select, but Saeed requested use of depends instead because
> > > select can cause problems.
> > >
> > > > I can't see another driver doing that. Perhaps I can set dependency on
> > > > VHOST which by itself depends on VHOST_IOTLB?
> > > >>
> > > >>
> > >   help
> > > Support library for Mellanox VDPA drivers. Provides code that 
> > >  is
> > > 
> > > >>
> > >
> >
> > Saeed what kind of problems? It's used with select in other places,
> > isn't it?
> 
> IMHO, "depends" is much more explicit than "select".
> 
> Thanks

This is now how VHOST_IOTLB has been designed though.
If you want to change VHOST_IOTLB to depends I think
we should do it consistently all over.


config VHOST_IOTLB
tristate
help
  Generic IOTLB implementation for vhost and vringh.
  This option is selected by any driver which needs to support
  an IOMMU in software.


> >
> > > --
> > > ~Randy
> >

Re: [PATCH 3/9] mm, page_alloc: remove setup_pageset()

2020-09-25 Thread David Hildenbrand

On 22.09.20 16:37, Vlastimil Babka wrote:
> We initialize boot-time pagesets with setup_pageset(), which sets high and
> batch values that effectively disable pcplists.
> 
> We can remove this wrapper if we just set these values for all pagesets in
> pageset_init(). Non-boot pagesets then subsequently update them to the proper
> values.
> 
> No functional change.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  mm/page_alloc.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 26069c8d1b19..76c2b4578723 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5823,7 +5823,7 @@ static void build_zonelists(pg_data_t *pgdat)
>   * not check if the processor is online before following the pageset pointer.
>   * Other parts of the kernel may not check if the zone is available.
>   */
> -static void setup_pageset(struct per_cpu_pageset *p);
> +static void pageset_init(struct per_cpu_pageset *p);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
>  static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
>  
> @@ -5891,7 +5891,7 @@ build_all_zonelists_init(void)
>* (a chicken-egg dilemma).
>*/
>   for_each_possible_cpu(cpu)
> - setup_pageset(&per_cpu(boot_pageset, cpu));
> + pageset_init(&per_cpu(boot_pageset, cpu));
>  
>   mminit_verify_zonelist();
>   cpuset_init_current_mems_allowed();
> @@ -6210,12 +6210,15 @@ static void pageset_init(struct per_cpu_pageset *p)
>   pcp = &p->pcp;
>   for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
>   INIT_LIST_HEAD(&pcp->lists[migratetype]);
> -}
>  
> -static void setup_pageset(struct per_cpu_pageset *p)
> -{
> - pageset_init(p);
> - pageset_update(&p->pcp, 0, 1);
> + /*
> +  * Set batch and high values safe for a boot pageset. A true percpu
> +  * pageset's initialization will update them subsequently. Here we don't
> +  * need to be as careful as pageset_update() as nobody can access the
> +  * pageset yet.
> +  */
> + pcp->high = 0;
> + pcp->batch = 1;
>  }
>  
>  /*
> 

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

Re: [PATCH v3 -next] vdpa: mlx5: change Kconfig depends to fix build errors

2020-09-25 Thread Michael S. Tsirkin

On Thu, Sep 24, 2020 at 01:24:13PM +0300, Eli Cohen wrote:
> On Thu, Sep 24, 2020 at 05:30:55AM -0400, Michael S. Tsirkin wrote:
> > > > --- linux-next-20200917.orig/drivers/vdpa/Kconfig
> > > > +++ linux-next-20200917/drivers/vdpa/Kconfig
> > > > @@ -31,7 +31,7 @@ config IFCVF
> > > >
> > > >  config MLX5_VDPA
> > > > bool "MLX5 VDPA support library for ConnectX devices"
> > > > -   depends on MLX5_CORE
> > > > +   depends on VHOST_IOTLB && MLX5_CORE
> > > > default n
> > > 
> > > While we are here, can anyone who apply this patch delete the "default n" 
> > > line?
> > > It is by default "n".
> 
> I can do that
> 
> > > 
> > > Thanks
> > 
> > Hmm other drivers select VHOST_IOTLB, why not do the same?
> 
> I can't see another driver doing that.

Well grep VHOST_IOTLB and you will see some examples.

> Perhaps I can set dependency on
> VHOST which by itself depends on VHOST_IOTLB?

VHOST is processing virtio in the kernel. You don't really need that
for mlx, do you?

> > 
> > 
> > > > help
> > > >   Support library for Mellanox VDPA drivers. Provides code that 
> > > > is
> > > >
> >

Re: [RFC 2/2] printk: Add more information about the printk caller

2020-09-25 Thread Petr Mladek

On Fri 2020-09-25 09:54:00, Sergey Senozhatsky wrote:
> On (20/09/24 15:38), Petr Mladek wrote:
> [..]
> >
> > G, I wonder why I thought that in_irq() covered also the situation
> > when IRQ was disabled. It was likely my wish because disabled
> > interrupts are problem for printk() because the console might
> > cause a softlockup.
> 
> preempt_disable() can also trigger softlockup.
> 
> > in_irq() actually behaves like in_serving_softirq().
> >
> > I am confused and puzzled now. I wonder what contexts are actually
> > interesting for developers.  It goes back to the ideas from Sergey
> > about preemption disabled, ...
> 
> Are we talking about context tracking for LOG_CONT or context on
> the serial console and /dev/kmsg?

OK, it is clear that LOG_CONT need to know when a different code
is called suddenly. I mean task code vs. an interrupt handler.

But it was actually also the original purpose of the caller_id.
AFAIK, people wanted to sort related messages when they were mixed
with ones from other CPUs.

> If the latter, then my 5 cents, is that something like preemptible(),
> which checks
> 
>   (preempt_count() == 0 && !irqs_disabled())
> 
> does not look completely unreasonable.
> 
> We had a rather OK context tracking in printk() before, but for a
> completely different purpose:
> 
>console_may_schedule = !oops_in_progress &&
>preemptible() &&
>!rcu_preempt_depth();
> 
> We know that printk() can cause RCU stalls [0]. Tracking this part
> of the context state is sort of meaningful.
> 
> Let's look at this from this POV - why do we add in_irq()/etc tracking
> info? Perhaps because we want to connect the dots between printk() caller
> state and watchdog reports. Do we cover all watchdogs? No, I don't think
> so. RCU stalls, local_irq_disable(), preempt_disable() are not covered.

I agree that it would be handy to see this context as well. It might
make it easier when hunting down various lockups and stall. But
I have some concerns.

First, the information is not always reliable (PREEMPT_NONE). I wonder
if it might cause more harm than good. People might get confused
or they might want to fix it by some crazy printk code.

Second, the information might not be detailed enough. Many lockups
depends on the fact that a particular lock is held. Backtraces
are likely more important. Or people would need to distinguish
many contexts. It would require another complex code.

I am not sure that this is woth it. After all, it might be enough
to distinguish the 4 basic contexts just to allow sorting mixed
messages.

Best Regards,
Petr

[PATCH] drm/stm: dsi: Use dev_ based logging

2020-09-25 Thread Yannick Fertre

Standardize on the dev_ based logging and drop the include of drm_print.h.
Remove useless dsi_color_from_mipi function.

Signed-off-by: Yannick Fertre 
---
 drivers/gpu/drm/stm/dw_mipi_dsi-stm.c | 87 ++-
 1 file changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c 
b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
index 164f79ef6269..93fa8bfd3127 100644
--- a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
+++ b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
@@ -76,6 +76,7 @@ enum dsi_color {
 
 struct dw_mipi_dsi_stm {
void __iomem *base;
+   struct device *dev;
struct clk *pllref_clk;
struct dw_mipi_dsi *dsi;
u32 hw_version;
@@ -110,23 +111,6 @@ static inline void dsi_update_bits(struct dw_mipi_dsi_stm 
*dsi, u32 reg,
dsi_write(dsi, reg, (dsi_read(dsi, reg) & ~mask) | val);
 }
 
-static enum dsi_color dsi_color_from_mipi(enum mipi_dsi_pixel_format fmt)
-{
-   switch (fmt) {
-   case MIPI_DSI_FMT_RGB888:
-   return DSI_RGB888;
-   case MIPI_DSI_FMT_RGB666:
-   return DSI_RGB666_CONF2;
-   case MIPI_DSI_FMT_RGB666_PACKED:
-   return DSI_RGB666_CONF1;
-   case MIPI_DSI_FMT_RGB565:
-   return DSI_RGB565_CONF1;
-   default:
-   DRM_DEBUG_DRIVER("MIPI color invalid, so we use rgb888\n");
-   }
-   return DSI_RGB888;
-}
-
 static int dsi_pll_get_clkout_khz(int clkin_khz, int idf, int ndiv, int odf)
 {
int divisor = idf * odf;
@@ -205,14 +189,14 @@ static int dw_mipi_dsi_phy_init(void *priv_data)
ret = readl_poll_timeout(dsi->base + DSI_WISR, val, val & WISR_RRS,
 SLEEP_US, TIMEOUT_US);
if (ret)
-   DRM_DEBUG_DRIVER("!TIMEOUT! waiting REGU, let's continue\n");
+   dev_dbg(dsi->dev, "!TIMEOUT! waiting REGU, let's continue\n");
 
/* Enable the DSI PLL & wait for its lock */
dsi_set(dsi, DSI_WRPCR, WRPCR_PLLEN);
ret = readl_poll_timeout(dsi->base + DSI_WISR, val, val & WISR_PLLLS,
 SLEEP_US, TIMEOUT_US);
if (ret)
-   DRM_DEBUG_DRIVER("!TIMEOUT! waiting PLL, let's continue\n");
+   dev_dbg(dsi->dev, "!TIMEOUT! waiting PLL, let's continue\n");
 
return 0;
 }
@@ -221,7 +205,7 @@ static void dw_mipi_dsi_phy_power_on(void *priv_data)
 {
struct dw_mipi_dsi_stm *dsi = priv_data;
 
-   DRM_DEBUG_DRIVER("\n");
+   dev_dbg(dsi->dev, "\n");
 
/* Enable the DSI wrapper */
dsi_set(dsi, DSI_WCR, WCR_DSIEN);
@@ -231,7 +215,7 @@ static void dw_mipi_dsi_phy_power_off(void *priv_data)
 {
struct dw_mipi_dsi_stm *dsi = priv_data;
 
-   DRM_DEBUG_DRIVER("\n");
+   dev_dbg(dsi->dev, "\n");
 
/* Disable the DSI wrapper */
dsi_clear(dsi, DSI_WCR, WCR_DSIEN);
@@ -244,6 +228,7 @@ dw_mipi_dsi_get_lane_mbps(void *priv_data, const struct 
drm_display_mode *mode,
 {
struct dw_mipi_dsi_stm *dsi = priv_data;
unsigned int idf, ndiv, odf, pll_in_khz, pll_out_khz;
+   enum mipi_dsi_pixel_format fmt;
int ret, bpp;
u32 val;
 
@@ -267,11 +252,11 @@ dw_mipi_dsi_get_lane_mbps(void *priv_data, const struct 
drm_display_mode *mode,
 
if (pll_out_khz > dsi->lane_max_kbps) {
pll_out_khz = dsi->lane_max_kbps;
-   DRM_WARN("Warning max phy mbps is used\n");
+   dev_warn(dsi->dev, "Warning max phy mbps is used\n");
}
if (pll_out_khz < dsi->lane_min_kbps) {
pll_out_khz = dsi->lane_min_kbps;
-   DRM_WARN("Warning min phy mbps is used\n");
+   dev_warn(dsi->dev, "Warning min phy mbps is used\n");
}
 
/* Compute best pll parameters */
@@ -281,7 +266,7 @@ dw_mipi_dsi_get_lane_mbps(void *priv_data, const struct 
drm_display_mode *mode,
ret = dsi_pll_get_params(dsi, pll_in_khz, pll_out_khz,
 &idf, &ndiv, &odf);
if (ret)
-   DRM_WARN("Warning dsi_pll_get_params(): bad params\n");
+   dev_warn(dsi->dev, "Warning dsi_pll_get_params(): bad 
params\n");
 
/* Get the adjusted pll out value */
pll_out_khz = dsi_pll_get_clkout_khz(pll_in_khz, idf, ndiv, odf);
@@ -297,14 +282,31 @@ dw_mipi_dsi_get_lane_mbps(void *priv_data, const struct 
drm_display_mode *mode,
/* Select video mode by resetting DSIM bit */
dsi_clear(dsi, DSI_WCFGR, WCFGR_DSIM);
 
+   switch (format) {
+   case MIPI_DSI_FMT_RGB888:
+   fmt = DSI_RGB888;
+   break;
+   case MIPI_DSI_FMT_RGB666:
+   fmt = DSI_RGB666_CONF2;
+   break;
+   case MIPI_DSI_FMT_RGB666_PACKED:
+   fmt = DSI_RGB666_CONF1;
+   break;
+   case MIPI_DSI_FMT_RGB565:
+   fmt = DSI_RGB565_CONF1;
+   break;
+   default:
+   fmt = DSI_RGB888;
+

Re: [PATCH net 1/1] net: stmmac: Fix clock handling on remove path

2020-09-25 Thread Andy Shevchenko

On Fri, Sep 25, 2020 at 12:54 PM Wong Vee Khee  wrote:
>
> While unloading the dwmac-intel driver, clk_disable_unprepare() is
> being called twice in stmmac_dvr_remove() and
> intel_eth_pci_remove(). This causes kernel panic on the second call.
>
> Removing the second call of clk_disable_unprepare() in
> intel_eth_pci_remove().

Thanks! I'm not sure how I missed this...

Reviewed-by: Andy Shevchenko 

> Fixes: 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove 
> paths")
> Cc: Andy Shevchenko 
> Reviewed-by: Voon Weifeng 
> Signed-off-by: Wong Vee Khee 
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
> index 2ac9dfb3462c..9e6d60e75f85 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
> @@ -653,7 +653,6 @@ static void intel_eth_pci_remove(struct pci_dev *pdev)
>
> pci_free_irq_vectors(pdev);
>
> -   clk_disable_unprepare(priv->plat->stmmac_clk);
> clk_unregister_fixed_rate(priv->plat->stmmac_clk);
>
> pcim_iounmap_regions(pdev, BIT(0));
> --
> 2.17.0
>


-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 4/9] mm, page_alloc: simplify pageset_update()

2020-09-25 Thread David Hildenbrand

On 22.09.20 16:37, Vlastimil Babka wrote:
> pageset_update() attempts to update pcplist's high and batch values in a way
> that readers don't observe batch > high. It uses smp_wmb() to order the 
> updates
> in a way to achieve this. However, without proper pairing read barriers in
> readers this guarantee doesn't hold, and there are no such barriers in
> e.g. free_unref_page_commit().
> 
> Commit 88e8ac11d2ea ("mm, page_alloc: fix core hung in free_pcppages_bulk()")
> already showed this is problematic, and solved this by ultimately only trusing
> pcp->count of the current cpu with interrupts disabled.
> 
> The update dance with unpaired write barriers thus makes no sense. Replace
> them with plain WRITE_ONCE to prevent store tearing, and document that the
> values can change asynchronously and should not be trusted for correctness.
> 
> All current readers appear to be OK after 88e8ac11d2ea. Convert them to
> READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybody
> making future changes to the code that special care is needed.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  mm/page_alloc.c | 40 ++--
>  1 file changed, 18 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 76c2b4578723..99b74c1c2b0a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1297,7 +1297,7 @@ static void free_pcppages_bulk(struct zone *zone, int 
> count,
>  {
>   int migratetype = 0;
>   int batch_free = 0;
> - int prefetch_nr = 0;
> + int prefetch_nr = READ_ONCE(pcp->batch);
>   bool isolated_pageblocks;
>   struct page *page, *tmp;
>   LIST_HEAD(head);
> @@ -1348,8 +1348,10 @@ static void free_pcppages_bulk(struct zone *zone, int 
> count,
>* avoid excessive prefetching due to large count, only
>* prefetch buddy for the first pcp->batch nr of pages.
>*/
> - if (prefetch_nr++ < pcp->batch)
> + if (prefetch_nr) {
>   prefetch_buddy(page);
> + prefetch_nr--;
> + }
>   } while (--count && --batch_free && !list_empty(list));
>   }
>  
> @@ -3131,10 +3133,8 @@ static void free_unref_page_commit(struct page *page, 
> unsigned long pfn)
>   pcp = &this_cpu_ptr(zone->pageset)->pcp;
>   list_add(&page->lru, &pcp->lists[migratetype]);
>   pcp->count++;
> - if (pcp->count >= pcp->high) {
> - unsigned long batch = READ_ONCE(pcp->batch);
> - free_pcppages_bulk(zone, batch, pcp);
> - }
> + if (pcp->count >= READ_ONCE(pcp->high))
> + free_pcppages_bulk(zone, READ_ONCE(pcp->batch), pcp);
>  }
>  
>  /*
> @@ -3318,7 +3318,7 @@ static struct page *__rmqueue_pcplist(struct zone 
> *zone, int migratetype,
>   do {
>   if (list_empty(list)) {
>   pcp->count += rmqueue_bulk(zone, 0,
> - pcp->batch, list,
> + READ_ONCE(pcp->batch), list,
>   migratetype, alloc_flags);
>   if (unlikely(list_empty(list)))
>   return NULL;
> @@ -6174,13 +6174,16 @@ static int zone_batchsize(struct zone *zone)
>  }
>  
>  /*
> - * pcp->high and pcp->batch values are related and dependent on one another:
> - * ->batch must never be higher then ->high.
> - * The following function updates them in a safe manner without read side
> - * locking.
> + * pcp->high and pcp->batch values are related and generally batch is lower
> + * than high. They are also related to pcp->count such that count is lower
> + * than high, and as soon as it reaches high, the pcplist is flushed.
>   *
> - * Any new users of pcp->batch and pcp->high should ensure they can cope with
> - * those fields changing asynchronously (acording to the above rule).
> + * However, guaranteeing these relations at all times would require e.g. 
> write
> + * barriers here but also careful usage of read barriers at the read side, 
> and
> + * thus be prone to error and bad for performance. Thus the update only 
> prevents
> + * store tearing. Any new users of pcp->batch and pcp->high should ensure 
> they
> + * can cope with those fields changing asynchronously, and fully trust only 
> the
> + * pcp->count field on the local CPU with interrupts disabled.
>   *
>   * mutex_is_locked(&pcp_batch_high_lock) required when calling this function
>   * outside of boot time (or some other assurance that no concurrent updaters
> @@ -6189,15 +6192,8 @@ static int zone_batchsize(struct zone *zone)
>  static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
>   unsigned long batch)
>  {
> -   /* start with a fail safe value for batch */
> - pcp->batch = 1;
> - smp_wmb();
> -
> -   /* Update high, then batch, in

Re: [PATCH 5/9] mm, page_alloc: make per_cpu_pageset accessible only after init

2020-09-25 Thread David Hildenbrand

On 22.09.20 16:37, Vlastimil Babka wrote:
> setup_zone_pageset() replaces the boot_pageset by allocating and initializing 
> a
> proper percpu one. Currently it assigns zone->pageset with the newly allocated
> one before initializing it. That's currently not an issue, because the zone
> should not be in any zonelist, thus not visible to allocators at this point.
> 
> Memory ordering between the pcplist contents and its visibility is also not
> guaranteed here, but that also shouldn't be an issue because online_pages()
> does a spin_unlock(pgdat->node_size_lock) before building the zonelists.
> 
> However it's best that we don't silently rely on operations that can be 
> changed
> in the future. Make sure only properly initialized pcplists are visible, using
> smp_store_release(). The read side has a data dependency via the zone->pageset
> pointer instead of an explicit read barrier.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  mm/page_alloc.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 99b74c1c2b0a..de3b48bda45c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6246,15 +6246,17 @@ static void zone_set_pageset_high_and_batch(struct 
> zone *zone)
>  
>  void __meminit setup_zone_pageset(struct zone *zone)
>  {
> + struct per_cpu_pageset __percpu * new_pageset;
>   struct per_cpu_pageset *p;
>   int cpu;
>  
> - zone->pageset = alloc_percpu(struct per_cpu_pageset);
> + new_pageset = alloc_percpu(struct per_cpu_pageset);
>   for_each_possible_cpu(cpu) {
> - p = per_cpu_ptr(zone->pageset, cpu);
> + p = per_cpu_ptr(new_pageset, cpu);
>   pageset_init(p);
>   }
>  
> + smp_store_release(&zone->pageset, new_pageset);
>   zone_set_pageset_high_and_batch(zone);
>  }
>  
> 

Acked-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

Re: [PATCH 02/12] soc: mediatek: Add MediaTek SCPSYS power domains

2020-09-25 Thread Weiyi Lu

On Thu, 2020-09-10 at 19:28 +0200, Enric Balletbo i Serra wrote:
> The System Control Processor System (SCPSYS) has several power management
> related tasks in the system. This driver implements support to handle
> the different power domains supported in order to meet high performance
> and low power requirements.
> 
> Co-developed-by: Matthias Brugger 
> Signed-off-by: Matthias Brugger 
> Signed-off-by: Enric Balletbo i Serra 
> ---
> 
>  drivers/soc/mediatek/Kconfig  |  13 +
>  drivers/soc/mediatek/Makefile |   1 +
>  drivers/soc/mediatek/mtk-pm-domains.c | 626 ++
>  3 files changed, 640 insertions(+)
>  create mode 100644 drivers/soc/mediatek/mtk-pm-domains.c
> 
> diff --git a/drivers/soc/mediatek/Kconfig b/drivers/soc/mediatek/Kconfig
> index 59a56cd790ec..68d800f9e4a5 100644
> --- a/drivers/soc/mediatek/Kconfig
> +++ b/drivers/soc/mediatek/Kconfig
> @@ -44,6 +44,19 @@ config MTK_SCPSYS
> Say yes here to add support for the MediaTek SCPSYS power domain
> driver.
>  
> +config MTK_SCPSYS_PM_DOMAINS
> + bool "MediaTek SCPSYS generic power domain"
> + default ARCH_MEDIATEK
> + depends on PM
> + depends on MTK_INFRACFG
> + select PM_GENERIC_DOMAINS
> + select REGMAP
> + help
> +   Say y here to enable power domain support.
> +   In order to meet high performance and low power requirements, the 
> System
> +   Control Processor System (SCPSYS) has several power management related
> +   tasks in the system.
> +
>  config MTK_MMSYS
>   bool "MediaTek MMSYS Support"
>   default ARCH_MEDIATEK
> diff --git a/drivers/soc/mediatek/Makefile b/drivers/soc/mediatek/Makefile
> index 01f9f873634a..1e60fb4f89d4 100644
> --- a/drivers/soc/mediatek/Makefile
> +++ b/drivers/soc/mediatek/Makefile
> @@ -3,4 +3,5 @@ obj-$(CONFIG_MTK_CMDQ) += mtk-cmdq-helper.o
>  obj-$(CONFIG_MTK_INFRACFG) += mtk-infracfg.o
>  obj-$(CONFIG_MTK_PMIC_WRAP) += mtk-pmic-wrap.o
>  obj-$(CONFIG_MTK_SCPSYS) += mtk-scpsys.o
> +obj-$(CONFIG_MTK_SCPSYS_PM_DOMAINS) += mtk-pm-domains.o
>  obj-$(CONFIG_MTK_MMSYS) += mtk-mmsys.o
> diff --git a/drivers/soc/mediatek/mtk-pm-domains.c 
> b/drivers/soc/mediatek/mtk-pm-domains.c
> new file mode 100644
> index ..db631dbaf2e3
> --- /dev/null
> +++ b/drivers/soc/mediatek/mtk-pm-domains.c
> @@ -0,0 +1,626 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2020 Collabora Ltd.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MTK_POLL_DELAY_US   10
> +#define MTK_POLL_TIMEOUTUSEC_PER_SEC
> +
> +#define MTK_SCPD_ACTIVE_WAKEUP   BIT(0)
> +#define MTK_SCPD_FWAIT_SRAM  BIT(1)
> +#define MTK_SCPD_CAPS(_scpd, _x) ((_scpd)->data->caps & (_x))
> +
> +#define SPM_VDE_PWR_CON  0x0210
> +#define SPM_MFG_PWR_CON  0x0214
> +#define SPM_VEN_PWR_CON  0x0230
> +#define SPM_ISP_PWR_CON  0x0238
> +#define SPM_DIS_PWR_CON  0x023c
> +#define SPM_VEN2_PWR_CON 0x0298
> +#define SPM_AUDIO_PWR_CON0x029c
> +#define SPM_MFG_2D_PWR_CON   0x02c0
> +#define SPM_MFG_ASYNC_PWR_CON0x02c4
> +#define SPM_USB_PWR_CON  0x02cc
> +

If me, I'd choose to write directly into the data of each SoC now
because it's inconsistent on most MediatTek chips.

> +#define SPM_PWR_STATUS   0x060c
> +#define SPM_PWR_STATUS_2ND   0x0610
> +
> +#define PWR_RST_B_BITBIT(0)
> +#define PWR_ISO_BIT  BIT(1)
> +#define PWR_ON_BIT   BIT(2)
> +#define PWR_ON_2ND_BIT   BIT(3)
> +#define PWR_CLK_DIS_BIT  BIT(4)
> +
> +#define PWR_STATUS_DISP  BIT(3)
> +#define PWR_STATUS_MFG   BIT(4)
> +#define PWR_STATUS_ISP   BIT(5)
> +#define PWR_STATUS_VDEC  BIT(7)
> +#define PWR_STATUS_VENC_LT   BIT(20)
> +#define PWR_STATUS_VENC  BIT(21)
> +#define PWR_STATUS_MFG_2DBIT(22)
> +#define PWR_STATUS_MFG_ASYNC BIT(23)
> +#define PWR_STATUS_AUDIO BIT(24)
> +#define PWR_STATUS_USB   BIT(25)
> +

same here for the status bits.

> +struct scpsys_bus_prot_data {
> + u32 bus_prot_mask;
> + bool bus_prot_reg_update;
> +};
> +
> +/**
> + * struct scpsys_domain_data - scp domain data for power on/off flow
> + * @sta_mask: The mask for power on/off status bit.
> + * @ctl_offs: The offset for main power control register.
> + * @sram_pdn_bits: The mask for sram power control bits.
> + * @sram_pdn_ack_bits: The mask for sram power control acked bits.
> + * @caps: The flag for active wake-up action.
> + * @bp_infracfg: bus protection for infracfg subsystem
> + */
> +struct scps

Re: [RFC-PATCH 2/4] mm: Add __rcu_alloc_page_lockless() func.

2020-09-25 Thread Uladzislau Rezki

On Fri, Sep 25, 2020 at 10:15:52AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 24, 2020 at 05:21:12PM +0200, Uladzislau Rezki wrote:
> > On Thu, Sep 24, 2020 at 01:19:07PM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 24, 2020 at 10:16:14AM +0200, Uladzislau Rezki wrote:
> > > > The key point is "enough". We need pages to make a) fast progress b) 
> > > > support
> > > > single argument of kvfree_rcu(one_arg). Not vice versa. That "enough" 
> > > > depends
> > > > on scheduler latency and vague pre-allocated number of pages, it might
> > > > be not enough what would require to refill it more and more or we can 
> > > > overshoot
> > > > that would lead to memory overhead. So we have here timing issues and
> > > > not accurate model. IMHO.
> > > 
> > > I'm firmly opposed to the single argument kvfree_rcu() idea, that's
> > > requiring memory to free memory.
> > > 
> > Hmm.. The problem is there is a demand in it:
> 
> People demand ponies all the time, the usual answer is: No.
>
I see your view. From the other hand it is clear, there is still
demand in it:


void ext4_kvfree_array_rcu(void *to_free)
{
 struct ext4_rcu_ptr *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);

 if (ptr) {
  ptr->ptr = to_free;
  call_rcu(&ptr->rcu, ext4_rcu_ptr_callback);
  return;
 }
 synchronize_rcu();
 kvfree(ptr);
}


--
Vlad Rezki

Re: [PATCH] tpm: of: avoid __va() translation for event log address

2020-09-25 Thread Jarkko Sakkinen

On Fri, Sep 25, 2020 at 09:00:56AM +0200, Ard Biesheuvel wrote:
> On Fri, 25 Sep 2020 at 07:56, Jarkko Sakkinen
>  wrote:
> >
> > On Tue, Sep 22, 2020 at 11:41:28AM +0200, Ard Biesheuvel wrote:
> > > The TPM event log is provided to the OS by the firmware, by loading
> > > it into an area in memory and passing the physical address via a node
> > > in the device tree.
> > >
> > > Currently, we use __va() to access the memory via the kernel's linear
> > > map: however, it is not guaranteed that the linear map covers this
> > > particular address, as we may be running under HIGHMEM on a 32-bit
> > > architecture, or running firmware that uses a memory type for the
> > > event log that is omitted from the linear map (such as EfiReserved).
> >
> > Makes perfect sense to the level that I wonder if this should have a
> > fixes tag and/or needs to be backported to the stable kernels?
> >
> 
> AIUI, the code was written specifically for ppc64, which is a
> non-highmem, non-EFI architecture. However, when we start reusing this
> driver for ARM, this issue could pop up.
> 
> The code itself has been refactored a couple of times, so I think it
> will require different versions of the patch for different generations
> of stable kernels.
> 
> So perhaps just add Cc: , and wait and see how
> far back it applies cleanly?

Yeah, I think I'll cc it with some note before the diffstat.

I'm thinking to cap it to only 5.x kernels (at least first) unless it is
dead easy to backport below that.

> > This is a really great catch!
> >
> > I'm a bit late of my PR a bit because of SGX upstreaming madness
> > (sending v39 soon). If you can answer to my question above, I can do
> > that nitpick change to patch and get it to my v5.10 PR.
> >
> 
> Yes, please.

Great, will do, thanks again for fixing this issue!

/Jarkko

[PATCH v2 0/4] clk: qcom : add sm8250 LPASS GFM drivers

2020-09-25 Thread Srinivas Kandagatla

This patchset adds support for GFM Muxes found in LPASS
(Low Power Audio SubSystem) IP in Audio Clock Controller
and Always ON clock controller.

Clocks derived from these muxes are consumed by LPASS Digital Codec.
Currently the driver for Audio and Always ON clock controller only
supports GFM Muxes, however it should be easy to add more clock
support when required

Changes since v1:
 -removed unnecessary Kconfig dependencies
 - cleaned up header includes.
 - moved to using pm_clk
 - Moved to right place in Makefile
 - moved to use module_platform_driver instead of builtin_platform_driver
 - add null check for of_device_get_match_data 

verified dt_binding_check to pass on linux next 
https://paste.ubuntu.com/p/6nVzjRwvsW/


Srinivas Kandagatla (4):
  dt-bindings: clock: Add support for LPASS Audio Clock Controller
  dt-bindings: clock: Add support for LPASS Always ON Controller
  clk: qcom: Add support to LPASS AUDIO_CC Glitch Free Mux clocks
  clk: qcom: Add support to LPASS AON_CC Glitch Free Mux clocks

 .../bindings/clock/qcom,aoncc-sm8250.yaml |  58 
 .../bindings/clock/qcom,audiocc-sm8250.yaml   |  58 
 drivers/clk/qcom/Kconfig  |   6 +
 drivers/clk/qcom/Makefile |   1 +
 drivers/clk/qcom/lpass-gfm-sm8250.c   | 323 ++
 .../clock/qcom,sm8250-lpass-aoncc.h   |  11 +
 .../clock/qcom,sm8250-lpass-audiocc.h |  13 +
 7 files changed, 470 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/qcom,aoncc-sm8250.yaml
 create mode 100644 
Documentation/devicetree/bindings/clock/qcom,audiocc-sm8250.yaml
 create mode 100644 drivers/clk/qcom/lpass-gfm-sm8250.c
 create mode 100644 include/dt-bindings/clock/qcom,sm8250-lpass-aoncc.h
 create mode 100644 include/dt-bindings/clock/qcom,sm8250-lpass-audiocc.h

-- 
2.21.0

[PATCH v2 4/4] clk: qcom: Add support to LPASS AON_CC Glitch Free Mux clocks

2020-09-25 Thread Srinivas Kandagatla

LPASS Always ON Clock controller has one GFM mux to control VA
and TX clocks to codec macro on LPASS.
This patch adds support to this mux.

Signed-off-by: Srinivas Kandagatla 
---
 drivers/clk/qcom/lpass-gfm-sm8250.c | 63 +
 1 file changed, 63 insertions(+)

diff --git a/drivers/clk/qcom/lpass-gfm-sm8250.c 
b/drivers/clk/qcom/lpass-gfm-sm8250.c
index c79854e1494d..61a89347885e 100644
--- a/drivers/clk/qcom/lpass-gfm-sm8250.c
+++ b/drivers/clk/qcom/lpass-gfm-sm8250.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct lpass_gfm {
struct device *dev;
@@ -66,6 +67,46 @@ static const struct clk_ops clk_gfm_ops = {
.determine_rate = __clk_mux_determine_rate,
 };
 
+static struct clk_gfm lpass_gfm_va_mclk = {
+   .mux_reg = 0x2,
+   .mux_mask = BIT(0),
+   .hw.init = &(struct clk_init_data) {
+   .name = "VA_MCLK",
+   .ops = &clk_gfm_ops,
+   .flags = CLK_SET_RATE_PARENT | CLK_OPS_PARENT_ENABLE,
+   .num_parents = 2,
+   .parent_data = (const struct clk_parent_data[]){
+   {
+   .index = 0,
+   .name = "LPASS_CLK_ID_TX_CORE_MCLK",
+   }, {
+   .index = 1,
+   .name = "LPASS_CLK_ID_VA_CORE_MCLK",
+   },
+   },
+   },
+};
+
+static struct clk_gfm lpass_gfm_tx_npl = {
+   .mux_reg = 0x2,
+   .mux_mask = BIT(0),
+   .hw.init = &(struct clk_init_data) {
+   .name = "TX_NPL",
+   .ops = &clk_gfm_ops,
+   .flags = CLK_SET_RATE_PARENT | CLK_OPS_PARENT_ENABLE,
+   .parent_data = (const struct clk_parent_data[]){
+   {
+   .index = 0,
+   .name = "LPASS_CLK_ID_TX_CORE_NPL_MCLK",
+   }, {
+   .index = 1,
+   .name = "LPASS_CLK_ID_VA_CORE_2X_MCLK",
+   },
+   },
+   .num_parents = 2,
+   },
+};
+
 static struct clk_gfm lpass_gfm_wsa_mclk = {
.mux_reg = 0x220d8,
.mux_mask = BIT(0),
@@ -146,6 +187,19 @@ static struct clk_gfm lpass_gfm_rx_npl = {
},
 };
 
+static struct clk_gfm *aoncc_gfm_clks[] = {
+   [LPASS_CDC_VA_MCLK] = &lpass_gfm_va_mclk,
+   [LPASS_CDC_TX_NPL]  = &lpass_gfm_tx_npl,
+};
+
+static struct clk_hw_onecell_data aoncc_hw_onecell_data = {
+   .hws = {
+   [LPASS_CDC_VA_MCLK] = &lpass_gfm_va_mclk.hw,
+   [LPASS_CDC_TX_NPL]  = &lpass_gfm_tx_npl.hw,
+   },
+   .num = ARRAY_SIZE(aoncc_gfm_clks),
+};
+
 static struct clk_gfm *audiocc_gfm_clks[] = {
[LPASS_CDC_WSA_NPL] = &lpass_gfm_wsa_npl,
[LPASS_CDC_WSA_MCLK]= &lpass_gfm_wsa_mclk,
@@ -173,6 +227,11 @@ static struct lpass_gfm_data audiocc_data = {
.gfm_clks = audiocc_gfm_clks,
 };
 
+static struct lpass_gfm_data aoncc_data = {
+   .onecell_data = &aoncc_hw_onecell_data,
+   .gfm_clks = aoncc_gfm_clks,
+};
+
 static int lpass_gfm_clk_driver_probe(struct platform_device *pdev)
 {
const struct lpass_gfm_data *data;
@@ -236,6 +295,10 @@ static int lpass_gfm_clk_driver_probe(struct 
platform_device *pdev)
 }
 
 static const struct of_device_id lpass_gfm_clk_match_table[] = {
+   {
+   .compatible = "qcom,sm8250-lpass-aoncc",
+   .data = &aoncc_data,
+   },
{
.compatible = "qcom,sm8250-lpass-audiocc",
.data = &audiocc_data,
-- 
2.21.0

[PATCH v2 2/4] dt-bindings: clock: Add support for LPASS Always ON Controller

2020-09-25 Thread Srinivas Kandagatla

Always ON Clock controller is a block inside LPASS which controls
1 Glitch free muxes to LPASS codec Macros.

Signed-off-by: Srinivas Kandagatla 
---
 .../bindings/clock/qcom,aoncc-sm8250.yaml | 58 +++
 .../clock/qcom,sm8250-lpass-aoncc.h   | 11 
 2 files changed, 69 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/qcom,aoncc-sm8250.yaml
 create mode 100644 include/dt-bindings/clock/qcom,sm8250-lpass-aoncc.h

diff --git a/Documentation/devicetree/bindings/clock/qcom,aoncc-sm8250.yaml 
b/Documentation/devicetree/bindings/clock/qcom,aoncc-sm8250.yaml
new file mode 100644
index ..c40a74b5d672
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/qcom,aoncc-sm8250.yaml
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/qcom,aoncc-sm8250.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Clock bindings for LPASS Always ON Clock Controller on SM8250 SoCs
+
+maintainers:
+  - Srinivas Kandagatla 
+
+description: |
+  The clock consumer should specify the desired clock by having the clock
+  ID in its "clocks" phandle cell.
+  See include/dt-bindings/clock/qcom,sm8250-lpass-aoncc.h for the full list
+  of Audio Clock controller clock IDs.
+
+properties:
+  compatible:
+const: qcom,sm8250-lpass-aon
+
+  reg:
+maxItems: 1
+
+  '#clock-cells':
+const: 1
+
+  clocks:
+items:
+  - description: LPASS Core voting clock
+  - description: Glitch Free Mux register clock
+
+  clock-names:
+items:
+  - const: core
+  - const: bus
+
+required:
+  - compatible
+  - reg
+  - '#clock-cells'
+  - clocks
+  - clock-names
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+clock-controller@380 {
+  #clock-cells = <1>;
+  compatible = "qcom,sm8250-lpass-aon";
+  reg = <0x0338 0x4>;
+  clocks = <&q6afecc LPASS_HW_MACRO_VOTE LPASS_CLK_ATTRIBUTE_COUPLE_NO>,
+   <&q6afecc LPASS_CLK_ID_TX_CORE_MCLK 
LPASS_CLK_ATTRIBUTE_COUPLE_NO>;
+  clock-names = "core", "bus";
+};
diff --git a/include/dt-bindings/clock/qcom,sm8250-lpass-aoncc.h 
b/include/dt-bindings/clock/qcom,sm8250-lpass-aoncc.h
new file mode 100644
index ..f5a1cfac8612
--- /dev/null
+++ b/include/dt-bindings/clock/qcom,sm8250-lpass-aoncc.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _DT_BINDINGS_CLK_LPASS_AONCC_SM8250_H
+#define _DT_BINDINGS_CLK_LPASS_AONCC_SM8250_H
+
+/* from AOCC */
+#define LPASS_CDC_VA_MCLK  0
+#define LPASS_CDC_TX_NPL   1
+#define LPASS_CDC_TX_MCLK  2
+
+#endif /* _DT_BINDINGS_CLK_LPASS_AONCC_SM8250_H */
-- 
2.21.0

Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation

2020-09-25 Thread Mark Rutland

Hi,

Sorry to come to this so late; I've been meaning to provide feedback on
this for a while but have been indisposed for a bit due to an injury.

On Fri, Sep 25, 2020 at 11:50:29AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 25, 2020 at 11:00:30AM +0200, David Hildenbrand wrote:
> > On 25.09.20 09:41, Peter Zijlstra wrote:
> > > On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote:
> > >> From: Mike Rapoport 
> > >>
> > >> Removing a PAGE_SIZE page from the direct map every time such page is
> > >> allocated for a secret memory mapping will cause severe fragmentation of
> > >> the direct map. This fragmentation can be reduced by using PMD-size pages
> > >> as a pool for small pages for secret memory mappings.
> > >>
> > >> Add a gen_pool per secretmem inode and lazily populate this pool with
> > >> PMD-size pages.
> > > 
> > > What's the actual efficacy of this? Since the pmd is per inode, all I
> > > need is a lot of inodes and we're in business to destroy the directmap,
> > > no?
> > > 
> > > Afaict there's no privs needed to use this, all a process needs is to
> > > stay below the mlock limit, so a 'fork-bomb' that maps a single secret
> > > page will utterly destroy the direct map.
> > > 
> > > I really don't like this, at all.
> > 
> > As I expressed earlier, I would prefer allowing allocation of secretmem
> > only from a previously defined CMA area. This would physically locally
> > limit the pain.
> 
> Given that this thing doesn't have a migrate hook, that seems like an
> eminently reasonable contraint. Because not only will it mess up the
> directmap, it will also destroy the ability of the page-allocator /
> compaction to re-form high order blocks by sprinkling holes throughout.
> 
> Also, this is all very close to XPFO, yet I don't see that mentioned
> anywhere.

Agreed. I think if we really need something like this, something between
XPFO and DEBUG_PAGEALLOC would be generally better, since:

* Secretmem puts userspace in charge of kernel internals (AFAICT without
  any ulimits?), so that seems like an avenue for malicious or buggy
  userspace to exploit and trigger DoS, etc. The other approaches leave
  the kernel in charge at all times, and it's a system-level choice
  which is easier to reason about and test.

* Secretmem interaction with existing ABIs is unclear. Should uaccess
  primitives work for secretmem? If so, this means that it's not valid
  to transform direct uaccesses in syscalls etc into accesses via the
  linear/direct map. If not, how do we prevent syscalls? The other
  approaches are clear that this should always work, but the kernel
  should avoid mappings wherever possible.

* The uncached option doesn't work in a number of situations, such as
  systems which are purely cache coherent at all times, or where the
  hypervisor has overridden attributes. The kernel cannot even know that
  whther this works as intended. On its own this doens't solve a
  particular problem, and I think this is a solution looking for a
  problem.

... and fundamentally, this seems like a "more security, please" option
that is going to be abused, since everyone wants security, regardless of
how we say it *should* be used. The few use-cases that may make sense
(e.g. protection of ketys and/or crypto secrrets), aren't going to be
able to rely on this (since e.g. other uses may depelete memory pools),
so this is going to be best-effort. With all that in mind, I struggle to
beleive that this is going to be worth the maintenance cost (e.g. with
any issues arising from uaccess, IO, etc).

Overall, I would prefer to not see this syscall in the kernel.

> Further still, it has this HAVE_SECRETMEM_UNCACHED nonsense which is
> completely unused. I'm not at all sure exposing UNCACHED to random
> userspace is a sane idea.

I agree the uncached stuff should be removed. It is at best misleading
since the kernel can't guarantee it does what it says, I think it's
liable to lead to issues in future (e.g. since it can cause memory
operations to raise different exceptions relative to what they can
today), and as above it seems like a solution looking for a problem.

Thanks,
Mark.

[PATCH v2 3/4] clk: qcom: Add support to LPASS AUDIO_CC Glitch Free Mux clocks

2020-09-25 Thread Srinivas Kandagatla

GFM Muxes in AUDIO_CC control clocks to LPASS WSA and RX Codec Macros.
This patch adds support to these muxes.

Signed-off-by: Srinivas Kandagatla 
---
 drivers/clk/qcom/Kconfig|   6 +
 drivers/clk/qcom/Makefile   |   1 +
 drivers/clk/qcom/lpass-gfm-sm8250.c | 260 
 3 files changed, 267 insertions(+)
 create mode 100644 drivers/clk/qcom/lpass-gfm-sm8250.c

diff --git a/drivers/clk/qcom/Kconfig b/drivers/clk/qcom/Kconfig
index 058327310c25..08078f4b0591 100644
--- a/drivers/clk/qcom/Kconfig
+++ b/drivers/clk/qcom/Kconfig
@@ -475,4 +475,10 @@ config KRAITCC
  Support for the Krait CPU clocks on Qualcomm devices.
  Say Y if you want to support CPU frequency scaling.
 
+config CLK_GFM_LPASS_SM8250
+   tristate "GFM LPASS Clocks"
+   help
+ Support for the GFM Glitch Free Mux LPASS clock. Say Y
+ if you want to support GFM Clocks on LPASS for SM8250 SoC.
+
 endif
diff --git a/drivers/clk/qcom/Makefile b/drivers/clk/qcom/Makefile
index 9677e769e7e9..7465cabc86aa 100644
--- a/drivers/clk/qcom/Makefile
+++ b/drivers/clk/qcom/Makefile
@@ -19,6 +19,7 @@ clk-qcom-$(CONFIG_QCOM_GDSC) += gdsc.o
 # Keep alphabetically sorted by config
 obj-$(CONFIG_APQ_GCC_8084) += gcc-apq8084.o
 obj-$(CONFIG_APQ_MMCC_8084) += mmcc-apq8084.o
+obj-$(CONFIG_CLK_GFM_LPASS_SM8250) += lpass-gfm-sm8250.o
 obj-$(CONFIG_IPQ_APSS_PLL) += apss-ipq-pll.o
 obj-$(CONFIG_IPQ_APSS_6018) += apss-ipq6018.o
 obj-$(CONFIG_IPQ_GCC_4019) += gcc-ipq4019.o
diff --git a/drivers/clk/qcom/lpass-gfm-sm8250.c 
b/drivers/clk/qcom/lpass-gfm-sm8250.c
new file mode 100644
index ..c79854e1494d
--- /dev/null
+++ b/drivers/clk/qcom/lpass-gfm-sm8250.c
@@ -0,0 +1,260 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * LPASS Audio CC and Always ON CC Glitch Free Mux clock driver
+ *
+ * Copyright (c) 2020 Linaro Ltd.
+ * Author: Srinivas Kandagatla 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct lpass_gfm {
+   struct device *dev;
+   void __iomem *base;
+};
+
+struct clk_gfm {
+   unsigned int mux_reg;
+   unsigned int mux_mask;
+   struct clk_hw   hw;
+   struct lpass_gfm *priv;
+   void __iomem *gfm_mux;
+};
+
+#define GFM_MASK   BIT(1)
+#define to_clk_gfm(_hw) container_of(_hw, struct clk_gfm, hw)
+
+static u8 clk_gfm_get_parent(struct clk_hw *hw)
+{
+   struct clk_gfm *clk = to_clk_gfm(hw);
+
+   return readl(clk->gfm_mux) & GFM_MASK;
+}
+
+static int clk_gfm_set_parent(struct clk_hw *hw, u8 index)
+{
+   struct clk_gfm *clk = to_clk_gfm(hw);
+   unsigned int val;
+
+   val = readl(clk->gfm_mux);
+
+   if (index)
+   val |= GFM_MASK;
+   else
+   val &= ~GFM_MASK;
+
+   writel(val, clk->gfm_mux);
+
+   return 0;
+}
+
+static const struct clk_ops clk_gfm_ops = {
+   .get_parent = clk_gfm_get_parent,
+   .set_parent = clk_gfm_set_parent,
+   .determine_rate = __clk_mux_determine_rate,
+};
+
+static struct clk_gfm lpass_gfm_wsa_mclk = {
+   .mux_reg = 0x220d8,
+   .mux_mask = BIT(0),
+   .hw.init = &(struct clk_init_data) {
+   .name = "WSA_MCLK",
+   .ops = &clk_gfm_ops,
+   .flags = CLK_SET_RATE_PARENT | CLK_OPS_PARENT_ENABLE,
+   .parent_data = (const struct clk_parent_data[]){
+   {
+   .index = 0,
+   .name = "LPASS_CLK_ID_TX_CORE_MCLK",
+   }, {
+   .index = 1,
+   .name = "LPASS_CLK_ID_WSA_CORE_MCLK",
+   },
+   },
+   .num_parents = 2,
+   },
+};
+
+static struct clk_gfm lpass_gfm_wsa_npl = {
+   .mux_reg = 0x220d8,
+   .mux_mask = BIT(0),
+   .hw.init = &(struct clk_init_data) {
+   .name = "WSA_NPL",
+   .ops = &clk_gfm_ops,
+   .flags = CLK_SET_RATE_PARENT | CLK_OPS_PARENT_ENABLE,
+   .parent_data = (const struct clk_parent_data[]){
+   {
+   .index = 0,
+   .name = "LPASS_CLK_ID_TX_CORE_NPL_MCLK",
+   }, {
+   .index = 1,
+   .name = "LPASS_CLK_ID_WSA_CORE_NPL_MCLK",
+   },
+   },
+   .num_parents = 2,
+   },
+};
+
+static struct clk_gfm lpass_gfm_rx_mclk_mclk2 = {
+   .mux_reg = 0x240d8,
+   .mux_mask = BIT(0),
+   .hw.init = &(struct clk_init_data) {
+   .name = "RX_MCLK_MCLK2",
+   .ops = &clk_gfm_ops,
+   .flags = CLK_SET_RATE_PARENT | CLK_OPS_PARENT_ENABLE,
+   .parent_data = (const struct clk_parent_data[]){
+   {
+   .index = 0,
+

[PATCH v2 1/4] dt-bindings: clock: Add support for LPASS Audio Clock Controller

2020-09-25 Thread Srinivas Kandagatla

Audio Clock controller is a block inside LPASS which controls
2 Glitch free muxes to LPASS codec Macros.

Signed-off-by: Srinivas Kandagatla 
---
 .../bindings/clock/qcom,audiocc-sm8250.yaml   | 58 +++
 .../clock/qcom,sm8250-lpass-audiocc.h | 13 +
 2 files changed, 71 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/qcom,audiocc-sm8250.yaml
 create mode 100644 include/dt-bindings/clock/qcom,sm8250-lpass-audiocc.h

diff --git a/Documentation/devicetree/bindings/clock/qcom,audiocc-sm8250.yaml 
b/Documentation/devicetree/bindings/clock/qcom,audiocc-sm8250.yaml
new file mode 100644
index ..915d76206ad0
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/qcom,audiocc-sm8250.yaml
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/qcom,audiocc-sm8250.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Clock bindings for LPASS Audio Clock Controller on SM8250 SoCs
+
+maintainers:
+  - Srinivas Kandagatla 
+
+description: |
+  The clock consumer should specify the desired clock by having the clock
+  ID in its "clocks" phandle cell.
+  See include/dt-bindings/clock/qcom,sm8250-lpass-audiocc.h for the full list
+  of Audio Clock controller clock IDs.
+
+properties:
+  compatible:
+const: qcom,sm8250-lpass-audiocc
+
+  reg:
+maxItems: 1
+
+  '#clock-cells':
+const: 1
+
+  clocks:
+items:
+  - description: LPASS Core voting clock
+  - description: Glitch Free Mux register clock
+
+  clock-names:
+items:
+  - const: core
+  - const: bus
+
+required:
+  - compatible
+  - reg
+  - '#clock-cells'
+  - clocks
+  - clock-names
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+clock-controller@330 {
+  #clock-cells = <1>;
+  compatible = "qcom,sm8250-lpass-audiocc";
+  reg = <0x0330 0x3>;
+  clocks = <&q6afecc LPASS_HW_MACRO_VOTE LPASS_CLK_ATTRIBUTE_COUPLE_NO>,
+   <&q6afecc LPASS_CLK_ID_TX_CORE_MCLK 
LPASS_CLK_ATTRIBUTE_COUPLE_NO>;
+  clock-names = "core", "bus";
+};
diff --git a/include/dt-bindings/clock/qcom,sm8250-lpass-audiocc.h 
b/include/dt-bindings/clock/qcom,sm8250-lpass-audiocc.h
new file mode 100644
index ..a1aa6cb5d840
--- /dev/null
+++ b/include/dt-bindings/clock/qcom,sm8250-lpass-audiocc.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _DT_BINDINGS_CLK_LPASS_AUDIOCC_SM8250_H
+#define _DT_BINDINGS_CLK_LPASS_AUDIOCC_SM8250_H
+
+/* From AudioCC */
+#define LPASS_CDC_WSA_NPL  0
+#define LPASS_CDC_WSA_MCLK 1
+#define LPASS_CDC_RX_MCLK  2
+#define LPASS_CDC_RX_NPL   3
+#define LPASS_CDC_RX_MCLK_MCLK24
+
+#endif /* _DT_BINDINGS_CLK_LPASS_AUDIOCC_SM8250_H */
-- 
2.21.0

Re: KMSAN: uninit-value in udf_evict_inode

2020-09-25 Thread Jan Kara

On Thu 24-09-20 22:18:21, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:c5a13b33 kmsan: clang-format core
> git tree:   https://github.com/google/kmsan.git master
> console output: https://syzkaller.appspot.com/x/log.txt?x=142cf80990
> kernel config:  https://syzkaller.appspot.com/x/.config?x=20f149ad694ba4be
> dashboard link: https://syzkaller.appspot.com/bug?extid=91f02b28f9bb5f5f1341
> compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
> c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> userspace arch: i386
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+91f02b28f9bb5f5f1...@syzkaller.appspotmail.com
> 
> UDF-fs: warning (device loop3): udf_load_vrs: No anchor found
> UDF-fs: Scanning with blocksize 512 failed
> UDF-fs: INFO Mounting volume 'LinuxUDF', timestamp 2020/09/19 18:44 (1000)
> UDF-fs: error (device loop3): udf_read_inode: (ino 1328) failed !bh
> =
> BUG: KMSAN: uninit-value in udf_evict_inode+0x382/0x7d0 fs/udf/inode.c:150

Yeah, easy enough. I'll send a fix.

Honza
-- 
Jan Kara 
SUSE Labs, CR

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1475 matches

Mail list logo