date:20231204

Re: [PATCH v3 2/7] kexec_file: print out debugging message if required

2023-12-04 Thread Baoquan He

On 11/29/23 at 06:52pm, Joe Perches wrote:
> On Thu, 2023-11-30 at 10:39 +0800, Baoquan He wrote:
> > Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
> > loading related codes.
> 
> trivia:
> 
> > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> []
> > @@ -551,9 +551,12 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, 
> > int need_kernel_map,
> > phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
> > phdr->p_align = 0;
> > ehdr->e_phnum++;
> > -   pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, 
> > paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
> > -   phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
> > -   ehdr->e_phnum, phdr->p_offset);
> > +#ifdef CONFIG_KEXEC_FILE
> > +   kexec_dprintk("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, 
> > paddr=0x%llx, "
> > + "sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
> > + phdr, phdr->p_vaddr, phdr->p_paddr, 
> > phdr->p_filesz,
> > + ehdr->e_phnum, phdr->p_offset);
> > +#endif
> 
> Perhaps run checkpatch and coalesce the format string.

Sorry for being late to reply, this comment was missed.

I broke it into two because it's a too long line and not friendly for
reading. I did notice there's such line breaking code. I can change it
if it's not suggested. Thanks for careful checking.

Re: [PATCH v2] powerpc/book3s/hash: Drop _PAGE_PRIVILEGED from PAGE_NONE

2023-12-04 Thread IBM

Michael Ellerman  writes:

> "Aneesh Kumar K.V"  writes:
>> There used to be a dependency on _PAGE_PRIVILEGED with pte_savedwrite.
>> But that got dropped by
>> commit 6a56ccbcf6c6 ("mm/autonuma: use can_change_(pte|pmd)_writable() to 
>> replace savedwrite")
>>
>> With the change in this patch numa fault pte (pte_protnone()) gets mapped as 
>> regular user pte
>> with RWX cleared (no-access) whereas earlier it used to be mapped 
>> _PAGE_PRIVILEGED.
>>
>> Hash fault handling code did get some WARN_ON added because those
>> functions are not expected to get called with _PAGE_READ cleared.
>> commit 18061c17c8ec ("powerpc/mm: Update PROTFAULT handling in the page 
>> fault path")
>> explains the details.
>  
> You say "did get" which makes me think you're talking about the past.
> But I think you're referring to the WARN_ON you are adding in this patch?

That is correct. Will update this as "Hash fault handing code gets some
WARN_ON added in this patch ..." ?
>

>
>> Also revert commit 1abce0580b89 ("powerpc/64s: Fix __pte_needs_flush() false 
>> positive warning")
>
> That could be done separately as a follow-up couldn't it? Would reduce
> the diff size.
>

Will split that to a separate patch.


>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h  | 9 +++--
>>  arch/powerpc/include/asm/book3s/64/tlbflush.h | 9 ++---
>>  arch/powerpc/mm/book3s64/hash_utils.c | 7 +++
>>  3 files changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index cb77eddca54b..2cc58ac74080 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -17,12 +17,6 @@
>>  #define _PAGE_EXEC  0x1 /* execute permission */
>>  #define _PAGE_WRITE 0x2 /* write access allowed */
>>  #define _PAGE_READ  0x4 /* read access allowed */
>> -#define _PAGE_NA_PAGE_PRIVILEGED
>  
>> -#define _PAGE_NAX   _PAGE_EXEC
>> -#define _PAGE_RO_PAGE_READ
>> -#define _PAGE_ROX   (_PAGE_READ | _PAGE_EXEC)
>> -#define _PAGE_RW(_PAGE_READ | _PAGE_WRITE)
>> -#define _PAGE_RWX   (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>  
> Those are unrelated I think?
>

If we don't require _PAGE_NA we can fallback to generic version.


>>  #define _PAGE_PRIVILEGED0x8 /* kernel access only */
>>  #define _PAGE_SAO   0x00010 /* Strong access order */
>>  #define _PAGE_NON_IDEMPOTENT0x00020 /* non idempotent memory */
>> @@ -529,6 +523,9 @@ static inline bool pte_user(pte_t pte)
>>  }
>>  
>>  #define pte_access_permitted pte_access_permitted
>> +/*
>> + * execute-only mappings return false
>> + */
>
> That would fit better in the existing comment block inside the function
> I think. Normally this location would be a function description comment.
>

Will move.

>>  static inline bool pte_access_permitted(pte_t pte, bool write)
>>  {
>>  /*
>   ie. here
>
> cheers

Thanks
-aneesh

[PATCH v2 1/2] powerpc/book3s/hash: Drop _PAGE_PRIVILEGED from PAGE_NONE

2023-12-04 Thread aneesh . kumar

From: "Aneesh Kumar K.V (IBM)" 

There used to be a dependency on _PAGE_PRIVILEGED with pte_savedwrite.
But that got dropped by
commit 6a56ccbcf6c6 ("mm/autonuma: use can_change_(pte|pmd)_writable() to 
replace savedwrite")

With the change in this patch numa fault pte (pte_protnone()) gets mapped as 
regular user pte
with RWX cleared (no-access) whereas earlier it used to be mapped 
_PAGE_PRIVILEGED.

Hash fault handling code gets some WARN_ON added in this patch because
those functions are not expected to get called with _PAGE_READ cleared.
commit 18061c17c8ec ("powerpc/mm: Update PROTFAULT handling in the page
fault path") explains the details.

Signed-off-by: Aneesh Kumar K.V (IBM) 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 ++
 arch/powerpc/mm/book3s64/hash_utils.c|  7 +++
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cb77eddca54b..927d585652bc 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -17,12 +17,6 @@
 #define _PAGE_EXEC 0x1 /* execute permission */
 #define _PAGE_WRITE0x2 /* write access allowed */
 #define _PAGE_READ 0x4 /* read access allowed */
-#define _PAGE_NA   _PAGE_PRIVILEGED
-#define _PAGE_NAX  _PAGE_EXEC
-#define _PAGE_RO   _PAGE_READ
-#define _PAGE_ROX  (_PAGE_READ | _PAGE_EXEC)
-#define _PAGE_RW   (_PAGE_READ | _PAGE_WRITE)
-#define _PAGE_RWX  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 #define _PAGE_PRIVILEGED   0x8 /* kernel access only */
 #define _PAGE_SAO  0x00010 /* Strong access order */
 #define _PAGE_NON_IDEMPOTENT   0x00020 /* non idempotent memory */
@@ -532,8 +526,8 @@ static inline bool pte_user(pte_t pte)
 static inline bool pte_access_permitted(pte_t pte, bool write)
 {
/*
-* _PAGE_READ is needed for any access and will be
-* cleared for PROT_NONE
+* _PAGE_READ is needed for any access and will be cleared for
+* PROT_NONE. Execute-only mapping via PROT_EXEC also returns false.
 */
if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte))
return false;
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index ad2afa08e62e..0626a25b0d72 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -310,9 +310,16 @@ unsigned long htab_convert_pte_flags(unsigned long 
pteflags, unsigned long flags
else
rflags |= 0x3;
}
+   VM_WARN_ONCE(!(pteflags & _PAGE_RWX), "no-access mapping 
request");
} else {
if (pteflags & _PAGE_RWX)
rflags |= 0x2;
+   /*
+* We should never hit this in normal fault handling because
+* a permission check (check_pte_access()) will bubble this
+* to higher level linux handler even for PAGE_NONE.
+*/
+   VM_WARN_ONCE(!(pteflags & _PAGE_RWX), "no-access mapping 
request");
if (!((pteflags & _PAGE_WRITE) && (pteflags & _PAGE_DIRTY)))
rflags |= 0x1;
}
-- 
2.43.0

[PATCH v2 2/2] powerpc/book3s64: Avoid __pte_protnone() check in __pte_flags_need_flush()

2023-12-04 Thread aneesh . kumar

From: "Aneesh Kumar K.V (IBM)" 

This reverts commit 1abce0580b89 ("powerpc/64s: Fix __pte_needs_flush()
false positive warning")

The previous patch dropped the usage of _PAGE_PRIVILEGED with PAGE_NONE.
Hence this check can be dropped.

Signed-off-by: Aneesh Kumar K.V (IBM) 
---
 arch/powerpc/include/asm/book3s/64/tlbflush.h | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 1950c1b825b4..fd642b729775 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -158,11 +158,6 @@ static inline void flush_tlb_fix_spurious_fault(struct 
vm_area_struct *vma,
 */
 }
 
-static inline bool __pte_protnone(unsigned long pte)
-{
-   return (pte & (pgprot_val(PAGE_NONE) | _PAGE_RWX)) == 
pgprot_val(PAGE_NONE);
-}
-
 static inline bool __pte_flags_need_flush(unsigned long oldval,
  unsigned long newval)
 {
@@ -179,8 +174,8 @@ static inline bool __pte_flags_need_flush(unsigned long 
oldval,
/*
 * We do not expect kernel mappings or non-PTEs or not-present PTEs.
 */
-   VM_WARN_ON_ONCE(!__pte_protnone(oldval) && oldval & _PAGE_PRIVILEGED);
-   VM_WARN_ON_ONCE(!__pte_protnone(newval) && newval & _PAGE_PRIVILEGED);
+   VM_WARN_ON_ONCE(oldval & _PAGE_PRIVILEGED);
+   VM_WARN_ON_ONCE(newval & _PAGE_PRIVILEGED);
VM_WARN_ON_ONCE(!(oldval & _PAGE_PTE));
VM_WARN_ON_ONCE(!(newval & _PAGE_PTE));
VM_WARN_ON_ONCE(!(oldval & _PAGE_PRESENT));
-- 
2.43.0

Re: [PATCH] cxl: Fix null pointer dereference in cxl_get_fd

2023-12-04 Thread Frederic Barrat





On 04/12/2023 03:07, Kunwu Chan wrote:

kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.

Fixes: bdecf76e319a ("cxl: Fix coredump generation when cxl_get_fd() is used")
Signed-off-by: Kunwu Chan 
---
  drivers/misc/cxl/api.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index d85c56530863..bfd7ccd4d7e1 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -419,6 +419,10 @@ struct file *cxl_get_fd(struct cxl_context *ctx, struct 
file_operations *fops,
fops = (struct file_operations *)&afu_fops;
  
  	name = kasprintf(GFP_KERNEL, "cxl:%d", ctx->pe);

+   if (!name) {
+   put_unused_fd(fdtmp);
+   return ERR_PTR(-ENOMEM);
+   }



That works, but you might as well follow the existing error path:

name = kasprintf(GFP_KERNEL, "cxl:%d", ctx->pe);
if (!name)
goto err_fd;

  Fred



file = cxl_getfile(name, fops, ctx, flags);
kfree(name);
if (IS_ERR(file))

Re: [PATCH v5 2/4] PCI: layerscape: Add suspend/resume for ls1021a

2023-12-04 Thread Manivannan Sadhasivam

On Fri, Dec 01, 2023 at 11:17:10AM -0500, Frank Li wrote:
> Add suspend/resume support for Layerscape LS1021a.
> 
> In the suspend path, PME_Turn_Off message is sent to the endpoint to
> transition the link to L2/L3_Ready state. In this SoC, there is no way to
> check if the controller has received the PME_To_Ack from the endpoint or
> not. So to be on the safer side, the driver just waits for
> PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> bit to complete the PME_Turn_Off handshake. Then the link would enter L2/L3
> state depending on the VAUX supply.
> 
> In the resume path, the link is brought back from L2 to L0 by doing a
> software reset.
> 
> Signed-off-by: Frank Li 

One comment below. With that addressed,

Reviewed-by: Manivannan Sadhasivam 

> ---
> 
> Notes:
> Change from v4 to v5
> - update comit message
> - remove a empty line
> - use comments
> /* Reset the PEX wrapper to bring the link out of L2 */
> - pci->pp.ops = pcie->drvdata->ops,
> ls_pcie_host_ops to the "ops" member of layerscape_drvdata.
> - don't set pcie->scfg = NULL at error path
> 
> Change from v3 to v4
> - update commit message.
> - it is reset a glue logic part for PCI controller.
> - use regmap_write_bits() to reduce code change.
> 
> Change from v2 to v3
> - update according to mani's feedback
> change from v1 to v2
> - change subject 'a' to 'A'
> 
>  drivers/pci/controller/dwc/pci-layerscape.c | 81 -
>  1 file changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index aea89926bcc4f..8bdaae9be7d56 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -35,11 +35,19 @@
>  #define PF_MCR_PTOMR BIT(0)
>  #define PF_MCR_EXL2S BIT(1)
>  
> +/* LS1021A PEXn PM Write Control Register */
> +#define SCFG_PEXPMWRCR(idx)  (0x5c + (idx) * 0x64)
> +#define PMXMTTURNOFF BIT(31)
> +#define SCFG_PEXSFTRSTCR 0x190
> +#define PEXSR(idx)   BIT(idx)
> +
>  #define PCIE_IATU_NUM6
>  
>  struct ls_pcie_drvdata {
>   const u32 pf_off;
> + const struct dw_pcie_host_ops *ops;
>   int (*exit_from_l2)(struct dw_pcie_rp *pp);
> + bool scfg_support;
>   bool pm_support;
>  };
>  
> @@ -47,6 +55,8 @@ struct ls_pcie {
>   struct dw_pcie *pci;
>   const struct ls_pcie_drvdata *drvdata;
>   void __iomem *pf_base;
> + struct regmap *scfg;
> + int index;
>   bool big_endian;
>  };
>  
> @@ -171,18 +181,70 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp)
>   return 0;
>  }
>  
> +static void scfg_pcie_send_turnoff_msg(struct regmap *scfg, u32 reg, u32 
> mask)
> +{
> + /* Send PME_Turn_Off message */
> + regmap_write_bits(scfg, reg, mask, mask);
> +
> + /*
> +  * There is no specific register to check for PME_To_Ack from endpoint.
> +  * So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US.
> +  */
> + mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000);
> +
> + /*
> +  * Layerscape hardware reference manual recommends clearing the 
> PMXMTTURNOFF bit
> +  * to complete the PME_Turn_Off handshake.
> +  */
> + regmap_write_bits(scfg, reg, mask, 0);
> +}
> +
> +static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> +
> + scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), 
> PMXMTTURNOFF);
> +}
> +
> +static int scfg_pcie_exit_from_l2(struct regmap *scfg, u32 reg, u32 mask)
> +{
> + /* Reset the PEX wrapper to bring the link out of L2 */
> + regmap_write_bits(scfg, reg, mask, mask);
> + regmap_write_bits(scfg, reg, mask, 0);
> +
> + return 0;
> +}
> +
> +static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> +
> + return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> PEXSR(pcie->index));
> +}
> +
>  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
>   .host_init = ls_pcie_host_init,
>   .pme_turn_off = ls_pcie_send_turnoff_msg,
>  };
>  
> +static const struct dw_pcie_host_ops ls1021a_pcie_host_ops = {
> + .host_init = ls_pcie_host_init,
> + .pme_turn_off = ls1021a_pcie_send_turnoff_msg,
> +};
> +
>  static const struct ls_pcie_drvdata ls1021a_drvdata = {
> - .pm_support = false,
> + .pm_support = true,
> + .scfg_support = true,
> + .ops = &ls1021a_pcie_host_ops,
> + .exit_from_l2 = ls1021a_pcie_exit_from_l2,
>  };
>  
>  static const struct ls_pcie_drvdata layerscape_drvdata = {
>   .pf_off = 0xc,
>   .pm_support = true,
> + .ops = &ls_pcie_host_ops;
>   .exit_from_l2 = ls_pcie_exit_from_l2,
>  };
>

Re: [PATCH v5 3/4] PCI: layerscape(ep): Rename pf_* as pf_lut_*

2023-12-04 Thread Manivannan Sadhasivam

On Fri, Dec 01, 2023 at 11:17:11AM -0500, Frank Li wrote:
> 'pf' and 'lut' is just difference name in difference chips, but basic it is
> a MMIO base address plus an offset.
> 
> Rename it to avoid duplicate pf_* and lut_* in driver.
> 
> Signed-off-by: Frank Li 

One comment below. With that addressed,

Reviewed-by: Manivannan Sadhasivam 

> ---
> 
> Notes:
> pf_lut is better than pf_* or lut* because some chip use 'pf', some chip
> use 'lut'.
> 
> Change from v4 to v5
> - rename layerscape-ep code also
> change from v1 to v4
> - new patch at v3
> 
>  .../pci/controller/dwc/pci-layerscape-ep.c| 16 -
>  drivers/pci/controller/dwc/pci-layerscape.c   | 36 +--
>  2 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index 3d3c50ef4b6ff..2ca339f938a86 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -49,7 +49,7 @@ struct ls_pcie_ep {
>   boolbig_endian;
>  };
>  
> -static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
> +static u32 ls_pcie_pf_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
>  {
>   struct dw_pcie *pci = pcie->pci;
>  
> @@ -59,7 +59,7 @@ static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
>   return ioread32(pci->dbi_base + offset);
>  }
>  
> -static void ls_lut_writel(struct ls_pcie_ep *pcie, u32 offset, u32 value)
> +static void ls_pcie_pf_lut_writel(struct ls_pcie_ep *pcie, u32 offset, u32 
> value)
>  {
>   struct dw_pcie *pci = pcie->pci;
>  
> @@ -76,8 +76,8 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
> *dev_id)
>   u32 val, cfg;
>   u8 offset;
>  
> - val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
> - ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
> + val = ls_pcie_pf_lut_readl(pcie, PEX_PF0_PME_MES_DR);
> + ls_pcie_pf_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
>  
>   if (!val)
>   return IRQ_NONE;
> @@ -96,9 +96,9 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
> *dev_id)
>   dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, pcie->lnkcap);
>   dw_pcie_dbi_ro_wr_dis(pci);
>  
> - cfg = ls_lut_readl(pcie, PEX_PF0_CONFIG);
> + cfg = ls_pcie_pf_lut_readl(pcie, PEX_PF0_CONFIG);
>   cfg |= PEX_PF0_CFG_READY;
> - ls_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
> + ls_pcie_pf_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
>   dw_pcie_ep_linkup(&pci->ep);
>  
>   dev_dbg(pci->dev, "Link up\n");
> @@ -130,10 +130,10 @@ static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep 
> *pcie,
>   }
>  
>   /* Enable interrupts */
> - val = ls_lut_readl(pcie, PEX_PF0_PME_MES_IER);
> + val = ls_pcie_pf_lut_readl(pcie, PEX_PF0_PME_MES_IER);
>   val |=  PEX_PF0_PME_MES_IER_LDDIE | PEX_PF0_PME_MES_IER_HRDIE |
>   PEX_PF0_PME_MES_IER_LUDIE;
> - ls_lut_writel(pcie, PEX_PF0_PME_MES_IER, val);
> + ls_pcie_pf_lut_writel(pcie, PEX_PF0_PME_MES_IER, val);
>  
>   return 0;
>  }
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index 8bdaae9be7d56..a9151e98fde6f 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -44,7 +44,7 @@
>  #define PCIE_IATU_NUM6
>  
>  struct ls_pcie_drvdata {
> - const u32 pf_off;
> + const u32 pf_lut_off;
>   const struct dw_pcie_host_ops *ops;
>   int (*exit_from_l2)(struct dw_pcie_rp *pp);
>   bool scfg_support;
> @@ -54,13 +54,13 @@ struct ls_pcie_drvdata {
>  struct ls_pcie {
>   struct dw_pcie *pci;
>   const struct ls_pcie_drvdata *drvdata;
> - void __iomem *pf_base;
> + void __iomem *pf_lut_base;
>   struct regmap *scfg;
>   int index;
>   bool big_endian;
>  };
>  
> -#define ls_pcie_pf_readl_addr(addr)  ls_pcie_pf_readl(pcie, addr)
> +#define ls_pcie_pf_lut_readl_addr(addr)  ls_pcie_pf_lut_readl(pcie, addr)
>  #define to_ls_pcie(x)dev_get_drvdata((x)->dev)
>  
>  static bool ls_pcie_is_bridge(struct ls_pcie *pcie)
> @@ -101,20 +101,20 @@ static void ls_pcie_fix_error_response(struct ls_pcie 
> *pcie)
>   iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
>  }
>  
> -static u32 ls_pcie_pf_readl(struct ls_pcie *pcie, u32 off)
> +static u32 ls_pcie_pf_lut_readl(struct ls_pcie *pcie, u32 off)
>  {
>   if (pcie->big_endian)
> - return ioread32be(pcie->pf_base + off);
> + return ioread32be(pcie->pf_lut_base + off);
>  
> - return ioread32(pcie->pf_base + off);
> + return ioread32(pcie->pf_lut_base + off);
>  }
>  
> -static void ls_pcie_pf_writel(struct ls_pcie *pcie, u32 off, u32 val)
> +static void ls_pcie_pf_lut_writel(struct ls_

Re: [PATCH v5 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-12-04 Thread Manivannan Sadhasivam

On Fri, Dec 01, 2023 at 11:17:12AM -0500, Frank Li wrote:
> Add suspend/resume support for Layerscape LS1043a.
> 
> In the suspend path, PME_Turn_Off message is sent to the endpoint to
> transition the link to L2/L3_Ready state. In this SoC, there is no way to
> check if the controller has received the PME_To_Ack from the endpoint or
> not. So to be on the safer side, the driver just waits for
> PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> bit to complete the PME_Turn_Off handshake. Then the link would enter L2/L3
> state depending on the VAUX supply.
> 
> In the resume path, the link is brought back from L2 to L0 by doing a
> software reset.
> 
> Signed-off-by: Frank Li 

Reviewed-by: Manivannan Sadhasivam 

- Mani

> ---
> 
> Notes:
> Change from v4 to v5
> - update commit message
> - use comments
> /* Reset the PEX wrapper to bring the link out of L2 */
> 
> Change from v3 to v4
> - Call scfg_pcie_send_turnoff_msg() shared with ls1021a
> - update commit message
> 
> Change from v2 to v3
> - Remove ls_pcie_lut_readl(writel) function
> 
> Change from v1 to v2
> - Update subject 'a' to 'A'
> 
>  drivers/pci/controller/dwc/pci-layerscape.c | 63 -
>  1 file changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index a9151e98fde6f..715365e91f8ef 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -41,6 +41,15 @@
>  #define SCFG_PEXSFTRSTCR 0x190
>  #define PEXSR(idx)   BIT(idx)
>  
> +/* LS1043A PEX PME control register */
> +#define SCFG_PEXPMECR0x144
> +#define PEXPME(idx)  BIT(31 - (idx) * 4)
> +
> +/* LS1043A PEX LUT debug register */
> +#define LS_PCIE_LDBG 0x7fc
> +#define LDBG_SR  BIT(30)
> +#define LDBG_WE  BIT(31)
> +
>  #define PCIE_IATU_NUM6
>  
>  struct ls_pcie_drvdata {
> @@ -224,6 +233,45 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp 
> *pp)
>   return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> PEXSR(pcie->index));
>  }
>  
> +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> +
> + scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
> PEXPME(pcie->index));
> +}
> +
> +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> + u32 val;
> +
> + /*
> +  * Reset the PEX wrapper to bring the link out of L2.
> +  * LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
> both setting and
> +  *  clearing the soft reset on the PEX module.
> +  * LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
> +  */
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val |= LDBG_WE;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val |= LDBG_SR;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val &= ~LDBG_SR;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val &= ~LDBG_WE;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + return 0;
> +}
> +
>  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
>   .host_init = ls_pcie_host_init,
>   .pme_turn_off = ls_pcie_send_turnoff_msg,
> @@ -241,6 +289,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
>   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
>  };
>  
> +static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
> + .host_init = ls_pcie_host_init,
> + .pme_turn_off = ls1043a_pcie_send_turnoff_msg,
> +};
> +
> +static const struct ls_pcie_drvdata ls1043a_drvdata = {
> + .pf_lut_off = 0x1,
> + .pm_support = true,
> + .scfg_support = true,
> + .ops = &ls1043a_pcie_host_ops,
> + .exit_from_l2 = ls1043a_pcie_exit_from_l2,
> +};
> +
>  static const struct ls_pcie_drvdata layerscape_drvdata = {
>   .pf_lut_off = 0xc,
>   .pm_support = true,
> @@ -252,7 +313,7 @@ static const struct of_device_id ls_pcie_of_match[] = {
>   { .compatible = "fsl,ls1012a-pcie", .data = &layerscape_drvdata },
>   { .compatible = "fsl,ls1021a-pcie", .data = &ls1021a_drvdata },
>   { .compatible = "fsl,ls1028a-pcie", .data = &layerscape_drvdata },
> - { .compatible = "fsl,ls1043a-pcie", .data = &ls1021a_drvdata },
> + { .compatible = "fsl,ls1043a-pcie", .data = &ls1043a_drvdata },
>   { .compatible = "fsl,ls1046a-pcie", .data = &layerscape_drvdata },
>   { .compatible = "fsl,

Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-12-04 Thread Ryan Roberts

On 03/12/2023 13:33, Christophe Leroy wrote:
> 
> 
> Le 30/11/2023 à 22:30, Peter Xu a écrit :
>> On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
>>> On Fri, Nov 24, 2023 at 09:06:01AM +, Ryan Roberts wrote:
 I don't have any micro-benchmarks for GUP though, if that's your question. 
 Is
 there an easy-to-use test I can run to get some numbers? I'd be happy to 
 try it out.
>>>
>>> Thanks Ryan.  Then nothing is needed to be tested if gup is not yet touched
>>> from your side, afaict.  I'll see whether I can provide some rough numbers
>>> instead in the next post (I'll probably only be able to test it in a VM,
>>> though, but hopefully that should still reflect mostly the truth).
>>
>> An update: I finished a round of 64K cont_pte test, in the slow gup micro
>> benchmark I see ~15% perf degrade with this patchset applied on a VM on top
>> of Apple M1.
>>
>> Frankly that's even less than I expected, considering not only how slow gup
>> THP used to be, but also on the fact that that's a tight loop over slow
>> gup, which in normal cases shouldn't happen: "present" ptes normally goes
>> to fast-gup, while !present goes into a fault following it.  I assume
>> that's why nobody cared slow gup for THP before.  I think adding cont_pte
>> support shouldn't be very hard, but that will include making cont_pte idea
>> global just for arm64 and riscv Svnapot.
> 
> Is there any documentation on what cont_pte is ? I have always wondered 
> if it could also fit powerpc 8xx need ?

pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the
"contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific
(AFAIK). The contiguous bit is a hint to the HW to tell it that a block of PTEs
are mapping a physically contiguous and naturally aligned piece of memory. The
HW can use this to coalesce entries in the TLB. When using 4K base pages, the
contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and for 64K
base pages, its 2M (32 PTEs).

> 
> On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4 
> PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB 
> misses the HW needs one entrie for each 4k fragment.

>From that description, it sounds like the SPS bit might be similar to arm64
contiguous bit? Although sounds like you are currently using it in a slightly
different way - telling kernel that the base page is 16K but mapping each 16K
page with 4x 4K entries (plus the SPS bit set)?

> 
> There is also a similar approach for 512k pages, we have 128 contiguous 
> identical PTEs for them.
> 
> And whatever PAGE_SIZE is (either 4k or 16k), the HW needs one 'unsigned 
> long' pte for each 4k fragment. So at the time being when we define 
> PAGE_SIZE as 16k, we need a special pte_t which is a table of 4x 
> unsigned long.
> 
> Wondering if the cont_pte concept is similar and whether it could help.

To be honest, while I understand pte_cont() and friends, I don't understand
their relevance (or at least potential future relevance) to GUP?

> 
> Thanks
> Christophe

Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-12-04 Thread Christophe Leroy



Le 04/12/2023 à 12:11, Ryan Roberts a écrit :
> On 03/12/2023 13:33, Christophe Leroy wrote:
>>
>>
>> Le 30/11/2023 à 22:30, Peter Xu a écrit :
>>> On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
 On Fri, Nov 24, 2023 at 09:06:01AM +, Ryan Roberts wrote:
> I don't have any micro-benchmarks for GUP though, if that's your 
> question. Is
> there an easy-to-use test I can run to get some numbers? I'd be happy to 
> try it out.

 Thanks Ryan.  Then nothing is needed to be tested if gup is not yet touched
 from your side, afaict.  I'll see whether I can provide some rough numbers
 instead in the next post (I'll probably only be able to test it in a VM,
 though, but hopefully that should still reflect mostly the truth).
>>>
>>> An update: I finished a round of 64K cont_pte test, in the slow gup micro
>>> benchmark I see ~15% perf degrade with this patchset applied on a VM on top
>>> of Apple M1.
>>>
>>> Frankly that's even less than I expected, considering not only how slow gup
>>> THP used to be, but also on the fact that that's a tight loop over slow
>>> gup, which in normal cases shouldn't happen: "present" ptes normally goes
>>> to fast-gup, while !present goes into a fault following it.  I assume
>>> that's why nobody cared slow gup for THP before.  I think adding cont_pte
>>> support shouldn't be very hard, but that will include making cont_pte idea
>>> global just for arm64 and riscv Svnapot.
>>
>> Is there any documentation on what cont_pte is ? I have always wondered
>> if it could also fit powerpc 8xx need ?
> 
> pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the
> "contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific
> (AFAIK). The contiguous bit is a hint to the HW to tell it that a block of 
> PTEs
> are mapping a physically contiguous and naturally aligned piece of memory. The
> HW can use this to coalesce entries in the TLB. When using 4K base pages, the
> contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and for 
> 64K
> base pages, its 2M (32 PTEs).
> 
>>
>> On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4
>> PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB
>> misses the HW needs one entrie for each 4k fragment.
> 
>  From that description, it sounds like the SPS bit might be similar to arm64
> contiguous bit? Although sounds like you are currently using it in a slightly
> different way - telling kernel that the base page is 16K but mapping each 16K
> page with 4x 4K entries (plus the SPS bit set)?

Yes it's both.

When the base page is 16k, there are 4x 4k entries (with SPS bit set) in 
the page table, and pte_t is a table of 4 'unsigned long'

When the base page is 4k, there is a 16k hugepage size, which is the 
same 4x 4k entries with SPS bit set.

So it looks similar to the contiguous bit.


And by extension, the same principle is used for 512k hugepages, the bit 
_PAGE_HUGE is copied by the TLB miss handler into the lower bit of PS, 
PS being as follows:
- 00 Small (4 Kbyte or 16 Kbyte)
- 01 512 Kbyte
- 10 Reserved
- 11 8 Mbyte

So as PMD size is 4M, 512k pages are 128 identical consecutive entries 
in the page table.

I which I could have THP with 16k or 512k pages.

Christophe

Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-12-04 Thread Ryan Roberts

On 04/12/2023 11:25, Christophe Leroy wrote:
> 
> 
> Le 04/12/2023 à 12:11, Ryan Roberts a écrit :
>> On 03/12/2023 13:33, Christophe Leroy wrote:
>>>
>>>
>>> Le 30/11/2023 à 22:30, Peter Xu a écrit :
 On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
> On Fri, Nov 24, 2023 at 09:06:01AM +, Ryan Roberts wrote:
>> I don't have any micro-benchmarks for GUP though, if that's your 
>> question. Is
>> there an easy-to-use test I can run to get some numbers? I'd be happy to 
>> try it out.
>
> Thanks Ryan.  Then nothing is needed to be tested if gup is not yet 
> touched
> from your side, afaict.  I'll see whether I can provide some rough numbers
> instead in the next post (I'll probably only be able to test it in a VM,
> though, but hopefully that should still reflect mostly the truth).

 An update: I finished a round of 64K cont_pte test, in the slow gup micro
 benchmark I see ~15% perf degrade with this patchset applied on a VM on top
 of Apple M1.

 Frankly that's even less than I expected, considering not only how slow gup
 THP used to be, but also on the fact that that's a tight loop over slow
 gup, which in normal cases shouldn't happen: "present" ptes normally goes
 to fast-gup, while !present goes into a fault following it.  I assume
 that's why nobody cared slow gup for THP before.  I think adding cont_pte
 support shouldn't be very hard, but that will include making cont_pte idea
 global just for arm64 and riscv Svnapot.
>>>
>>> Is there any documentation on what cont_pte is ? I have always wondered
>>> if it could also fit powerpc 8xx need ?
>>
>> pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the
>> "contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific
>> (AFAIK). The contiguous bit is a hint to the HW to tell it that a block of 
>> PTEs
>> are mapping a physically contiguous and naturally aligned piece of memory. 
>> The
>> HW can use this to coalesce entries in the TLB. When using 4K base pages, the
>> contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and for 
>> 64K
>> base pages, its 2M (32 PTEs).
>>
>>>
>>> On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4
>>> PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB
>>> misses the HW needs one entrie for each 4k fragment.
>>
>>  From that description, it sounds like the SPS bit might be similar to arm64
>> contiguous bit? Although sounds like you are currently using it in a slightly
>> different way - telling kernel that the base page is 16K but mapping each 16K
>> page with 4x 4K entries (plus the SPS bit set)?
> 
> Yes it's both.
> 
> When the base page is 16k, there are 4x 4k entries (with SPS bit set) in 
> the page table, and pte_t is a table of 4 'unsigned long'
> 
> When the base page is 4k, there is a 16k hugepage size, which is the 
> same 4x 4k entries with SPS bit set.
> 
> So it looks similar to the contiguous bit.
> 
> 
> And by extension, the same principle is used for 512k hugepages, the bit 
> _PAGE_HUGE is copied by the TLB miss handler into the lower bit of PS, 
> PS being as follows:
> - 00 Small (4 Kbyte or 16 Kbyte)
> - 01 512 Kbyte
> - 10 Reserved
> - 11 8 Mbyte
> 
> So as PMD size is 4M, 512k pages are 128 identical consecutive entries 
> in the page table.
> 
> I which I could have THP with 16k or 512k pages.

Then you have come to the right place! :)

https://lore.kernel.org/linux-mm/20231204102027.57185-1-ryan.robe...@arm.com/


> 
> Christophe

Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-12-04 Thread Christophe Leroy



Le 04/12/2023 à 12:46, Ryan Roberts a écrit :
> On 04/12/2023 11:25, Christophe Leroy wrote:
>>
>>
>> Le 04/12/2023 à 12:11, Ryan Roberts a écrit :
>>> On 03/12/2023 13:33, Christophe Leroy wrote:


 Le 30/11/2023 à 22:30, Peter Xu a écrit :
> On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
>> On Fri, Nov 24, 2023 at 09:06:01AM +, Ryan Roberts wrote:
>>> I don't have any micro-benchmarks for GUP though, if that's your 
>>> question. Is
>>> there an easy-to-use test I can run to get some numbers? I'd be happy 
>>> to try it out.
>>
>> Thanks Ryan.  Then nothing is needed to be tested if gup is not yet 
>> touched
>> from your side, afaict.  I'll see whether I can provide some rough 
>> numbers
>> instead in the next post (I'll probably only be able to test it in a VM,
>> though, but hopefully that should still reflect mostly the truth).
>
> An update: I finished a round of 64K cont_pte test, in the slow gup micro
> benchmark I see ~15% perf degrade with this patchset applied on a VM on 
> top
> of Apple M1.
>
> Frankly that's even less than I expected, considering not only how slow 
> gup
> THP used to be, but also on the fact that that's a tight loop over slow
> gup, which in normal cases shouldn't happen: "present" ptes normally goes
> to fast-gup, while !present goes into a fault following it.  I assume
> that's why nobody cared slow gup for THP before.  I think adding cont_pte
> support shouldn't be very hard, but that will include making cont_pte idea
> global just for arm64 and riscv Svnapot.

 Is there any documentation on what cont_pte is ? I have always wondered
 if it could also fit powerpc 8xx need ?
>>>
>>> pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the
>>> "contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific
>>> (AFAIK). The contiguous bit is a hint to the HW to tell it that a block of 
>>> PTEs
>>> are mapping a physically contiguous and naturally aligned piece of memory. 
>>> The
>>> HW can use this to coalesce entries in the TLB. When using 4K base pages, 
>>> the
>>> contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and 
>>> for 64K
>>> base pages, its 2M (32 PTEs).
>>>

 On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4
 PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB
 misses the HW needs one entrie for each 4k fragment.
>>>
>>>   From that description, it sounds like the SPS bit might be similar to 
>>> arm64
>>> contiguous bit? Although sounds like you are currently using it in a 
>>> slightly
>>> different way - telling kernel that the base page is 16K but mapping each 
>>> 16K
>>> page with 4x 4K entries (plus the SPS bit set)?
>>
>> Yes it's both.
>>
>> When the base page is 16k, there are 4x 4k entries (with SPS bit set) in
>> the page table, and pte_t is a table of 4 'unsigned long'
>>
>> When the base page is 4k, there is a 16k hugepage size, which is the
>> same 4x 4k entries with SPS bit set.
>>
>> So it looks similar to the contiguous bit.
>>
>>
>> And by extension, the same principle is used for 512k hugepages, the bit
>> _PAGE_HUGE is copied by the TLB miss handler into the lower bit of PS,
>> PS being as follows:
>> - 00 Small (4 Kbyte or 16 Kbyte)
>> - 01 512 Kbyte
>> - 10 Reserved
>> - 11 8 Mbyte
>>
>> So as PMD size is 4M, 512k pages are 128 identical consecutive entries
>> in the page table.
>>
>> I which I could have THP with 16k or 512k pages.
> 
> Then you have come to the right place! :)
> 
> https://lore.kernel.org/linux-mm/20231204102027.57185-1-ryan.robe...@arm.com/
> 

That looks great. That series only modifies core mm/ .
No changes needed in arch ? Will it work on powerpc without any 
change/additions to arch code ?

Well, I'll try it soon.

Christophe

Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-12-04 Thread Ryan Roberts

On 04/12/2023 11:57, Christophe Leroy wrote:
> 
> 
> Le 04/12/2023 à 12:46, Ryan Roberts a écrit :
>> On 04/12/2023 11:25, Christophe Leroy wrote:
>>>
>>>
>>> Le 04/12/2023 à 12:11, Ryan Roberts a écrit :
 On 03/12/2023 13:33, Christophe Leroy wrote:
>
>
> Le 30/11/2023 à 22:30, Peter Xu a écrit :
>> On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
>>> On Fri, Nov 24, 2023 at 09:06:01AM +, Ryan Roberts wrote:
 I don't have any micro-benchmarks for GUP though, if that's your 
 question. Is
 there an easy-to-use test I can run to get some numbers? I'd be happy 
 to try it out.
>>>
>>> Thanks Ryan.  Then nothing is needed to be tested if gup is not yet 
>>> touched
>>> from your side, afaict.  I'll see whether I can provide some rough 
>>> numbers
>>> instead in the next post (I'll probably only be able to test it in a VM,
>>> though, but hopefully that should still reflect mostly the truth).
>>
>> An update: I finished a round of 64K cont_pte test, in the slow gup micro
>> benchmark I see ~15% perf degrade with this patchset applied on a VM on 
>> top
>> of Apple M1.
>>
>> Frankly that's even less than I expected, considering not only how slow 
>> gup
>> THP used to be, but also on the fact that that's a tight loop over slow
>> gup, which in normal cases shouldn't happen: "present" ptes normally goes
>> to fast-gup, while !present goes into a fault following it.  I assume
>> that's why nobody cared slow gup for THP before.  I think adding cont_pte
>> support shouldn't be very hard, but that will include making cont_pte 
>> idea
>> global just for arm64 and riscv Svnapot.
>
> Is there any documentation on what cont_pte is ? I have always wondered
> if it could also fit powerpc 8xx need ?

 pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the
 "contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific
 (AFAIK). The contiguous bit is a hint to the HW to tell it that a block of 
 PTEs
 are mapping a physically contiguous and naturally aligned piece of memory. 
 The
 HW can use this to coalesce entries in the TLB. When using 4K base pages, 
 the
 contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and 
 for 64K
 base pages, its 2M (32 PTEs).

>
> On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4
> PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB
> misses the HW needs one entrie for each 4k fragment.

   From that description, it sounds like the SPS bit might be similar to 
 arm64
 contiguous bit? Although sounds like you are currently using it in a 
 slightly
 different way - telling kernel that the base page is 16K but mapping each 
 16K
 page with 4x 4K entries (plus the SPS bit set)?
>>>
>>> Yes it's both.
>>>
>>> When the base page is 16k, there are 4x 4k entries (with SPS bit set) in
>>> the page table, and pte_t is a table of 4 'unsigned long'
>>>
>>> When the base page is 4k, there is a 16k hugepage size, which is the
>>> same 4x 4k entries with SPS bit set.
>>>
>>> So it looks similar to the contiguous bit.
>>>
>>>
>>> And by extension, the same principle is used for 512k hugepages, the bit
>>> _PAGE_HUGE is copied by the TLB miss handler into the lower bit of PS,
>>> PS being as follows:
>>> - 00 Small (4 Kbyte or 16 Kbyte)
>>> - 01 512 Kbyte
>>> - 10 Reserved
>>> - 11 8 Mbyte
>>>
>>> So as PMD size is 4M, 512k pages are 128 identical consecutive entries
>>> in the page table.
>>>
>>> I which I could have THP with 16k or 512k pages.
>>
>> Then you have come to the right place! :)
>>
>> https://lore.kernel.org/linux-mm/20231204102027.57185-1-ryan.robe...@arm.com/
>>
> 
> That looks great. That series only modifies core mm/ .
> No changes needed in arch ? Will it work on powerpc without any 
> change/additions to arch code ?

Yes there are also changes needed in arch; I have a separate series for arm64,
which transparently manages the contiguous bit when it sees appropriate PTEs:

https://lore.kernel.org/linux-arm-kernel/20231204105440.61448-1-ryan.robe...@arm.com/

> 
> Well, I'll try it soon.
> 
> Christophe

Re: [PATCH v3] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

2023-12-04 Thread IBM

Haren Myneni  writes:

> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously.
>
> So instead of msleep(), use usleep_range() to ensure sleep with
> the expected value before issuing HCALL again.
>

Can you summarize if there an user observable impact for the current
code? We have other code paths using msleep(get_longbusy_msec()). Should
we audit those usages?


>
> Signed-off-by: Haren Myneni 
> Suggested-by: Nathan Lynch 
>
> ---
> v1 -> v2:
> - Use usleep_range instead of using RTAS sleep routine as
>   suggested by Nathan
> v2 -> v3:
> - Sleep 10MSecs even for HCALL delay > 10MSecs and the other
>   commit / comemnt changes as suggested by Nathan and Ellerman.
> ---
>  arch/powerpc/platforms/pseries/vas.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/vas.c 
> b/arch/powerpc/platforms/pseries/vas.c
> index 71d52a670d95..5cf81c564d4b 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -38,7 +38,30 @@ static long hcall_return_busy_check(long rc)
>  {
>   /* Check if we are stalled for some time */
>   if (H_IS_LONG_BUSY(rc)) {
> - msleep(get_longbusy_msecs(rc));
> + unsigned int ms;
> + /*
> +  * Allocate, Modify and Deallocate HCALLs returns
> +  * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
> +  * for the long delay. So the sleep time should always
> +  * be either 1 or 10msecs, but in case if the HCALL
> +  * returns the long delay > 10 msecs, clamp the sleep
> +  * time to 10msecs.
> +  */
> + ms = clamp(get_longbusy_msecs(rc), 1, 10);
> +
> + /*
> +  * msleep() will often sleep at least 20 msecs even
> +  * though the hypervisor suggests that the OS reissue
> +  * HCALLs after 1 or 10msecs. Also the delay hint from
> +  * the HCALL is just a suggestion. So OK to pause for
> +  * less time than the hinted delay. Use usleep_range()
> +  * to ensure we don't sleep much longer than actually
> +  * needed.
> +  *
> +  * See Documentation/timers/timers-howto.rst for
> +  * explanation of the range used here.
> +  */
> + usleep_range(ms * 100, ms * 1000);
>   rc = H_BUSY;
>   } else if (rc == H_BUSY) {
>   cond_resched();
> -- 
> 2.26.3

Re: [PATCH v3 5/7] kexec_file, ricv: print out debugging message if required

2023-12-04 Thread Baoquan He

On 12/01/23 at 10:38am, Conor Dooley wrote:
> On Thu, Nov 30, 2023 at 10:39:53AM +0800, Baoquan He wrote:
> 
> $subject has a typo in the arch bit :)

Indeed, will fix if need report. Thanks for careful checking.

> 
> > Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
> > loading related codes.
> 
> Commit messages should be understandable in isolation, but this only
> explains (part of) what is obvious in the diff. Why is this change
> being made?

The purpose has been detailedly described in cover letter and patch 1
log. Andrew has picked these patches into his tree and grabbed the cover
letter log into the relevant commit for people's later checking. All
these seven patches will be present in mainline together. This is common
way when posting patch series? Please let me know if I misunderstand
anything.
> 
> > 
> > And also remove kexec_image_info() because the content has been printed
> > out in generic code.
> > 
> > Signed-off-by: Baoquan He 
> > ---
> >  arch/riscv/kernel/elf_kexec.c | 11 ++-
> >  arch/riscv/kernel/machine_kexec.c | 26 --
> >  2 files changed, 6 insertions(+), 31 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
> > index e60fbd8660c4..5bd1ec3341fe 100644
> > --- a/arch/riscv/kernel/elf_kexec.c
> > +++ b/arch/riscv/kernel/elf_kexec.c
> > @@ -216,7 +216,6 @@ static void *elf_kexec_load(struct kimage *image, char 
> > *kernel_buf,
> > if (ret)
> > goto out;
> > kernel_start = image->start;
> > -   pr_notice("The entry point of kernel at 0x%lx\n", image->start);
> >  
> > /* Add the kernel binary to the image */
> > ret = riscv_kexec_elf_load(image, &ehdr, &elf_info,
> > @@ -252,8 +251,8 @@ static void *elf_kexec_load(struct kimage *image, char 
> > *kernel_buf,
> > image->elf_load_addr = kbuf.mem;
> > image->elf_headers_sz = headers_sz;
> >  
> > -   pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx 
> > memsz=0x%lx\n",
> > -image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
> > +   kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx 
> > memsz=0x%lx\n",
> > + image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
> >  
> > /* Setup cmdline for kdump kernel case */
> > modified_cmdline = setup_kdump_cmdline(image, cmdline,
> > @@ -275,6 +274,8 @@ static void *elf_kexec_load(struct kimage *image, char 
> > *kernel_buf,
> > pr_err("Error loading purgatory ret=%d\n", ret);
> > goto out;
> > }
> > +   kexec_dprintk("Loaded purgatory at 0x%lx\n", kbuf.mem);
> > +
> > ret = kexec_purgatory_get_set_symbol(image, "riscv_kernel_entry",
> >  &kernel_start,
> >  sizeof(kernel_start), 0);
> > @@ -293,7 +294,7 @@ static void *elf_kexec_load(struct kimage *image, char 
> > *kernel_buf,
> > if (ret)
> > goto out;
> > initrd_pbase = kbuf.mem;
> 
> > -   pr_notice("Loaded initrd at 0x%lx\n", initrd_pbase);
> > +   kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_pbase);
> 
> This is not a pr_debug().
> 
> > }
> >  
> > /* Add the DTB to the image */
> > @@ -318,7 +319,7 @@ static void *elf_kexec_load(struct kimage *image, char 
> > *kernel_buf,
> > }
> > /* Cache the fdt buffer address for memory cleanup */
> > image->arch.fdt = fdt;
> 
> > -   pr_notice("Loaded device tree at 0x%lx\n", kbuf.mem);
> > +   kexec_dprintk("Loaded device tree at 0x%lx\n", kbuf.mem);
> 
> Neither is this. Why are they being moved from pr_notice()?

You are right. 

While always printing out the loaded location of purgatory and
device tree doesn't make sense. It will be confusing when users
see these even when they do normal kexec/kdump loading. It should be
changed to pr_debug().

Which way do you suggest?
1) change it back to pr_debug(), fix it in another patch;
2) keep it as is in the patch;

Thanks
Baoquan

[PATCH v6 0/4] PCI: layerscape: Add suspend/resume support for ls1043 and ls1021

2023-12-04 Thread Frank Li

Add suspend/resume support for ls1043 and ls1021.

Change log see each patch

Frank Li (4):
  PCI: layerscape: Add function pointer for exit_from_l2()
  PCI: layerscape: Add suspend/resume for ls1021a
  PCI: layerscape(ep): Rename pf_* as pf_lut_*
  PCI: layerscape: Add suspend/resume for ls1043a

 .../pci/controller/dwc/pci-layerscape-ep.c|  16 +-
 drivers/pci/controller/dwc/pci-layerscape.c   | 191 +++---
 2 files changed, 176 insertions(+), 31 deletions(-)

-- 
2.34.1

[PATCH v6 1/4] PCI: layerscape: Add function pointer for exit_from_l2()

2023-12-04 Thread Frank Li

Since difference SoCs require different sequence for exiting L2, let's add
a separate "exit_from_l2()" callback. This callback can be used to execute
SoC specific sequence.

Change ls_pcie_exit_from_l2() return value from void to int. Return error
if exit_from_l2() failure at exit resume flow.

Acked-by: Roy Zang 
Reviewed-by: Manivannan Sadhasivam 
Signed-off-by: Frank Li 
---

Notes:
Change from v4 to v6
- none
Change from v3 to v4
- update commit message
  Add mani's review by tag
Change from v2 to v3
- fixed according to mani's feedback
  1. update commit message
  2. move dw_pcie_host_ops to next patch
  3. check return value from exit_from_l2()
Change from v1 to v2
- change subject 'a' to 'A'

Change from v1 to v2
- change subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 37956e09c65bd..aea89926bcc4f 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -39,6 +39,7 @@
 
 struct ls_pcie_drvdata {
const u32 pf_off;
+   int (*exit_from_l2)(struct dw_pcie_rp *pp);
bool pm_support;
 };
 
@@ -125,7 +126,7 @@ static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
dev_err(pcie->pci->dev, "PME_Turn_off timeout\n");
 }
 
-static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+static int ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
 {
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
struct ls_pcie *pcie = to_ls_pcie(pci);
@@ -150,6 +151,8 @@ static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
 1);
if (ret)
dev_err(pcie->pci->dev, "L2 exit timeout\n");
+
+   return ret;
 }
 
 static int ls_pcie_host_init(struct dw_pcie_rp *pp)
@@ -180,6 +183,7 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_off = 0xc,
.pm_support = true,
+   .exit_from_l2 = ls_pcie_exit_from_l2,
 };
 
 static const struct of_device_id ls_pcie_of_match[] = {
@@ -247,11 +251,14 @@ static int ls_pcie_suspend_noirq(struct device *dev)
 static int ls_pcie_resume_noirq(struct device *dev)
 {
struct ls_pcie *pcie = dev_get_drvdata(dev);
+   int ret;
 
if (!pcie->drvdata->pm_support)
return 0;
 
-   ls_pcie_exit_from_l2(&pcie->pci->pp);
+   ret = pcie->drvdata->exit_from_l2(&pcie->pci->pp);
+   if (ret)
+   return ret;
 
return dw_pcie_resume_noirq(pcie->pci);
 }
-- 
2.34.1

[PATCH v6 2/4] PCI: layerscape: Add suspend/resume for ls1021a

2023-12-04 Thread Frank Li

Add suspend/resume support for Layerscape LS1021a.

In the suspend path, PME_Turn_Off message is sent to the endpoint to
transition the link to L2/L3_Ready state. In this SoC, there is no way to
check if the controller has received the PME_To_Ack from the endpoint or
not. So to be on the safer side, the driver just waits for
PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
bit to complete the PME_Turn_Off handshake. Then the link would enter L2/L3
state depending on the VAUX supply.

In the resume path, the link is brought back from L2 to L0 by doing a
software reset.

Acked-by: Roy Zang 
Reviewed-by: Manivannan Sadhasivam 
Signed-off-by: Frank Li 
---

Notes:
Change from v5 to v6
- remove reduntant pci->pp.ops = &ls_pcie_host_ops;
Change from v4 to v5
- update comit message
- remove a empty line
- use comments
/* Reset the PEX wrapper to bring the link out of L2 */
- pci->pp.ops = pcie->drvdata->ops,
ls_pcie_host_ops to the "ops" member of layerscape_drvdata.
- don't set pcie->scfg = NULL at error path

Change from v3 to v4
- update commit message.
- it is reset a glue logic part for PCI controller.
- use regmap_write_bits() to reduce code change.

Change from v2 to v3
- update according to mani's feedback
change from v1 to v2
- change subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 83 -
 1 file changed, 80 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index aea89926bcc4f..711563777aeba 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -35,11 +35,19 @@
 #define PF_MCR_PTOMR   BIT(0)
 #define PF_MCR_EXL2S   BIT(1)
 
+/* LS1021A PEXn PM Write Control Register */
+#define SCFG_PEXPMWRCR(idx)(0x5c + (idx) * 0x64)
+#define PMXMTTURNOFF   BIT(31)
+#define SCFG_PEXSFTRSTCR   0x190
+#define PEXSR(idx) BIT(idx)
+
 #define PCIE_IATU_NUM  6
 
 struct ls_pcie_drvdata {
const u32 pf_off;
+   const struct dw_pcie_host_ops *ops;
int (*exit_from_l2)(struct dw_pcie_rp *pp);
+   bool scfg_support;
bool pm_support;
 };
 
@@ -47,6 +55,8 @@ struct ls_pcie {
struct dw_pcie *pci;
const struct ls_pcie_drvdata *drvdata;
void __iomem *pf_base;
+   struct regmap *scfg;
+   int index;
bool big_endian;
 };
 
@@ -171,18 +181,70 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp)
return 0;
 }
 
+static void scfg_pcie_send_turnoff_msg(struct regmap *scfg, u32 reg, u32 mask)
+{
+   /* Send PME_Turn_Off message */
+   regmap_write_bits(scfg, reg, mask, mask);
+
+   /*
+* There is no specific register to check for PME_To_Ack from endpoint.
+* So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US.
+*/
+   mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000);
+
+   /*
+* Layerscape hardware reference manual recommends clearing the 
PMXMTTURNOFF bit
+* to complete the PME_Turn_Off handshake.
+*/
+   regmap_write_bits(scfg, reg, mask, 0);
+}
+
+static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+
+   scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), 
PMXMTTURNOFF);
+}
+
+static int scfg_pcie_exit_from_l2(struct regmap *scfg, u32 reg, u32 mask)
+{
+   /* Reset the PEX wrapper to bring the link out of L2 */
+   regmap_write_bits(scfg, reg, mask, mask);
+   regmap_write_bits(scfg, reg, mask, 0);
+
+   return 0;
+}
+
+static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+
+   return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
PEXSR(pcie->index));
+}
+
 static const struct dw_pcie_host_ops ls_pcie_host_ops = {
.host_init = ls_pcie_host_init,
.pme_turn_off = ls_pcie_send_turnoff_msg,
 };
 
+static const struct dw_pcie_host_ops ls1021a_pcie_host_ops = {
+   .host_init = ls_pcie_host_init,
+   .pme_turn_off = ls1021a_pcie_send_turnoff_msg,
+};
+
 static const struct ls_pcie_drvdata ls1021a_drvdata = {
-   .pm_support = false,
+   .pm_support = true,
+   .scfg_support = true,
+   .ops = &ls1021a_pcie_host_ops,
+   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
 };
 
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_off = 0xc,
.pm_support = true,
+   .ops = &ls_pcie_host_ops,
.exit_from_l2 = ls_pcie_exit_from_l2,
 };
 
@@ -205,6 +267,8 @@ static int ls_pcie_probe(struct platform_device *pdev)
struct dw_pcie *pci;
struct ls_pcie *pcie;
struct resource *dbi_base;
+   u32 index[2];
+

[PATCH v6 3/4] PCI: layerscape(ep): Rename pf_* as pf_lut_*

2023-12-04 Thread Frank Li

'pf' and 'lut' is just difference name in difference chips, but basic it is
a MMIO base address plus an offset.

Rename it to avoid duplicate pf_* and lut_* in driver.

Reviewed-by: Manivannan Sadhasivam 
Acked-by: Roy Zang 
Signed-off-by: Frank Li 
---

Notes:
pf_lut is better than pf_* or lut* because some chip use 'pf', some chip
use 'lut'.

Change from v5 to v6
move to previous patch
-> -.ops = &ls_pcie_host_ops;
> + .ops = &ls_pcie_host_ops,

Change from v4 to v5
- rename layerscape-ep code also
change from v1 to v4
- new patch at v3

 .../pci/controller/dwc/pci-layerscape-ep.c| 16 -
 drivers/pci/controller/dwc/pci-layerscape.c   | 34 +--
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 3d3c50ef4b6ff..2ca339f938a86 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -49,7 +49,7 @@ struct ls_pcie_ep {
boolbig_endian;
 };
 
-static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
+static u32 ls_pcie_pf_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
 {
struct dw_pcie *pci = pcie->pci;
 
@@ -59,7 +59,7 @@ static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
return ioread32(pci->dbi_base + offset);
 }
 
-static void ls_lut_writel(struct ls_pcie_ep *pcie, u32 offset, u32 value)
+static void ls_pcie_pf_lut_writel(struct ls_pcie_ep *pcie, u32 offset, u32 
value)
 {
struct dw_pcie *pci = pcie->pci;
 
@@ -76,8 +76,8 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
*dev_id)
u32 val, cfg;
u8 offset;
 
-   val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
-   ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
+   val = ls_pcie_pf_lut_readl(pcie, PEX_PF0_PME_MES_DR);
+   ls_pcie_pf_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
 
if (!val)
return IRQ_NONE;
@@ -96,9 +96,9 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
*dev_id)
dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, pcie->lnkcap);
dw_pcie_dbi_ro_wr_dis(pci);
 
-   cfg = ls_lut_readl(pcie, PEX_PF0_CONFIG);
+   cfg = ls_pcie_pf_lut_readl(pcie, PEX_PF0_CONFIG);
cfg |= PEX_PF0_CFG_READY;
-   ls_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
+   ls_pcie_pf_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
dw_pcie_ep_linkup(&pci->ep);
 
dev_dbg(pci->dev, "Link up\n");
@@ -130,10 +130,10 @@ static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep 
*pcie,
}
 
/* Enable interrupts */
-   val = ls_lut_readl(pcie, PEX_PF0_PME_MES_IER);
+   val = ls_pcie_pf_lut_readl(pcie, PEX_PF0_PME_MES_IER);
val |=  PEX_PF0_PME_MES_IER_LDDIE | PEX_PF0_PME_MES_IER_HRDIE |
PEX_PF0_PME_MES_IER_LUDIE;
-   ls_lut_writel(pcie, PEX_PF0_PME_MES_IER, val);
+   ls_pcie_pf_lut_writel(pcie, PEX_PF0_PME_MES_IER, val);
 
return 0;
 }
diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 711563777aeba..f3dfb70066fb7 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -44,7 +44,7 @@
 #define PCIE_IATU_NUM  6
 
 struct ls_pcie_drvdata {
-   const u32 pf_off;
+   const u32 pf_lut_off;
const struct dw_pcie_host_ops *ops;
int (*exit_from_l2)(struct dw_pcie_rp *pp);
bool scfg_support;
@@ -54,13 +54,13 @@ struct ls_pcie_drvdata {
 struct ls_pcie {
struct dw_pcie *pci;
const struct ls_pcie_drvdata *drvdata;
-   void __iomem *pf_base;
+   void __iomem *pf_lut_base;
struct regmap *scfg;
int index;
bool big_endian;
 };
 
-#define ls_pcie_pf_readl_addr(addr)ls_pcie_pf_readl(pcie, addr)
+#define ls_pcie_pf_lut_readl_addr(addr)ls_pcie_pf_lut_readl(pcie, addr)
 #define to_ls_pcie(x)  dev_get_drvdata((x)->dev)
 
 static bool ls_pcie_is_bridge(struct ls_pcie *pcie)
@@ -101,20 +101,20 @@ static void ls_pcie_fix_error_response(struct ls_pcie 
*pcie)
iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
 }
 
-static u32 ls_pcie_pf_readl(struct ls_pcie *pcie, u32 off)
+static u32 ls_pcie_pf_lut_readl(struct ls_pcie *pcie, u32 off)
 {
if (pcie->big_endian)
-   return ioread32be(pcie->pf_base + off);
+   return ioread32be(pcie->pf_lut_base + off);
 
-   return ioread32(pcie->pf_base + off);
+   return ioread32(pcie->pf_lut_base + off);
 }
 
-static void ls_pcie_pf_writel(struct ls_pcie *pcie, u32 off, u32 val)
+static void ls_pcie_pf_lut_writel(struct ls_pcie *pcie, u32 off, u32 val)
 {
if (pcie->big_endian)
-   iowrite32be(val, pcie->pf_base + off);
+

[PATCH v6 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-12-04 Thread Frank Li

Add suspend/resume support for Layerscape LS1043a.

In the suspend path, PME_Turn_Off message is sent to the endpoint to
transition the link to L2/L3_Ready state. In this SoC, there is no way to
check if the controller has received the PME_To_Ack from the endpoint or
not. So to be on the safer side, the driver just waits for
PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
bit to complete the PME_Turn_Off handshake. Then the link would enter L2/L3
state depending on the VAUX supply.

In the resume path, the link is brought back from L2 to L0 by doing a
software reset.

Acked-by: Roy Zang 
Reviewed-by: Manivannan Sadhasivam 
Signed-off-by: Frank Li 
---

Notes:
Chagne from v5 to v6
- none
Change from v4 to v5
- update commit message
- use comments
/* Reset the PEX wrapper to bring the link out of L2 */

Change from v3 to v4
- Call scfg_pcie_send_turnoff_msg() shared with ls1021a
- update commit message

Change from v2 to v3
- Remove ls_pcie_lut_readl(writel) function

Change from v1 to v2
- Update subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 63 -
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index f3dfb70066fb7..7cdada200de7e 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -41,6 +41,15 @@
 #define SCFG_PEXSFTRSTCR   0x190
 #define PEXSR(idx) BIT(idx)
 
+/* LS1043A PEX PME control register */
+#define SCFG_PEXPMECR  0x144
+#define PEXPME(idx)BIT(31 - (idx) * 4)
+
+/* LS1043A PEX LUT debug register */
+#define LS_PCIE_LDBG   0x7fc
+#define LDBG_SRBIT(30)
+#define LDBG_WEBIT(31)
+
 #define PCIE_IATU_NUM  6
 
 struct ls_pcie_drvdata {
@@ -224,6 +233,45 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
PEXSR(pcie->index));
 }
 
+static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+
+   scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
PEXPME(pcie->index));
+}
+
+static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+   u32 val;
+
+   /*
+* Reset the PEX wrapper to bring the link out of L2.
+* LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
both setting and
+*  clearing the soft reset on the PEX module.
+* LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
+*/
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val |= LDBG_WE;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val |= LDBG_SR;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val &= ~LDBG_SR;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val &= ~LDBG_WE;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   return 0;
+}
+
 static const struct dw_pcie_host_ops ls_pcie_host_ops = {
.host_init = ls_pcie_host_init,
.pme_turn_off = ls_pcie_send_turnoff_msg,
@@ -241,6 +289,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
.exit_from_l2 = ls1021a_pcie_exit_from_l2,
 };
 
+static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
+   .host_init = ls_pcie_host_init,
+   .pme_turn_off = ls1043a_pcie_send_turnoff_msg,
+};
+
+static const struct ls_pcie_drvdata ls1043a_drvdata = {
+   .pf_lut_off = 0x1,
+   .pm_support = true,
+   .scfg_support = true,
+   .ops = &ls1043a_pcie_host_ops,
+   .exit_from_l2 = ls1043a_pcie_exit_from_l2,
+};
+
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_lut_off = 0xc,
.pm_support = true,
@@ -252,7 +313,7 @@ static const struct of_device_id ls_pcie_of_match[] = {
{ .compatible = "fsl,ls1012a-pcie", .data = &layerscape_drvdata },
{ .compatible = "fsl,ls1021a-pcie", .data = &ls1021a_drvdata },
{ .compatible = "fsl,ls1028a-pcie", .data = &layerscape_drvdata },
-   { .compatible = "fsl,ls1043a-pcie", .data = &ls1021a_drvdata },
+   { .compatible = "fsl,ls1043a-pcie", .data = &ls1043a_drvdata },
{ .compatible = "fsl,ls1046a-pcie", .data = &layerscape_drvdata },
{ .compatible = "fsl,ls2080a-pcie", .data = &layerscape_drvdata },
{ .compatible = "fsl,ls2085a-pcie", .data = &layerscape_drvdata },
-- 
2.34.1

Re: [PATCH v3 5/7] kexec_file, ricv: print out debugging message if required

2023-12-04 Thread Conor Dooley

On Mon, Dec 04, 2023 at 11:38:05PM +0800, Baoquan He wrote:
> On 12/01/23 at 10:38am, Conor Dooley wrote:
> > On Thu, Nov 30, 2023 at 10:39:53AM +0800, Baoquan He wrote:
> > 
> > $subject has a typo in the arch bit :)
> 
> Indeed, will fix if need report. Thanks for careful checking.
> 
> > 
> > > Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
> > > loading related codes.
> > 
> > Commit messages should be understandable in isolation, but this only
> > explains (part of) what is obvious in the diff. Why is this change
> > being made?
> 
> The purpose has been detailedly described in cover letter and patch 1
> log. Andrew has picked these patches into his tree and grabbed the cover
> letter log into the relevant commit for people's later checking. All
> these seven patches will be present in mainline together. This is common
> way when posting patch series? Please let me know if I misunderstand
> anything.

Each patch having a commit message that explains why a change is being
made is the expectation. It is especially useful to explain the why
here, since it is not just a mechanical conversion of pr_debug()s as the
commit message suggests.

> > 
> > > 
> > > And also remove kexec_image_info() because the content has been printed
> > > out in generic code.
> > > 
> > > Signed-off-by: Baoquan He 
> > > ---
> > >  arch/riscv/kernel/elf_kexec.c | 11 ++-
> > >  arch/riscv/kernel/machine_kexec.c | 26 --
> > >  2 files changed, 6 insertions(+), 31 deletions(-)
> > > 
> > > diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
> > > index e60fbd8660c4..5bd1ec3341fe 100644
> > > --- a/arch/riscv/kernel/elf_kexec.c
> > > +++ b/arch/riscv/kernel/elf_kexec.c
> > > @@ -216,7 +216,6 @@ static void *elf_kexec_load(struct kimage *image, 
> > > char *kernel_buf,
> > >   if (ret)
> > >   goto out;
> > >   kernel_start = image->start;
> > > - pr_notice("The entry point of kernel at 0x%lx\n", image->start);
> > >  
> > >   /* Add the kernel binary to the image */
> > >   ret = riscv_kexec_elf_load(image, &ehdr, &elf_info,
> > > @@ -252,8 +251,8 @@ static void *elf_kexec_load(struct kimage *image, 
> > > char *kernel_buf,
> > >   image->elf_load_addr = kbuf.mem;
> > >   image->elf_headers_sz = headers_sz;
> > >  
> > > - pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx 
> > > memsz=0x%lx\n",
> > > -  image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
> > > + kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx 
> > > memsz=0x%lx\n",
> > > +   image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
> > >  
> > >   /* Setup cmdline for kdump kernel case */
> > >   modified_cmdline = setup_kdump_cmdline(image, cmdline,
> > > @@ -275,6 +274,8 @@ static void *elf_kexec_load(struct kimage *image, 
> > > char *kernel_buf,
> > >   pr_err("Error loading purgatory ret=%d\n", ret);
> > >   goto out;
> > >   }
> > > + kexec_dprintk("Loaded purgatory at 0x%lx\n", kbuf.mem);
> > > +
> > >   ret = kexec_purgatory_get_set_symbol(image, "riscv_kernel_entry",
> > >&kernel_start,
> > >sizeof(kernel_start), 0);
> > > @@ -293,7 +294,7 @@ static void *elf_kexec_load(struct kimage *image, 
> > > char *kernel_buf,
> > >   if (ret)
> > >   goto out;
> > >   initrd_pbase = kbuf.mem;
> > 
> > > - pr_notice("Loaded initrd at 0x%lx\n", initrd_pbase);
> > > + kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_pbase);
> > 
> > This is not a pr_debug().
> > 
> > >   }
> > >  
> > >   /* Add the DTB to the image */
> > > @@ -318,7 +319,7 @@ static void *elf_kexec_load(struct kimage *image, 
> > > char *kernel_buf,
> > >   }
> > >   /* Cache the fdt buffer address for memory cleanup */
> > >   image->arch.fdt = fdt;
> > 
> > > - pr_notice("Loaded device tree at 0x%lx\n", kbuf.mem);
> > > + kexec_dprintk("Loaded device tree at 0x%lx\n", kbuf.mem);
> > 
> > Neither is this. Why are they being moved from pr_notice()?
> 
> You are right. 
> 
> While always printing out the loaded location of purgatory and
> device tree doesn't make sense. It will be confusing when users
> see these even when they do normal kexec/kdump loading. It should be
> changed to pr_debug().
> 
> Which way do you suggest?
> 1) change it back to pr_debug(), fix it in another patch;
> 2) keep it as is in the patch;

Personally I think it is fine to change them all in one patch, but the
rationale for converting pr_notice() to your new debug infrastructure
needs to be in the commit message.

Thanks,
Conor.


signature.asc
Description: PGP signature

Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-12-04 Thread Peter Xu

On Mon, Dec 04, 2023 at 11:11:26AM +, Ryan Roberts wrote:
> To be honest, while I understand pte_cont() and friends, I don't understand
> their relevance (or at least potential future relevance) to GUP?

GUP in general can be smarter to recognize if a pte/pmd is a cont_pte and
fetch the whole pte/pmd range if the caller specified.  Now it loops over
each pte/pmd.

Fast-gup is better as it at least doesn't take pgtable lock, for cont_pte
it looks inside gup_pte_range() which is good enough, but it'll still do
folio checks for each sub-pte, even though the 2nd+ folio checks should be
mostly the same (if to ignore races when the folio changed within the time
of processing the cont_pte chunk).

Slow-gup (as of what this series is about so far) doesn't do that either,
for each cont_pte whole entry it'll loop N times, frequently taking and
releasing the pgtable lock.  A smarter slow-gup can fundamentallly setup
follow_page_context.page_mask if it sees a cont_pte.  There might be a
challenge on whether holding the head page's refcount would stablize the
whole folio, but that may be another question to ask.

I think I also overlooked that PPC_8XX also has cont_pte support, so we
actually have three users indeed, if not counting potential future archs
adding support to also get that same tlb benefit.

Thanks,

-- 
Peter Xu

[PATCH v2 02/10] leds: aw2013: unlock mutex before destroying it

2023-12-04 Thread George Stark

In the probe() callback in case of error mutex is destroyed being locked
which is not allowed so unlock the mute before destroying.

Signed-off-by: George Stark 
---
 drivers/leds/leds-aw2013.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/leds/leds-aw2013.c b/drivers/leds/leds-aw2013.c
index 59765640b70f..c2bc0782c0cd 100644
--- a/drivers/leds/leds-aw2013.c
+++ b/drivers/leds/leds-aw2013.c
@@ -397,6 +397,7 @@ static int aw2013_probe(struct i2c_client *client)
regulator_disable(chip->vcc_regulator);
 
 error:
+   mutex_unlock(&chip->mutex);
mutex_destroy(&chip->mutex);
return ret;
 }
-- 
2.38.4

[PATCH v2 01/10] devm-helpers: introduce devm_mutex_init

2023-12-04 Thread George Stark

Using of devm API leads to certain order of releasing resources.
So all dependent resources which are not devm-wrapped should be deleted
with respect to devm-release order. Mutex is one of such objects that
often is bound to other resources and has no own devm wrapping.
Since mutex_destroy() actually does nothing in non-debug builds
frequently calling mutex_destroy() is just ignored which is safe for now
but wrong formally and can lead to a problem if mutex_destroy() is
extended so introduce devm_mutex_init().

Signed-off-by: George Stark 
---
 include/linux/devm-helpers.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/devm-helpers.h b/include/linux/devm-helpers.h
index 74891802200d..2f56e476776f 100644
--- a/include/linux/devm-helpers.h
+++ b/include/linux/devm-helpers.h
@@ -76,4 +76,22 @@ static inline int devm_work_autocancel(struct device *dev,
return devm_add_action(dev, devm_work_drop, w);
 }
 
+static inline void devm_mutex_release(void *res)
+{
+   mutex_destroy(res);
+}
+
+/**
+ * devm_mutex_init - Resource-managed mutex initialization
+ * @dev:   Device which lifetime work is bound to
+ * @lock:  Pointer to a mutex
+ *
+ * Initialize mutex which is automatically destroyed when driver is detached.
+ */
+static inline int devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   mutex_init(lock);
+   return devm_add_action_or_reset(dev, devm_mutex_release, lock);
+}
+
 #endif
-- 
2.38.4

[PATCH v2 04/10] leds: aw200xx: use devm API to cleanup module's resources

2023-12-04 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().

Signed-off-by: George Stark 
---
 drivers/leds/leds-aw200xx.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/leds/leds-aw200xx.c b/drivers/leds/leds-aw200xx.c
index 1d3943f86f7f..b1a097c7c879 100644
--- a/drivers/leds/leds-aw200xx.c
+++ b/drivers/leds/leds-aw200xx.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -530,6 +531,20 @@ static const struct regmap_config aw200xx_regmap_config = {
.disable_locking = true,
 };
 
+static void aw200xx_chip_reset_action(void *data)
+{
+   const struct aw200xx *chip = (struct aw200xx *)data;
+
+   aw200xx_chip_reset(chip);
+}
+
+static void aw200xx_disable_action(void *data)
+{
+   const struct aw200xx *chip = (struct aw200xx *)data;
+
+   aw200xx_disable(chip);
+}
+
 static int aw200xx_probe(struct i2c_client *client)
 {
const struct aw200xx_chipdef *cdef;
@@ -568,11 +583,16 @@ static int aw200xx_probe(struct i2c_client *client)
 
aw200xx_enable(chip);
 
+   ret = devm_add_action(&client->dev, aw200xx_disable_action, chip);
+   if (ret)
+   return ret;
+
ret = aw200xx_chip_check(chip);
if (ret)
return ret;
 
-   mutex_init(&chip->mutex);
+   if (devm_mutex_init(&client->dev, &chip->mutex))
+   return -ENOMEM;
 
/* Need a lock now since after call aw200xx_probe_fw, sysfs nodes 
created */
mutex_lock(&chip->mutex);
@@ -581,6 +601,10 @@ static int aw200xx_probe(struct i2c_client *client)
if (ret)
goto out_unlock;
 
+   ret = devm_add_action(&client->dev, aw200xx_chip_reset_action, chip);
+   if (ret)
+   goto out_unlock;
+
ret = aw200xx_probe_fw(&client->dev, chip);
if (ret)
goto out_unlock;
@@ -595,15 +619,6 @@ static int aw200xx_probe(struct i2c_client *client)
return ret;
 }
 
-static void aw200xx_remove(struct i2c_client *client)
-{
-   struct aw200xx *chip = i2c_get_clientdata(client);
-
-   aw200xx_chip_reset(chip);
-   aw200xx_disable(chip);
-   mutex_destroy(&chip->mutex);
-}
-
 static const struct aw200xx_chipdef aw20036_cdef = {
.channels = 36,
.display_size_rows_max = 3,
@@ -652,7 +667,6 @@ static struct i2c_driver aw200xx_driver = {
.of_match_table = aw200xx_match_table,
},
.probe_new = aw200xx_probe,
-   .remove = aw200xx_remove,
.id_table = aw200xx_id,
 };
 module_i2c_driver(aw200xx_driver);
-- 
2.38.4

[PATCH v2 03/10] leds: aw2013: use devm API to cleanup module's resources

2023-12-04 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().

Signed-off-by: George Stark 
---
 drivers/leds/leds-aw2013.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/leds/leds-aw2013.c b/drivers/leds/leds-aw2013.c
index c2bc0782c0cd..1a8acf303548 100644
--- a/drivers/leds/leds-aw2013.c
+++ b/drivers/leds/leds-aw2013.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0+
 // Driver for Awinic AW2013 3-channel LED driver
 
+#include 
 #include 
 #include 
 #include 
@@ -318,6 +319,13 @@ static int aw2013_probe_dt(struct aw2013 *chip)
return 0;
 }
 
+static void aw2013_chip_disable_action(void *data)
+{
+   struct aw2013 *chip = (struct aw2013 *)data;
+
+   aw2013_chip_disable(chip);
+}
+
 static const struct regmap_config aw2013_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
@@ -334,7 +342,9 @@ static int aw2013_probe(struct i2c_client *client)
if (!chip)
return -ENOMEM;
 
-   mutex_init(&chip->mutex);
+   if (devm_mutex_init(&client->dev, &chip->mutex))
+   return -ENOMEM;
+
mutex_lock(&chip->mutex);
 
chip->client = client;
@@ -378,6 +388,10 @@ static int aw2013_probe(struct i2c_client *client)
goto error_reg;
}
 
+   ret = devm_add_action(&client->dev, aw2013_chip_disable_action, chip);
+   if (ret)
+   goto error_reg;
+
ret = aw2013_probe_dt(chip);
if (ret < 0)
goto error_reg;
@@ -398,19 +412,9 @@ static int aw2013_probe(struct i2c_client *client)
 
 error:
mutex_unlock(&chip->mutex);
-   mutex_destroy(&chip->mutex);
return ret;
 }
 
-static void aw2013_remove(struct i2c_client *client)
-{
-   struct aw2013 *chip = i2c_get_clientdata(client);
-
-   aw2013_chip_disable(chip);
-
-   mutex_destroy(&chip->mutex);
-}
-
 static const struct of_device_id aw2013_match_table[] = {
{ .compatible = "awinic,aw2013", },
{ /* sentinel */ },
@@ -424,7 +428,6 @@ static struct i2c_driver aw2013_driver = {
.of_match_table = of_match_ptr(aw2013_match_table),
},
.probe = aw2013_probe,
-   .remove = aw2013_remove,
 };
 
 module_i2c_driver(aw2013_driver);
-- 
2.38.4

[PATCH v2 05/10] leds: lp3952: use devm API to cleanup module's resources

2023-12-04 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().
Also drop explicit turning LEDs off from remove() due to they will be off
anyway by led_classdev_unregister().

Signed-off-by: George Stark 
---
 drivers/leds/leds-lp3952.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/leds/leds-lp3952.c b/drivers/leds/leds-lp3952.c
index 3bd55652a706..fc0e02a9768f 100644
--- a/drivers/leds/leds-lp3952.c
+++ b/drivers/leds/leds-lp3952.c
@@ -207,6 +207,13 @@ static const struct regmap_config lp3952_regmap = {
.cache_type = REGCACHE_RBTREE,
 };
 
+static void gpio_set_low_action(void *data)
+{
+   struct lp3952_led_array *priv = (struct lp3952_led_array *)data;
+
+   gpiod_set_value(priv->enable_gpio, 0);
+}
+
 static int lp3952_probe(struct i2c_client *client)
 {
int status;
@@ -226,6 +233,10 @@ static int lp3952_probe(struct i2c_client *client)
return status;
}
 
+   status = devm_add_action(&client->dev, gpio_set_low_action, priv);
+   if (status)
+   return status;
+
priv->regmap = devm_regmap_init_i2c(client, &lp3952_regmap);
if (IS_ERR(priv->regmap)) {
int err = PTR_ERR(priv->regmap);
@@ -254,15 +265,6 @@ static int lp3952_probe(struct i2c_client *client)
return 0;
 }
 
-static void lp3952_remove(struct i2c_client *client)
-{
-   struct lp3952_led_array *priv;
-
-   priv = i2c_get_clientdata(client);
-   lp3952_on_off(priv, LP3952_LED_ALL, false);
-   gpiod_set_value(priv->enable_gpio, 0);
-}
-
 static const struct i2c_device_id lp3952_id[] = {
{LP3952_NAME, 0},
{}
@@ -274,7 +276,6 @@ static struct i2c_driver lp3952_i2c_driver = {
.name = LP3952_NAME,
},
.probe = lp3952_probe,
-   .remove = lp3952_remove,
.id_table = lp3952_id,
 };
 
-- 
2.38.4

[PATCH v2 00/10] devm_led_classdev_register() usage problem

2023-12-04 Thread George Stark

This patch series fixes the problem of devm_led_classdev_register misusing.

The basic problem is described in [1]. Shortly when devm_led_classdev_register()
is used then led_classdev_unregister() called after driver's remove() callback.
led_classdev_unregister() calls driver's brightness_set callback and that 
callback
may use resources which were destroyed already in driver's remove().

After discussion with maintainers [2] [3] we decided:
1) don't touch led subsytem core code and don't remove led_set_brightness() 
from it
but fix drivers
2) don't use devm_led_classdev_unregister

So the solution is to use devm wrappers for all resources
driver's brightness_set() depends on. And introduce dedicated devm wrapper
for mutex as it's often used resource.

[1] 
https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/
[2] 
https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/#mc132b9b350fa51931b4fcfe14705d9f06e91421f
[3] 
https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/#mdbf572a85c33f869a553caf986b6228bb65c8383

George Stark (10):
  devm-helpers: introduce devm_mutex_init
  leds: aw2013: unlock mutex before destroying it
  leds: aw2013: use devm API to cleanup module's resources
  leds: aw200xx: use devm API to cleanup module's resources
  leds: lp3952: use devm API to cleanup module's resources
  leds: lm3532: use devm API to cleanup module's resources
  leds: nic78bx: use devm API to cleanup module's resources
  leds: mlxreg: use devm_mutex_init for mutex initializtion
  leds: an30259a: use devm_mutext_init for mutext initialization
  leds: powernv: add LED_RETAIN_AT_SHUTDOWN flag for leds

 drivers/leds/leds-an30259a.c | 14 --
 drivers/leds/leds-aw200xx.c  | 36 +---
 drivers/leds/leds-aw2013.c   | 28 
 drivers/leds/leds-lm3532.c   | 29 +
 drivers/leds/leds-lp3952.c   | 21 +++--
 drivers/leds/leds-mlxreg.c   | 15 ---
 drivers/leds/leds-nic78bx.c  | 25 +
 drivers/leds/leds-powernv.c  | 23 ---
 include/linux/devm-helpers.h | 18 ++
 9 files changed, 116 insertions(+), 93 deletions(-)

--
2.38.4

Re: [PATCH v2 01/10] devm-helpers: introduce devm_mutex_init

2023-12-04 Thread Andy Shevchenko

On Mon, Dec 4, 2023 at 8:07 PM George Stark  wrote:
>
> Using of devm API leads to certain order of releasing resources.
> So all dependent resources which are not devm-wrapped should be deleted
> with respect to devm-release order. Mutex is one of such objects that
> often is bound to other resources and has no own devm wrapping.
> Since mutex_destroy() actually does nothing in non-debug builds
> frequently calling mutex_destroy() is just ignored which is safe for now
> but wrong formally and can lead to a problem if mutex_destroy() is
> extended so introduce devm_mutex_init().

...

Do you need to include mutex.h?

...

> +/**
> + * devm_mutex_init - Resource-managed mutex initialization
> + * @dev:   Device which lifetime work is bound to
> + * @lock:  Pointer to a mutex
> + *
> + * Initialize mutex which is automatically destroyed when driver is detached.

the driver

Have you run scripts/kernel-doc -v -Wall -none ... against this file?
I'm pretty sure it will complain.

> + */


-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 02/10] leds: aw2013: unlock mutex before destroying it

2023-12-04 Thread Andy Shevchenko

On Mon, Dec 4, 2023 at 8:07 PM George Stark  wrote:
>
> In the probe() callback in case of error mutex is destroyed being locked
> which is not allowed so unlock the mute before destroying.

Sounds to me like this is a real fix, hence the Fixes tag and being
first in the series.

You free to add
Reviewed-by: Andy Shevchenko 

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 03/10] leds: aw2013: use devm API to cleanup module's resources

2023-12-04 Thread Andy Shevchenko

On Mon, Dec 4, 2023 at 8:07 PM George Stark  wrote:
>
> In this driver LEDs are registered using devm_led_classdev_register()
> so they are automatically unregistered after module's remove() is done.
> led_classdev_unregister() calls module's led_set_brightness() to turn off
> the LEDs and that callback uses resources which were destroyed already
> in module's remove() so use devm API instead of remove().

...

> +static void aw2013_chip_disable_action(void *data)
> +{
> +   struct aw2013 *chip = (struct aw2013 *)data;
> +
> +   aw2013_chip_disable(chip);
> +}

As with mutex release, this also can be oneliner

static void aw2013_chip_disable_action(void *chip)
{
   aw2013_chip_disable(chip);
}

...

> +   if (devm_mutex_init(&client->dev, &chip->mutex))
> +   return -ENOMEM;

Shouldn be

   ret = devm_mutex_init(&client->dev, &chip->mutex);
   if (ret)
   return ret;

?

> +   return -ENOMEM;

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 04/10] leds: aw200xx: use devm API to cleanup module's resources

2023-12-04 Thread Andy Shevchenko

On Mon, Dec 4, 2023 at 8:07 PM George Stark  wrote:
>
> In this driver LEDs are registered using devm_led_classdev_register()
> so they are automatically unregistered after module's remove() is done.
> led_classdev_unregister() calls module's led_set_brightness() to turn off
> the LEDs and that callback uses resources which were destroyed already
> in module's remove() so use devm API instead of remove().

...

> +static void aw200xx_chip_reset_action(void *data)
> +{
> +   const struct aw200xx *chip = (struct aw200xx *)data;
> +
> +   aw200xx_chip_reset(chip);
> +}
> +
> +static void aw200xx_disable_action(void *data)
> +{
> +   const struct aw200xx *chip = (struct aw200xx *)data;
> +
> +   aw200xx_disable(chip);
> +}

They can be made oneliners.

...

> +   if (devm_mutex_init(&client->dev, &chip->mutex))
> +   return -ENOMEM;

Do not shadow the real error code.

-- 
With Best Regards,
Andy Shevchenko

[PATCH net-next v2 0/9] net*: Convert to platform remove callback returning void

2023-12-04 Thread Uwe Kleine-König

Hello,

(implicit) v1 of this series can be found at
https://lore.kernel.org/netdev/20231117095922.876489-1-u.kleine-koe...@pengutronix.de.
Changes since then:

 - Dropped patch #1 as Alex objected. Patch #1 (was #2 before) now
   converts ipa to remove_new() and introduces an error message in the
   error path that failed before.

 - Rebased to today's next

 - Add the tags received in the previous round.

Uwe Kleine-König (9):
  net: ipa: Convert to platform remove callback returning void
  net: fjes: Convert to platform remove callback returning void
  net: pcs: rzn1-miic: Convert to platform remove callback returning
void
  net: sfp: Convert to platform remove callback returning void
  net: wan/fsl_ucc_hdlc: Convert to platform remove callback returning
void
  net: wan/ixp4xx_hss: Convert to platform remove callback returning
void
  net: wwan: qcom_bam_dmux: Convert to platform remove callback
returning void
  ieee802154: fakelb: Convert to platform remove callback returning void
  ieee802154: hwsim: Convert to platform remove callback returning void

 drivers/net/fjes/fjes_main.c |  6 ++---
 drivers/net/ieee802154/fakelb.c  |  5 ++--
 drivers/net/ieee802154/mac802154_hwsim.c |  6 ++---
 drivers/net/ipa/ipa_main.c   | 29 +++-
 drivers/net/pcs/pcs-rzn1-miic.c  |  6 ++---
 drivers/net/phy/sfp.c|  6 ++---
 drivers/net/wan/fsl_ucc_hdlc.c   |  6 ++---
 drivers/net/wan/ixp4xx_hss.c |  5 ++--
 drivers/net/wwan/qcom_bam_dmux.c |  6 ++---
 9 files changed, 29 insertions(+), 46 deletions(-)


base-commit: 629a3b49f3f957e975253c54846090b8d5ed2e9b
-- 
2.42.0

[PATCH net-next v2 5/9] net: wan/fsl_ucc_hdlc: Convert to platform remove callback returning void

2023-12-04 Thread Uwe Kleine-König

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Link: 
https://lore.kernel.org/r/20231117095922.876489-7-u.kleine-koe...@pengutronix.de
Signed-off-by: Uwe Kleine-König 
---
 drivers/net/wan/fsl_ucc_hdlc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index fd50bb313b92..605e70f7baac 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -1259,7 +1259,7 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
return ret;
 }
 
-static int ucc_hdlc_remove(struct platform_device *pdev)
+static void ucc_hdlc_remove(struct platform_device *pdev)
 {
struct ucc_hdlc_private *priv = dev_get_drvdata(&pdev->dev);
 
@@ -1277,8 +1277,6 @@ static int ucc_hdlc_remove(struct platform_device *pdev)
kfree(priv);
 
dev_info(&pdev->dev, "UCC based hdlc module removed\n");
-
-   return 0;
 }
 
 static const struct of_device_id fsl_ucc_hdlc_of_match[] = {
@@ -1292,7 +1290,7 @@ MODULE_DEVICE_TABLE(of, fsl_ucc_hdlc_of_match);
 
 static struct platform_driver ucc_hdlc_driver = {
.probe  = ucc_hdlc_probe,
-   .remove = ucc_hdlc_remove,
+   .remove_new = ucc_hdlc_remove,
.driver = {
.name   = DRV_NAME,
.pm = HDLC_PM_OPS,
-- 
2.42.0

Re: [PATCH] perf vendor events: Update datasource event name to fix duplicate events

2023-12-04 Thread Arnaldo Carvalho de Melo

Em Sun, Dec 03, 2023 at 09:27:25PM +0530, Athira Rajeev escreveu:
> > On 29-Nov-2023, at 10:51 AM, Athira Rajeev  
> > wrote:
> >> On 27-Nov-2023, at 5:32 PM, Disha Goel  wrote:
> >>> Fixes: fc1435807533 ("perf vendor events power10: Update JSON/events")
> >>> Signed-off-by: Athira Rajeev 

> >> I have tested the patch on Power10 machine. Perf list works correctly 
> >> without any segfault now.

> >> # ./perf list

> >> List of pre-defined events (to be used in -e or -M):

> >>  branch-instructions OR branches[Hardware event]
> >>  branch-misses  [Hardware event]

> >> Tested-by: Disha Goel 

> > Thanks Disha for testing

> Hi Arnaldo,
 
> Can we get this pulled in if the patch looks good ?

Thanks, applied to perf-tools-next.

- Arnaldo

Re: [PATCH] perf vendor events: Update datasource event name to fix duplicate events

2023-12-04 Thread Ian Rogers

On Thu, Nov 23, 2023 at 8:01 AM Athira Rajeev
 wrote:
>
> Running "perf list" on powerpc fails with segfault
> as below:
>
>./perf list
>Segmentation fault (core dumped)
>
> This happens because of duplicate events in the json list.
> The powerpc Json event list contains some event with same
> event name, but different event code. They are:
> - PM_INST_FROM_L3MISS (Present in datasource and frontend)
> - PM_MRK_DATA_FROM_L2MISS (Present in datasource and marked)
> - PM_MRK_INST_FROM_L3MISS (Present in datasource and marked)
> - PM_MRK_DATA_FROM_L3MISS (Present in datasource and marked)
>
> pmu_events_table__num_events uses the value from
> table_pmu->num_entries which includes duplicate events as
> well. This causes issue during "perf list" and results in
> segmentation fault.
>
> Since both event codes are valid, append _DSRC to the Data
> Source events (datasource.json), so that they would have a
> unique name. Also add PM_DATA_FROM_L2MISS_DSRC and
> PM_DATA_FROM_L3MISS_DSRC events. With the fix, perf list
> works as expected.
>
> Fixes: fc1435807533 ("perf vendor events power10: Update JSON/events")
> Signed-off-by: Athira Rajeev 

Given duplicate events creates broken pmu-events.c we should capture
that as an exception in jevents.py. That way a JEVENTS_ARCH=all build
will fail if any vendor/architecture would break in this way. We
should also add JEVENTS_ARCH=all to tools/perf/tests/make. Athira, do
you want to look at doing this?

Thanks,
Ian

> ---
>  .../arch/powerpc/power10/datasource.json   | 18 ++
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json 
> b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> index 6b0356f2d301..0eeaaf1a95b8 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> @@ -99,6 +99,11 @@
>  "EventName": "PM_INST_FROM_L2MISS",
>  "BriefDescription": "The processor's instruction cache was reloaded from 
> a source beyond the local core's L2 due to a demand miss."
>},
> +  {
> +"EventCode": "0x0003C000C040",
> +"EventName": "PM_DATA_FROM_L2MISS_DSRC",
> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L2 due to a demand miss."
> +  },
>{
>  "EventCode": "0x00038010C040",
>  "EventName": "PM_INST_FROM_L2MISS_ALL",
> @@ -161,9 +166,14 @@
>},
>{
>  "EventCode": "0x00078000C040",
> -"EventName": "PM_INST_FROM_L3MISS",
> +"EventName": "PM_INST_FROM_L3MISS_DSRC",
>  "BriefDescription": "The processor's instruction cache was reloaded from 
> beyond the local core's L3 due to a demand miss."
>},
> +  {
> +"EventCode": "0x0007C000C040",
> +"EventName": "PM_DATA_FROM_L3MISS_DSRC",
> +"BriefDescription": "The processor's L1 data cache was reloaded from 
> beyond the local core's L3 due to a demand miss."
> +  },
>{
>  "EventCode": "0x00078010C040",
>  "EventName": "PM_INST_FROM_L3MISS_ALL",
> @@ -981,7 +991,7 @@
>},
>{
>  "EventCode": "0x0003C000C142",
> -"EventName": "PM_MRK_DATA_FROM_L2MISS",
> +"EventName": "PM_MRK_DATA_FROM_L2MISS_DSRC",
>  "BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L2 due to a demand miss for a marked 
> instruction."
>},
>{
> @@ -1046,12 +1056,12 @@
>},
>{
>  "EventCode": "0x00078000C142",
> -"EventName": "PM_MRK_INST_FROM_L3MISS",
> +"EventName": "PM_MRK_INST_FROM_L3MISS_DSRC",
>  "BriefDescription": "The processor's instruction cache was reloaded from 
> beyond the local core's L3 due to a demand miss for a marked instruction."
>},
>{
>  "EventCode": "0x0007C000C142",
> -"EventName": "PM_MRK_DATA_FROM_L3MISS",
> +"EventName": "PM_MRK_DATA_FROM_L3MISS_DSRC",
>  "BriefDescription": "The processor's L1 data cache was reloaded from 
> beyond the local core's L3 due to a demand miss for a marked instruction."
>},
>{
> --
> 2.39.3
>

Re: [PATCH] perf vendor events: Update datasource event name to fix duplicate events

2023-12-04 Thread Arnaldo Carvalho de Melo

Em Mon, Dec 04, 2023 at 12:12:54PM -0800, Ian Rogers escreveu:
> On Thu, Nov 23, 2023 at 8:01 AM Athira Rajeev
>  wrote:
> >
> > Running "perf list" on powerpc fails with segfault
> > as below:
> >
> >./perf list
> >Segmentation fault (core dumped)
> >
> > This happens because of duplicate events in the json list.
> > The powerpc Json event list contains some event with same
> > event name, but different event code. They are:
> > - PM_INST_FROM_L3MISS (Present in datasource and frontend)
> > - PM_MRK_DATA_FROM_L2MISS (Present in datasource and marked)
> > - PM_MRK_INST_FROM_L3MISS (Present in datasource and marked)
> > - PM_MRK_DATA_FROM_L3MISS (Present in datasource and marked)
> >
> > pmu_events_table__num_events uses the value from
> > table_pmu->num_entries which includes duplicate events as
> > well. This causes issue during "perf list" and results in
> > segmentation fault.
> >
> > Since both event codes are valid, append _DSRC to the Data
> > Source events (datasource.json), so that they would have a
> > unique name. Also add PM_DATA_FROM_L2MISS_DSRC and
> > PM_DATA_FROM_L3MISS_DSRC events. With the fix, perf list
> > works as expected.
> >
> > Fixes: fc1435807533 ("perf vendor events power10: Update JSON/events")
> > Signed-off-by: Athira Rajeev 
> 
> Given duplicate events creates broken pmu-events.c we should capture
> that as an exception in jevents.py. That way a JEVENTS_ARCH=all build
> will fail if any vendor/architecture would break in this way. We
> should also add JEVENTS_ARCH=all to tools/perf/tests/make. Athira, do
> you want to look at doing this?

Should I go ahead and remove this patch till this is sorted out?

- Arnaldo

Re: [PATCH] perf vendor events: Update datasource event name to fix duplicate events

2023-12-04 Thread Arnaldo Carvalho de Melo

Em Mon, Dec 04, 2023 at 05:20:46PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Dec 04, 2023 at 12:12:54PM -0800, Ian Rogers escreveu:
> > On Thu, Nov 23, 2023 at 8:01 AM Athira Rajeev
> >  wrote:
> > >
> > > Running "perf list" on powerpc fails with segfault
> > > as below:
> > >
> > >./perf list
> > >Segmentation fault (core dumped)
> > >
> > > This happens because of duplicate events in the json list.
> > > The powerpc Json event list contains some event with same
> > > event name, but different event code. They are:
> > > - PM_INST_FROM_L3MISS (Present in datasource and frontend)
> > > - PM_MRK_DATA_FROM_L2MISS (Present in datasource and marked)
> > > - PM_MRK_INST_FROM_L3MISS (Present in datasource and marked)
> > > - PM_MRK_DATA_FROM_L3MISS (Present in datasource and marked)
> > >
> > > pmu_events_table__num_events uses the value from
> > > table_pmu->num_entries which includes duplicate events as
> > > well. This causes issue during "perf list" and results in
> > > segmentation fault.
> > >
> > > Since both event codes are valid, append _DSRC to the Data
> > > Source events (datasource.json), so that they would have a
> > > unique name. Also add PM_DATA_FROM_L2MISS_DSRC and
> > > PM_DATA_FROM_L3MISS_DSRC events. With the fix, perf list
> > > works as expected.
> > >
> > > Fixes: fc1435807533 ("perf vendor events power10: Update JSON/events")
> > > Signed-off-by: Athira Rajeev 
> > 
> > Given duplicate events creates broken pmu-events.c we should capture
> > that as an exception in jevents.py. That way a JEVENTS_ARCH=all build
> > will fail if any vendor/architecture would break in this way. We
> > should also add JEVENTS_ARCH=all to tools/perf/tests/make. Athira, do
> > you want to look at doing this?
> 
> Should I go ahead and remove this patch till this is sorted out?

I'll keep it, its already in tmp.perf-tools-next, we can go from there
and improve this with follow up patches,

- Arnaldo

Re: [PATCH] perf vendor events: Update datasource event name to fix duplicate events

2023-12-04 Thread Ian Rogers

On Mon, Dec 4, 2023 at 12:22 PM Arnaldo Carvalho de Melo
 wrote:
>
> Em Mon, Dec 04, 2023 at 05:20:46PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Dec 04, 2023 at 12:12:54PM -0800, Ian Rogers escreveu:
> > > On Thu, Nov 23, 2023 at 8:01 AM Athira Rajeev
> > >  wrote:
> > > >
> > > > Running "perf list" on powerpc fails with segfault
> > > > as below:
> > > >
> > > >./perf list
> > > >Segmentation fault (core dumped)
> > > >
> > > > This happens because of duplicate events in the json list.
> > > > The powerpc Json event list contains some event with same
> > > > event name, but different event code. They are:
> > > > - PM_INST_FROM_L3MISS (Present in datasource and frontend)
> > > > - PM_MRK_DATA_FROM_L2MISS (Present in datasource and marked)
> > > > - PM_MRK_INST_FROM_L3MISS (Present in datasource and marked)
> > > > - PM_MRK_DATA_FROM_L3MISS (Present in datasource and marked)
> > > >
> > > > pmu_events_table__num_events uses the value from
> > > > table_pmu->num_entries which includes duplicate events as
> > > > well. This causes issue during "perf list" and results in
> > > > segmentation fault.
> > > >
> > > > Since both event codes are valid, append _DSRC to the Data
> > > > Source events (datasource.json), so that they would have a
> > > > unique name. Also add PM_DATA_FROM_L2MISS_DSRC and
> > > > PM_DATA_FROM_L3MISS_DSRC events. With the fix, perf list
> > > > works as expected.
> > > >
> > > > Fixes: fc1435807533 ("perf vendor events power10: Update JSON/events")
> > > > Signed-off-by: Athira Rajeev 
> > >
> > > Given duplicate events creates broken pmu-events.c we should capture
> > > that as an exception in jevents.py. That way a JEVENTS_ARCH=all build
> > > will fail if any vendor/architecture would break in this way. We
> > > should also add JEVENTS_ARCH=all to tools/perf/tests/make. Athira, do
> > > you want to look at doing this?
> >
> > Should I go ahead and remove this patch till this is sorted out?
>
> I'll keep it, its already in tmp.perf-tools-next, we can go from there
> and improve this with follow up patches,

Agreed. I could look to do the follow up but likely won't have a
chance for a while. If others could help out it would be great. I'd
like to have the jevents and json be robust enough that we don't trip
over problems like this and the somewhat similar AmpereOne issue.

Thanks,
Ian

> - Arnaldo

Re: [PATCH 1/8] CMDLINE: add generic builtin command line

2023-12-04 Thread Jaskaran Singh




On 11/10/2023 7:08 AM, Daniel Walker wrote:
> diff --git a/lib/generic_cmdline.S b/lib/generic_cmdline.S
> new file mode 100644
> index ..223763f9eeb6
> --- /dev/null
> +++ b/lib/generic_cmdline.S
> @@ -0,0 +1,53 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include 
> +#include 
> +
> +#include 
> +
> +__INITDATA
> +
> +   .align 8
> +   .global cmdline_prepend
> +cmdline_prepend:
> +   .ifnc CONFIG_CMDLINE_PREPEND,""
> +   .ascii CONFIG_CMDLINE_PREPEND
> +   .string " "
> +   .else
> +   .byte 0x0
> +   .endif
> +#ifdef CONFIG_CMDLINE_EXTRA
> +   .space COMMAND_LINE_SIZE - (.-cmdline_prepend)
> +   .size cmdline_prepend, COMMAND_LINE_SIZE
> +#endif /* CONFIG_CMDLINE_EXTRA */
> +
> +cmdline_prepend_end:
> +   .size cmdline_prepend, (cmdline_prepend_end - cmdline_prepend)
> +
> +   .align 8
> +   .global cmdline_tmp
> +cmdline_tmp:
> +   .ifnc CONFIG_CMDLINE_PREPEND,""
> +   .size cmdline_tmp, COMMAND_LINE_SIZE
> +   .space COMMAND_LINE_SIZE
> +   .else
> +   .byte 0x0
> +   .endif
> +cmdline_tmp_end:
> +   .size cmdline_tmp, (cmdline_tmp_end - cmdline_tmp)
> +
> +   .align 8
> +   .global cmdline_append
> +   .size cmdline_append, COMMAND_LINE_SIZE
> +cmdline_append:
> +   .ifnc CONFIG_CMDLINE_APPEND,""
> +   .ascii " "
> +   .string CONFIG_CMDLINE_APPEND
> +   .else
> +   .byte 0x0
> +   .endif
> +#ifdef CONFIG_CMDLINE_EXTRA
> +   .space COMMAND_LINE_SIZE - (.-cmdline_append)
> +#endif /* CONFIG_CMDLINE_EXTRA */
> +cmdline_append_end:
> +   .size cmdline_append, (cmdline_append_end - cmdline_append)
> +

Hi Daniel,

I picked these patches to test a usecase of ours. generic_cmdline.S does
not escape semicolons in the CMDLINE_APPEND and CMDLINE_PREPEND strings.
Take this config snippet for example:

CONFIG_CMDLINE_APPEND="slub_debug=FZP,zs_handle,zspage;FZPU page_owner=on"
CONFIG_CMDLINE_BOOL=y
# CONFIG_CMDLINE_EXTRA is not set
# CONFIG_CMDLINE_OVERRIDE is not set
# CONFIG_CMDLINE_PREPEND is not set
# CONFIG_TEST_CMDLINE is not set

While compiling, the word FZPU is considered as a mnemonic because of
the semicolon preceding it causing parsing to fail:

kernel/lib/generic_cmdline.S: Assembler messages:
kernel/lib/generic_cmdline.S:42: Warning: missing closing `"'
kernel/lib/generic_cmdline.S:42: Error: unknown mnemonic `fzpu' -- `fzpu
page_owner=on",""'

Maybe Christophe's suggestion of moving this code to a C file and using
inline assembly helps mitigate similar problems?

Thanks,
Jaskaran.

Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations

2023-12-04 Thread Andy Shevchenko

On Sun, Dec 03, 2023 at 11:23:47AM -0800, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.
> 
> The target patterns may look like this:
> 
>   for (idx = 0; idx < nbits; idx++)
>   if (test_and_clear_bit(idx, bitmap))
>   do_something(idx);
> 
> Or like this:
> 
>   do {
>   bit = find_first_bit(bitmap, nbits);
>   if (bit >= nbits)
>   return nbits;
>   } while (!test_and_clear_bit(bit, bitmap));
>   return bit;
> 
> In both cases, the opencoded loop may be converted to a single function
> or iterator call. Correspondingly:
> 
>   for_each_test_and_clear_bit(idx, bitmap, nbits)
>   do_something(idx);
> 
> Or:
>   return find_and_clear_bit(bitmap, nbits);
> 
> Obviously, the less routine code people have to write themself, the
> less probability to make a mistake.
> 
> Those are not only handy helpers but also resolve a non-trivial
> issue of using non-atomic find_bit() together with atomic
> test_and_{set,clear)_bit().
> 
> The trick is that find_bit() implies that the bitmap is a regular
> non-volatile piece of memory, and compiler is allowed to use such
> optimization techniques like re-fetching memory instead of caching it.
> 
> For example, find_first_bit() is implemented like this:
> 
>   for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
>   val = addr[idx];
>   if (val) {
>   sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
>   break;
>   }
>   }
> 
> On register-memory architectures, like x86, compiler may decide to
> access memory twice - first time to compare against 0, and second time
> to fetch its value to pass it to __ffs().
> 
> When running find_first_bit() on volatile memory, the memory may get
> changed in-between, and for instance, it may lead to passing 0 to
> __ffs(), which is undefined. This is a potentially dangerous call.
> 
> find_and_clear_bit() as a wrapper around test_and_clear_bit()
> naturally treats underlying bitmap as a volatile memory and prevents
> compiler from such optimizations.
> 
> Now that KCSAN is catching exactly this type of situations and warns on
> undercover memory modifications. We can use it to reveal improper usage
> of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.
> 
> The 1st patch of the series adds the following atomic primitives:
> 
>   find_and_set_bit(addr, nbits);
>   find_and_set_next_bit(addr, nbits, start);
>   ...
> 
> Here find_and_{set,clear} part refers to the corresponding
> test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
> derive their semantics from corresponding find() or test() functions.
> 
> For brevity, the naming omits the fact that we search for zero bit in
> find_and_set, and correspondingly search for set bit in find_and_clear
> functions.
> 
> The patch also adds iterators with atomic semantics, like
> for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
> corresponding atomic operation with 'for_each'.
> 
> This series is a result of discussion [1]. All find_bit() functions imply
> exclusive access to the bitmaps. However, KCSAN reports quite a number
> of warnings related to find_bit() API. Some of them are not pointing
> to real bugs because in many situations people intentionally allow
> concurrent bitmap operations.
> 
> If so, find_bit() can be annotated such that KCSAN will ignore it:
> 
> bit = data_race(find_first_bit(bitmap, nbits));
> 
> This series addresses the other important case where people really need
> atomic find ops. As the following patches show, the resulting code
> looks safer and more verbose comparing to opencoded loops followed by
> atomic bit flips.
> 
> In [1] Mirsad reported 2% slowdown in a single-thread search test when
> switching find_bit() function to treat bitmaps as volatile arrays. On
> the other hand, kernel robot in the same thread reported +3.7% to the
> performance of will-it-scale.per_thread_ops test.
> 
> Assuming that our compilers are sane and generate better code against
> properly annotated data, the above discrepancy doesn't look weird. When
> running on non-volatile bitmaps, plain find_bit() outperforms atomic
> find_and_bit(), and vice-versa.

...

In some cases the better improvements can be achieved by switching
the (very) old code to utilise IDA framework.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 00/35] bitops: add atomic find_bit() operations

2023-12-04 Thread Jan Kara

Hello Yury!

On Sun 03-12-23 11:23:47, Yury Norov wrote:
> Add helpers around test_and_{set,clear}_bit() that allow to search for
> clear or set bits and flip them atomically.
> 
> The target patterns may look like this:
> 
>   for (idx = 0; idx < nbits; idx++)
>   if (test_and_clear_bit(idx, bitmap))
>   do_something(idx);
> 
> Or like this:
> 
>   do {
>   bit = find_first_bit(bitmap, nbits);
>   if (bit >= nbits)
>   return nbits;
>   } while (!test_and_clear_bit(bit, bitmap));
>   return bit;
> 
> In both cases, the opencoded loop may be converted to a single function
> or iterator call. Correspondingly:
> 
>   for_each_test_and_clear_bit(idx, bitmap, nbits)
>   do_something(idx);
> 
> Or:
>   return find_and_clear_bit(bitmap, nbits);

These are fine cleanups but they actually don't address the case that has
triggered all these changes - namely the xarray use of find_next_bit() in
xas_find_chunk().

...
> This series is a result of discussion [1]. All find_bit() functions imply
> exclusive access to the bitmaps. However, KCSAN reports quite a number
> of warnings related to find_bit() API. Some of them are not pointing
> to real bugs because in many situations people intentionally allow
> concurrent bitmap operations.
> 
> If so, find_bit() can be annotated such that KCSAN will ignore it:
> 
> bit = data_race(find_first_bit(bitmap, nbits));

No, this is not a correct thing to do. If concurrent bitmap changes can
happen, find_first_bit() as it is currently implemented isn't ever a safe
choice because it can call __ffs(0) which is dangerous as you properly note
above. I proposed adding READ_ONCE() into find_first_bit() / find_next_bit()
implementation to fix this issue but you disliked that. So other option we
have is adding find_first_bit() and find_next_bit() variants that take
volatile 'addr' and we have to use these in code like xas_find_chunk()
which cannot be converted to your new helpers.

> This series addresses the other important case where people really need
> atomic find ops. As the following patches show, the resulting code
> looks safer and more verbose comparing to opencoded loops followed by
> atomic bit flips.
> 
> In [1] Mirsad reported 2% slowdown in a single-thread search test when
> switching find_bit() function to treat bitmaps as volatile arrays. On
> the other hand, kernel robot in the same thread reported +3.7% to the
> performance of will-it-scale.per_thread_ops test.

It was actually me who reported the regression here [2] but whatever :)

[2] https://lore.kernel.org/all/20231011150252.32737-1-j...@suse.cz

> Assuming that our compilers are sane and generate better code against
> properly annotated data, the above discrepancy doesn't look weird. When
> running on non-volatile bitmaps, plain find_bit() outperforms atomic
> find_and_bit(), and vice-versa.
> 
> So, all users of find_bit() API, where heavy concurrency is expected,
> are encouraged to switch to atomic find_and_bit() as appropriate.

Well, all users where any concurrency can happen should switch. Otherwise
they are prone to the (admittedly mostly theoretical) data race issue.

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH v2 01/10] devm-helpers: introduce devm_mutex_init

2023-12-04 Thread Christophe Leroy



Le 04/12/2023 à 19:05, George Stark a écrit :
> Using of devm API leads to certain order of releasing resources.
> So all dependent resources which are not devm-wrapped should be deleted
> with respect to devm-release order. Mutex is one of such objects that
> often is bound to other resources and has no own devm wrapping.
> Since mutex_destroy() actually does nothing in non-debug builds
> frequently calling mutex_destroy() is just ignored which is safe for now
> but wrong formally and can lead to a problem if mutex_destroy() is
> extended so introduce devm_mutex_init().

This is not needed for patch 2. Should patch 2 go first ?

> 
> Signed-off-by: George Stark 
> ---
>   include/linux/devm-helpers.h | 18 ++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/include/linux/devm-helpers.h b/include/linux/devm-helpers.h
> index 74891802200d..2f56e476776f 100644
> --- a/include/linux/devm-helpers.h
> +++ b/include/linux/devm-helpers.h
> @@ -76,4 +76,22 @@ static inline int devm_work_autocancel(struct device *dev,
>   return devm_add_action(dev, devm_work_drop, w);
>   }
>   
> +static inline void devm_mutex_release(void *res)
> +{
> + mutex_destroy(res);
> +}
> +
> +/**
> + * devm_mutex_init - Resource-managed mutex initialization
> + * @dev: Device which lifetime work is bound to
> + * @lock:Pointer to a mutex
> + *
> + * Initialize mutex which is automatically destroyed when driver is detached.
> + */
> +static inline int devm_mutex_init(struct device *dev, struct mutex *lock)
> +{
> + mutex_init(lock);
> + return devm_add_action_or_reset(dev, devm_mutex_release, lock);
> +}
> +
>   #endif

Re: [PATCH v2 02/10] leds: aw2013: unlock mutex before destroying it

2023-12-04 Thread Christophe Leroy



Le 04/12/2023 à 19:05, George Stark a écrit :
> In the probe() callback in case of error mutex is destroyed being locked
> which is not allowed so unlock the mute before destroying.

Should there be a fixes: tag ? For instance on 59ea3c9faf32 ("leds: add 
aw2013 driver") ?

> 
> Signed-off-by: George Stark 
> ---
>   drivers/leds/leds-aw2013.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/leds/leds-aw2013.c b/drivers/leds/leds-aw2013.c
> index 59765640b70f..c2bc0782c0cd 100644
> --- a/drivers/leds/leds-aw2013.c
> +++ b/drivers/leds/leds-aw2013.c
> @@ -397,6 +397,7 @@ static int aw2013_probe(struct i2c_client *client)
>   regulator_disable(chip->vcc_regulator);
>   
>   error:
> + mutex_unlock(&chip->mutex);
>   mutex_destroy(&chip->mutex);
>   return ret;
>   }

Re: [PATCH v2 03/10] leds: aw2013: use devm API to cleanup module's resources

2023-12-04 Thread Christophe Leroy



Le 04/12/2023 à 19:05, George Stark a écrit :
> In this driver LEDs are registered using devm_led_classdev_register()
> so they are automatically unregistered after module's remove() is done.
> led_classdev_unregister() calls module's led_set_brightness() to turn off
> the LEDs and that callback uses resources which were destroyed already
> in module's remove() so use devm API instead of remove().
> 
> Signed-off-by: George Stark 
> ---
>   drivers/leds/leds-aw2013.c | 27 +++
>   1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/leds/leds-aw2013.c b/drivers/leds/leds-aw2013.c
> index c2bc0782c0cd..1a8acf303548 100644
> --- a/drivers/leds/leds-aw2013.c
> +++ b/drivers/leds/leds-aw2013.c
> @@ -1,6 +1,7 @@
>   // SPDX-License-Identifier: GPL-2.0+
>   // Driver for Awinic AW2013 3-channel LED driver
>   
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -318,6 +319,13 @@ static int aw2013_probe_dt(struct aw2013 *chip)
>   return 0;
>   }
>   
> +static void aw2013_chip_disable_action(void *data)
> +{
> + struct aw2013 *chip = (struct aw2013 *)data;

The cast is not needed because data is void*

And chip is not needed at all, you can pass data to 
aw2013_chip_disable() directly, without a cast either.

> +
> + aw2013_chip_disable(chip);
> +}
> +
>   static const struct regmap_config aw2013_regmap_config = {
>   .reg_bits = 8,
>   .val_bits = 8,
> @@ -334,7 +342,9 @@ static int aw2013_probe(struct i2c_client *client)
>   if (!chip)
>   return -ENOMEM;
>   
> - mutex_init(&chip->mutex);
> + if (devm_mutex_init(&client->dev, &chip->mutex))
> + return -ENOMEM;

Why not return the value returned by devm_mutex_init() ?

> +
>   mutex_lock(&chip->mutex);
>   
>   chip->client = client;
> @@ -378,6 +388,10 @@ static int aw2013_probe(struct i2c_client *client)
>   goto error_reg;
>   }
>   
> + ret = devm_add_action(&client->dev, aw2013_chip_disable_action, chip);
> + if (ret)
> + goto error_reg;
> +
>   ret = aw2013_probe_dt(chip);
>   if (ret < 0)
>   goto error_reg;
> @@ -398,19 +412,9 @@ static int aw2013_probe(struct i2c_client *client)
>   
>   error:
>   mutex_unlock(&chip->mutex);
> - mutex_destroy(&chip->mutex);
>   return ret;
>   }
>   
> -static void aw2013_remove(struct i2c_client *client)
> -{
> - struct aw2013 *chip = i2c_get_clientdata(client);
> -
> - aw2013_chip_disable(chip);
> -
> - mutex_destroy(&chip->mutex);
> -}
> -
>   static const struct of_device_id aw2013_match_table[] = {
>   { .compatible = "awinic,aw2013", },
>   { /* sentinel */ },
> @@ -424,7 +428,6 @@ static struct i2c_driver aw2013_driver = {
>   .of_match_table = of_match_ptr(aw2013_match_table),
>   },
>   .probe = aw2013_probe,
> - .remove = aw2013_remove,
>   };
>   
>   module_i2c_driver(aw2013_driver);

Re: [PATCH v2 04/10] leds: aw200xx: use devm API to cleanup module's resources

2023-12-04 Thread Christophe Leroy



Le 04/12/2023 à 19:05, George Stark a écrit :
> In this driver LEDs are registered using devm_led_classdev_register()
> so they are automatically unregistered after module's remove() is done.
> led_classdev_unregister() calls module's led_set_brightness() to turn off
> the LEDs and that callback uses resources which were destroyed already
> in module's remove() so use devm API instead of remove().
> 
> Signed-off-by: George Stark 
> ---
>   drivers/leds/leds-aw200xx.c | 36 +---
>   1 file changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/leds/leds-aw200xx.c b/drivers/leds/leds-aw200xx.c
> index 1d3943f86f7f..b1a097c7c879 100644
> --- a/drivers/leds/leds-aw200xx.c
> +++ b/drivers/leds/leds-aw200xx.c
> @@ -10,6 +10,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -530,6 +531,20 @@ static const struct regmap_config aw200xx_regmap_config 
> = {
>   .disable_locking = true,
>   };
>   
> +static void aw200xx_chip_reset_action(void *data)
> +{
> + const struct aw200xx *chip = (struct aw200xx *)data;
> +
> + aw200xx_chip_reset(chip);

Same as previous patch, no need to cast data and no need for chip at 
all, directly do aw200xx_chip_reset(data)

> +}
> +
> +static void aw200xx_disable_action(void *data)
> +{
> + const struct aw200xx *chip = (struct aw200xx *)data;
> +
> + aw200xx_disable(chip);

Same

> +}
> +
>   static int aw200xx_probe(struct i2c_client *client)
>   {
>   const struct aw200xx_chipdef *cdef;
> @@ -568,11 +583,16 @@ static int aw200xx_probe(struct i2c_client *client)
>   
>   aw200xx_enable(chip);
>   
> + ret = devm_add_action(&client->dev, aw200xx_disable_action, chip);
> + if (ret)
> + return ret;
> +
>   ret = aw200xx_chip_check(chip);
>   if (ret)
>   return ret;
>   
> - mutex_init(&chip->mutex);
> + if (devm_mutex_init(&client->dev, &chip->mutex))
> + return -ENOMEM;

Why not return value returned by dev_mutex_init() directly ?

>   
>   /* Need a lock now since after call aw200xx_probe_fw, sysfs nodes 
> created */
>   mutex_lock(&chip->mutex);
> @@ -581,6 +601,10 @@ static int aw200xx_probe(struct i2c_client *client)
>   if (ret)
>   goto out_unlock;
>   
> + ret = devm_add_action(&client->dev, aw200xx_chip_reset_action, chip);
> + if (ret)
> + goto out_unlock;
> +
>   ret = aw200xx_probe_fw(&client->dev, chip);
>   if (ret)
>   goto out_unlock;
> @@ -595,15 +619,6 @@ static int aw200xx_probe(struct i2c_client *client)
>   return ret;
>   }
>   
> -static void aw200xx_remove(struct i2c_client *client)
> -{
> - struct aw200xx *chip = i2c_get_clientdata(client);
> -
> - aw200xx_chip_reset(chip);
> - aw200xx_disable(chip);
> - mutex_destroy(&chip->mutex);
> -}
> -
>   static const struct aw200xx_chipdef aw20036_cdef = {
>   .channels = 36,
>   .display_size_rows_max = 3,
> @@ -652,7 +667,6 @@ static struct i2c_driver aw200xx_driver = {
>   .of_match_table = aw200xx_match_table,
>   },
>   .probe_new = aw200xx_probe,
> - .remove = aw200xx_remove,
>   .id_table = aw200xx_id,
>   };
>   module_i2c_driver(aw200xx_driver);

Re: [PATCH 1/6] x86: Use PCI_HEADER_TYPE_* instead of literals

2023-12-04 Thread Michael Ellerman

Bjorn Helgaas  writes:
> [+cc scsi, powerpc folks]
>
> On Fri, Dec 01, 2023 at 02:44:47PM -0600, Bjorn Helgaas wrote:
>> On Fri, Nov 24, 2023 at 11:09:13AM +0200, Ilpo Järvinen wrote:
>> > Replace 0x7f and 0x80 literals with PCI_HEADER_TYPE_* defines.
>> > 
>> > Signed-off-by: Ilpo Järvinen 
>> 
>> Applied entire series on the PCI "enumeration" branch for v6.8,
>> thanks!
>> 
>> If anybody wants to take pieces separately, let me know and I'll drop
>> from PCI.
>
> OK, b4 picked up the entire series but I was only cc'd on this first
> patch, so I missed the responses about EDAC, xtensa, bcma already
> being applied elsewhere.
>
> So I kept these in the PCI tree:
>
>   420ac76610d7 ("scsi: lpfc: Use PCI_HEADER_TYPE_MFD instead of literal")
>   3773343dd890 ("powerpc/fsl-pci: Use PCI_HEADER_TYPE_MASK instead of 
> literal")
>   197e0da1f1a3 ("x86/pci: Use PCI_HEADER_TYPE_* instead of literals")
>
> and dropped the others.
>
> x86, SCSI, powerpc folks, if you want to take these instead, let me
> know and I'll drop them.

Nah that's fine, you keep them.

cheers

[PATCH] MAINTAINERS: powerpc: Add Aneesh & Naveen

2023-12-04 Thread Michael Ellerman

Aneesh and Naveen are helping out with some aspects of upstream
maintenance, add them as reviewers.

Signed-off-by: Michael Ellerman 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ea790149af79..562d048863ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12240,6 +12240,8 @@ LINUX FOR POWERPC (32-BIT AND 64-BIT)
 M: Michael Ellerman 
 R: Nicholas Piggin 
 R: Christophe Leroy 
+R: Aneesh Kumar K.V 
+R: Naveen N. Rao 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
 W: https://github.com/linuxppc/wiki/wiki
-- 
2.43.0

[PATCH] MAINTAINERS: powerpc: Transfer PPC83XX to Christophe

2023-12-04 Thread Michael Ellerman

Christophe volunteered[1] to maintain PPC83XX.

1: https://lore.kernel.org/all/7b1bf4dc-d09d-35b8-f4df-16bf00429...@csgroup.eu/

Signed-off-by: Michael Ellerman 
---
 MAINTAINERS | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 562d048863ee..d4efe48cc36a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12287,21 +12287,21 @@ S:Orphan
 F: arch/powerpc/platforms/40x/
 F: arch/powerpc/platforms/44x/
 
-LINUX FOR POWERPC EMBEDDED PPC83XX AND PPC85XX
+LINUX FOR POWERPC EMBEDDED PPC85XX
 M: Scott Wood 
 L: linuxppc-dev@lists.ozlabs.org
 S: Odd fixes
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
 F: Documentation/devicetree/bindings/cache/freescale-l2cache.txt
 F: Documentation/devicetree/bindings/powerpc/fsl/
-F: arch/powerpc/platforms/83xx/
 F: arch/powerpc/platforms/85xx/
 
-LINUX FOR POWERPC EMBEDDED PPC8XX
+LINUX FOR POWERPC EMBEDDED PPC8XX AND PPC83XX
 M: Christophe Leroy 
 L: linuxppc-dev@lists.ozlabs.org
 S: Maintained
 F: arch/powerpc/platforms/8xx/
+F: arch/powerpc/platforms/83xx/
 
 LINUX KERNEL DUMP TEST MODULE (LKDTM)
 M: Kees Cook 
-- 
2.43.0

Re: [PATCH net-next v2 0/9] net*: Convert to platform remove callback returning void

2023-12-04 Thread Miquel Raynal

Hi Uwe,

u.kleine-koe...@pengutronix.de wrote on Mon,  4 Dec 2023 19:30:40 +0100:

> Hello,
> 
> (implicit) v1 of this series can be found at
> https://lore.kernel.org/netdev/20231117095922.876489-1-u.kleine-koe...@pengutronix.de.
> Changes since then:
> 
>  - Dropped patch #1 as Alex objected. Patch #1 (was #2 before) now
>converts ipa to remove_new() and introduces an error message in the
>error path that failed before.
> 
>  - Rebased to today's next
> 
>  - Add the tags received in the previous round.
> 
> Uwe Kleine-König (9):
>   net: ipa: Convert to platform remove callback returning void
>   net: fjes: Convert to platform remove callback returning void
>   net: pcs: rzn1-miic: Convert to platform remove callback returning
> void
>   net: sfp: Convert to platform remove callback returning void
>   net: wan/fsl_ucc_hdlc: Convert to platform remove callback returning
> void
>   net: wan/ixp4xx_hss: Convert to platform remove callback returning
> void
>   net: wwan: qcom_bam_dmux: Convert to platform remove callback
> returning void
>   ieee802154: fakelb: Convert to platform remove callback returning void
>   ieee802154: hwsim: Convert to platform remove callback returning void

FYI, I plan on taking patches 8 and 9 through wpan-next.

Thanks,
Miquèl

Re: [PATCH] perf vendor events: Update datasource event name to fix duplicate events

2023-12-04 Thread Athira Rajeev




> On 05-Dec-2023, at 1:42 AM, Ian Rogers  wrote:
> 
> On Thu, Nov 23, 2023 at 8:01 AM Athira Rajeev
>  wrote:
>> 
>> Running "perf list" on powerpc fails with segfault
>> as below:
>> 
>>   ./perf list
>>   Segmentation fault (core dumped)
>> 
>> This happens because of duplicate events in the json list.
>> The powerpc Json event list contains some event with same
>> event name, but different event code. They are:
>> - PM_INST_FROM_L3MISS (Present in datasource and frontend)
>> - PM_MRK_DATA_FROM_L2MISS (Present in datasource and marked)
>> - PM_MRK_INST_FROM_L3MISS (Present in datasource and marked)
>> - PM_MRK_DATA_FROM_L3MISS (Present in datasource and marked)
>> 
>> pmu_events_table__num_events uses the value from
>> table_pmu->num_entries which includes duplicate events as
>> well. This causes issue during "perf list" and results in
>> segmentation fault.
>> 
>> Since both event codes are valid, append _DSRC to the Data
>> Source events (datasource.json), so that they would have a
>> unique name. Also add PM_DATA_FROM_L2MISS_DSRC and
>> PM_DATA_FROM_L3MISS_DSRC events. With the fix, perf list
>> works as expected.
>> 
>> Fixes: fc1435807533 ("perf vendor events power10: Update JSON/events")
>> Signed-off-by: Athira Rajeev 
> 
> Given duplicate events creates broken pmu-events.c we should capture
> that as an exception in jevents.py. That way a JEVENTS_ARCH=all build
> will fail if any vendor/architecture would break in this way. We
> should also add JEVENTS_ARCH=all to tools/perf/tests/make. Athira, do
> you want to look at doing this?
> 
> Thanks,
> Ian

Hi Ian,

That’s a great suggestion. This will definitely help to capture the issues 
ahead.
I am interested and will work on adding this as part of tools/perf/tests/make

Thanks
Athira
> 
>> ---
>> .../arch/powerpc/power10/datasource.json   | 18 ++
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>> 
>> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json 
>> b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
>> index 6b0356f2d301..0eeaaf1a95b8 100644
>> --- a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
>> +++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
>> @@ -99,6 +99,11 @@
>> "EventName": "PM_INST_FROM_L2MISS",
>> "BriefDescription": "The processor's instruction cache was reloaded from 
>> a source beyond the local core's L2 due to a demand miss."
>>   },
>> +  {
>> +"EventCode": "0x0003C000C040",
>> +"EventName": "PM_DATA_FROM_L2MISS_DSRC",
>> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
>> source beyond the local core's L2 due to a demand miss."
>> +  },
>>   {
>> "EventCode": "0x00038010C040",
>> "EventName": "PM_INST_FROM_L2MISS_ALL",
>> @@ -161,9 +166,14 @@
>>   },
>>   {
>> "EventCode": "0x00078000C040",
>> -"EventName": "PM_INST_FROM_L3MISS",
>> +"EventName": "PM_INST_FROM_L3MISS_DSRC",
>> "BriefDescription": "The processor's instruction cache was reloaded from 
>> beyond the local core's L3 due to a demand miss."
>>   },
>> +  {
>> +"EventCode": "0x0007C000C040",
>> +"EventName": "PM_DATA_FROM_L3MISS_DSRC",
>> +"BriefDescription": "The processor's L1 data cache was reloaded from 
>> beyond the local core's L3 due to a demand miss."
>> +  },
>>   {
>> "EventCode": "0x00078010C040",
>> "EventName": "PM_INST_FROM_L3MISS_ALL",
>> @@ -981,7 +991,7 @@
>>   },
>>   {
>> "EventCode": "0x0003C000C142",
>> -"EventName": "PM_MRK_DATA_FROM_L2MISS",
>> +"EventName": "PM_MRK_DATA_FROM_L2MISS_DSRC",
>> "BriefDescription": "The processor's L1 data cache was reloaded from a 
>> source beyond the local core's L2 due to a demand miss for a marked 
>> instruction."
>>   },
>>   {
>> @@ -1046,12 +1056,12 @@
>>   },
>>   {
>> "EventCode": "0x00078000C142",
>> -"EventName": "PM_MRK_INST_FROM_L3MISS",
>> +"EventName": "PM_MRK_INST_FROM_L3MISS_DSRC",
>> "BriefDescription": "The processor's instruction cache was reloaded from 
>> beyond the local core's L3 due to a demand miss for a marked instruction."
>>   },
>>   {
>> "EventCode": "0x0007C000C142",
>> -"EventName": "PM_MRK_DATA_FROM_L3MISS",
>> +"EventName": "PM_MRK_DATA_FROM_L3MISS_DSRC",
>> "BriefDescription": "The processor's L1 data cache was reloaded from 
>> beyond the local core's L3 due to a demand miss for a marked instruction."
>>   },
>>   {
>> --
>> 2.39.3

Re: [PATCH] perf vendor events: Update datasource event name to fix duplicate events

2023-12-04 Thread Athira Rajeev




> On 05-Dec-2023, at 2:43 AM, Ian Rogers  wrote:
> 
> On Mon, Dec 4, 2023 at 12:22 PM Arnaldo Carvalho de Melo
>  wrote:
>> 
>> Em Mon, Dec 04, 2023 at 05:20:46PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Mon, Dec 04, 2023 at 12:12:54PM -0800, Ian Rogers escreveu:
 On Thu, Nov 23, 2023 at 8:01 AM Athira Rajeev
  wrote:
> 
> Running "perf list" on powerpc fails with segfault
> as below:
> 
>   ./perf list
>   Segmentation fault (core dumped)
> 
> This happens because of duplicate events in the json list.
> The powerpc Json event list contains some event with same
> event name, but different event code. They are:
> - PM_INST_FROM_L3MISS (Present in datasource and frontend)
> - PM_MRK_DATA_FROM_L2MISS (Present in datasource and marked)
> - PM_MRK_INST_FROM_L3MISS (Present in datasource and marked)
> - PM_MRK_DATA_FROM_L3MISS (Present in datasource and marked)
> 
> pmu_events_table__num_events uses the value from
> table_pmu->num_entries which includes duplicate events as
> well. This causes issue during "perf list" and results in
> segmentation fault.
> 
> Since both event codes are valid, append _DSRC to the Data
> Source events (datasource.json), so that they would have a
> unique name. Also add PM_DATA_FROM_L2MISS_DSRC and
> PM_DATA_FROM_L3MISS_DSRC events. With the fix, perf list
> works as expected.
> 
> Fixes: fc1435807533 ("perf vendor events power10: Update JSON/events")
> Signed-off-by: Athira Rajeev 
 
 Given duplicate events creates broken pmu-events.c we should capture
 that as an exception in jevents.py. That way a JEVENTS_ARCH=all build
 will fail if any vendor/architecture would break in this way. We
 should also add JEVENTS_ARCH=all to tools/perf/tests/make. Athira, do
 you want to look at doing this?
>>> 
>>> Should I go ahead and remove this patch till this is sorted out?
>> 
>> I'll keep it, its already in tmp.perf-tools-next, we can go from there
>> and improve this with follow up patches,

Thanks Arnaldo for pulling the fix patch.

> 
> Agreed. I could look to do the follow up but likely won't have a
> chance for a while. If others could help out it would be great. I'd
> like to have the jevents and json be robust enough that we don't trip
> over problems like this and the somewhat similar AmpereOne issue.

Yes Ian.

I will look at adding this with follow up patches and including this as part of 
tools/perf/tests/make

Thanks
Athira
> 
> Thanks,
> Ian
> 
>> - Arnaldo

Re: [PATCH net-next v2 0/9] net*: Convert to platform remove callback returning void

2023-12-04 Thread Uwe Kleine-König

Hello Miquel,

On Tue, Dec 05, 2023 at 07:51:10AM +0100, Miquel Raynal wrote:
> u.kleine-koe...@pengutronix.de wrote on Mon,  4 Dec 2023 19:30:40 +0100:
> > (implicit) v1 of this series can be found at
> > https://lore.kernel.org/netdev/20231117095922.876489-1-u.kleine-koe...@pengutronix.de.
> > Changes since then:
> > 
> >  - Dropped patch #1 as Alex objected. Patch #1 (was #2 before) now
> >converts ipa to remove_new() and introduces an error message in the
> >error path that failed before.
> > 
> >  - Rebased to today's next
> > 
> >  - Add the tags received in the previous round.
> > 
> > Uwe Kleine-König (9):
> >   net: ipa: Convert to platform remove callback returning void
> >   net: fjes: Convert to platform remove callback returning void
> >   net: pcs: rzn1-miic: Convert to platform remove callback returning
> > void
> >   net: sfp: Convert to platform remove callback returning void
> >   net: wan/fsl_ucc_hdlc: Convert to platform remove callback returning
> > void
> >   net: wan/ixp4xx_hss: Convert to platform remove callback returning
> > void
> >   net: wwan: qcom_bam_dmux: Convert to platform remove callback
> > returning void
> >   ieee802154: fakelb: Convert to platform remove callback returning void
> >   ieee802154: hwsim: Convert to platform remove callback returning void
> 
> FYI, I plan on taking patches 8 and 9 through wpan-next.

I forgot to mention explicitly that there are no interdependencies in
this series. So each maintainer picking up up their patches is fine.

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature

Re: [PATCH net-next v2 0/9] net*: Convert to platform remove callback returning void

2023-12-04 Thread Miquel Raynal

Hello Uwe,

u.kleine-koe...@pengutronix.de wrote on Tue, 5 Dec 2023 08:39:11 +0100:

> Hello Miquel,
> 
> On Tue, Dec 05, 2023 at 07:51:10AM +0100, Miquel Raynal wrote:
> > u.kleine-koe...@pengutronix.de wrote on Mon,  4 Dec 2023 19:30:40 +0100:  
> > > (implicit) v1 of this series can be found at
> > > https://lore.kernel.org/netdev/20231117095922.876489-1-u.kleine-koe...@pengutronix.de.
> > > Changes since then:
> > > 
> > >  - Dropped patch #1 as Alex objected. Patch #1 (was #2 before) now
> > >converts ipa to remove_new() and introduces an error message in the
> > >error path that failed before.
> > > 
> > >  - Rebased to today's next
> > > 
> > >  - Add the tags received in the previous round.
> > > 
> > > Uwe Kleine-König (9):
> > >   net: ipa: Convert to platform remove callback returning void
> > >   net: fjes: Convert to platform remove callback returning void
> > >   net: pcs: rzn1-miic: Convert to platform remove callback returning
> > > void
> > >   net: sfp: Convert to platform remove callback returning void
> > >   net: wan/fsl_ucc_hdlc: Convert to platform remove callback returning
> > > void
> > >   net: wan/ixp4xx_hss: Convert to platform remove callback returning
> > > void
> > >   net: wwan: qcom_bam_dmux: Convert to platform remove callback
> > > returning void
> > >   ieee802154: fakelb: Convert to platform remove callback returning void
> > >   ieee802154: hwsim: Convert to platform remove callback returning void  
> > 
> > FYI, I plan on taking patches 8 and 9 through wpan-next.  
> 
> I forgot to mention explicitly that there are no interdependencies in
> this series. So each maintainer picking up up their patches is fine.

Yes, no problem, it was quick to figure out.

Thanks,
Miquèl

54 matches

Mail list logo