Re: [PATCH v11 0/4] Support SMT control on arm64

2025-03-03 Thread Yicong Yang
On 2025/2/28 19:12, Dietmar Eggemann wrote:
> On 18/02/2025 15:10, Yicong Yang wrote:
>> From: Yicong Yang 
>>
>> The core CPU control framework supports runtime SMT control which
>> is not yet supported on arm64. Besides the general vulnerabilities
>> concerns we want this runtime control on our arm64 server for:
>>
>> - better single CPU performance in some cases
>> - saving overall power consumption
>>
>> This patchset implements it in the following aspects:
>>
>> - Provides a default topology_is_primary_thread()
>> - support retrieve SMT thread number on OF based system
>> - support retrieve SMT thread number on ACPI based system
>> - select HOTPLUG_SMT for arm64
>>
>> Tests has been done on our ACPI based arm64 server and on ACPI/OF
>> based QEMU VMs.
> 
> [...]
> 
>> Yicong Yang (4):
>>   cpu/SMT: Provide a default topology_is_primary_thread()
>>   arch_topology: Support SMT control for OF based system
>>   arm64: topology: Support SMT control on ACPI based system
>>   arm64: Kconfig: Enable HOTPLUG_SMT
>>
>>  arch/arm64/Kconfig  |  1 +
>>  arch/arm64/kernel/topology.c| 66 +
>>  arch/powerpc/include/asm/topology.h |  1 +
>>  arch/x86/include/asm/topology.h |  2 +-
>>  drivers/base/arch_topology.c| 27 
>>  include/linux/topology.h| 22 ++
>>  6 files changed, 118 insertions(+), 1 deletion(-)
> 
> With the review comments on the individual patches [0-3]/4:

will fix.

> 
> Reviewed-by: Dietmar Eggemann 
> 

Thanks.




[PATCH 02/19] arm64: Make asm/cache.h compatible with vDSO

2025-03-03 Thread Thomas Weißschuh
asm/cache.h can be used during the vDSO build through vdso/cache.h.
Not all definitions in it are compatible with the vDSO, especially the
compat vDSO.
Hide the more complex definitions from the vDSO build.

Signed-off-by: Thomas Weißschuh 
---
 arch/arm64/include/asm/cache.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index 
06a4670bdb0b9b7552d553cee3cc70a6e15b2b93..99cd6546e72e35cfbceec7ce0a0f64498dfadd38
 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -35,7 +35,7 @@
 #define ARCH_DMA_MINALIGN  (128)
 #define ARCH_KMALLOC_MINALIGN  (8)
 
-#ifndef __ASSEMBLY__
+#if !defined(__ASSEMBLY__) && !defined(BUILD_VDSO)
 
 #include 
 #include 
@@ -118,6 +118,6 @@ static inline u32 __attribute_const__ 
read_cpuid_effective_cachetype(void)
return ctr;
 }
 
-#endif /* __ASSEMBLY__ */
+#endif /* !defined(__ASSEMBLY__) && !defined(BUILD_VDSO) */
 
 #endif

-- 
2.48.1




Re: [PATCH v3] dt-bindings: dma: Convert fsl,elo*-dma to YAML

2025-03-03 Thread Rob Herring
On Wed, Feb 26, 2025 at 11:29:54AM -0600, Rob Herring (Arm) wrote:
> 
> On Wed, 26 Feb 2025 16:57:17 +0100, J. Neuschäfer wrote:
> > The devicetree bindings for Freescale DMA engines have so far existed as
> > a text file. This patch converts them to YAML, and specifies all the
> > compatible strings currently in use in arch/powerpc/boot/dts.
> > 
> > Signed-off-by: J. Neuschäfer 
> > ---
> > I considered referencing dma-controller.yaml, but that requires
> > the #dma-cells property (via dma-common.yaml), and I'm now sure which
> > value it should have, if any. Therefore I did not reference
> > dma-controller.yaml.
> > 
> > V3:
> > - split out as a single patch
> > - restructure "description" definitions to use "items:" as much as possible
> > - remove useless description of interrupts in fsl,elo3-dma
> > - rename DMA controller nodes to dma-controller@...
> > - use IRQ_TYPE_* constants in examples
> > - define unit address format for DMA channel nodes
> > - drop interrupts-parent properties from examples
> > 
> > V2:
> > - part of series [PATCH v2 00/12] YAML conversion of several 
> > Freescale/PowerPC DT bindings
> >   Link: 
> > https://lore.kernel.org/lkml/20250207-ppcyaml-v2-5-8137b0c42...@posteo.net/
> > - remove unnecessary multiline markers
> > - fix additionalProperties to always be false
> > - add description/maxItems to interrupts
> > - add missing #address-cells/#size-cells properties
> > - convert "Note on DMA channel compatible properties" to YAML by listing
> >   fsl,ssi-dma-channel as a valid compatible value
> > - fix property ordering in examples: compatible and reg come first
> > - add missing newlines in examples
> > - trim subject line (remove "bindings")
> > ---
> >  .../devicetree/bindings/dma/fsl,elo-dma.yaml   | 137 ++
> >  .../devicetree/bindings/dma/fsl,elo3-dma.yaml  | 125 +
> >  .../devicetree/bindings/dma/fsl,eloplus-dma.yaml   | 132 +
> >  .../devicetree/bindings/powerpc/fsl/dma.txt| 204 
> > -
> >  4 files changed, 394 insertions(+), 204 deletions(-)
> > 
> 
> My bot found errors running 'make dt_binding_check' on your patch:
> 
> yamllint warnings/errors:
> 
> dtschema/dtc warnings/errors:
> /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/dma/fsl,elo-dma.example.dtb:
>  dma-controller@82a8: '#dma-cells' is a required property
>   from schema $id: http://devicetree.org/schemas/dma/dma-controller.yaml#
> /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/dma/fsl,eloplus-dma.example.dtb:
>  dma-controller@21300: '#dma-cells' is a required property
>   from schema $id: http://devicetree.org/schemas/dma/dma-controller.yaml#
> /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/dma/fsl,elo3-dma.example.dtb:
>  dma-controller@100300: '#dma-cells' is a required property
>   from schema $id: http://devicetree.org/schemas/dma/dma-controller.yaml#

Just stick with 'dma' for node name as that's what .dts files are using 
and 'dma-controller' is reserved for users of DMA provider binding.

Rob



Re: [PATCH v9 19/20] fs/dax: Properly refcount fs dax pages

2025-03-03 Thread David Hildenbrand




-static inline unsigned long dax_folio_share_put(struct folio *folio)
+static inline unsigned long dax_folio_put(struct folio *folio)
  {
-   return --folio->page.share;
+   unsigned long ref;
+   int order, i;
+
+   if (!dax_folio_is_shared(folio))
+   ref = 0;
+   else
+   ref = --folio->share;
+


It would still be good to learn how this non-atomic update here is safe 
(@Dan?), but that's independent of this series.


Staring at it, I would have thought we have to us an atomic_t here.

Acked-by: David Hildenbrand 

--
Cheers,

David / dhildenb




Re: [PATCH] powerpc: Don't use %pK through printk

2025-03-03 Thread Thomas Weißschuh
On Fri, Feb 28, 2025 at 08:15:02PM +, Maciej W. Rozycki wrote:
> On Wed, 26 Feb 2025, Thomas Weißschuh wrote:
> 
> > > > By default, when kptr_restrict is set to 0, %pK behaves the same as %p.
> > > > The same happened for a bunch of other architectures and nobody seems
> > > > to have noticed in the past.
> > > > The symbol-relative pointers or pointer formats designed for backtraces,
> > > > as notes by Christophe, seem to be enough.
> > > 
> > >  I do hope so.
> > 
> > As mentioned before, personally I am fine with using %px here.
> 
>  Glad to hear!
> 
> > The values are in the register dumps anyways and security sensitive 
> > deployments
> > will panic on WARN(), making the information disclosure useless.
> 
>  And even more so, I wasn't aware of this feature.  But this code doesn't 
> make use of the WARN() facility, it just prints at the heightened KERN_ERR 
> priority.

Indeed, I got confused with some other patches where WARN() is used mostly.
This makes it a bit murkier.

> > > > But personally I'm also fine with using %px, as my goal is to remove the
> > > > error-prone and confusing %pK.
> > > 
> > >  It's clear that `%pK' was meant to restrict access to /proc files and 
> > > the 
> > > like that may be accessible by unprivileged users:
> > 
> > Then let's stop abusing it. For something that is clear, it is
> > misunderstood very often.
> 
>  Absolutely, I haven't questioned the removal of `%pK', but the switch to 
> `%p' rather than `%px' specifically for this single hunk of your patch.

Sure. It would be great if one of the maintainers could confirm this preference.

> > > "
> > > kptr_restrict
> > > =
> > > 
> > > This toggle indicates whether restrictions are placed on
> > > exposing kernel addresses via ``/proc`` and other interfaces.
> > > "
> > > 
> > > and not the kernel log, the information in which may come from rare 
> > > events 
> > > that are difficult to trigger and hard to recover via other means.  Sigh. 
> > > Once you've got access to the kernel log, you may as well wipe the system 
> > > or do any other harm you might like.
> > 
> > As I understand it, both the security and printk maintainers don't want the
> > kernel log in general to be security sensitive and restricted.
> > My goal here is not to push site-specific policy into the kernel but make 
> > life
> > easier for kernel developers by removing the confusing and error-prone %pK
> > altogether.
> 
>  Let me ask a different question then: is your approach to bulk-switch all 
> instances of `%pK' to `%p' as the safe default and let other people figure 
> out afterwards whether a different conversion specifier ought to be used 
> instead on a case-by-case basis and then follow up with another patch, or 
> will you consider these alternatives right away?

I am considering on a case-by-case basis. But mostly the decision is that %p is
enough, because by default %pK has been the same as %p anyways.
Also the current wave of replacements does not touch valid users of %pK.
They will stay and later be replaced with a new and better API.

> > Security is only one aspect.
> 
>  I think it's important enough though for us to ensure we don't compromise 
> it by chance.

Agreed.



[PATCH 10/19] vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. At the moment, vdso_clock
is simply a define which maps vdso_clock to vdso_time_data.

Prepare for the rework of these structures by adding struct vdso_clock
pointer argument to do_coarse_timens(), and replace the struct
vdso_time_data pointer with the new pointer arugment whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 lib/vdso/gettimeofday.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 
36ef7de097e6137832605928a155a0ff78123fb4..03fa0393645ac0f5ee465ddc19d84b330913da65
 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -193,21 +193,25 @@ int do_hres(const struct vdso_time_data *vd, const struct 
vdso_clock *vc,
 }
 
 #ifdef CONFIG_TIME_NS
-static __always_inline int do_coarse_timens(const struct vdso_time_data *vdns, 
clockid_t clk,
-   struct __kernel_timespec *ts)
+static __always_inline
+int do_coarse_timens(const struct vdso_time_data *vdns, const struct 
vdso_clock *vcns,
+clockid_t clk, struct __kernel_timespec *ts)
 {
const struct vdso_time_data *vd = __arch_get_vdso_u_timens_data(vdns);
-   const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
-   const struct timens_offset *offs = &vdns->offset[clk];
+   const struct timens_offset *offs = &vcns->offset[clk];
+   const struct vdso_timestamp *vdso_ts;
+   const struct vdso_clock *vc = vd;
u64 nsec;
s64 sec;
s32 seq;
 
+   vdso_ts = &vc->basetime[clk];
+
do {
-   seq = vdso_read_begin(vd);
+   seq = vdso_read_begin(vc);
sec = vdso_ts->sec;
nsec = vdso_ts->nsec;
-   } while (unlikely(vdso_read_retry(vd, seq)));
+   } while (unlikely(vdso_read_retry(vc, seq)));
 
/* Add the namespace offset */
sec += offs->sec;
@@ -222,8 +226,9 @@ static __always_inline int do_coarse_timens(const struct 
vdso_time_data *vdns, c
return 0;
 }
 #else
-static __always_inline int do_coarse_timens(const struct vdso_time_data *vdns, 
clockid_t clk,
-   struct __kernel_timespec *ts)
+static __always_inline
+int do_coarse_timens(const struct vdso_time_data *vdns, const struct 
vdso_clock *vcns,
+clockid_t clk, struct __kernel_timespec *ts)
 {
return -1;
 }
@@ -244,7 +249,7 @@ int do_coarse(const struct vdso_time_data *vd, const struct 
vdso_clock *vc,
while ((seq = READ_ONCE(vc->seq)) & 1) {
if (IS_ENABLED(CONFIG_TIME_NS) &&
vc->clock_mode == VDSO_CLOCKMODE_TIMENS)
-   return do_coarse_timens(vc, clk, ts);
+   return do_coarse_timens(vd, vc, clk, ts);
cpu_relax();
}
smp_rmb();

-- 
2.48.1




[PATCH 05/19] vdso/helpers: Prepare introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

Prepare all functions which need the pointer to the vdso_clock array to
work well after the structures get reworked. Replace struct vdso_time_data
pointer with struct vdso_clock pointer whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 include/vdso/helpers.h | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/vdso/helpers.h b/include/vdso/helpers.h
index 
41c3087070c7ab21d7adec04e6cd30c4b32ea221..28f0707a46c62187ad7500543e169f5b99deee70
 100644
--- a/include/vdso/helpers.h
+++ b/include/vdso/helpers.h
@@ -7,49 +7,53 @@
 #include 
 #include 
 
-static __always_inline u32 vdso_read_begin(const struct vdso_time_data *vd)
+static __always_inline u32 vdso_read_begin(const struct vdso_clock *vc)
 {
u32 seq;
 
-   while (unlikely((seq = READ_ONCE(vd->seq)) & 1))
+   while (unlikely((seq = READ_ONCE(vc->seq)) & 1))
cpu_relax();
 
smp_rmb();
return seq;
 }
 
-static __always_inline u32 vdso_read_retry(const struct vdso_time_data *vd,
+static __always_inline u32 vdso_read_retry(const struct vdso_clock *vc,
   u32 start)
 {
u32 seq;
 
smp_rmb();
-   seq = READ_ONCE(vd->seq);
+   seq = READ_ONCE(vc->seq);
return seq != start;
 }
 
 static __always_inline void vdso_write_begin(struct vdso_time_data *vd)
 {
+   struct vdso_clock *vc = vd;
+
/*
 * WRITE_ONCE() is required otherwise the compiler can validly tear
 * updates to vd[x].seq and it is possible that the value seen by the
 * reader is inconsistent.
 */
-   WRITE_ONCE(vd[CS_HRES_COARSE].seq, vd[CS_HRES_COARSE].seq + 1);
-   WRITE_ONCE(vd[CS_RAW].seq, vd[CS_RAW].seq + 1);
+   WRITE_ONCE(vc[CS_HRES_COARSE].seq, vc[CS_HRES_COARSE].seq + 1);
+   WRITE_ONCE(vc[CS_RAW].seq, vc[CS_RAW].seq + 1);
smp_wmb();
 }
 
 static __always_inline void vdso_write_end(struct vdso_time_data *vd)
 {
+   struct vdso_clock *vc = vd;
+
smp_wmb();
/*
 * WRITE_ONCE() is required otherwise the compiler can validly tear
 * updates to vd[x].seq and it is possible that the value seen by the
 * reader is inconsistent.
 */
-   WRITE_ONCE(vd[CS_HRES_COARSE].seq, vd[CS_HRES_COARSE].seq + 1);
-   WRITE_ONCE(vd[CS_RAW].seq, vd[CS_RAW].seq + 1);
+   WRITE_ONCE(vc[CS_HRES_COARSE].seq, vc[CS_HRES_COARSE].seq + 1);
+   WRITE_ONCE(vc[CS_RAW].seq, vc[CS_RAW].seq + 1);
 }
 
 #endif /* !__ASSEMBLY__ */

-- 
2.48.1




[PATCH 06/19] vdso/gettimeofday: Prepare introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. At the moment, vdso_clock
is simply a define which maps vdso_clock to vdso_time_data.

Prepare all functions which need the pointer to the vdso_clock array to
work well after introducing the new struct. Whenever applicable, struct
vdso_time_data pointer is replaced by struct vdso_clock pointer.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 lib/vdso/gettimeofday.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 
299f027116ee0e50a69c5a8a17218004e4af0ea1..59369a4e9f25f937eb8d9aed3201ebd340097a9d
 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -257,6 +257,7 @@ static __always_inline int
 __cvdso_clock_gettime_common(const struct vdso_time_data *vd, clockid_t clock,
 struct __kernel_timespec *ts)
 {
+   const struct vdso_clock *vc = vd;
u32 msk;
 
/* Check for negative values or invalid clocks */
@@ -269,15 +270,15 @@ __cvdso_clock_gettime_common(const struct vdso_time_data 
*vd, clockid_t clock,
 */
msk = 1U << clock;
if (likely(msk & VDSO_HRES))
-   vd = &vd[CS_HRES_COARSE];
+   vc = &vc[CS_HRES_COARSE];
else if (msk & VDSO_COARSE)
-   return do_coarse(&vd[CS_HRES_COARSE], clock, ts);
+   return do_coarse(&vc[CS_HRES_COARSE], clock, ts);
else if (msk & VDSO_RAW)
-   vd = &vd[CS_RAW];
+   vc = &vc[CS_RAW];
else
return -1;
 
-   return do_hres(vd, clock, ts);
+   return do_hres(vc, clock, ts);
 }
 
 static __maybe_unused int
@@ -328,11 +329,12 @@ static __maybe_unused int
 __cvdso_gettimeofday_data(const struct vdso_time_data *vd,
  struct __kernel_old_timeval *tv, struct timezone *tz)
 {
+   const struct vdso_clock *vc = vd;
 
if (likely(tv != NULL)) {
struct __kernel_timespec ts;
 
-   if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts))
+   if (do_hres(&vc[CS_HRES_COARSE], CLOCK_REALTIME, &ts))
return gettimeofday_fallback(tv, tz);
 
tv->tv_sec = ts.tv_sec;
@@ -341,7 +343,7 @@ __cvdso_gettimeofday_data(const struct vdso_time_data *vd,
 
if (unlikely(tz != NULL)) {
if (IS_ENABLED(CONFIG_TIME_NS) &&
-   vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
+   vc->clock_mode == VDSO_CLOCKMODE_TIMENS)
vd = __arch_get_vdso_u_timens_data(vd);
 
tz->tz_minuteswest = vd[CS_HRES_COARSE].tz_minuteswest;
@@ -361,13 +363,16 @@ __cvdso_gettimeofday(struct __kernel_old_timeval *tv, 
struct timezone *tz)
 static __maybe_unused __kernel_old_time_t
 __cvdso_time_data(const struct vdso_time_data *vd, __kernel_old_time_t *time)
 {
+   const struct vdso_clock *vc = vd;
__kernel_old_time_t t;
 
if (IS_ENABLED(CONFIG_TIME_NS) &&
-   vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
+   vc->clock_mode == VDSO_CLOCKMODE_TIMENS) {
vd = __arch_get_vdso_u_timens_data(vd);
+   vc = vd;
+   }
 
-   t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
+   t = READ_ONCE(vc[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
 
if (time)
*time = t;
@@ -386,6 +391,7 @@ static __maybe_unused
 int __cvdso_clock_getres_common(const struct vdso_time_data *vd, clockid_t 
clock,
struct __kernel_timespec *res)
 {
+   const struct vdso_clock *vc = vd;
u32 msk;
u64 ns;
 
@@ -394,7 +400,7 @@ int __cvdso_clock_getres_common(const struct vdso_time_data 
*vd, clockid_t clock
return -1;
 
if (IS_ENABLED(CONFIG_TIME_NS) &&
-   vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
+   vc->clock_mode == VDSO_CLOCKMODE_TIMENS)
vd = __arch_get_vdso_u_timens_data(vd);
 
/*

-- 
2.48.1




[PATCH 00/19] vdso: Rework struct vdso_time_data and introduce struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be an array of it.

This series is based on and intended to be merged through tip/timers/vdso.

Signed-off-by: Thomas Weißschuh 
---
Anna-Maria Behnsen (15):
  vdso: Make vdso_time_data cacheline aligned
  vdso/datapage: Define for vdso_data to make rework of vdso possible
  vdso/helpers: Prepare introduction of struct vdso_clock
  vdso/gettimeofday: Prepare introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct 
vdso_clock
  vdso/gettimeofday: Prepare do_coarse() for introduction of struct 
vdso_clock
  vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct 
vdso_clock
  vdso/gettimeofday: Prepare helper functions for introduction of struct 
vdso_clock
  vdso/vsyscall: Prepare introduction of struct vdso_clock
  vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock 
struct
  time/namespace: Prepare introduction of struct vdso_clock
  x86/vdso: Prepare introduction of struct vdso_clock
  vdso: Move arch related data before basetime
  vdso: Rework struct vdso_time_data and introduce struct vdso_clock

Nam Cao (2):
  arm64/vdso: Prepare introduction of struct vdso_clock
  powerpc/vdso: Prepare introduction of struct vdso_clock

Thomas Weißschuh (2):
  vdso: Introduce vdso/cache.h
  arm64: Make asm/cache.h compatible with vDSO

 arch/arm64/include/asm/cache.h|   4 +-
 arch/arm64/include/asm/vdso/compat_gettimeofday.h |   6 +-
 arch/arm64/include/asm/vdso/vsyscall.h|   4 +-
 arch/powerpc/include/asm/vdso/gettimeofday.h  |   2 +-
 arch/s390/kernel/time.c   |  11 +-
 arch/x86/include/asm/vdso/gettimeofday.h  |  16 +--
 include/asm-generic/vdso/vsyscall.h   |   2 +-
 include/linux/cache.h |   9 +-
 include/vdso/cache.h  |  15 +++
 include/vdso/datapage.h   |  48 ---
 include/vdso/helpers.h|  20 +--
 kernel/time/namespace.c   |  20 +--
 kernel/time/vsyscall.c|  47 +++
 lib/vdso/datastore.c  |   6 +-
 lib/vdso/gettimeofday.c   | 146 --
 15 files changed, 196 insertions(+), 160 deletions(-)
---
base-commit: ac1a42f4e4e296b5ba5fdb39444f65d6e5196240
change-id: 20250224-vdso-clock-f10f017c4b80

Best regards,
-- 
Thomas Weißschuh 




[PATCH 07/19] vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

Prepare for the rework of these structures by adding struct vdso_clock
pointer argument to do_hres(), and replace the struct vdso_time_data
pointer with the new pointer arugment whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 lib/vdso/gettimeofday.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 
59369a4e9f25f937eb8d9aed3201ebd340097a9d..15611ab650232f2e847b7de80c7293c4fb7f84f2
 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -139,10 +139,11 @@ static __always_inline int do_hres_timens(const struct 
vdso_time_data *vdns, clo
 }
 #endif
 
-static __always_inline int do_hres(const struct vdso_time_data *vd, clockid_t 
clk,
-  struct __kernel_timespec *ts)
+static __always_inline
+int do_hres(const struct vdso_time_data *vd, const struct vdso_clock *vc,
+   clockid_t clk, struct __kernel_timespec *ts)
 {
-   const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
+   const struct vdso_timestamp *vdso_ts = &vc->basetime[clk];
u64 cycles, sec, ns;
u32 seq;
 
@@ -154,31 +155,31 @@ static __always_inline int do_hres(const struct 
vdso_time_data *vd, clockid_t cl
/*
 * Open coded function vdso_read_begin() to handle
 * VDSO_CLOCKMODE_TIMENS. Time namespace enabled tasks have a
-* special VVAR page installed which has vd->seq set to 1 and
-* vd->clock_mode set to VDSO_CLOCKMODE_TIMENS. For non time
+* special VVAR page installed which has vc->seq set to 1 and
+* vc->clock_mode set to VDSO_CLOCKMODE_TIMENS. For non time
 * namespace affected tasks this does not affect performance
-* because if vd->seq is odd, i.e. a concurrent update is in
-* progress the extra check for vd->clock_mode is just a few
-* extra instructions while spin waiting for vd->seq to become
+* because if vc->seq is odd, i.e. a concurrent update is in
+* progress the extra check for vc->clock_mode is just a few
+* extra instructions while spin waiting for vc->seq to become
 * even again.
 */
-   while (unlikely((seq = READ_ONCE(vd->seq)) & 1)) {
+   while (unlikely((seq = READ_ONCE(vc->seq)) & 1)) {
if (IS_ENABLED(CONFIG_TIME_NS) &&
-   vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
+   vc->clock_mode == VDSO_CLOCKMODE_TIMENS)
return do_hres_timens(vd, clk, ts);
cpu_relax();
}
smp_rmb();
 
-   if (unlikely(!vdso_clocksource_ok(vd)))
+   if (unlikely(!vdso_clocksource_ok(vc)))
return -1;
 
-   cycles = __arch_get_hw_counter(vd->clock_mode, vd);
+   cycles = __arch_get_hw_counter(vc->clock_mode, vd);
if (unlikely(!vdso_cycles_ok(cycles)))
return -1;
-   ns = vdso_calc_ns(vd, cycles, vdso_ts->nsec);
+   ns = vdso_calc_ns(vc, cycles, vdso_ts->nsec);
sec = vdso_ts->sec;
-   } while (unlikely(vdso_read_retry(vd, seq)));
+   } while (unlikely(vdso_read_retry(vc, seq)));
 
/*
 * Do this outside the loop: a race inside the loop could result
@@ -278,7 +279,7 @@ __cvdso_clock_gettime_common(const struct vdso_time_data 
*vd, clockid_t clock,
else
return -1;
 
-   return do_hres(vc, clock, ts);
+   return do_hres(vd, vc, clock, ts);
 }
 
 static __maybe_unused int
@@ -334,7 +335,7 @@ __cvdso_gettimeofday_data(const struct vdso_time_data *vd,
if (likely(tv != NULL)) {
struct __kernel_timespec ts;
 
-   if (do_hres(&vc[CS_HRES_COARSE], CLOCK_REALTIME, &ts))
+   if (do_hres(vd, &vc[CS_HRES_COARSE], CLOCK_REALTIME, &ts))
return gettimeofday_fallback(tv, tz);
 
tv->tv_sec = ts.tv_sec;

-- 
2.48.1




[PATCH 13/19] vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it.

For time namespace, vdso_time_data needs to be set up. But this is only the
clock related part of the vdso_data thats requires this setup. To reflect
the future struct vdso_clock, rename timens_setup_vdso_data() to
timns_setup_vdso_clock_data().

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 kernel/time/namespace.c | 6 +++---
 lib/vdso/datastore.c| 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c
index 
12f55aa539adbc11cce4055f519dbeca8a73320c..f02430a73be8f081618792c8968bf0c112c54505
 100644
--- a/kernel/time/namespace.c
+++ b/kernel/time/namespace.c
@@ -176,8 +176,8 @@ static struct timens_offset offset_from_ts(struct 
timespec64 off)
  * Timens page has vdso_time_data->clock_mode set to VDSO_CLOCKMODE_TIMENS 
which
  * enforces the time namespace handling path.
  */
-static void timens_setup_vdso_data(struct vdso_time_data *vdata,
-  struct time_namespace *ns)
+static void timens_setup_vdso_clock_data(struct vdso_time_data *vdata,
+struct time_namespace *ns)
 {
struct timens_offset *offset = vdata->offset;
struct timens_offset monotonic = offset_from_ts(ns->offsets.monotonic);
@@ -238,7 +238,7 @@ static void timens_set_vvar_page(struct task_struct *task,
vdata = page_address(ns->vvar_page);
 
for (i = 0; i < CS_BASES; i++)
-   timens_setup_vdso_data(&vdata[i], ns);
+   timens_setup_vdso_clock_data(&vdata[i], ns);
 
 out:
mutex_unlock(&offset_lock);
diff --git a/lib/vdso/datastore.c b/lib/vdso/datastore.c
index 
e227fbbcb79694f9a40606ac864f52cf1fdbfcf4..4e350f56ace335b7ebca8af7663b5731fae27334
 100644
--- a/lib/vdso/datastore.c
+++ b/lib/vdso/datastore.c
@@ -109,7 +109,7 @@ struct vm_area_struct *vdso_install_vvar_mapping(struct 
mm_struct *mm, unsigned
  * non-root time namespace. Whenever a task changes its namespace, the VVAR
  * page tables are cleared and then they will be re-faulted with a
  * corresponding layout.
- * See also the comment near timens_setup_vdso_data() for details.
+ * See also the comment near timens_setup_vdso_clock_data() for details.
  */
 int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 {

-- 
2.48.1




[PATCH 09/19] vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

Prepare for the rework of these structures by adding struct vdso_clock
pointer argument to do_coarse(), and replace the struct vdso_time_data
pointer with the new pointer arugment whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 lib/vdso/gettimeofday.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 
e8d4b02bcb616af19f1e794b14fb4419809408da..36ef7de097e6137832605928a155a0ff78123fb4
 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -229,10 +229,11 @@ static __always_inline int do_coarse_timens(const struct 
vdso_time_data *vdns, c
 }
 #endif
 
-static __always_inline int do_coarse(const struct vdso_time_data *vd, 
clockid_t clk,
-struct __kernel_timespec *ts)
+static __always_inline
+int do_coarse(const struct vdso_time_data *vd, const struct vdso_clock *vc,
+ clockid_t clk, struct __kernel_timespec *ts)
 {
-   const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
+   const struct vdso_timestamp *vdso_ts = &vc->basetime[clk];
u32 seq;
 
do {
@@ -240,17 +241,17 @@ static __always_inline int do_coarse(const struct 
vdso_time_data *vd, clockid_t
 * Open coded function vdso_read_begin() to handle
 * VDSO_CLOCK_TIMENS. See comment in do_hres().
 */
-   while ((seq = READ_ONCE(vd->seq)) & 1) {
+   while ((seq = READ_ONCE(vc->seq)) & 1) {
if (IS_ENABLED(CONFIG_TIME_NS) &&
-   vd->clock_mode == VDSO_CLOCKMODE_TIMENS)
-   return do_coarse_timens(vd, clk, ts);
+   vc->clock_mode == VDSO_CLOCKMODE_TIMENS)
+   return do_coarse_timens(vc, clk, ts);
cpu_relax();
}
smp_rmb();
 
ts->tv_sec = vdso_ts->sec;
ts->tv_nsec = vdso_ts->nsec;
-   } while (unlikely(vdso_read_retry(vd, seq)));
+   } while (unlikely(vdso_read_retry(vc, seq)));
 
return 0;
 }
@@ -274,7 +275,7 @@ __cvdso_clock_gettime_common(const struct vdso_time_data 
*vd, clockid_t clock,
if (likely(msk & VDSO_HRES))
vc = &vc[CS_HRES_COARSE];
else if (msk & VDSO_COARSE)
-   return do_coarse(&vc[CS_HRES_COARSE], clock, ts);
+   return do_coarse(vd, &vc[CS_HRES_COARSE], clock, ts);
else if (msk & VDSO_RAW)
vc = &vc[CS_RAW];
else

-- 
2.48.1




[PATCH 08/19] vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

Prepare for the rework of these structures by adding struct vdso_clock
pointer argument to do_hres_timens(), and replace the struct vdso_time_data
pointer with the new pointer arugment whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 lib/vdso/gettimeofday.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 
15611ab650232f2e847b7de80c7293c4fb7f84f2..e8d4b02bcb616af19f1e794b14fb4419809408da
 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -81,36 +81,36 @@ const struct vdso_time_data 
*__arch_get_vdso_u_timens_data(const struct vdso_tim
 }
 #endif /* CONFIG_GENERIC_VDSO_DATA_STORE */
 
-static __always_inline int do_hres_timens(const struct vdso_time_data *vdns, 
clockid_t clk,
- struct __kernel_timespec *ts)
+static __always_inline
+int do_hres_timens(const struct vdso_time_data *vdns, const struct vdso_clock 
*vcns,
+  clockid_t clk, struct __kernel_timespec *ts)
 {
-   const struct timens_offset *offs = &vdns->offset[clk];
+   const struct vdso_time_data *vd = __arch_get_vdso_u_timens_data(vdns);
+   const struct timens_offset *offs = &vcns->offset[clk];
const struct vdso_timestamp *vdso_ts;
-   const struct vdso_time_data *vd;
+   const struct vdso_clock *vc = vd;
u64 cycles, ns;
u32 seq;
s64 sec;
 
-   vd = vdns - (clk == CLOCK_MONOTONIC_RAW ? CS_RAW : CS_HRES_COARSE);
-   vd = __arch_get_vdso_u_timens_data(vd);
if (clk != CLOCK_MONOTONIC_RAW)
-   vd = &vd[CS_HRES_COARSE];
+   vc = &vc[CS_HRES_COARSE];
else
-   vd = &vd[CS_RAW];
-   vdso_ts = &vd->basetime[clk];
+   vc = &vc[CS_RAW];
+   vdso_ts = &vc->basetime[clk];
 
do {
-   seq = vdso_read_begin(vd);
+   seq = vdso_read_begin(vc);
 
-   if (unlikely(!vdso_clocksource_ok(vd)))
+   if (unlikely(!vdso_clocksource_ok(vc)))
return -1;
 
-   cycles = __arch_get_hw_counter(vd->clock_mode, vd);
+   cycles = __arch_get_hw_counter(vc->clock_mode, vd);
if (unlikely(!vdso_cycles_ok(cycles)))
return -1;
-   ns = vdso_calc_ns(vd, cycles, vdso_ts->nsec);
+   ns = vdso_calc_ns(vc, cycles, vdso_ts->nsec);
sec = vdso_ts->sec;
-   } while (unlikely(vdso_read_retry(vd, seq)));
+   } while (unlikely(vdso_read_retry(vc, seq)));
 
/* Add the namespace offset */
sec += offs->sec;
@@ -132,8 +132,9 @@ const struct vdso_time_data 
*__arch_get_vdso_u_timens_data(const struct vdso_tim
return NULL;
 }
 
-static __always_inline int do_hres_timens(const struct vdso_time_data *vdns, 
clockid_t clk,
- struct __kernel_timespec *ts)
+static __always_inline
+int do_hres_timens(const struct vdso_time_data *vdns, const struct vdso_clock 
*vcns,
+  clockid_t clk, struct __kernel_timespec *ts)
 {
return -EINVAL;
 }
@@ -166,7 +167,7 @@ int do_hres(const struct vdso_time_data *vd, const struct 
vdso_clock *vc,
while (unlikely((seq = READ_ONCE(vc->seq)) & 1)) {
if (IS_ENABLED(CONFIG_TIME_NS) &&
vc->clock_mode == VDSO_CLOCKMODE_TIMENS)
-   return do_hres_timens(vd, clk, ts);
+   return do_hres_timens(vd, vc, clk, ts);
cpu_relax();
}
smp_rmb();

-- 
2.48.1




[PATCH 03/19] vdso: Make vdso_time_data cacheline aligned

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

vdso_time_data is not cacheline aligned at the moment. When instantiating
an array, the start of the second array member is not cache line aligned.
This increases the number of the required cache lines which needs to be
read when handling e.g. CLOCK_MONOTONIC_RAW, because the data spawns an
extra cache line if the previous data does not end at a cache line
boundary.

Therefore make struct vdso_time_data cacheline aligned.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 include/vdso/datapage.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 
ed4fb4c06e3ee6423fe68ccb476565213f234863..dfd98f969f151eca3c551c3e90f69af9ee8f22bb
 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -126,7 +127,7 @@ struct vdso_time_data {
u32 __unused;
 
struct arch_vdso_time_data arch_data;
-};
+} cacheline_aligned;
 
 /**
  * struct vdso_rng_data - vdso RNG state information

-- 
2.48.1




[PATCH 16/19] arm64/vdso: Prepare introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Nam Cao 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer with struct vdso_clock pointer whenever applicable.

No functional change.

Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 arch/arm64/include/asm/vdso/compat_gettimeofday.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h 
b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index 
957ee12fcc54bd7f978fbcd8945bce62327b037a..2c6b90d26bc8fd6d4be87bf6a4178472581f56d3
 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -155,9 +155,9 @@ static __always_inline const struct vdso_time_data 
*__arch_get_vdso_u_time_data(
 }
 #define __arch_get_vdso_u_time_data __arch_get_vdso_u_time_data
 
-static inline bool vdso_clocksource_ok(const struct vdso_time_data *vd)
+static inline bool vdso_clocksource_ok(const struct vdso_clock *vc)
 {
-   return vd->clock_mode == VDSO_CLOCKMODE_ARCHTIMER;
+   return vc->clock_mode == VDSO_CLOCKMODE_ARCHTIMER;
 }
 #define vdso_clocksource_okvdso_clocksource_ok
 

-- 
2.48.1




[PATCH 17/19] powerpc/vdso: Prepare introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Nam Cao 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer with struct vdso_clock pointer whenever applicable.

No functional change.

Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 arch/powerpc/include/asm/vdso/gettimeofday.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
index 
dc955f2e0cc51f44d46f488a292aa0dbee3dc16c..99c9d6f43fde2efaf92d4777d3a5510677da7c92
 100644
--- a/arch/powerpc/include/asm/vdso/gettimeofday.h
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -99,7 +99,7 @@ static __always_inline u64 __arch_get_hw_counter(s32 
clock_mode,
return get_tb();
 }
 
-static inline bool vdso_clocksource_ok(const struct vdso_time_data *vd)
+static inline bool vdso_clocksource_ok(const struct vdso_clock *vc)
 {
return true;
 }

-- 
2.48.1




[PATCH 19/19] vdso: Rework struct vdso_time_data and introduce struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be an array of it.

Now all preparation is in place: Split the clock related struct members
into a separate struct vdso_clock. Make sure all users are aware, that
vdso_time_data is no longer initialized as an array and vdso_clock is now
the array inside vdso_data. Remove also the define of vdso_clock which made
preparation possible in smaller steps.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 arch/arm64/include/asm/vdso/compat_gettimeofday.h |  2 +-
 arch/arm64/include/asm/vdso/vsyscall.h|  4 +-
 arch/s390/kernel/time.c   | 11 ++
 include/asm-generic/vdso/vsyscall.h   |  2 +-
 include/vdso/datapage.h   | 47 ++-
 include/vdso/helpers.h|  4 +-
 kernel/time/namespace.c   |  2 +-
 kernel/time/vsyscall.c| 11 +++---
 lib/vdso/datastore.c  |  4 +-
 lib/vdso/gettimeofday.c   | 16 
 10 files changed, 53 insertions(+), 50 deletions(-)

diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h 
b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index 
2c6b90d26bc8fd6d4be87bf6a4178472581f56d3..d60ea7a72a9cb3457c412d0ece21ed76ae77782d
 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -149,7 +149,7 @@ static __always_inline const struct vdso_time_data 
*__arch_get_vdso_u_time_data(
 * where __aarch64_get_vdso_u_time_data() is called, and then keep the
 * result in a register.
 */
-   asm volatile("mov %0, %1" : "=r"(ret) : "r"(vdso_u_time_data));
+   asm volatile("mov %0, %1" : "=r"(ret) : "r"(&vdso_u_time_data));
 
return ret;
 }
diff --git a/arch/arm64/include/asm/vdso/vsyscall.h 
b/arch/arm64/include/asm/vdso/vsyscall.h
index 
3f65cbd00635aab50a4e0c6058d38b39fd6d43a9..de58951b8df6a4bb9afd411878793c79c30adbf2
 100644
--- a/arch/arm64/include/asm/vdso/vsyscall.h
+++ b/arch/arm64/include/asm/vdso/vsyscall.h
@@ -15,8 +15,8 @@
 static __always_inline
 void __arm64_update_vsyscall(struct vdso_time_data *vdata)
 {
-   vdata[CS_HRES_COARSE].mask  = VDSO_PRECISION_MASK;
-   vdata[CS_RAW].mask  = VDSO_PRECISION_MASK;
+   vdata->clock_data[CS_HRES_COARSE].mask  = VDSO_PRECISION_MASK;
+   vdata->clock_data[CS_RAW].mask  = VDSO_PRECISION_MASK;
 }
 #define __arch_update_vsyscall __arm64_update_vsyscall
 
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 
41ca3586b19f6cac3753b52f0b99be62a33e1cb1..699a18f1c54eb7ec09f7f1cceecd1118aed37ab2
 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -79,12 +79,10 @@ void __init time_early_init(void)
 {
struct ptff_qto qto;
struct ptff_qui qui;
-   int cs;
 
/* Initialize TOD steering parameters */
tod_steering_end = tod_clock_base.tod;
-   for (cs = 0; cs < CS_BASES; cs++)
-   vdso_k_time_data[cs].arch_data.tod_steering_end = 
tod_steering_end;
+   vdso_k_time_data->arch_data.tod_steering_end = tod_steering_end;
 
if (!test_facility(28))
return;
@@ -373,7 +371,6 @@ static void clock_sync_global(long delta)
 {
unsigned long now, adj;
struct ptff_qto qto;
-   int cs;
 
/* Fixup the monotonic sched clock. */
tod_clock_base.eitod += delta;
@@ -389,10 +386,8 @@ static void clock_sync_global(long delta)
panic("TOD clock sync offset %li is too large to drift\n",
  tod_steering_delta);
tod_steering_end = now + (abs(tod_steering_delta) << 15);
-   for (cs = 0; cs < CS_BASES; cs++) {
-   vdso_k_time_data[cs].arch_data.tod_steering_end = 
tod_steering_end;
-   vdso_k_time_data[cs].arch_data.tod_steering_delta = 
tod_steering_delta;
-   }
+   vdso_k_time_data->arch_data.tod_steering_end = tod_steering_end;
+   vdso_k_time_data->arch_data.tod_steering_delta = tod_steering_delta;
 
/* Update LPAR offset. */
if (ptff_query(PTFF_QTO) && ptff(&qto, sizeof(qto), PTFF_QTO) == 0)
diff --git a/include/asm-generic/vdso/vsyscall.h 
b/include/asm-generic/vdso/vsyscall.h
index 
1fb3000f50364feeaaa9348d438b3ab8091bb265..b550afa15ecd101d821f51ce9105903978dced40
 100644
--- a/include/asm-generic/vdso/vsyscall.h
+++ b/include/asm-generic/vdso/vsyscall.h
@@ -9,7 +9,7 @@
 #ifndef __arch_get_vdso_u_time_data
 static __always_inline const struct vdso_time_data 
*__arch_get_vdso_u_time_data(void)
 {
-   return vdso_u_time_data;
+   return &vdso_u_time_data;
 }
 #endif
 
diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h

[PATCH 15/19] x86/vdso: Prepare introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer with struct vdso_clock pointer whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 arch/x86/include/asm/vdso/gettimeofday.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/vdso/gettimeofday.h 
b/arch/x86/include/asm/vdso/gettimeofday.h
index 
edec796832e08b73d6d58bda6408957048f4e80e..9e52cc46e1da99114312d85b34ae52e539dac9b6
 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -261,7 +261,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode,
return U64_MAX;
 }
 
-static inline bool arch_vdso_clocksource_ok(const struct vdso_time_data *vd)
+static inline bool arch_vdso_clocksource_ok(const struct vdso_clock *vc)
 {
return true;
 }
@@ -300,34 +300,34 @@ static inline bool arch_vdso_cycles_ok(u64 cycles)
  * declares everything with the MSB/Sign-bit set as invalid. Therefore the
  * effective mask is S64_MAX.
  */
-static __always_inline u64 vdso_calc_ns(const struct vdso_time_data *vd, u64 
cycles, u64 base)
+static __always_inline u64 vdso_calc_ns(const struct vdso_clock *vc, u64 
cycles, u64 base)
 {
-   u64 delta = cycles - vd->cycle_last;
+   u64 delta = cycles - vc->cycle_last;
 
/*
 * Negative motion and deltas which can cause multiplication
 * overflow require special treatment. This check covers both as
-* negative motion is guaranteed to be greater than @vd::max_cycles
+* negative motion is guaranteed to be greater than @vc::max_cycles
 * due to unsigned comparison.
 *
 * Due to the MSB/Sign-bit being used as invalid marker (see
 * arch_vdso_cycles_ok() above), the effective mask is S64_MAX, but that
 * case is also unlikely and will also take the unlikely path here.
 */
-   if (unlikely(delta > vd->max_cycles)) {
+   if (unlikely(delta > vc->max_cycles)) {
/*
 * Due to the above mentioned TSC wobbles, filter out
 * negative motion.  Per the above masking, the effective
 * sign bit is now bit 62.
 */
if (delta & (1ULL << 62))
-   return base >> vd->shift;
+   return base >> vc->shift;
 
/* Handle multiplication overflow gracefully */
-   return mul_u64_u32_add_u64_shr(delta & S64_MAX, vd->mult, base, 
vd->shift);
+   return mul_u64_u32_add_u64_shr(delta & S64_MAX, vc->mult, base, 
vc->shift);
}
 
-   return ((delta * vd->mult) + base) >> vd->shift;
+   return ((delta * vc->mult) + base) >> vc->shift;
 }
 #define vdso_calc_ns vdso_calc_ns
 

-- 
2.48.1




[PATCH 12/19] vdso/vsyscall: Prepare introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer with struct vdso_clock pointer whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 kernel/time/vsyscall.c | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/kernel/time/vsyscall.c b/kernel/time/vsyscall.c
index 
418192296ef7dd3c1772d50f129e7838883cf00c..dd85b41a70bee7decbd943c35197c091916ee4c7
 100644
--- a/kernel/time/vsyscall.c
+++ b/kernel/time/vsyscall.c
@@ -18,25 +18,26 @@
 static inline void update_vdso_time_data(struct vdso_time_data *vdata, struct 
timekeeper *tk)
 {
struct vdso_timestamp *vdso_ts;
+   struct vdso_clock *vc = vdata;
u64 nsec, sec;
 
-   vdata[CS_HRES_COARSE].cycle_last= tk->tkr_mono.cycle_last;
+   vc[CS_HRES_COARSE].cycle_last   = tk->tkr_mono.cycle_last;
 #ifdef CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT
-   vdata[CS_HRES_COARSE].max_cycles= 
tk->tkr_mono.clock->max_cycles;
+   vc[CS_HRES_COARSE].max_cycles   = tk->tkr_mono.clock->max_cycles;
 #endif
-   vdata[CS_HRES_COARSE].mask  = tk->tkr_mono.mask;
-   vdata[CS_HRES_COARSE].mult  = tk->tkr_mono.mult;
-   vdata[CS_HRES_COARSE].shift = tk->tkr_mono.shift;
-   vdata[CS_RAW].cycle_last= tk->tkr_raw.cycle_last;
+   vc[CS_HRES_COARSE].mask = tk->tkr_mono.mask;
+   vc[CS_HRES_COARSE].mult = tk->tkr_mono.mult;
+   vc[CS_HRES_COARSE].shift= tk->tkr_mono.shift;
+   vc[CS_RAW].cycle_last   = tk->tkr_raw.cycle_last;
 #ifdef CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT
-   vdata[CS_RAW].max_cycles= tk->tkr_raw.clock->max_cycles;
+   vc[CS_RAW].max_cycles   = tk->tkr_raw.clock->max_cycles;
 #endif
-   vdata[CS_RAW].mask  = tk->tkr_raw.mask;
-   vdata[CS_RAW].mult  = tk->tkr_raw.mult;
-   vdata[CS_RAW].shift = tk->tkr_raw.shift;
+   vc[CS_RAW].mask = tk->tkr_raw.mask;
+   vc[CS_RAW].mult = tk->tkr_raw.mult;
+   vc[CS_RAW].shift= tk->tkr_raw.shift;
 
/* CLOCK_MONOTONIC */
-   vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC];
+   vdso_ts = &vc[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC];
vdso_ts->sec= tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
 
nsec = tk->tkr_mono.xtime_nsec;
@@ -54,7 +55,7 @@ static inline void update_vdso_time_data(struct 
vdso_time_data *vdata, struct ti
nsec+= (u64)tk->monotonic_to_boot.tv_nsec << tk->tkr_mono.shift;
 
/* CLOCK_BOOTTIME */
-   vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME];
+   vdso_ts = &vc[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME];
vdso_ts->sec= sec;
 
while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
@@ -64,12 +65,12 @@ static inline void update_vdso_time_data(struct 
vdso_time_data *vdata, struct ti
vdso_ts->nsec   = nsec;
 
/* CLOCK_MONOTONIC_RAW */
-   vdso_ts = &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW];
+   vdso_ts = &vc[CS_RAW].basetime[CLOCK_MONOTONIC_RAW];
vdso_ts->sec= tk->raw_sec;
vdso_ts->nsec   = tk->tkr_raw.xtime_nsec;
 
/* CLOCK_TAI */
-   vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI];
+   vdso_ts = &vc[CS_HRES_COARSE].basetime[CLOCK_TAI];
vdso_ts->sec= tk->xtime_sec + (s64)tk->tai_offset;
vdso_ts->nsec   = tk->tkr_mono.xtime_nsec;
 }
@@ -78,6 +79,7 @@ void update_vsyscall(struct timekeeper *tk)
 {
struct vdso_time_data *vdata = vdso_k_time_data;
struct vdso_timestamp *vdso_ts;
+   struct vdso_clock *vc = vdata;
s32 clock_mode;
u64 nsec;
 
@@ -85,21 +87,21 @@ void update_vsyscall(struct timekeeper *tk)
vdso_write_begin(vdata);
 
clock_mode = tk->tkr_mono.clock->vdso_clock_mode;
-   vdata[CS_HRES_COARSE].clock_mode= clock_mode;
-   vdata[CS_RAW].clock_mode= clock_mode;
+   vc[CS_HRES_COARSE].clock_mode   = clock_mode;
+   vc[CS_RAW].clock_mode   = clock_mode;
 
/* CLOCK_REALTIME also required for time() */
-   vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME];
+   vdso_ts = &vc[CS_HRES_COARSE].basetime[CLOCK_REALTIME];
vdso_ts->sec= tk->xtime_sec;
vdso_ts->nsec   = tk->tkr_mono.xtime_nsec;
 
/* CLOCK_REALTIME_COARSE *

[PATCH 14/19] time/namespace: Prepare introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer with struct vdso_clock pointer whenever applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 kernel/time/namespace.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c
index 
f02430a73be8f081618792c8968bf0c112c54505..09bc4fb39f24ccdaa1e6e7f7238660a4f2a63b54
 100644
--- a/kernel/time/namespace.c
+++ b/kernel/time/namespace.c
@@ -165,26 +165,26 @@ static struct timens_offset offset_from_ts(struct 
timespec64 off)
  * HVCLOCK
  * VVAR
  *
- * The check for vdso_time_data->clock_mode is in the unlikely path of
+ * The check for vdso_clock->clock_mode is in the unlikely path of
  * the seq begin magic. So for the non-timens case most of the time
  * 'seq' is even, so the branch is not taken.
  *
  * If 'seq' is odd, i.e. a concurrent update is in progress, the extra check
- * for vdso_time_data->clock_mode is a non-issue. The task is spin waiting for 
the
+ * for vdso_clock->clock_mode is a non-issue. The task is spin waiting for the
  * update to finish and for 'seq' to become even anyway.
  *
- * Timens page has vdso_time_data->clock_mode set to VDSO_CLOCKMODE_TIMENS 
which
+ * Timens page has vdso_clock->clock_mode set to VDSO_CLOCKMODE_TIMENS which
  * enforces the time namespace handling path.
  */
-static void timens_setup_vdso_clock_data(struct vdso_time_data *vdata,
+static void timens_setup_vdso_clock_data(struct vdso_clock *vc,
 struct time_namespace *ns)
 {
-   struct timens_offset *offset = vdata->offset;
+   struct timens_offset *offset = vc->offset;
struct timens_offset monotonic = offset_from_ts(ns->offsets.monotonic);
struct timens_offset boottime = offset_from_ts(ns->offsets.boottime);
 
-   vdata->seq  = 1;
-   vdata->clock_mode   = VDSO_CLOCKMODE_TIMENS;
+   vc->seq = 1;
+   vc->clock_mode  = VDSO_CLOCKMODE_TIMENS;
offset[CLOCK_MONOTONIC] = monotonic;
offset[CLOCK_MONOTONIC_RAW] = monotonic;
offset[CLOCK_MONOTONIC_COARSE]  = monotonic;
@@ -220,6 +220,7 @@ static void timens_set_vvar_page(struct task_struct *task,
struct time_namespace *ns)
 {
struct vdso_time_data *vdata;
+   struct vdso_clock *vc;
unsigned int i;
 
if (ns == &init_time_ns)
@@ -236,9 +237,10 @@ static void timens_set_vvar_page(struct task_struct *task,
 
ns->frozen_offsets = true;
vdata = page_address(ns->vvar_page);
+   vc = vdata;
 
for (i = 0; i < CS_BASES; i++)
-   timens_setup_vdso_clock_data(&vdata[i], ns);
+   timens_setup_vdso_clock_data(&vc[i], ns);
 
 out:
mutex_unlock(&offset_lock);

-- 
2.48.1




[PATCH 01/19] vdso: Introduce vdso/cache.h

2025-03-03 Thread Thomas Weißschuh
The vDSO implementation can only include headers from the vdso/
namespace. To enable the usage of cacheline_aligned from
the vDSO, move it and its dependencies into a new header vdso/cache.h.
Keep compatibility by including vdso/cache.h from linux/cache.h.

Signed-off-by: Thomas Weißschuh 
---
 include/linux/cache.h |  9 +
 include/vdso/cache.h  | 15 +++
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/cache.h b/include/linux/cache.h
index 
ca2a05682a54b51af991154a99f57a00c88fc5a8..e69768f50d5327b874ba4bd56609300526511a69
 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -3,16 +3,13 @@
 #define __LINUX_CACHE_H
 
 #include 
+#include 
 #include 
 
 #ifndef L1_CACHE_ALIGN
 #define L1_CACHE_ALIGN(x) __ALIGN_KERNEL(x, L1_CACHE_BYTES)
 #endif
 
-#ifndef SMP_CACHE_BYTES
-#define SMP_CACHE_BYTES L1_CACHE_BYTES
-#endif
-
 /**
  * SMP_CACHE_ALIGN - align a value to the L2 cacheline size
  * @x: value to align
@@ -63,10 +60,6 @@
 #define __ro_after_init __section(".data..ro_after_init")
 #endif
 
-#ifndef cacheline_aligned
-#define cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
-#endif
-
 #ifndef cacheline_aligned_in_smp
 #ifdef CONFIG_SMP
 #define cacheline_aligned_in_smp cacheline_aligned
diff --git a/include/vdso/cache.h b/include/vdso/cache.h
new file mode 100644
index 
..f89d48304bf8f101df581aee0e32a2efa9d2fb2d
--- /dev/null
+++ b/include/vdso/cache.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __VDSO_CACHE_H
+#define __VDSO_CACHE_H
+
+#include 
+
+#ifndef SMP_CACHE_BYTES
+#define SMP_CACHE_BYTES L1_CACHE_BYTES
+#endif
+
+#ifndef cacheline_aligned
+#define cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
+#endif
+
+#endif /* __VDSO_ALIGN_H */

-- 
2.48.1




Re: [PATCH v11 3/4] arm64: topology: Support SMT control on ACPI based system

2025-03-03 Thread Sudeep Holla
On Mon, Mar 03, 2025 at 10:56:12AM +0100, Pierre Gondois wrote:
> On 2/28/25 20:06, Sudeep Holla wrote:
> > > >
> > > > Ditto as previous patch, can get rid if it is default 1.
> > > >
> > >
> > > On non-SMT platforms, not calling cpu_smt_set_num_threads() leaves
> > > cpu_smt_num_threads uninitialized to UINT_MAX:
> > >
> > > smt/active:0
> > > smt/control:-1
> > >
> > > If cpu_smt_set_num_threads() is called:
> > > active:0
> > > control:notsupported
> > >
> > > So it might be slightly better to still initialize max_smt_thread_num.
> > >
> >
> > Sure, what I meant is to have max_smt_thread_num set to 1 by default is
> > that is what needed anyways and the above code does that now.
> >
> > Why not start with initialised to 1 instead ?
> > Of course some current logic needs to change around testing it for zero.
> >
>
> I think there would still be a way to check against the default value.
> If we have:
> unsigned int max_smt_thread_num = 1;
>
> then on a platform with 2 threads, the detection condition would trigger:
> xa_for_each(&hetero_cpu, hetero_id, entry) {
> if (entry->thread_num != max_smt_thread_num && max_smt_thread_num) 
> < (entry->thread_num=2) and (max_smt_thread_num=1)
> pr_warn_once("Heterogeneous SMT topology is partly
>   supported by SMT control\n");
>
> so we would need an additional variable:
> bool is_initialized = false;

Sure, we could do that or skip the check if max_smt_thread_num == 1 ?

I mean
if (entry->thread_num != max_smt_thread_num && max_smt_thread_num != 1)

I assume entry->thread_num must be set to 1 on single threaded cores
Won't that work ? Am I missing something still ?

--
Regards,
Sudeep



[PATCH 18/19] vdso: Move arch related data before basetime

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

Architecture related vdso data is required in fastpath when acquiring
CLOCK_MONOTONIC or CLOCK_REALTIME. At the moment, this information is
located at the end of the vdso_time_data structure. The whole structure has
to be loaded into cache to be able to access this information.

To minimize the number of required cachelines, the architecture specific
vdso data struct is moved right before the basetime (basetime information
is required anyway). This change does not have an impact on architectures
with CONFIG_ARCH_HAS_VDSO_DATA=n. All other architectures could spare
reading unnecessary cachelines.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 include/vdso/datapage.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 
1df22e8bb9b31153546b72b1e8b8c8aeaed7d9e3..bcd19c223783be7c22f90120330e70496f1a
 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -70,6 +70,8 @@ struct vdso_timestamp {
 
 /**
  * struct vdso_time_data - vdso datapage representation
+ * @arch_data: architecture specific data (optional, defaults
+ * to an empty struct)
  * @seq:   timebase sequence counter
  * @clock_mode:clock mode
  * @cycle_last:timebase at clocksource init
@@ -83,8 +85,6 @@ struct vdso_timestamp {
  * @tz_dsttime:type of DST correction
  * @hrtimer_res:   hrtimer resolution
  * @__unused:  unused
- * @arch_data: architecture specific data (optional, defaults
- * to an empty struct)
  *
  * vdso_time_data will be accessed by 64 bit and compat code at the same time
  * so we should be careful before modifying this structure.
@@ -105,6 +105,8 @@ struct vdso_timestamp {
  * offset must be zero.
  */
 struct vdso_time_data {
+   struct arch_vdso_time_data arch_data;
+
u32 seq;
 
s32 clock_mode;
@@ -125,8 +127,6 @@ struct vdso_time_data {
s32 tz_dsttime;
u32 hrtimer_res;
u32 __unused;
-
-   struct arch_vdso_time_data arch_data;
 } cacheline_aligned;
 
 #define vdso_clock vdso_time_data

-- 
2.48.1




[PATCH v7 0/6] ptrace: introduce PTRACE_SET_SYSCALL_INFO API

2025-03-03 Thread Dmitry V. Levin
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.

This API allows ptracers to obtain and modify system call details in a
straightforward and architecture-agnostic way, providing a consistent way
of manipulating the system call number and arguments across architectures.

As in case of PTRACE_GET_SYSCALL_INFO, PTRACE_SET_SYSCALL_INFO also
does not aim to address numerous architecture-specific system call ABI
peculiarities, like differences in the number of system call arguments
for such system calls as pread64 and preadv.

The current implementation supports changing only those bits of system call
information that are used by strace system call tampering, namely, syscall
number, syscall arguments, and syscall return value.

Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later if
needed, by using struct ptrace_syscall_info.flags to specify the additional
details that should be set.  Currently, "flags" and "reserved" fields of
struct ptrace_syscall_info must be initialized with zeroes; "arch",
"instruction_pointer", and "stack_pointer" fields are currently ignored.

PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.

Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen.  The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].

ptrace(2) man page:

long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
   Modify information about the system call that caused the stop.
   The "data" argument is a pointer to struct ptrace_syscall_info
   that specifies the system call information to be set.
   The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).

[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055ee...@gmail.com/

Notes:
v7:
* csky: Fix typo in comment
* mips: syscall_set_arguments: Remove mips_syscall_is_indirect check
* mips: syscall_set_nr: Reword comment
* mips: Add Reviewed-by
* v6: https://lore.kernel.org/all/20250217090834.ga18...@strace.io/

v6:
* mips: Submit mips_get_syscall_arg() o32 fix via mips tree
  to get it merged into v6.14-rc3
* Rebase to v6.14-rc3
* v5: https://lore.kernel.org/all/20250210113336.ga...@strace.io/

v5:
* ptrace: Extend the commit message to say that the new API does not aim
  to address numerous architecture-specific syscall ABI peculiarities
* selftests: Add a workaround for s390 16-bit syscall numbers
* parisc: Add Acked-by
* v4: https://lore.kernel.org/all/20250203065849.ga14...@strace.io/

v4:
* Split out syscall_set_return_value() for hexagon into a separate patch
* s390: Change the style of syscall_set_arguments() implementation as
  requested
* ptrace: Add Reviewed-by
* v3: https://lore.kernel.org/all/20250128091445.ga8...@strace.io/

v3:
* powerpc: Submit syscall_set_return_value() fix for "sc" case separately
* mips: Do not introduce erroneous argument truncation on mips n32,
  add a detailed description to the commit message of the
  mips_get_syscall_arg() change
* ptrace: Add explicit padding to the end of struct ptrace_syscall_info,
  simplify obtaining of user ptrace_syscall_info,
  do not introduce PTRACE_SYSCALL_INFO_SIZE_VER0
* ptrace: Change the return type of ptrace_set_syscall_info_* functions
  from "unsigned long" to "int"
* ptrace: Add -ERANGE check to ptrace_set_syscall_info_exit(),
  add comments to -ERANGE checks
* ptrace: Update comments about supported syscall stops
* selftests: Extend set_syscall_info test, fix for mips n32
* riscv: Add Tested-by and Reviewed-by

v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
  on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
  and sparc
* Remove redundant ptrace_syscall_info.op assignments in
  ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
  ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
  in ptrace_set_syscall_info_entry() 
* Split ptrace_syscall_info.reserved 

[PATCH v7 2/6] syscall.h: add syscall_set_arguments()

2025-03-03 Thread Dmitry V. Levin
This function is going to be needed on all HAVE_ARCH_TRACEHOOK
architectures to implement PTRACE_SET_SYSCALL_INFO API.

This partially reverts commit 7962c2eddbfe ("arch: remove unused
function syscall_set_arguments()") by reusing some of old
syscall_set_arguments() implementations.

Signed-off-by: Dmitry V. Levin 
Tested-by: Charlie Jenkins 
Reviewed-by: Charlie Jenkins 
Acked-by: Helge Deller  # parisc
Reviewed-by: Maciej W. Rozycki  # mips
---
 arch/arc/include/asm/syscall.h| 14 +++
 arch/arm/include/asm/syscall.h| 13 ++
 arch/arm64/include/asm/syscall.h  | 13 ++
 arch/csky/include/asm/syscall.h   | 13 ++
 arch/hexagon/include/asm/syscall.h|  7 ++
 arch/loongarch/include/asm/syscall.h  |  8 ++
 arch/mips/include/asm/syscall.h   | 28 +
 arch/nios2/include/asm/syscall.h  | 11 
 arch/openrisc/include/asm/syscall.h   |  7 ++
 arch/parisc/include/asm/syscall.h | 12 +
 arch/powerpc/include/asm/syscall.h| 10 
 arch/riscv/include/asm/syscall.h  |  9 +++
 arch/s390/include/asm/syscall.h   |  9 +++
 arch/sh/include/asm/syscall_32.h  | 12 +
 arch/sparc/include/asm/syscall.h  | 10 
 arch/um/include/asm/syscall-generic.h | 14 +++
 arch/x86/include/asm/syscall.h| 36 +++
 arch/xtensa/include/asm/syscall.h | 11 
 include/asm-generic/syscall.h | 16 
 19 files changed, 253 insertions(+)

diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index 9709256e31c8..89c1e1736356 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct 
pt_regs *regs,
}
 }
 
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ unsigned long *args)
+{
+   unsigned long *inside_ptregs = ®s->r0;
+   unsigned int n = 6;
+   unsigned int i = 0;
+
+   while (n--) {
+   *inside_ptregs = args[i++];
+   inside_ptregs--;
+   }
+}
+
 static inline int
 syscall_get_arch(struct task_struct *task)
 {
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fe4326d938c1..21927fa0ae2b 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct task_struct 
*task,
memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+struct pt_regs *regs,
+const unsigned long *args)
+{
+   memcpy(®s->ARM_r0, args, 6 * sizeof(args[0]));
+   /*
+* Also copy the first argument into ARM_ORIG_r0
+* so that syscall_get_arguments() would return it
+* instead of the previous value.
+*/
+   regs->ARM_ORIG_r0 = regs->ARM_r0;
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
/* ARM tasks don't change audit architectures on the fly. */
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index ab8e14b96f68..76020b66286b 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct task_struct 
*task,
memcpy(args, ®s->regs[1], 5 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+struct pt_regs *regs,
+const unsigned long *args)
+{
+   memcpy(®s->regs[0], args, 6 * sizeof(args[0]));
+   /*
+* Also copy the first argument into orig_x0
+* so that syscall_get_arguments() would return it
+* instead of the previous value.
+*/
+   regs->orig_x0 = regs->regs[0];
+}
+
 /*
  * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
  * AArch64 has the same system calls both on little- and big- endian.
diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
index 0de5734950bf..717f44b4d26f 100644
--- a/arch/csky/include/asm/syscall.h
+++ b/arch/csky/include/asm/syscall.h
@@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct 
pt_regs *regs,
memcpy(args, ®s->a1, 5 * sizeof(args[0]));
 }
 
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ const unsigned long *args)
+{
+   memcpy(®s->a0, args, 6 * sizeof(regs->a0));
+   /*
+* Also copy the first argument into orig_a0
+* so that syscall_get_arguments() would return it
+* instead of the previous value.
+*/
+   regs->orig_a0 = regs->a0;
+}
+
 static inline int
 syscall_get_arch(struct task_struct *ta

[PATCH v7 3/6] syscall.h: introduce syscall_set_nr()

2025-03-03 Thread Dmitry V. Levin
Similar to syscall_set_arguments() that complements
syscall_get_arguments(), introduce syscall_set_nr()
that complements syscall_get_nr().

syscall_set_nr() is going to be needed along with
syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK
architectures to implement PTRACE_SET_SYSCALL_INFO API.

Signed-off-by: Dmitry V. Levin 
Tested-by: Charlie Jenkins 
Reviewed-by: Charlie Jenkins 
Acked-by: Helge Deller  # parisc
Reviewed-by: Maciej W. Rozycki  # mips
---
 arch/arc/include/asm/syscall.h| 11 +++
 arch/arm/include/asm/syscall.h| 24 
 arch/arm64/include/asm/syscall.h  | 16 
 arch/hexagon/include/asm/syscall.h|  7 +++
 arch/loongarch/include/asm/syscall.h  |  7 +++
 arch/m68k/include/asm/syscall.h   |  7 +++
 arch/microblaze/include/asm/syscall.h |  7 +++
 arch/mips/include/asm/syscall.h   | 15 +++
 arch/nios2/include/asm/syscall.h  |  5 +
 arch/openrisc/include/asm/syscall.h   |  6 ++
 arch/parisc/include/asm/syscall.h |  7 +++
 arch/powerpc/include/asm/syscall.h| 10 ++
 arch/riscv/include/asm/syscall.h  |  7 +++
 arch/s390/include/asm/syscall.h   | 12 
 arch/sh/include/asm/syscall_32.h  | 12 
 arch/sparc/include/asm/syscall.h  | 12 
 arch/um/include/asm/syscall-generic.h |  5 +
 arch/x86/include/asm/syscall.h|  7 +++
 arch/xtensa/include/asm/syscall.h |  7 +++
 include/asm-generic/syscall.h | 14 ++
 20 files changed, 198 insertions(+)

diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index 89c1e1736356..728d625a10f1 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -23,6 +23,17 @@ syscall_get_nr(struct task_struct *task, struct pt_regs 
*regs)
return -1;
 }
 
+static inline void
+syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
+{
+   /*
+* Unlike syscall_get_nr(), syscall_set_nr() can be called only when
+* the target task is stopped for tracing on entering syscall, so
+* there is no need to have the same check syscall_get_nr() has.
+*/
+   regs->r8 = nr;
+}
+
 static inline void
 syscall_rollback(struct task_struct *task, struct pt_regs *regs)
 {
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 21927fa0ae2b..18b102a30741 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -68,6 +68,30 @@ static inline void syscall_set_return_value(struct 
task_struct *task,
regs->ARM_r0 = (long) error ? error : val;
 }
 
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+   if (nr == -1) {
+   task_thread_info(task)->abi_syscall = -1;
+   /*
+* When the syscall number is set to -1, the syscall will be
+* skipped.  In this case the syscall return value has to be
+* set explicitly, otherwise the first syscall argument is
+* returned as the syscall return value.
+*/
+   syscall_set_return_value(task, regs, -ENOSYS, 0);
+   return;
+   }
+   if ((IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))) {
+   task_thread_info(task)->abi_syscall = nr;
+   return;
+   }
+   task_thread_info(task)->abi_syscall =
+   (task_thread_info(task)->abi_syscall & ~__NR_SYSCALL_MASK) |
+   (nr & __NR_SYSCALL_MASK);
+}
+
 #define SYSCALL_MAX_ARGS 7
 
 static inline void syscall_get_arguments(struct task_struct *task,
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 76020b66286b..712daa90e643 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -61,6 +61,22 @@ static inline void syscall_set_return_value(struct 
task_struct *task,
regs->regs[0] = val;
 }
 
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+   regs->syscallno = nr;
+   if (nr == -1) {
+   /*
+* When the syscall number is set to -1, the syscall will be
+* skipped.  In this case the syscall return value has to be
+* set explicitly, otherwise the first syscall argument is
+* returned as the syscall return value.
+*/
+   syscall_set_return_value(task, regs, -ENOSYS, 0);
+   }
+}
+
 #define SYSCALL_MAX_ARGS 6
 
 static inline void syscall_get_arguments(struct task_struct *task,
diff --git a/arch/hexagon/include/asm/syscall.h 
b/arch/hexagon/include/asm/syscall.h
index 1024a6548d78..70637261817a 100644
--- a/arch/hexagon/inc

[PATCH 04/19] vdso/datapage: Define for vdso_data to make rework of vdso possible

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

PTP clocks could also be supported by the vdso to use the advantages of
this implementation. Therefore the struct must be reworked. For a
transition to the new structure of the vdso, add a define which maps
vdso_clock to vdso_data. This will be removed when all users are updated
step by step.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 include/vdso/datapage.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 
dfd98f969f151eca3c551c3e90f69af9ee8f22bb..1df22e8bb9b31153546b72b1e8b8c8aeaed7d9e3
 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -129,6 +129,8 @@ struct vdso_time_data {
struct arch_vdso_time_data arch_data;
 } cacheline_aligned;
 
+#define vdso_clock vdso_time_data
+
 /**
  * struct vdso_rng_data - vdso RNG state information
  * @generation:counter representing the number of RNG reseeds

-- 
2.48.1




Re: [PATCH v11 3/4] arm64: topology: Support SMT control on ACPI based system

2025-03-03 Thread Pierre Gondois




On 2/28/25 20:06, Sudeep Holla wrote:

On Fri, Feb 28, 2025 at 06:51:16PM +0100, Pierre Gondois wrote:



On 2/28/25 14:56, Sudeep Holla wrote:

On Tue, Feb 18, 2025 at 10:10:17PM +0800, Yicong Yang wrote:

From: Yicong Yang 

For ACPI we'll build the topology from PPTT and we cannot directly
get the SMT number of each core. Instead using a temporary xarray
to record the heterogeneous information (from ACPI_PPTT_ACPI_IDENTICAL)
and SMT information of the first core in its heterogeneous CPU cluster
when building the topology. Then we can know the largest SMT number
in the system. If a homogeneous system's using ACPI 6.2 or later,
all the CPUs should be under the root node of PPTT. There'll be
only one entry in the xarray and all the CPUs in the system will
be assumed identical.

The core's SMT control provides two interface to the users [1]:
1) enable/disable SMT by writing on/off
2) enable/disable SMT by writing thread number 1/max_thread_number

If a system have more than one SMT thread number the 2) may
not handle it well, since there're multiple thread numbers in the
system and 2) only accept 1/max_thread_number. So issue a warning
to notify the users if such system detected.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-devices-system-cpu#n542

Reviewed-by: Jonathan Cameron 
Signed-off-by: Yicong Yang 
---
   arch/arm64/kernel/topology.c | 66 
   1 file changed, 66 insertions(+)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..6eba1ac091ee 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -15,8 +15,10 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
+#include 
   #include 
   #include 
@@ -37,17 +39,28 @@ static bool __init acpi_cpu_is_threaded(int cpu)
return !!is_threaded;
   }
+struct cpu_smt_info {
+   unsigned int thread_num;
+   int core_id;
+};
+
   /*
* Propagate the topology information of the processor_topology_node tree to 
the
* cpu_topology array.
*/
   int __init parse_acpi_topology(void)
   {
+   unsigned int max_smt_thread_num = 0;
+   struct cpu_smt_info *entry;
+   struct xarray hetero_cpu;
+   unsigned long hetero_id;
int cpu, topology_id;
if (acpi_disabled)
return 0;
+   xa_init(&hetero_cpu);
+
for_each_possible_cpu(cpu) {
topology_id = find_acpi_cpu_topology(cpu, 0);
if (topology_id < 0)
@@ -57,6 +70,34 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].thread_id = topology_id;
topology_id = find_acpi_cpu_topology(cpu, 1);
cpu_topology[cpu].core_id   = topology_id;
+
+   /*
+* In the PPTT, CPUs below a node with the 'identical
+* implementation' flag have the same number of threads.
+* Count the number of threads for only one CPU (i.e.
+* one core_id) among those with the same hetero_id.
+* See the comment of find_acpi_cpu_topology_hetero_id()
+* for more details.
+*
+* One entry is created for each node having:
+* - the 'identical implementation' flag
+* - its parent not having the flag
+*/
+   hetero_id = find_acpi_cpu_topology_hetero_id(cpu);
+   entry = xa_load(&hetero_cpu, hetero_id);
+   if (!entry) {
+   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+   WARN_ON_ONCE(!entry);
+
+   if (entry) {
+   entry->core_id = topology_id;
+   entry->thread_num = 1;
+   xa_store(&hetero_cpu, hetero_id,
+entry, GFP_KERNEL);
+   }
+   } else if (entry->core_id == topology_id) {
+   entry->thread_num++;
+   }
} else {
cpu_topology[cpu].thread_id  = -1;
cpu_topology[cpu].core_id= topology_id;
@@ -67,6 +108,31 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].package_id = topology_id;
}
+   /*
+* This should be a short loop depending on the number of heterogeneous
+* CPU clusters. Typically on a homogeneous system there's only one
+* entry in the XArray.
+*/
+   xa_for_each(&hetero_cpu, hetero_id, entry) {
+   if (entry->thread_num != max_smt_thread_num && 
max_smt_thread_num)

[PATCH 11/19] vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock

2025-03-03 Thread Thomas Weißschuh
From: Anna-Maria Behnsen 

To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of it. By now, vdso_clock is
simply a define which maps vdso_clock to vdso_time_data.

To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer argument of the helper functions with struct
vdso_clock pointer if applicable.

No functional change.

Signed-off-by: Anna-Maria Behnsen 
Signed-off-by: Nam Cao 
Signed-off-by: Thomas Weißschuh 
---
 lib/vdso/gettimeofday.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 
03fa0393645ac0f5ee465ddc19d84b330913da65..c6ff6934558658f9e280d5b84cfb034f4828893d
 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -17,12 +17,12 @@
 #endif
 
 #ifdef CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT
-static __always_inline bool vdso_delta_ok(const struct vdso_time_data *vd, u64 
delta)
+static __always_inline bool vdso_delta_ok(const struct vdso_clock *vc, u64 
delta)
 {
-   return delta < vd->max_cycles;
+   return delta < vc->max_cycles;
 }
 #else
-static __always_inline bool vdso_delta_ok(const struct vdso_time_data *vd, u64 
delta)
+static __always_inline bool vdso_delta_ok(const struct vdso_clock *vc, u64 
delta)
 {
return true;
 }
@@ -39,14 +39,14 @@ static __always_inline u64 vdso_shift_ns(u64 ns, u32 shift)
  * Default implementation which works for all sane clocksources. That
  * obviously excludes x86/TSC.
  */
-static __always_inline u64 vdso_calc_ns(const struct vdso_time_data *vd, u64 
cycles, u64 base)
+static __always_inline u64 vdso_calc_ns(const struct vdso_clock *vc, u64 
cycles, u64 base)
 {
-   u64 delta = (cycles - vd->cycle_last) & VDSO_DELTA_MASK(vd);
+   u64 delta = (cycles - vc->cycle_last) & VDSO_DELTA_MASK(vc);
 
-   if (likely(vdso_delta_ok(vd, delta)))
-   return vdso_shift_ns((delta * vd->mult) + base, vd->shift);
+   if (likely(vdso_delta_ok(vc, delta)))
+   return vdso_shift_ns((delta * vc->mult) + base, vc->shift);
 
-   return mul_u64_u32_add_u64_shr(delta, vd->mult, base, vd->shift);
+   return mul_u64_u32_add_u64_shr(delta, vc->mult, base, vc->shift);
 }
 #endif /* vdso_calc_ns */
 
@@ -58,9 +58,9 @@ static inline bool __arch_vdso_hres_capable(void)
 #endif
 
 #ifndef vdso_clocksource_ok
-static inline bool vdso_clocksource_ok(const struct vdso_time_data *vd)
+static inline bool vdso_clocksource_ok(const struct vdso_clock *vc)
 {
-   return vd->clock_mode != VDSO_CLOCKMODE_NONE;
+   return vc->clock_mode != VDSO_CLOCKMODE_NONE;
 }
 #endif
 

-- 
2.48.1




Re: [PATCH 3/4] ASoC: dt-bindings: fsl,audmix: make 'dais' property to be optional

2025-03-03 Thread Rob Herring (Arm)


On Wed, 26 Feb 2025 18:05:07 +0800, Shengjiu Wang wrote:
> Make 'dais' property to be optional. When there is no 'dais' property,
> driver won't register the card, dts should have audio graph card node
> for linking this device.
> 
> Signed-off-by: Shengjiu Wang 
> ---
>  Documentation/devicetree/bindings/sound/fsl,audmix.yaml | 1 -
>  1 file changed, 1 deletion(-)
> 

Reviewed-by: Rob Herring (Arm) 




Re: [PATCH 2/4] ASoC: dt-bindings: fsl,audmix: Document audio graph port

2025-03-03 Thread Rob Herring (Arm)


On Wed, 26 Feb 2025 18:05:06 +0800, Shengjiu Wang wrote:
> This device can be used in conjunction with audio-graph-card to provide
> an endpoint for binding with the other side of the audio link.
> 
> Signed-off-by: Shengjiu Wang 
> ---
>  .../devicetree/bindings/sound/fsl,audmix.yaml | 60 +++
>  1 file changed, 60 insertions(+)
> 

Reviewed-by: Rob Herring (Arm) 




Re: [PATCH v11 1/4] cpu/SMT: Provide a default topology_is_primary_thread()

2025-03-03 Thread Yicong Yang
On 2025/2/28 21:54, Sudeep Holla wrote:
> On Tue, Feb 18, 2025 at 10:10:15PM +0800, Yicong Yang wrote:
>> From: Yicong Yang 
>>
>> Currently if architectures want to support HOTPLUG_SMT they need to
>> provide a topology_is_primary_thread() telling the framework which
>> thread in the SMT cannot offline. However arm64 doesn't have a
>> restriction on which thread in the SMT cannot offline, a simplest
>> choice is that just make 1st thread as the "primary" thread. So
>> just make this as the default implementation in the framework and
>> let architectures like x86 that have special primary thread to
>> override this function (which they've already done).
>>
>> There's no need to provide a stub function if !CONFIG_SMP or
>> !CONFIG_HOTPLUG_SMT. In such case the testing CPU is already
>> the 1st CPU in the SMT so it's always the primary thread.
>>
>> Reviewed-by: Jonathan Cameron 
>> Signed-off-by: Yicong Yang 
>> ---
>> Pre questioned in v9 [1] whether this works on architectures not using
>> CONFIG_GENERIC_ARCH_TOPOLOGY, See [2] for demonstration hacking on LoongArch
>> VM and this also works. Architectures should use this on their own situation.
>> [1] 
>> https://lore.kernel.org/linux-arm-kernel/427bd639-33c3-47e4-9e83-68c428eb1...@arm.com/
>> [2] 
>> https://lore.kernel.org/linux-arm-kernel/a5690fee-3019-f26c-8bad-1d95e388e...@huawei.com/
>>
>>  arch/powerpc/include/asm/topology.h |  1 +
>>  arch/x86/include/asm/topology.h |  2 +-
>>  include/linux/topology.h| 22 ++
>>  3 files changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/topology.h 
>> b/arch/powerpc/include/asm/topology.h
>> index 16bacfe8c7a2..da15b5efe807 100644
>> --- a/arch/powerpc/include/asm/topology.h
>> +++ b/arch/powerpc/include/asm/topology.h
>> @@ -152,6 +152,7 @@ static inline bool topology_is_primary_thread(unsigned 
>> int cpu)
>>  {
>>  return cpu == cpu_first_thread_sibling(cpu);
>>  }
>> +#define topology_is_primary_thread topology_is_primary_thread
>>  
>>  static inline bool topology_smt_thread_allowed(unsigned int cpu)
>>  {
>> diff --git a/arch/x86/include/asm/topology.h 
>> b/arch/x86/include/asm/topology.h
>> index ec134b719144..6c79ee7c0957 100644
>> --- a/arch/x86/include/asm/topology.h
>> +++ b/arch/x86/include/asm/topology.h
>> @@ -229,11 +229,11 @@ static inline bool topology_is_primary_thread(unsigned 
>> int cpu)
>>  {
>>  return cpumask_test_cpu(cpu, cpu_primary_thread_mask);
>>  }
>> +#define topology_is_primary_thread topology_is_primary_thread
>>  
>>  #else /* CONFIG_SMP */
>>  static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 
>> 0; }
>>  static inline int topology_max_smt_threads(void) { return 1; }
>> -static inline bool topology_is_primary_thread(unsigned int cpu) { return 
>> true; }
>>  static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
>>  #endif /* !CONFIG_SMP */
>>  
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index 52f5850730b3..b3aba443c4eb 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -240,6 +240,28 @@ static inline const struct cpumask *cpu_smt_mask(int 
>> cpu)
>>  }
>>  #endif
>>  
>> +#ifndef topology_is_primary_thread
>> +
>> +#define topology_is_primary_thread topology_is_primary_thread
>> +
>> +static inline bool topology_is_primary_thread(unsigned int cpu)
>> +{
>> +/*
>> + * On SMT hotplug the primary thread of the SMT won't be disabled.
> 
> I may be misunderstanding the term "SMT hotplug" above. For me it is
> comparable with logical CPU hotplug, so the above statement may be
> misleading. IIUC, what you mean above is if SMT is disabled, the
> primary thread will always remain enabled/active. Does that make sense
> or am I missing something ?
> 

I just the borrow the term from kconfig HOTPLUG_SMT here, but here the statement
only involves the disable part, so maybe it'll be more accurate to use "SMT
disable" rather than "SMT hotplug" here?

Thanks.




Re: [PATCH v3 2/3] dt-bindings: nand: Add fsl,elbc-fcm-nand

2025-03-03 Thread Rob Herring
On Wed, Feb 26, 2025 at 12:45:17PM -0600, Rob Herring (Arm) wrote:
> 
> On Wed, 26 Feb 2025 18:01:41 +0100, J. Neuschäfer wrote:
> > Formalize the binding already supported by the fsl_elbc_nand.c driver
> > and used in several device trees in arch/powerpc/boot/dts/.
> > 
> > raw-nand-chip.yaml is referenced in order to accommodate situations in
> > which the ECC parameters settings are set in the device tree. One such
> > example is in arch/powerpc/boot/dts/turris1x.dts:
> > 
> > /* MT29F2G08ABAEAWP:E NAND */
> > nand@1,0 {
> > compatible = "fsl,p2020-fcm-nand", "fsl,elbc-fcm-nand";
> > reg = <0x1 0x0 0x0004>;
> > nand-ecc-mode = "soft";
> > nand-ecc-algo = "bch";
> > 
> > partitions { ... };
> > };
> > 
> > Reviewed-by: Frank Li 
> > Signed-off-by: J. Neuschäfer 
> > ---
> > 
> > V3:
> > - remove unnecessary #address/size-cells from nand node in example
> > - add Frank Li's review tag
> > - add missing end of document marker (...)
> > - explain choice to reference raw-nand-chip.yaml
> > 
> > V2:
> > - split out from fsl,elbc binding patch
> > - constrain #address-cells and #size-cells
> > - add a general description
> > - use unevaluatedProperties=false instead of additionalProperties=false
> > - fix property order to comply with dts coding style
> > - include raw-nand-chip.yaml instead of nand-chip.yaml
> > ---
> >  .../devicetree/bindings/mtd/fsl,elbc-fcm-nand.yaml | 68 
> > ++
> >  1 file changed, 68 insertions(+)
> > 
> 
> My bot found errors running 'make dt_binding_check' on your patch:
> 
> yamllint warnings/errors:
> 
> dtschema/dtc warnings/errors:
> /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/mtd/fsl,elbc-fcm-nand.example.dtb:
>  nand@1,0: $nodename:0: 'nand@1,0' does not match '^nand@[a-f0-9]$'
>   from schema $id: 
> http://devicetree.org/schemas/mtd/fsl,elbc-fcm-nand.yaml#

Drop the unit address in raw-nand-chip.yaml. So: 

properties:
  $nodename:
pattern: "^nand@"




Re: [PATCH 1/4] ASoC: dt-bindings: fsl,sai: Document audio graph port

2025-03-03 Thread Rob Herring (Arm)


On Wed, 26 Feb 2025 18:05:05 +0800, Shengjiu Wang wrote:
> This device can be used in conjunction with audio-graph-card to provide
> an endpoint for binding with the other side of the audio link.
> 
> Signed-off-by: Shengjiu Wang 
> ---
>  .../devicetree/bindings/sound/fsl,sai.yaml| 51 +++
>  1 file changed, 51 insertions(+)
> 

Reviewed-by: Rob Herring (Arm) 




Re: [PATCH] book3s64/radix : Align section vmemmap start address to PAGE_SIZE

2025-03-03 Thread Aneesh Kumar K . V
Donet Tom  writes:

> A vmemmap altmap is a device-provided region used to provide
> backing storage for struct pages. For each namespace, the altmap
> should belong to that same namespace. If the namespaces are
> created unaligned, there is a chance that the section vmemmap
> start address could also be unaligned. If the section vmemmap
> start address is unaligned, the altmap page allocated from the
> current namespace might be used by the previous namespace also.
> During the free operation, since the altmap is shared between two
> namespaces, the previous namespace may detect that the page does
> not belong to its altmap and incorrectly assume that the page is a
> normal page. It then attempts to free the normal page, which leads
> to a kernel crash.
>
> In this patch, we are aligning the section vmemmap start address
> to PAGE_SIZE. After alignment, the start address will not be
> part of the current namespace, and a normal page will be allocated
> for the vmemmap mapping of the current section. For the remaining
> sections, altmaps will be allocated. During the free operation,
> the normal page will be correctly freed.
>
> Without this patch
> ==
> NS1 start   NS2 start
>  _
> | NS1   |NS2  |
>  -
> | Altmap| Altmap | .|Altmap| Altmap | ...
> |  NS1  |  NS1   |  | NS2  |  NS2   |
>

^^^ this should be allocated in ram?


>
> In the above scenario, NS1 and NS2 are two namespaces. The vmemmap
> for NS1 comes from Altmap NS1, which belongs to NS1, and the
> vmemmap for NS2 comes from Altmap NS2, which belongs to NS2.
>
> The vmemmap start for NS2 is not aligned, so Altmap NS2 is shared
> by both NS1 and NS2. During the free operation in NS1, Altmap NS2
> is not part of NS1's altmap, causing it to attempt to free an
> invalid page.
>
> With this patch
> ===
> NS1 start   NS2 start
>  _
> | NS1   |NS2  |
>  -
> | Altmap| Altmap | .| Normal | Altmap | Altmap |...
> |  NS1  |  NS1   |  |  Page  |  NS2   |  NS2   |
>
> If the vmemmap start for NS2 is not aligned then we are allocating
> a normal page. NS1 and NS2 vmemmap will be freed correctly.
>
> Fixes: 368a0590d954("powerpc/book3s64/vmemmap: switch radix to use a 
> different vmemmap handling function")
> Co-developed-by: Ritesh Harjani (IBM) 
> Signed-off-by: Ritesh Harjani (IBM) 
> Signed-off-by: Donet Tom 
> ---
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 311e2112d782..b22d5f6147d2 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1120,6 +1120,8 @@ int __meminit radix__vmemmap_populate(unsigned long 
> start, unsigned long end, in
>   pmd_t *pmd;
>   pte_t *pte;
>  
> + start = ALIGN_DOWN(start, PAGE_SIZE);
> +
>   for (addr = start; addr < end; addr = next) {
>   next = pmd_addr_end(addr, end);
>  
> -- 
> 2.43.5



Re: [PATCH v11 1/4] cpu/SMT: Provide a default topology_is_primary_thread()

2025-03-03 Thread Yicong Yang
On 2025/2/28 19:10, Dietmar Eggemann wrote:
> On 18/02/2025 15:10, Yicong Yang wrote:
>> From: Yicong Yang 
> 
> [...]
> 
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index 52f5850730b3..b3aba443c4eb 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -240,6 +240,28 @@ static inline const struct cpumask *cpu_smt_mask(int 
>> cpu)
>>  }
>>  #endif
>>  
>> +#ifndef topology_is_primary_thread
>> +
>> +#define topology_is_primary_thread topology_is_primary_thread
>> +
>> +static inline bool topology_is_primary_thread(unsigned int cpu)
>> +{
>> +/*
>> + * On SMT hotplug the primary thread of the SMT won't be disabled.
>> + * Architectures do have a special primary thread (e.g. x86) need
>> + * to override this function. Otherwise just make the first thread
>> + * in the SMT as the primary thread.
>> + *
>> + * The sibling cpumask of an offline CPU contains always the CPU
>> + * itself for architectures using CONFIG_GENERIC_ARCH_TOPOLOGY.
>> + * Other architectures should use this depend on their own
>> + * situation.
> 
> This sentence is hard to get. Do you want to say that other
> architectures (CONFIG_GENERIC_ARCH_TOPOLOGY or
> !CONFIG_GENERIC_ARCH_TOPOLOGY) have to check whether they can use this
> default implementation or have to override it?
> 

yes exactly, will improve the comments. thanks.




Re: [PATCH v11 2/4] arch_topology: Support SMT control for OF based system

2025-03-03 Thread Yicong Yang
On 2025/2/28 21:54, Sudeep Holla wrote:
> On Tue, Feb 18, 2025 at 10:10:16PM +0800, Yicong Yang wrote:
>> From: Yicong Yang 
>>
>> On building the topology from the devicetree, we've already
>> gotten the SMT thread number of each core. Update the largest
>> SMT thread number and enable the SMT control by the end of
>> topology parsing.
>>
>> The core's SMT control provides two interface to the users [1]:
>> 1) enable/disable SMT by writing on/off
>> 2) enable/disable SMT by writing thread number 1/max_thread_number
>>
>> If a system have more than one SMT thread number the 2) may
>> not handle it well, since there're multiple thread numbers in the
>> system and 2) only accept 1/max_thread_number. So issue a warning
>> to notify the users if such system detected.
>>
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-devices-system-cpu#n542
>>
>> Signed-off-by: Yicong Yang 
>> ---
>>  drivers/base/arch_topology.c | 27 +++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index 3ebe77566788..23f425a9d77a 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -11,6 +11,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -506,6 +507,10 @@ core_initcall(free_raw_capacity);
>>  #endif
>>  
>>  #if defined(CONFIG_ARM64) || defined(CONFIG_RISCV)
>> +
>> +/* Maximum SMT thread number detected used to enable the SMT control */
>> +static unsigned int max_smt_thread_num;
>> +
>>  /*
>>   * This function returns the logic cpu number of the node.
>>   * There are basically three kinds of return values:
>> @@ -565,6 +570,16 @@ static int __init parse_core(struct device_node *core, 
>> int package_id,
>>  i++;
>>  } while (1);
>>  
>> +/*
>> + * If max_smt_thread_num has been initialized and doesn't match
>> + * the thread number of this entry, then the system has
>> + * heterogeneous SMT topology.
>> + */
>> +if (max_smt_thread_num && max_smt_thread_num != i)
>> +pr_warn_once("Heterogeneous SMT topology is partly supported by 
>> SMT control\n");
>> +
> 
> May be we need to make it more conditional as we may have to support
> systems with few cores that are single threaded ? I think Dietmar's
> comment is about that.
> 

it thought of ignoring the cores with single thread in one previous discussion
as replied in Dietmar's thread.

>> +max_smt_thread_num = max_t(unsigned int, max_smt_thread_num, i);
>> +
>>  cpu = get_cpu_for_node(core);
>>  if (cpu >= 0) {
>>  if (!leaf) {
>> @@ -677,6 +692,18 @@ static int __init parse_socket(struct device_node 
>> *socket)
>>  if (!has_socket)
>>  ret = parse_cluster(socket, 0, -1, 0);
>>  
>> +/*
>> + * Notify the CPU framework of the SMT support. Initialize the
>> + * max_smt_thread_num to 1 if no SMT support detected or failed
>> + * to parse the topology. A thread number of 1 can be handled by
>> + * the framework so we don't need to check max_smt_thread_num to
>> + * see we support SMT or not.
>> + */
>> +if (!max_smt_thread_num || ret)
>> +max_smt_thread_num = 1;
>> +
> 
> For the failed parsing of topology, reset_cpu_topology() gets called.
> I suggest resetting max_smt_thread_num to 1 belongs there.

this is only used by ARM64 || RISCV for using arch_topology to parse
the CPU topology, but the reset_cpu_topology() is also shared by arm/parisc.
Should we move it there and add some ARM64 || RISCV protection macro?

> 
> And if you start with max_smt_thread_num, we don't need to update it to
> 1 explicitly here. So I would like to get rid of above check completely.
> 
> --
> Regards,
> Sudeep
> 
> .
> 



Re: [PATCH v11 2/4] arch_topology: Support SMT control for OF based system

2025-03-03 Thread Yicong Yang
On 2025/2/28 19:11, Dietmar Eggemann wrote:
> On 18/02/2025 15:10, Yicong Yang wrote:
>> From: Yicong Yang 
>>
>> On building the topology from the devicetree, we've already
>> gotten the SMT thread number of each core. Update the largest
>> SMT thread number and enable the SMT control by the end of
>> topology parsing.
>>
>> The core's SMT control provides two interface to the users [1]:
>> 1) enable/disable SMT by writing on/off
>> 2) enable/disable SMT by writing thread number 1/max_thread_number
> 
> 1/max_thread_number stands for '1 or max_thread_number', right ?
> 
> Aren't the two interfaces:
> 
> (a) /sys/devices/system/cpu/smt/active
> (b) /sys/devices/system/cpu/smt/control
> 
> and you write 1) or 2) (or 'forceoff') into (b)?

yes you're correct. "active" is a RO file for status only so not for this 
interface.
Let me explicitly mention the /sys/devices/system/cpu/smt/control here in the 
commit.

> 
>> If a system have more than one SMT thread number the 2) may
> 
> s/have/has
> 
>> not handle it well, since there're multiple thread numbers in the
> 
> multiple thread numbers other than 1, right?

according to the pr_warn_once() we implemented below it also includes the case
where the system have one type of SMT cores and non-SMT cores (the thread 
number is 1):
- 1 thread
- X (!= 1) threads

Discussion made in [1] and I thought we have agreement (hope I understood 
correctly)
that all the asymmetric cases need to notify. Do you and Sudeep think we should 
not
warn in such case?

[1] 
https://lore.kernel.org/linux-arm-kernel/10082e64-b00a-a30b-b9c5-1401a54f6...@huawei.com/

> 
>> system and 2) only accept 1/max_thread_number. So issue a warning
>> to notify the users if such system detected.
> 
> This paragraph seems to be about heterogeneous systems. Maybe mention this?
> 
> Heterogeneous system with SMT only on a subset of cores (like Intel
> Hybrid): This one works (N threads per core with N=1 and N=2) just fine.
> 
> But on Arm64 (default) we would still see:
> 
> [0.075782] Heterogeneous SMT topology is partly supported by SMT control
> 

more clearer, will add it. Thanks.

>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-devices-system-cpu#n542
>>
>> Signed-off-by: Yicong Yang 
>> ---
>>  drivers/base/arch_topology.c | 27 +++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index 3ebe77566788..23f425a9d77a 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -11,6 +11,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -506,6 +507,10 @@ core_initcall(free_raw_capacity);
>>  #endif
>>  
>>  #if defined(CONFIG_ARM64) || defined(CONFIG_RISCV)
>> +
>> +/* Maximum SMT thread number detected used to enable the SMT control */
> 
> maybe shorter ?
> 
> /* used to enable SMT control */
> 

sure.

>> +static unsigned int max_smt_thread_num;
>> +
>>  /*
>>   * This function returns the logic cpu number of the node.
>>   * There are basically three kinds of return values:
>> @@ -565,6 +570,16 @@ static int __init parse_core(struct device_node *core, 
>> int package_id,
>>  i++;
>>  } while (1);
>>  
>> +/*
>> + * If max_smt_thread_num has been initialized and doesn't match
>> + * the thread number of this entry, then the system has
>> + * heterogeneous SMT topology.
>> + */
>> +if (max_smt_thread_num && max_smt_thread_num != i)
>> +pr_warn_once("Heterogeneous SMT topology is partly supported by 
>> SMT control\n");
>> +
>> +max_smt_thread_num = max_t(unsigned int, max_smt_thread_num, i);
>> +
>>  cpu = get_cpu_for_node(core);
>>  if (cpu >= 0) {
>>  if (!leaf) {
>> @@ -677,6 +692,18 @@ static int __init parse_socket(struct device_node 
>> *socket)
>>  if (!has_socket)
>>  ret = parse_cluster(socket, 0, -1, 0);
>>  
>> +/*
>> + * Notify the CPU framework of the SMT support. Initialize the
>> + * max_smt_thread_num to 1 if no SMT support detected or failed
>> + * to parse the topology. A thread number of 1 can be handled by
>> + * the framework so we don't need to check max_smt_thread_num to
>> + * see we support SMT or not.
> 
> Not sure whether the last sentence is needed here?
> 

We always need to call cpu_smt_set_num_threads() to notify the framework
of the thread number even if SMT is not supported. In which case the
thread number is 1 but the framework can handle this well. I worry readers
may get confused for notifying a thread number of 1 so add this comment this.

Will get rid of this if thought redundant.

Thanks.





Re: [PATCH v11 3/4] arm64: topology: Support SMT control on ACPI based system

2025-03-03 Thread Yicong Yang
On 2025/3/3 19:16, Sudeep Holla wrote:
> On Mon, Mar 03, 2025 at 10:56:12AM +0100, Pierre Gondois wrote:
>> On 2/28/25 20:06, Sudeep Holla wrote:
>
> Ditto as previous patch, can get rid if it is default 1.
>

 On non-SMT platforms, not calling cpu_smt_set_num_threads() leaves
 cpu_smt_num_threads uninitialized to UINT_MAX:

 smt/active:0
 smt/control:-1

 If cpu_smt_set_num_threads() is called:
 active:0
 control:notsupported

 So it might be slightly better to still initialize max_smt_thread_num.

>>>
>>> Sure, what I meant is to have max_smt_thread_num set to 1 by default is
>>> that is what needed anyways and the above code does that now.
>>>
>>> Why not start with initialised to 1 instead ?
>>> Of course some current logic needs to change around testing it for zero.
>>>
>>
>> I think there would still be a way to check against the default value.
>> If we have:
>> unsigned int max_smt_thread_num = 1;
>>
>> then on a platform with 2 threads, the detection condition would trigger:
>> xa_for_each(&hetero_cpu, hetero_id, entry) {
>> if (entry->thread_num != max_smt_thread_num && max_smt_thread_num) 
>> < (entry->thread_num=2) and (max_smt_thread_num=1)
>> pr_warn_once("Heterogeneous SMT topology is partly
>>   supported by SMT control\n");
>>
>> so we would need an additional variable:
>> bool is_initialized = false;
> 
> Sure, we could do that or skip the check if max_smt_thread_num == 1 ?
> 
> I mean
>   if (entry->thread_num != max_smt_thread_num && max_smt_thread_num != 1)
> 

this will work for me. will launch some tests.

Thanks.





Re: [PATCH v11 3/4] arm64: topology: Support SMT control on ACPI based system

2025-03-03 Thread Yicong Yang
On 2025/2/25 14:08, Hanjun Guo wrote:
> On 2025/2/18 22:10, Yicong Yang wrote:
>> From: Yicong Yang 
>>
>> For ACPI we'll build the topology from PPTT and we cannot directly
>> get the SMT number of each core. Instead using a temporary xarray
>> to record the heterogeneous information (from ACPI_PPTT_ACPI_IDENTICAL)
>> and SMT information of the first core in its heterogeneous CPU cluster
>> when building the topology. Then we can know the largest SMT number
>> in the system. If a homogeneous system's using ACPI 6.2 or later,
>> all the CPUs should be under the root node of PPTT. There'll be
>> only one entry in the xarray and all the CPUs in the system will
>> be assumed identical.
>>
>> The core's SMT control provides two interface to the users [1]:
>> 1) enable/disable SMT by writing on/off
>> 2) enable/disable SMT by writing thread number 1/max_thread_number
>>
>> If a system have more than one SMT thread number the 2) may
>> not handle it well, since there're multiple thread numbers in the
>> system and 2) only accept 1/max_thread_number. So issue a warning
>> to notify the users if such system detected.
>>
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-devices-system-cpu#n542
>>
>> Reviewed-by: Jonathan Cameron 
>> Signed-off-by: Yicong Yang 
>> ---
>>   arch/arm64/kernel/topology.c | 66 
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 1a2c72f3e7f8..6eba1ac091ee 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -15,8 +15,10 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>   #include 
>> +#include 
>>     #include 
>>   #include 
>> @@ -37,17 +39,28 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>   return !!is_threaded;
>>   }
>>   +struct cpu_smt_info {
>> +    unsigned int thread_num;
>> +    int core_id;
>> +};
>> +
>>   /*
>>    * Propagate the topology information of the processor_topology_node tree 
>> to the
>>    * cpu_topology array.
>>    */
>>   int __init parse_acpi_topology(void)
>>   {
>> +    unsigned int max_smt_thread_num = 0;
>> +    struct cpu_smt_info *entry;
>> +    struct xarray hetero_cpu;
>> +    unsigned long hetero_id;
>>   int cpu, topology_id;
>>     if (acpi_disabled)
>>   return 0;
>>   +    xa_init(&hetero_cpu);
>> +
>>   for_each_possible_cpu(cpu) {
>>   topology_id = find_acpi_cpu_topology(cpu, 0);
>>   if (topology_id < 0)
>> @@ -57,6 +70,34 @@ int __init parse_acpi_topology(void)
>>   cpu_topology[cpu].thread_id = topology_id;
>>   topology_id = find_acpi_cpu_topology(cpu, 1);
>>   cpu_topology[cpu].core_id   = topology_id;
>> +
>> +    /*
>> + * In the PPTT, CPUs below a node with the 'identical
>> + * implementation' flag have the same number of threads.
>> + * Count the number of threads for only one CPU (i.e.
>> + * one core_id) among those with the same hetero_id.
>> + * See the comment of find_acpi_cpu_topology_hetero_id()
>> + * for more details.
>> + *
>> + * One entry is created for each node having:
>> + * - the 'identical implementation' flag
>> + * - its parent not having the flag
>> + */
>> +    hetero_id = find_acpi_cpu_topology_hetero_id(cpu);
>> +    entry = xa_load(&hetero_cpu, hetero_id);
>> +    if (!entry) {
>> +    entry = kzalloc(sizeof(*entry), GFP_KERNEL);
>> +    WARN_ON_ONCE(!entry);
>> +
>> +    if (entry) {
>> +    entry->core_id = topology_id;
>> +    entry->thread_num = 1;
>> +    xa_store(&hetero_cpu, hetero_id,
>> + entry, GFP_KERNEL);
>> +    }
>> +    } else if (entry->core_id == topology_id) {
>> +    entry->thread_num++;
>> +    }
>>   } else {
>>   cpu_topology[cpu].thread_id  = -1;
>>   cpu_topology[cpu].core_id    = topology_id;
>> @@ -67,6 +108,31 @@ int __init parse_acpi_topology(void)
>>   cpu_topology[cpu].package_id = topology_id;
>>   }
>>   +    /*
>> + * This should be a short loop depending on the number of heterogeneous
>> + * CPU clusters. Typically on a homogeneous system there's only one
>> + * entry in the XArray.
>> + */
>> +    xa_for_each(&hetero_cpu, hetero_id, entry) {
>> +    if (entry->thread_num != max_smt_thread_num && max_smt_thread_num)
>> +    pr_warn_once("Heterogeneous SMT topology is partly supported by 
>> SMT control\n");
>> +
>> +    max_smt_thread_num = max(max_smt_thread_num, entry->thread_num);
>> +    xa_erase(&hetero_cpu, hetero_id);
>> +    kfree(entry);
>> +    }
>> +
>> + 

Re: [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts

2025-03-03 Thread Andrew Morton
On Fri, 28 Feb 2025 14:42:40 +1100 Alistair Popple  wrote:

> This is essentially the same as what's currently in mm-unstable aside from
> the two updates listed below. The main thing to note is it incorporates
> Balbir's fixup which is currently in mm-unstable as c98612955016
> ("mm-allow-compound-zone-device-pages-fix-fix")
> 

Thanks, I've updated mm.git to this v9 series.



[PATCH v3 net-next 07/13] net: enetc: check if the RSS hfunc is toeplitz

2025-03-03 Thread Wei Fang
Both ENETC v1 and ENETC v4 only support the toeplitz algorithm for RSS,
so add a check for RSS hfunc.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc_ethtool.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c 
b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
index bc65135925b8..6a47e2bd1d4f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
@@ -753,6 +753,13 @@ static int enetc_set_rxfh(struct net_device *ndev,
struct enetc_si *si = priv->si;
int err = 0;
 
+   if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
+   rxfh->hfunc != ETH_RSS_HASH_TOP) {
+   netdev_err(ndev, "Only toeplitz hash function is supported\n");
+
+   return -EOPNOTSUPP;
+   }
+
/* set hash key, if PF */
if (rxfh->key && enetc_si_is_pf(si))
enetc_set_rss_key(si, rxfh->key);
-- 
2.34.1




[PATCH v3 net-next 09/13] net: enetc: move generic VLAN filter interfaces to enetc-core

2025-03-03 Thread Wei Fang
For ENETC, each SI has a corresponding VLAN hash table. That is to say,
both PF and VFs can support VLAN filter. However, currently only ENETC v1
PF driver supports VLAN filter. In order to make i.MX95 ENETC (v4) PF and
VF drivers also support VLAN filter, some related macros are moved from
enetc_pf.h to enetc.h, and the related structure variables are moved from
enetc_pf to enetc_si.

Besides, enetc_vid_hash_idx() as a generic function is moved to enetc.c.
Extract enetc_refresh_vlan_ht_filter() from enetc_sync_vlan_ht_filter()
so that it can be shared by PF and VF drivers. This will make it easier
to add VLAN filter support for i.MX95 ENETC later.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 25 ++
 drivers/net/ethernet/freescale/enetc/enetc.h  |  6 +++
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 46 +--
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  4 --
 4 files changed, 42 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 8583ac9f7b9e..248dbc874eec 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -72,6 +72,31 @@ void enetc_reset_mac_addr_filter(struct enetc_mac_filter 
*filter)
 }
 EXPORT_SYMBOL_GPL(enetc_reset_mac_addr_filter);
 
+int enetc_vid_hash_idx(unsigned int vid)
+{
+   int res = 0;
+   int i;
+
+   for (i = 0; i < 6; i++)
+   res |= (hweight8(vid & (BIT(i) | BIT(i + 6))) & 0x1) << i;
+
+   return res;
+}
+EXPORT_SYMBOL_GPL(enetc_vid_hash_idx);
+
+void enetc_refresh_vlan_ht_filter(struct enetc_si *si)
+{
+   int i;
+
+   bitmap_zero(si->vlan_ht_filter, ENETC_VLAN_HT_SIZE);
+   for_each_set_bit(i, si->active_vlans, VLAN_N_VID) {
+   int hidx = enetc_vid_hash_idx(i);
+
+   __set_bit(hidx, si->vlan_ht_filter);
+   }
+}
+EXPORT_SYMBOL_GPL(enetc_refresh_vlan_ht_filter);
+
 static int enetc_num_stack_tx_queues(struct enetc_ndev_priv *priv)
 {
int num_tx_rings = priv->num_tx_rings;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index ecf79338cd79..c60741dfe358 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -24,6 +24,7 @@
 #define ENETC_CBD_DATA_MEM_ALIGN 64
 
 #define ENETC_MADDR_HASH_TBL_SZ64
+#define ENETC_VLAN_HT_SIZE 64
 
 enum enetc_mac_addr_type {UC, MC, MADDR_TYPE};
 
@@ -321,6 +322,9 @@ struct enetc_si {
struct workqueue_struct *workqueue;
struct work_struct rx_mode_task;
struct dentry *debugfs_root;
+
+   DECLARE_BITMAP(vlan_ht_filter, ENETC_VLAN_HT_SIZE);
+   DECLARE_BITMAP(active_vlans, VLAN_N_VID);
 };
 
 #define ENETC_SI_ALIGN 32
@@ -506,6 +510,8 @@ int enetc_get_driver_data(struct enetc_si *si);
 void enetc_add_mac_addr_ht_filter(struct enetc_mac_filter *filter,
  const unsigned char *addr);
 void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter);
+int enetc_vid_hash_idx(unsigned int vid);
+void enetc_refresh_vlan_ht_filter(struct enetc_si *si);
 
 int enetc_open(struct net_device *ndev);
 int enetc_close(struct net_device *ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 38ec7657b9aa..f9b179ed6d8b 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -222,45 +222,18 @@ static void enetc_set_vlan_ht_filter(struct enetc_hw *hw, 
int si_idx,
enetc_port_wr(hw, ENETC_PSIVHFR1(si_idx), upper_32_bits(hash));
 }
 
-static int enetc_vid_hash_idx(unsigned int vid)
-{
-   int res = 0;
-   int i;
-
-   for (i = 0; i < 6; i++)
-   res |= (hweight8(vid & (BIT(i) | BIT(i + 6))) & 0x1) << i;
-
-   return res;
-}
-
-static void enetc_sync_vlan_ht_filter(struct enetc_pf *pf, bool rehash)
-{
-   int i;
-
-   if (rehash) {
-   bitmap_zero(pf->vlan_ht_filter, ENETC_VLAN_HT_SIZE);
-
-   for_each_set_bit(i, pf->active_vlans, VLAN_N_VID) {
-   int hidx = enetc_vid_hash_idx(i);
-
-   __set_bit(hidx, pf->vlan_ht_filter);
-   }
-   }
-
-   enetc_set_vlan_ht_filter(&pf->si->hw, 0, *pf->vlan_ht_filter);
-}
-
 static int enetc_vlan_rx_add_vid(struct net_device *ndev, __be16 prot, u16 vid)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_pf *pf = enetc_si_priv(priv->si);
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
int idx;
 
-   __set_bit(vid, pf->active_vlans);
+   __set_bit(vid, si->active_vlans);
 
idx = enetc_vid_hash_idx(vid);
-   if (!__test_and_set_bit(idx, pf->vlan_ht_filter))
-   enetc_sync_vlan_ht_filter(pf, false);
+   if (!__test_and

[PATCH v3 net-next 10/13] net: enetc: move generic VLAN hash filter functions to enetc_pf_common.c

2025-03-03 Thread Wei Fang
Since the VLAN hash filter of ENETC v1 and v4 is the basically same, the
only difference is the offset of the VLAN hash filter registers. So, the
.set_si_vlan_hash_filter() hook is added to struct enetc_pf_ops to set
the registers of the corresponding platform. In addition, the common VLAN
hash filter functions enetc_vlan_rx_add_vid() and enetc_vlan_rx_del_vid()
are moved to enetc_pf_common.c.

Signed-off-by: Wei Fang 
---
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 34 ++-
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  1 +
 .../freescale/enetc/enetc_pf_common.c | 34 +++
 .../freescale/enetc/enetc_pf_common.h |  2 ++
 4 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index f9b179ed6d8b..d3ca9e33893f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -215,43 +215,12 @@ static void enetc_pf_set_rx_mode(struct net_device *ndev)
enetc_port_wr(hw, ENETC_PSIPMR, psipmr);
 }
 
-static void enetc_set_vlan_ht_filter(struct enetc_hw *hw, int si_idx,
-unsigned long hash)
+static void enetc_set_vlan_ht_filter(struct enetc_hw *hw, int si_idx, u64 hash)
 {
enetc_port_wr(hw, ENETC_PSIVHFR0(si_idx), lower_32_bits(hash));
enetc_port_wr(hw, ENETC_PSIVHFR1(si_idx), upper_32_bits(hash));
 }
 
-static int enetc_vlan_rx_add_vid(struct net_device *ndev, __be16 prot, u16 vid)
-{
-   struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_si *si = priv->si;
-   struct enetc_hw *hw = &si->hw;
-   int idx;
-
-   __set_bit(vid, si->active_vlans);
-
-   idx = enetc_vid_hash_idx(vid);
-   if (!__test_and_set_bit(idx, si->vlan_ht_filter))
-   enetc_set_vlan_ht_filter(hw, 0, *si->vlan_ht_filter);
-
-   return 0;
-}
-
-static int enetc_vlan_rx_del_vid(struct net_device *ndev, __be16 prot, u16 vid)
-{
-   struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_si *si = priv->si;
-   struct enetc_hw *hw = &si->hw;
-
-   if (__test_and_clear_bit(vid, si->active_vlans)) {
-   enetc_refresh_vlan_ht_filter(si);
-   enetc_set_vlan_ht_filter(hw, 0, *si->vlan_ht_filter);
-   }
-
-   return 0;
-}
-
 static void enetc_set_loopback(struct net_device *ndev, bool en)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
@@ -953,6 +922,7 @@ static const struct enetc_pf_ops enetc_pf_ops = {
.create_pcs = enetc_pf_create_pcs,
.destroy_pcs = enetc_pf_destroy_pcs,
.enable_psfp = enetc_psfp_enable,
+   .set_si_vlan_hash_filter = enetc_set_vlan_ht_filter,
 };
 
 static int enetc_pf_probe(struct pci_dev *pdev,
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.h 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.h
index 90137fbb8f48..704c4ee42f61 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.h
@@ -37,6 +37,7 @@ struct enetc_pf_ops {
struct phylink_pcs *(*create_pcs)(struct enetc_pf *pf, struct mii_bus 
*bus);
void (*destroy_pcs)(struct phylink_pcs *pcs);
int (*enable_psfp)(struct enetc_ndev_priv *priv);
+   void (*set_si_vlan_hash_filter)(struct enetc_hw *hw, int si, u64 hash);
 };
 
 struct enetc_pf {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index a737a7f8c79e..9f812c1af7a3 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -343,5 +343,39 @@ void enetc_phylink_destroy(struct enetc_ndev_priv *priv)
 }
 EXPORT_SYMBOL_GPL(enetc_phylink_destroy);
 
+int enetc_vlan_rx_add_vid(struct net_device *ndev, __be16 prot, u16 vid)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_pf *pf = enetc_si_priv(priv->si);
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
+   int idx;
+
+   __set_bit(vid, si->active_vlans);
+
+   idx = enetc_vid_hash_idx(vid);
+   if (!__test_and_set_bit(idx, si->vlan_ht_filter))
+   pf->ops->set_si_vlan_hash_filter(hw, 0, *si->vlan_ht_filter);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(enetc_vlan_rx_add_vid);
+
+int enetc_vlan_rx_del_vid(struct net_device *ndev, __be16 prot, u16 vid)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_pf *pf = enetc_si_priv(priv->si);
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
+
+   if (__test_and_clear_bit(vid, si->active_vlans)) {
+   enetc_refresh_vlan_ht_filter(si);
+   pf->ops->set_si_vlan_hash_filter(hw, 0, *si->vlan_ht_filter);
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(enetc_vlan_rx_del_vid);
+
 MODULE_DESCRIPTION("NXP EN

[PATCH v3 net-next 11/13] net: enetc: add VLAN filtering support for i.MX95 ENETC PF

2025-03-03 Thread Wei Fang
Add VLAN hash filter support for i.MX95 ENETC PF. If VLAN filtering is
disabled, then VLAN promiscuous mode will be enabled, which means that
PF qualifies for reception of all VLAN tags.

Signed-off-by: Wei Fang 
---
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |  4 
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 20 +++
 .../freescale/enetc/enetc_pf_common.c |  2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h 
b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
index 826359004850..aa25b445d301 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
@@ -107,6 +107,10 @@
 #define ENETC4_PSIMMHFR0(a)((a) * 0x80 + 0x2058)
 #define ENETC4_PSIMMHFR1(a)((a) * 0x80 + 0x205c)
 
+/* Port station interface a VLAN hash filter register 0/1 */
+#define ENETC4_PSIVHFR0(a) ((a) * 0x80 + 0x2060)
+#define ENETC4_PSIVHFR1(a) ((a) * 0x80 + 0x2064)
+
 #define ENETC4_PMCAPR  0x4004
 #define  PMCAPR_HD BIT(8)
 #define  PMCAPR_FP GENMASK(10, 9)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index adaf28fdf0aa..e08d06e22898 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -101,6 +101,13 @@ static void enetc4_pf_set_si_mac_hash_filter(struct 
enetc_hw *hw, int si,
}
 }
 
+static void enetc4_pf_set_si_vlan_hash_filter(struct enetc_hw *hw,
+ int si, u64 hash)
+{
+   enetc_port_wr(hw, ENETC4_PSIVHFR0(si), lower_32_bits(hash));
+   enetc_port_wr(hw, ENETC4_PSIVHFR1(si), upper_32_bits(hash));
+}
+
 static void enetc4_pf_destroy_mac_list(struct enetc_pf *pf)
 {
struct enetc_mac_list_entry *entry;
@@ -403,6 +410,7 @@ static void enetc4_pf_set_mac_filter(struct enetc_pf *pf, 
int type)
 static const struct enetc_pf_ops enetc4_pf_ops = {
.set_si_primary_mac = enetc4_pf_set_si_primary_mac,
.get_si_primary_mac = enetc4_pf_get_si_primary_mac,
+   .set_si_vlan_hash_filter = enetc4_pf_set_si_vlan_hash_filter,
 };
 
 static int enetc4_pf_struct_init(struct enetc_si *si)
@@ -692,6 +700,16 @@ static void enetc4_pf_set_rx_mode(struct net_device *ndev)
 static int enetc4_pf_set_features(struct net_device *ndev,
  netdev_features_t features)
 {
+   netdev_features_t changed = ndev->features ^ features;
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_hw *hw = &priv->si->hw;
+
+   if (changed & NETIF_F_HW_VLAN_CTAG_FILTER) {
+   bool promisc_en = !(features & NETIF_F_HW_VLAN_CTAG_FILTER);
+
+   enetc4_pf_set_si_vlan_promisc(hw, 0, promisc_en);
+   }
+
enetc_set_features(ndev, features);
 
return 0;
@@ -705,6 +723,8 @@ static const struct net_device_ops enetc4_ndev_ops = {
.ndo_set_mac_address= enetc_pf_set_mac_addr,
.ndo_set_rx_mode= enetc4_pf_set_rx_mode,
.ndo_set_features   = enetc4_pf_set_features,
+   .ndo_vlan_rx_add_vid= enetc_vlan_rx_add_vid,
+   .ndo_vlan_rx_kill_vid   = enetc_vlan_rx_del_vid,
 };
 
 static struct phylink_pcs *
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 9f812c1af7a3..3f7ccc482301 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -135,7 +135,7 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct 
net_device *ndev,
 
/* TODO: currently, i.MX95 ENETC driver does not support advanced 
features */
if (!is_enetc_rev1(si)) {
-   ndev->hw_features &= ~(NETIF_F_HW_VLAN_CTAG_FILTER | 
NETIF_F_LOOPBACK);
+   ndev->hw_features &= ~NETIF_F_LOOPBACK;
goto end;
}
 
-- 
2.34.1




[PATCH v3 net-next 08/13] net: enetc: enable RSS feature by default

2025-03-03 Thread Wei Fang
Receive side scaling (RSS) is a network driver technology that enables
the efficient distribution of network receive processing across multiple
CPUs in multiprocessor systems. Therefore, it is better to enable RSS by
default so that the CPU load can be balanced and network performance can
be improved when then network is enabled.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 35 ++-
 .../freescale/enetc/enetc_pf_common.c |  4 ++-
 .../net/ethernet/freescale/enetc/enetc_vf.c   |  4 ++-
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 5b5e65ac8fab..8583ac9f7b9e 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2420,6 +2420,22 @@ static void enetc_set_lso_flags_mask(struct enetc_hw *hw)
enetc_wr(hw, ENETC4_SILSOSFMR1, 0);
 }
 
+static int enetc_set_rss(struct net_device *ndev, int en)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_hw *hw = &priv->si->hw;
+   u32 reg;
+
+   enetc_wr(hw, ENETC_SIRBGCR, priv->num_rx_rings);
+
+   reg = enetc_rd(hw, ENETC_SIMR);
+   reg &= ~ENETC_SIMR_RSSE;
+   reg |= (en) ? ENETC_SIMR_RSSE : 0;
+   enetc_wr(hw, ENETC_SIMR, reg);
+
+   return 0;
+}
+
 int enetc_configure_si(struct enetc_ndev_priv *priv)
 {
struct enetc_si *si = priv->si;
@@ -2440,6 +2456,9 @@ int enetc_configure_si(struct enetc_ndev_priv *priv)
err = enetc_setup_default_rss_table(si, priv->num_rx_rings);
if (err)
return err;
+
+   if (priv->ndev->features & NETIF_F_RXHASH)
+   enetc_set_rss(priv->ndev, true);
}
 
return 0;
@@ -3232,22 +3251,6 @@ struct net_device_stats *enetc_get_stats(struct 
net_device *ndev)
 }
 EXPORT_SYMBOL_GPL(enetc_get_stats);
 
-static int enetc_set_rss(struct net_device *ndev, int en)
-{
-   struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_hw *hw = &priv->si->hw;
-   u32 reg;
-
-   enetc_wr(hw, ENETC_SIRBGCR, priv->num_rx_rings);
-
-   reg = enetc_rd(hw, ENETC_SIMR);
-   reg &= ~ENETC_SIMR_RSSE;
-   reg |= (en) ? ENETC_SIMR_RSSE : 0;
-   enetc_wr(hw, ENETC_SIMR, reg);
-
-   return 0;
-}
-
 static void enetc_enable_rxvlan(struct net_device *ndev, bool en)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index c346e0e3ad37..a737a7f8c79e 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -128,8 +128,10 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct 
net_device *ndev,
if (si->hw_features & ENETC_SI_F_LSO)
priv->active_offloads |= ENETC_F_LSO;
 
-   if (si->num_rss)
+   if (si->num_rss) {
ndev->hw_features |= NETIF_F_RXHASH;
+   ndev->features |= NETIF_F_RXHASH;
+   }
 
/* TODO: currently, i.MX95 ENETC driver does not support advanced 
features */
if (!is_enetc_rev1(si)) {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_vf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
index 072e5b40a199..3372a9a779a6 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_vf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
@@ -155,8 +155,10 @@ static void enetc_vf_netdev_setup(struct enetc_si *si, 
struct net_device *ndev,
ndev->vlan_features = NETIF_F_SG | NETIF_F_HW_CSUM |
  NETIF_F_TSO | NETIF_F_TSO6;
 
-   if (si->num_rss)
+   if (si->num_rss) {
ndev->hw_features |= NETIF_F_RXHASH;
+   ndev->features |= NETIF_F_RXHASH;
+   }
 
/* pick up primary MAC address from SI */
enetc_load_primary_mac_addr(&si->hw, ndev);
-- 
2.34.1




[PATCH v3 net-next 12/13] net: enetc: add loopback support for i.MX95 ENETC PF

2025-03-03 Thread Wei Fang
Add internal loopback support for i.MX95 ENETC PF, the default loopback
mode is MAC level loopback, the MAC Tx data is looped back onto the Rx.
The MAC interface runs at a fixed 1:8 ratio of NETC clock in MAC-level
loopback mode, with no dependency on Tx clock.

Signed-off-by: Wei Fang 
---
 .../net/ethernet/freescale/enetc/enetc4_pf.c   | 18 ++
 .../ethernet/freescale/enetc/enetc_pf_common.c |  4 +---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index e08d06e22898..ea859792ccfa 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -108,6 +108,21 @@ static void enetc4_pf_set_si_vlan_hash_filter(struct 
enetc_hw *hw,
enetc_port_wr(hw, ENETC4_PSIVHFR1(si), upper_32_bits(hash));
 }
 
+static void enetc4_pf_set_loopback(struct net_device *ndev, bool en)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_si *si = priv->si;
+   u32 val;
+
+   val = enetc_port_mac_rd(si, ENETC4_PM_CMD_CFG(0));
+   val = u32_replace_bits(val, en ? 1 : 0, PM_CMD_CFG_LOOP_EN);
+   /* Default to select MAC level loopback mode if loopback is enabled. */
+   val = u32_replace_bits(val, en ? LPBCK_MODE_MAC_LEVEL : 0,
+  PM_CMD_CFG_LPBK_MODE);
+
+   enetc_port_mac_wr(si, ENETC4_PM_CMD_CFG(0), val);
+}
+
 static void enetc4_pf_destroy_mac_list(struct enetc_pf *pf)
 {
struct enetc_mac_list_entry *entry;
@@ -710,6 +725,9 @@ static int enetc4_pf_set_features(struct net_device *ndev,
enetc4_pf_set_si_vlan_promisc(hw, 0, promisc_en);
}
 
+   if (changed & NETIF_F_LOOPBACK)
+   enetc4_pf_set_loopback(ndev, !!(features & NETIF_F_LOOPBACK));
+
enetc_set_features(ndev, features);
 
return 0;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 3f7ccc482301..0a2b8769a175 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -134,10 +134,8 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct 
net_device *ndev,
}
 
/* TODO: currently, i.MX95 ENETC driver does not support advanced 
features */
-   if (!is_enetc_rev1(si)) {
-   ndev->hw_features &= ~NETIF_F_LOOPBACK;
+   if (!is_enetc_rev1(si))
goto end;
-   }
 
ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
 NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG |
-- 
2.34.1




[PATCH v3 net-next 13/13] MAINTAINERS: add new file ntmp.h to ENETC driver

2025-03-03 Thread Wei Fang
Add new file ntmp.h. to ENETC driver.

Signed-off-by: Wei Fang 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7078199fcebf..e259b659eadb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9174,6 +9174,7 @@ F:
Documentation/devicetree/bindings/net/nxp,netc-blk-ctrl.yaml
 F: drivers/net/ethernet/freescale/enetc/
 F: include/linux/fsl/enetc_mdio.h
 F: include/linux/fsl/netc_global.h
+F: include/linux/fsl/ntmp.h
 
 FREESCALE eTSEC ETHERNET DRIVER (GIANFAR)
 M: Claudiu Manoil 
-- 
2.34.1




[PATCH v3 net-next 06/13] net: enetc: add RSS support for i.MX95 ENETC PF

2025-03-03 Thread Wei Fang
Add Receive side scaling (RSS) support for i.MX95 ENETC PF to improve
the network performance and balance the CPU loading. The main changes
are as follows.

1. Since i.MX95 ENETC (v4) use NTMP 2.0 to manage the RSS table, which
is different from LS1028A ENETC (v1). In order to reuse some functions
related to the RSS table, so add .get_rss_table() and .set_rss_table()
hooks to enetc_si_ops.

2. Since the offset of the RSS key registers of i.MX95 ENETC is also
different from that of LS1028A, so add enetc_get_rss_key_base() to get
the base offset for the different chips, so that enetc_set_rss_key()
and enetc_get_rss_key() can be reused for this trivial.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  |  7 +-
 drivers/net/ethernet/freescale/enetc/enetc.h  |  6 +-
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 22 +++---
 .../net/ethernet/freescale/enetc/enetc_cbdr.c | 14 
 .../ethernet/freescale/enetc/enetc_ethtool.c  | 69 +++
 .../net/ethernet/freescale/enetc/enetc_pf.c   |  4 +-
 .../freescale/enetc/enetc_pf_common.c |  6 +-
 .../net/ethernet/freescale/enetc/enetc_vf.c   |  2 +
 8 files changed, 95 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 3832d2cd91ba..5b5e65ac8fab 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2405,7 +2405,7 @@ static int enetc_setup_default_rss_table(struct enetc_si 
*si, int num_groups)
for (i = 0; i < si->num_rss; i++)
rss_table[i] = i % num_groups;
 
-   enetc_set_rss_table(si, rss_table, si->num_rss);
+   si->ops->set_rss_table(si, rss_table, si->num_rss);
 
kfree(rss_table);
 
@@ -2436,10 +2436,7 @@ int enetc_configure_si(struct enetc_ndev_priv *priv)
if (si->hw_features & ENETC_SI_F_LSO)
enetc_set_lso_flags_mask(hw);
 
-   /* TODO: RSS support for i.MX95 will be supported later, and the
-* is_enetc_rev1() condition will be removed
-*/
-   if (si->num_rss && is_enetc_rev1(si)) {
+   if (si->num_rss) {
err = enetc_setup_default_rss_table(si, priv->num_rx_rings);
if (err)
return err;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index ca1bc85c0ac9..ecf79338cd79 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -290,6 +290,8 @@ struct enetc_si;
 struct enetc_si_ops {
int (*setup_cbdr)(struct enetc_si *si);
void (*teardown_cbdr)(struct enetc_si *si);
+   int (*get_rss_table)(struct enetc_si *si, u32 *table, int count);
+   int (*set_rss_table)(struct enetc_si *si, const u32 *table, int count);
 };
 
 /* PCI IEP device data */
@@ -537,10 +539,12 @@ int enetc_set_mac_flt_entry(struct enetc_si *si, int 
index,
 int enetc_clear_mac_flt_entry(struct enetc_si *si, int index);
 int enetc_set_fs_entry(struct enetc_si *si, struct enetc_cmd_rfse *rfse,
   int index);
-void enetc_set_rss_key(struct enetc_hw *hw, const u8 *bytes);
+void enetc_set_rss_key(struct enetc_si *si, const u8 *bytes);
 int enetc_get_rss_table(struct enetc_si *si, u32 *table, int count);
 int enetc_set_rss_table(struct enetc_si *si, const u32 *table, int count);
 int enetc_send_cmd(struct enetc_si *si, struct enetc_cbd *cbd);
+int enetc4_get_rss_table(struct enetc_si *si, u32 *table, int count);
+int enetc4_set_rss_table(struct enetc_si *si, const u32 *table, int count);
 
 static inline void *enetc_cbd_alloc_data_mem(struct enetc_si *si,
 struct enetc_cbd *cbd,
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index c696eb4f0488..adaf28fdf0aa 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -579,22 +579,13 @@ static void enetc4_set_trx_frame_size(struct enetc_pf *pf)
enetc4_pf_reset_tc_msdu(&si->hw);
 }
 
-static void enetc4_set_rss_key(struct enetc_hw *hw, const u8 *bytes)
-{
-   int i;
-
-   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
-   enetc_port_wr(hw, ENETC4_PRSSKR(i), ((u32 *)bytes)[i]);
-}
-
 static void enetc4_set_default_rss_key(struct enetc_pf *pf)
 {
u8 hash_key[ENETC_RSSHASH_KEY_SIZE] = {0};
-   struct enetc_hw *hw = &pf->si->hw;
 
/* set up hash key */
get_random_bytes(hash_key, ENETC_RSSHASH_KEY_SIZE);
-   enetc4_set_rss_key(hw, hash_key);
+   enetc_set_rss_key(pf->si, hash_key);
 }
 
 static void enetc4_enable_trx(struct enetc_pf *pf)
@@ -698,6 +689,14 @@ static void enetc4_pf_set_rx_mode(struct net_device *ndev)
queue_work(si->workqueue, &si->rx_mode_task);
 }
 
+static int enetc4_pf_set_features(struct net_device *ndev,
+

[PATCH v3 net-next 01/13] net: enetc: add initial netc-lib driver to support NTMP

2025-03-03 Thread Wei Fang
Some NETC functionality is controlled using control messages sent to the
hardware using BD ring interface with 32B descriptor similar to transmit
BD ring used on ENETC. This BD ring interface is referred to as command
BD ring. It is used to configure functionality where the underlying
resources may be shared between different entities or being too large to
configure using direct registers. Therefore, a messaging protocol called
NETC Table Management Protocol (NTMP) is provided for exchanging
configuration and management information between the software and the
hardware using the command BD ring interface.

For i.MX95, NTMP has been upgraded to version 2.0, which is incompatible
with LS1028A, because the message formats have been changed. Therefore,
add the netc-lib driver to support NTMP 2.0 to operate various tables.
Note that, only MAC address filter table and RSS table are supported at
the moment. More tables will be supported in subsequent patches.

It is worth mentioning that the purpose of the netc-lib driver is to
provide some NTMP-based generic interfaces for ENETC and NETC Switch
drivers. Currently, it only supports the configurations of some tables.
Interfaces such as tc flower and debugfs will be added in the future.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/Kconfig  |  11 +
 drivers/net/ethernet/freescale/enetc/Makefile |   3 +
 drivers/net/ethernet/freescale/enetc/ntmp.c   | 458 ++
 .../ethernet/freescale/enetc/ntmp_private.h   |  67 +++
 include/linux/fsl/ntmp.h  | 178 +++
 5 files changed, 717 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp_private.h
 create mode 100644 include/linux/fsl/ntmp.h

diff --git a/drivers/net/ethernet/freescale/enetc/Kconfig 
b/drivers/net/ethernet/freescale/enetc/Kconfig
index 6c2779047dcd..94db8e8d0eb3 100644
--- a/drivers/net/ethernet/freescale/enetc/Kconfig
+++ b/drivers/net/ethernet/freescale/enetc/Kconfig
@@ -15,6 +15,16 @@ config NXP_ENETC_PF_COMMON
 
  If compiled as module (M), the module name is nxp-enetc-pf-common.
 
+config NXP_NETC_LIB
+   tristate "NETC Library"
+   help
+ This module provides common functionalities for both ENETC and NETC
+ Switch, such as NETC Table Management Protocol (NTMP) 2.0, common tc
+ flower and debugfs interfaces and so on.
+
+ If compiled as module (M), the module name is nxp-netc-lib.
+
+
 config FSL_ENETC
tristate "ENETC PF driver"
depends on PCI_MSI
@@ -40,6 +50,7 @@ config NXP_ENETC4
select FSL_ENETC_CORE
select FSL_ENETC_MDIO
select NXP_ENETC_PF_COMMON
+   select NXP_NETC_LIB
select PHYLINK
select DIMLIB
help
diff --git a/drivers/net/ethernet/freescale/enetc/Makefile 
b/drivers/net/ethernet/freescale/enetc/Makefile
index 6fd27ee4fcd1..707a68e26971 100644
--- a/drivers/net/ethernet/freescale/enetc/Makefile
+++ b/drivers/net/ethernet/freescale/enetc/Makefile
@@ -6,6 +6,9 @@ fsl-enetc-core-y := enetc.o enetc_cbdr.o enetc_ethtool.o
 obj-$(CONFIG_NXP_ENETC_PF_COMMON) += nxp-enetc-pf-common.o
 nxp-enetc-pf-common-y := enetc_pf_common.o
 
+obj-$(CONFIG_NXP_NETC_LIB) += nxp-netc-lib.o
+nxp-netc-lib-y := ntmp.o
+
 obj-$(CONFIG_FSL_ENETC) += fsl-enetc.o
 fsl-enetc-y := enetc_pf.o
 fsl-enetc-$(CONFIG_PCI_IOV) += enetc_msg.o
diff --git a/drivers/net/ethernet/freescale/enetc/ntmp.c 
b/drivers/net/ethernet/freescale/enetc/ntmp.c
new file mode 100644
index ..df10f2f310c1
--- /dev/null
+++ b/drivers/net/ethernet/freescale/enetc/ntmp.c
@@ -0,0 +1,458 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
+/*
+ * NETC NTMP (NETC Table Management Protocol) 2.0 Library
+ * Copyright 2025 NXP
+ */
+
+#include 
+#include 
+#include 
+
+#include "ntmp_private.h"
+
+#define NETC_CBDR_TIMEOUT  1000 /* us */
+#define NETC_CBDR_MR_ENBIT(31)
+
+#define NTMP_BASE_ADDR_ALIGN   128
+#define NTMP_DATA_ADDR_ALIGN   32
+
+/* Define NTMP Table ID */
+#define NTMP_MAFT_ID   1
+#define NTMP_RSST_ID   3
+
+/* Generic Update Actions for most tables */
+#define NTMP_GEN_UA_CFGEU  BIT(0)
+#define NTMP_GEN_UA_STSEU  BIT(1)
+
+#define NTMP_ENTRY_ID_SIZE 4
+#define RSST_ENTRY_NUM 64
+#define RSST_STSE_DATA_SIZE(n) ((n) * 8)
+#define RSST_CFGE_DATA_SIZE(n) (n)
+
+int netc_setup_cbdr(struct device *dev, int cbd_num,
+   struct netc_cbdr_regs *regs,
+   struct netc_cbdr *cbdr)
+{
+   size_t size;
+
+   size = cbd_num * sizeof(union netc_cbd) + NTMP_BASE_ADDR_ALIGN;
+   cbdr->addr_base = dma_alloc_coherent(dev, size, &cbdr->dma_base,
+GFP_KERNEL);
+   if (!cbdr->addr_base)
+   return -ENOMEM;
+
+   cbdr->dma_size = size;
+  

Re: [PATCH v3 1/1] cxl: Remove driver

2025-03-03 Thread Martin K. Petersen


Hi Madhavan!

> This patch has depenednecy with the first patch 
>
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2025-February/280990.html
>
> Which is already part of your staging tree. Can you please
> take this patch along with the previous patch. 

If I merge the main cxl patch we'll have another conflict due to the
docs patch below:

>> [0] 
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20250219064807.175107-1-...@linux.ibm.com/

I don't mind taking both patches but it seems more appropriate for a
major feature removal like this to go through the relevant architecture
tree.

Maybe the path of least resistance is for you to put the cxl removal in
a separate branch and defer sending the pull request until after Linus
has merged the initial SCSI bits for 6.15?

-- 
Martin K. Petersen  Oracle Linux Engineering



Build Warnings at arch/powerpc/

2025-03-03 Thread Venkat Rao Bagalkote

Greetings!!


Observing build warnings with linux-next and powerpc repo's. Issue is 
currently not seen on mainline yet.


PPC Repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge 
branch


PPC Kernel Version: 6.14.0-rc4-g1304f486dbf1
next Repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
master branch


next Kernel Version: 6.14.0-rc5-next-20250303


On linux-next kernel issue got introduced b/w next-20250227 and 
next-20250303



Build Warnings:

arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe84: 
intra_function_call not a direct call
arch/powerpc/crypto/ghashp8-ppc.o: warning: objtool: .text+0x22c: 
unannotated intra-function call
arch/powerpc/kernel/switch.o: warning: objtool: .text+0x4: 
intra_function_call not a direct call



If you fix this issue, please add below tag.


Reported-By: Venkat Rao Bagalkote 


Regards,

Venkat.




Re: Build Warnings at arch/powerpc/

2025-03-03 Thread Madhavan Srinivasan



On 3/4/25 10:42 AM, Venkat Rao Bagalkote wrote:
> Greetings!!
> 
> 
> Observing build warnings with linux-next and powerpc repo's. Issue is 
> currently not seen on mainline yet.
> 
> PPC Repo: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> merge branch
> 
> PPC Kernel Version: 6.14.0-rc4-g1304f486dbf1
> next Repo: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master 
> branch
> 
> next Kernel Version: 6.14.0-rc5-next-20250303
> 
> 
> On linux-next kernel issue got introduced b/w next-20250227 and next-20250303
> 
> 
> Build Warnings:
> 
> arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe84: 
> intra_function_call not a direct call
> arch/powerpc/crypto/ghashp8-ppc.o: warning: objtool: .text+0x22c: unannotated 
> intra-function call
> arch/powerpc/kernel/switch.o: warning: objtool: .text+0x4: 
> intra_function_call not a direct call
> 
> 

Can you please specific the compiler and compiler version you found this issue 
with

maddy

> If you fix this issue, please add below tag.
> 
> 
> Reported-By: Venkat Rao Bagalkote 
> 
> 
> Regards,
> 
> Venkat.
> 




Re: [PATCH v3 1/1] cxl: Remove driver

2025-03-03 Thread Madhavan Srinivasan



On 3/4/25 8:31 AM, Martin K. Petersen wrote:
> 
> Hi Madhavan!
> 
>> This patch has depenednecy with the first patch 
>>
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2025-February/280990.html
>>
>> Which is already part of your staging tree. Can you please
>> take this patch along with the previous patch. 
> 
> If I merge the main cxl patch we'll have another conflict due to the
> docs patch below:
> 
>>> [0] 
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20250219064807.175107-1-...@linux.ibm.com/
> 
> I don't mind taking both patches but it seems more appropriate for a
> major feature removal like this to go through the relevant architecture
> tree.
> 
> Maybe the path of least resistance is for you to put the cxl removal in
> a separate branch and defer sending the pull request until after Linus
> has merged the initial SCSI bits for 6.15?

Yes, I agree and I was thinking of doing it, but wanted to check. 
I will send a separate PR after SCSI merge PR. 

Thanks for response.

Maddy   

> 




Re: [PATCH v4] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-03-03 Thread IBM
Sourabh Jain  writes:

> Hello Ritesh,
>
> Thanks for the review.
>
> On 02/03/25 12:05, Ritesh Harjani (IBM) wrote:
>> Sourabh Jain  writes:
>>
>>> The fadump kernel boots with limited memory solely to collect the kernel
>>> core dump. Having gigantic hugepages in the fadump kernel is of no use.
>> Sure got it.
>>
>>> Many times, the fadump kernel encounters OOM (Out of Memory) issues if
>>> gigantic hugepages are allocated.
>>>
>>> To address this, disable gigantic hugepages if fadump is active by
>>> returning early from arch_hugetlb_valid_size() using
>>> hugepages_supported(). When fadump is active, the global variable
>>> hugetlb_disabled is set to true, which is later used by the
>>> PowerPC-specific hugepages_supported() function to determine hugepage
>>> support.
>>>
>>> Returning early from arch_hugetlb_vali_size() not only disables
>>> gigantic hugepages but also avoids unnecessary hstate initialization for
>>> every hugepage size supported by the platform.
>>>
>>> kernel logs related to hugepages with this patch included:
>>> kernel argument passed: hugepagesz=1G hugepages=1
>>>
>>> First kernel: gigantic hugepage got allocated
>>> ==
>>>
>>> dmesg | grep -i "hugetlb"
>>> -
>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>> HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
>>> HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
>>>
>>> $ cat /proc/meminfo | grep -i "hugetlb"
>>> -
>>> Hugetlb: 1048576 kB
>> Was this tested with patch [1] in your local tree?
>>
>> [1]: 
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d629d7a8efc33
>>
>> IIUC, this patch [1] disables the boot time allocation of hugepages.
>> Isn't it also disabling the boot time allocation for gigantic huge pages
>> passed by the cmdline params like hugepagesz=1G and hugepages=2 ?
>
> Yes, I had the patch [1] in my tree.
>
> My understanding is that gigantic pages are allocated before normal huge 
> pages.
>
> In hugepages_setup() in hugetlb.c, we have:
>
>      if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate))
>      hugetlb_hstate_alloc_pages(parsed_hstate);
>
> I believe the above code allocates memory for gigantic pages, and 
> hugetlb_init() is
> called later because it is a subsys_initcall.
>
> So, by the time the kernel reaches hugetlb_init(), the gigantic pages 
> are already
> allocated. Isn't that right?
>
> Please let me know your opinion.

Yes, you are right. We are allocating hugepages from memblock, however
this isn't getting advertized anywhere. i.e. there is no way one can
know from any user interface on whether hugepages were allocated or not.
i.e. for fadump kernel when hugepagesz= and hugepages= params are
passed, though it will allocate gigantic pages, it won't advertize this
in meminfo or anywhere else. This was adding the confusion when I tested
this (which wasn't clear from the commit msg either).

And I guess this is happening during fadump kernel because of our patch
[1], which added a check to see whether hugetlb_disabled is true in
hugepages_supported(). Due to this hugetlb_init() is now not doing the
rest of the initialization for those gigantic pages which were allocated
due to cmdline options from hugepages_setup().

[1]: 
https://lore.kernel.org/linuxppc-dev/20241202054310.928610-1-sourabhj...@linux.ibm.com/

Now as we know from below that fadump can set hugetlb_disabled call in 
early_setup().
i.e. fadump can mark hugetlb_disabled to true in 
early_setup() -> early_init_devtree() -> fadump_reserve_mem()

And hugepages_setup() and hugepagesz_setup() gets called late in
start_kernel() -> parse_args() 


And we already check for hugepages_supported() in all necessary calls in
mm/hugetlb.c. So IMO, this check should go in mm/hugetlb.c in
hugepagesz_setup() and hugepages_setup(). Because otherwise every arch
implementation will end up duplicating this by adding
hugepages_supported() check in their arch implementation of
arch_hugetlb_valid_size().

e.g. references of hugepages_supported() checks in mm/hugetlb.c

mm/hugetlb.c hugetlb_show_meminfo_node 4959 if (!hugepages_supported())
mm/hugetlb.c hugetlb_report_node_meminfo 4943 if (!hugepages_supported())  
mm/hugetlb.c hugetlb_report_meminfo 4914 if (!hugepages_supported())   
mm/hugetlb.c hugetlb_overcommit_handler 4848 if (!hugepages_supported())   
mm/hugetlb.c hugetlb_sysctl_handler_common 4809 if (!hugepages_supported())
mm/hugetlb.c hugetlb_init 4461 if (!hugepages_supported()) {   
mm/hugetlb.c dissolve_free_hugetlb_folios 2211 if (!hugepages_supported()) 
fs/hugetlbfs/inode.c init_hugetlbfs_fs 1604 if (!hugepages_supported()) {  


Let me also see the history on why this wasn't done earlier though... 

... Oh actually there is more history to this. See [2]. We already had

[PATCH v3 net-next 04/13] net: enetc: add MAC filter for i.MX95 ENETC PF

2025-03-03 Thread Wei Fang
The i.MX95 ENETC supports both MAC hash filter and MAC exact filter. MAC
hash filter is implenented through a 64-bits hash table to match against
the hashed addresses, PF and VFs each have two MAC hash tables, one is
for unicast and the other one is for multicast. But MAC exact filter is
shared between SIs (PF and VFs), each table entry contains a MAC address
that may be unicast or multicast and the entry also contains an SI bitmap
field that indicates for which SIs the entry is valid.

For i.MX95 ENETC, MAC exact filter only has 4 entries. According to the
observation of the system default network configuration, the MAC filter
will be configured with multiple multicast addresses, so MAC exact filter
does not have enough entries to implement multicast filtering. Therefore,
the current MAC exact filter is only used for unicast filtering. If the
number of unicast addresses exceeds 4, then MAC hash filter is used.

Note that both MAC hash filter and MAC exact filter can only be accessed
by PF, VFs can notify PF to set its corresponding MAC filter through the
mailbox mechanism of ENETC. But currently MAC filter is only added for
i.MX95 ENETC PF. The MAC filter support of ENETC VFs will be supported in
subsequent patches.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.h  |   2 +
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |   8 +
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 418 +-
 .../net/ethernet/freescale/enetc/enetc_hw.h   |   6 +
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  11 +
 5 files changed, 444 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 9380d3e8ca01..4dba91408e3d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -316,6 +316,8 @@ struct enetc_si {
const struct enetc_si_ops *ops;
 
struct enetc_mac_filter mac_filter[MADDR_TYPE];
+   struct workqueue_struct *workqueue;
+   struct work_struct rx_mode_task;
 };
 
 #define ENETC_SI_ALIGN 32
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h 
b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
index 695cb07c74bc..826359004850 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
@@ -99,6 +99,14 @@
 #define ENETC4_PSICFGR2(a) ((a) * 0x80 + 0x2018)
 #define  PSICFGR2_NUM_MSIX GENMASK(5, 0)
 
+/* Port station interface a unicast MAC hash filter register 0/1 */
+#define ENETC4_PSIUMHFR0(a)((a) * 0x80 + 0x2050)
+#define ENETC4_PSIUMHFR1(a)((a) * 0x80 + 0x2054)
+
+/* Port station interface a multicast MAC hash filter register 0/1 */
+#define ENETC4_PSIMMHFR0(a)((a) * 0x80 + 0x2058)
+#define ENETC4_PSIMMHFR1(a)((a) * 0x80 + 0x205c)
+
 #define ENETC4_PMCAPR  0x4004
 #define  PMCAPR_HD BIT(8)
 #define  PMCAPR_FP GENMASK(10, 9)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index 63001379f0a0..305781ccefd0 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -11,6 +11,15 @@
 
 #define ENETC_SI_MAX_RING_NUM  8
 
+#define ENETC_MAC_FILTER_TYPE_UC   BIT(0)
+#define ENETC_MAC_FILTER_TYPE_MC   BIT(1)
+#define ENETC_MAC_FILTER_TYPE_ALL  (ENETC_MAC_FILTER_TYPE_UC | \
+ENETC_MAC_FILTER_TYPE_MC)
+
+struct enetc_mac_addr {
+   u8 addr[ETH_ALEN];
+};
+
 static void enetc4_get_port_caps(struct enetc_pf *pf)
 {
struct enetc_hw *hw = &pf->si->hw;
@@ -26,6 +35,9 @@ static void enetc4_get_port_caps(struct enetc_pf *pf)
 
val = enetc_port_rd(hw, ENETC4_PMCAPR);
pf->caps.half_duplex = (val & PMCAPR_HD) ? 1 : 0;
+
+   val = enetc_port_rd(hw, ENETC4_PSIMAFCAPR);
+   pf->caps.mac_filter_num = val & PSIMAFCAPR_NUM_MAC_AFTE;
 }
 
 static void enetc4_pf_set_si_primary_mac(struct enetc_hw *hw, int si,
@@ -56,6 +68,337 @@ static void enetc4_pf_get_si_primary_mac(struct enetc_hw 
*hw, int si,
put_unaligned_le16(lower, addr + 4);
 }
 
+static void enetc4_pf_set_si_mac_promisc(struct enetc_hw *hw, int si,
+int type, bool en)
+{
+   u32 val = enetc_port_rd(hw, ENETC4_PSIPMMR);
+
+   if (type == UC) {
+   if (en)
+   val |= PSIPMMR_SI_MAC_UP(si);
+   else
+   val &= ~PSIPMMR_SI_MAC_UP(si);
+   } else { /* Multicast promiscuous mode. */
+   if (en)
+   val |= PSIPMMR_SI_MAC_MP(si);
+   else
+   val &= ~PSIPMMR_SI_MAC_MP(si);
+   }
+
+   enetc_port_wr(hw, ENETC4_PSIPMMR, val);
+}
+
+static void enetc4_pf_set_si_mac_hash_filter(struct enetc_hw *hw, int si,
+

[PATCH v3 net-next 03/13] net: enetc: move generic MAC filterng interfaces to enetc-core

2025-03-03 Thread Wei Fang
Although only ENETC PF can access the MAC address filter table, the table
entries can specify MAC address filtering for one or more SIs based on
SI_BITMAP, which means that the table also supports MAC address filtering
for VFs.

Currently, only the ENETC v1 PF driver supports MAC address filtering. In
order to add the MAC address filtering support for the ENETC v4 PF driver
and VF driver in the future, the relevant generic interfaces are moved to
the enetc-core driver. At the same time, the struct enetc_mac_filter is
moved from enetc_pf to enetc_si, because enetc_si is a structure shared by
PF and VFs. This lays the basis for i.MX95 ENETC PF and VFs to support
MAC address filtering.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 36 ++
 drivers/net/ethernet/freescale/enetc/enetc.h  | 17 +++
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 49 +++
 .../net/ethernet/freescale/enetc/enetc_pf.h   | 14 --
 4 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 2106861463e4..3832d2cd91ba 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -36,6 +36,42 @@ static void enetc_change_preemptible_tcs(struct 
enetc_ndev_priv *priv,
enetc_mm_commit_preemptible_tcs(priv);
 }
 
+static int enetc_mac_addr_hash_idx(const u8 *addr)
+{
+   u64 fold = __swab64(ether_addr_to_u64(addr)) >> 16;
+   u64 mask = 0;
+   int res = 0;
+   int i;
+
+   for (i = 0; i < 8; i++)
+   mask |= BIT_ULL(i * 6);
+
+   for (i = 0; i < 6; i++)
+   res |= (hweight64(fold & (mask << i)) & 0x1) << i;
+
+   return res;
+}
+
+void enetc_add_mac_addr_ht_filter(struct enetc_mac_filter *filter,
+ const unsigned char *addr)
+{
+   int idx = enetc_mac_addr_hash_idx(addr);
+
+   /* add hash table entry */
+   __set_bit(idx, filter->mac_hash_table);
+   filter->mac_addr_cnt++;
+}
+EXPORT_SYMBOL_GPL(enetc_add_mac_addr_ht_filter);
+
+void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter)
+{
+   filter->mac_addr_cnt = 0;
+
+   bitmap_zero(filter->mac_hash_table,
+   ENETC_MADDR_HASH_TBL_SZ);
+}
+EXPORT_SYMBOL_GPL(enetc_reset_mac_addr_filter);
+
 static int enetc_num_stack_tx_queues(struct enetc_ndev_priv *priv)
 {
int num_tx_rings = priv->num_tx_rings;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 4ff0957e69be..9380d3e8ca01 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -23,6 +23,18 @@
 
 #define ENETC_CBD_DATA_MEM_ALIGN 64
 
+#define ENETC_MADDR_HASH_TBL_SZ64
+
+enum enetc_mac_addr_type {UC, MC, MADDR_TYPE};
+
+struct enetc_mac_filter {
+   union {
+   char mac_addr[ETH_ALEN];
+   DECLARE_BITMAP(mac_hash_table, ENETC_MADDR_HASH_TBL_SZ);
+   };
+   int mac_addr_cnt;
+};
+
 struct enetc_tx_swbd {
union {
struct sk_buff *skb;
@@ -302,6 +314,8 @@ struct enetc_si {
int hw_features;
const struct enetc_drvdata *drvdata;
const struct enetc_si_ops *ops;
+
+   struct enetc_mac_filter mac_filter[MADDR_TYPE];
 };
 
 #define ENETC_SI_ALIGN 32
@@ -484,6 +498,9 @@ int enetc_alloc_si_resources(struct enetc_ndev_priv *priv);
 void enetc_free_si_resources(struct enetc_ndev_priv *priv);
 int enetc_configure_si(struct enetc_ndev_priv *priv);
 int enetc_get_driver_data(struct enetc_si *si);
+void enetc_add_mac_addr_ht_filter(struct enetc_mac_filter *filter,
+ const unsigned char *addr);
+void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter);
 
 int enetc_open(struct net_device *ndev);
 int enetc_close(struct net_device *ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index a214749a4af6..cc3e52bd3096 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -72,30 +72,6 @@ static void enetc_set_isol_vlan(struct enetc_hw *hw, int si, 
u16 vlan, u8 qos)
enetc_port_wr(hw, ENETC_PSIVLANR(si), val);
 }
 
-static int enetc_mac_addr_hash_idx(const u8 *addr)
-{
-   u64 fold = __swab64(ether_addr_to_u64(addr)) >> 16;
-   u64 mask = 0;
-   int res = 0;
-   int i;
-
-   for (i = 0; i < 8; i++)
-   mask |= BIT_ULL(i * 6);
-
-   for (i = 0; i < 6; i++)
-   res |= (hweight64(fold & (mask << i)) & 0x1) << i;
-
-   return res;
-}
-
-static void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter)
-{
-   filter->mac_addr_cnt = 0;
-
-   bitmap_zero(filter->mac_hash_table,
-   ENETC_MADDR_HASH_TBL_SZ);
-}
-
 static void enetc_add_mac_addr

[PATCH v3 net-next 00/13] Add more feautues for ENETC v4 - round 2

2025-03-03 Thread Wei Fang
This patch set adds the following features.
1. Compared with ENETC v1, the formats of tables and command BD of ENETC
v4 have changed significantly, and the two are not compatible. Therefore,
in order to support the NETC Table Management Protocol (NTMP) v2.0, we
introduced the netc-lib driver and added support for MAC address filter
table and RSS table.
2. Add MAC filter and VLAN filter support for i.MX95 ENETC PF.
3. Add RSS support for i.MX95 ENETC PF.
4. Add loopback support for i.MX95 ENETC PF.

---
v1 Link: https://lore.kernel.org/imx/20250103060610.2233908-1-wei.f...@nxp.com/
v2 changes
1. Change NTMP_FILL_CRD() and NTMP_FILL_CRD_EID to functions
2. Fix the compile warning in enetc4_pf.c
v2 Link: https://lore.kernel.org/imx/20250113082245.2332775-1-wei.f...@nxp.com/
v3 changes
1. Rename ntmp_formats.h to ntmp_private.h, becuase in addition to
   defining some table formats, some macros and function declarations
   will be added to this header file in the future
2. Add struct ntmp_dma_buf, so refactor ntmp_alloc_data_mem() and
   ntmp_free_data_mem() accordingly
3. Add the setting for cache attributes of command BD Ring in
   enetc4_setup_cbdr()
4. Remove __free() and scoped_guard() from patch "net: enetc: add MAC
   filter for i.MX95 ENETC PF", as these cleanup APIs are discouraged
   within networking drivers.
5. Remove patch "net: enetc: make enetc_set_rxfh() and enetc_get_rxfh()
   reusable" in v2, and add enetc_set_rss_key() and enetc_get_rss_key()
   instead of adding .set_rss_key() and .get_rss_key() to enetc_pf_ops
6. Separate patch " net: enetc: check if the RSS hfunc is toeplitz" from
   patch "net: enetc: add RSS support for i.MX95 ENETC PF"
---

Wei Fang (13):
  net: enetc: add initial netc-lib driver to support NTMP
  net: enetc: add command BD ring support for i.MX95 ENETC
  net: enetc: move generic MAC filterng interfaces to enetc-core
  net: enetc: add MAC filter for i.MX95 ENETC PF
  net: enetc: add debugfs interface to dump MAC filter
  net: enetc: add RSS support for i.MX95 ENETC PF
  net: enetc: check if the RSS hfunc is toeplitz
  net: enetc: enable RSS feature by default
  net: enetc: move generic VLAN filter interfaces to enetc-core
  net: enetc: move generic VLAN hash filter functions to
enetc_pf_common.c
  net: enetc: add VLAN filtering support for i.MX95 ENETC PF
  net: enetc: add loopback support for i.MX95 ENETC PF
  MAINTAINERS: add new file ntmp.h to ENETC driver

 MAINTAINERS   |   1 +
 drivers/net/ethernet/freescale/enetc/Kconfig  |  11 +
 drivers/net/ethernet/freescale/enetc/Makefile |   4 +
 drivers/net/ethernet/freescale/enetc/enetc.c  | 103 +++-
 drivers/net/ethernet/freescale/enetc/enetc.h  |  59 +-
 .../ethernet/freescale/enetc/enetc4_debugfs.c |  93 +++
 .../ethernet/freescale/enetc/enetc4_debugfs.h |  20 +
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |  12 +
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 529 +-
 .../net/ethernet/freescale/enetc/enetc_cbdr.c |  69 ++-
 .../ethernet/freescale/enetc/enetc_ethtool.c  |  76 ++-
 .../net/ethernet/freescale/enetc/enetc_hw.h   |   6 +
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 124 +---
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  30 +-
 .../freescale/enetc/enetc_pf_common.c |  46 +-
 .../freescale/enetc/enetc_pf_common.h |   2 +
 .../net/ethernet/freescale/enetc/enetc_vf.c   |  19 +-
 drivers/net/ethernet/freescale/enetc/ntmp.c   | 458 +++
 .../ethernet/freescale/enetc/ntmp_private.h   |  67 +++
 include/linux/fsl/ntmp.h  | 178 ++
 20 files changed, 1716 insertions(+), 191 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.h
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp_private.h
 create mode 100644 include/linux/fsl/ntmp.h

-- 
2.34.1




[PATCH v3 net-next 02/13] net: enetc: add command BD ring support for i.MX95 ENETC

2025-03-03 Thread Wei Fang
The command BD ring is used to configure functionality where the
underlying resources may be shared between different entities or being
too large to configure using direct registers (such as lookup tables).

Because the command BD and table formats of i.MX95 and LS1028A are very
different, the software processing logic is also different. In order to
ensure driver compatibility, struct enetc_si_ops is introduced. This
structure defines some hooks shared by VSI and PSI. Different hardware
driver will register different hooks, For example, setup_cbdr() is used
to initialize the command BD ring, and teardown_cbdr() is used to free
the command BD ring.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.h  | 27 +++--
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 47 +++-
 .../net/ethernet/freescale/enetc/enetc_cbdr.c | 55 +--
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 13 +++--
 .../net/ethernet/freescale/enetc/enetc_vf.c   | 13 +++--
 5 files changed, 136 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 4ad4eb5c5a74..4ff0957e69be 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -266,6 +267,19 @@ struct enetc_platform_info {
const struct enetc_drvdata *data;
 };
 
+struct enetc_si;
+
+/*
+ * This structure defines the some common hooks for ENETC PSI and VSI.
+ * In addition, since VSI only uses the struct enetc_si as its private
+ * driver data, so this structure also define some hooks specifically
+ * for VSI. For VSI-specific hooks, the format is ???vf_*()???.
+ */
+struct enetc_si_ops {
+   int (*setup_cbdr)(struct enetc_si *si);
+   void (*teardown_cbdr)(struct enetc_si *si);
+};
+
 /* PCI IEP device data */
 struct enetc_si {
struct pci_dev *pdev;
@@ -274,7 +288,10 @@ struct enetc_si {
 
struct net_device *ndev; /* back ref. */
 
-   struct enetc_cbdr cbd_ring;
+   union {
+   struct enetc_cbdr cbd_ring; /* Only ENETC 1.0 */
+   struct ntmp_priv ntmp; /* ENETC 4.1 and later */
+   };
 
int num_rx_rings; /* how many rings are available in the SI */
int num_tx_rings;
@@ -284,6 +301,7 @@ struct enetc_si {
u16 revision;
int hw_features;
const struct enetc_drvdata *drvdata;
+   const struct enetc_si_ops *ops;
 };
 
 #define ENETC_SI_ALIGN 32
@@ -490,9 +508,10 @@ void enetc_mm_link_state_update(struct enetc_ndev_priv 
*priv, bool link);
 void enetc_mm_commit_preemptible_tcs(struct enetc_ndev_priv *priv);
 
 /* control buffer descriptor ring (CBDR) */
-int enetc_setup_cbdr(struct device *dev, struct enetc_hw *hw, int bd_count,
-struct enetc_cbdr *cbdr);
-void enetc_teardown_cbdr(struct enetc_cbdr *cbdr);
+int enetc_setup_cbdr(struct enetc_si *si);
+void enetc_teardown_cbdr(struct enetc_si *si);
+int enetc4_setup_cbdr(struct enetc_si *si);
+void enetc4_teardown_cbdr(struct enetc_si *si);
 int enetc_set_mac_flt_entry(struct enetc_si *si, int index,
char *mac_addr, int si_map);
 int enetc_clear_mac_flt_entry(struct enetc_si *si, int index);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index 73ac8c6afb3a..63001379f0a0 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -260,6 +260,23 @@ static void enetc4_configure_port(struct enetc_pf *pf)
enetc4_enable_trx(pf);
 }
 
+static int enetc4_init_ntmp_priv(struct enetc_si *si)
+{
+   struct ntmp_priv *ntmp = &si->ntmp;
+
+   ntmp->dev_type = NETC_DEV_ENETC;
+
+   /* For ENETC 4.1, all table versions are 0 */
+   memset(&ntmp->cbdrs.tbl, 0, sizeof(ntmp->cbdrs.tbl));
+
+   return si->ops->setup_cbdr(si);
+}
+
+static void enetc4_free_ntmp_priv(struct enetc_si *si)
+{
+   si->ops->teardown_cbdr(si);
+}
+
 static int enetc4_pf_init(struct enetc_pf *pf)
 {
struct device *dev = &pf->si->pdev->dev;
@@ -272,11 +289,22 @@ static int enetc4_pf_init(struct enetc_pf *pf)
return err;
}
 
+   err = enetc4_init_ntmp_priv(pf->si);
+   if (err) {
+   dev_err(dev, "Failed to init CBDR\n");
+   return err;
+   }
+
enetc4_configure_port(pf);
 
return 0;
 }
 
+static void enetc4_pf_free(struct enetc_pf *pf)
+{
+   enetc4_free_ntmp_priv(pf->si);
+}
+
 static const struct net_device_ops enetc4_ndev_ops = {
.ndo_open   = enetc_open,
.ndo_stop   = enetc_close,
@@ -688,6 +716,11 @@ static void enetc4_pf_netdev_destroy(struct enetc_si *si)
free_netdev(ndev);
 }
 
+static const struct enetc_si_ops enetc4_psi_ops = {
+   .setup_cbdr =

[PATCH v3 net-next 05/13] net: enetc: add debugfs interface to dump MAC filter

2025-03-03 Thread Wei Fang
ENETC's MAC filter consists of hash MAC filter and exact MAC filter. Hash
MAC filter is a 64-entry hash table consisting of two 32-bit registers.
Exact MAC filter is implemented by configuring MAC address filter table
through command BD ring. The table is stored in ENETC's internal memory
and needs to be read through command BD ring. In order to facilitate
debugging, added a debugfs interface to get the relevant information
about MAC filter.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/Makefile |  1 +
 drivers/net/ethernet/freescale/enetc/enetc.h  |  1 +
 .../ethernet/freescale/enetc/enetc4_debugfs.c | 93 +++
 .../ethernet/freescale/enetc/enetc4_debugfs.h | 20 
 .../net/ethernet/freescale/enetc/enetc4_pf.c  |  4 +
 5 files changed, 119 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.h

diff --git a/drivers/net/ethernet/freescale/enetc/Makefile 
b/drivers/net/ethernet/freescale/enetc/Makefile
index 707a68e26971..f1c5ad45fd76 100644
--- a/drivers/net/ethernet/freescale/enetc/Makefile
+++ b/drivers/net/ethernet/freescale/enetc/Makefile
@@ -16,6 +16,7 @@ fsl-enetc-$(CONFIG_FSL_ENETC_QOS) += enetc_qos.o
 
 obj-$(CONFIG_NXP_ENETC4) += nxp-enetc4.o
 nxp-enetc4-y := enetc4_pf.o
+nxp-enetc4-$(CONFIG_DEBUG_FS) += enetc4_debugfs.o
 
 obj-$(CONFIG_FSL_ENETC_VF) += fsl-enetc-vf.o
 fsl-enetc-vf-y := enetc_vf.o
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 4dba91408e3d..ca1bc85c0ac9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -318,6 +318,7 @@ struct enetc_si {
struct enetc_mac_filter mac_filter[MADDR_TYPE];
struct workqueue_struct *workqueue;
struct work_struct rx_mode_task;
+   struct dentry *debugfs_root;
 };
 
 #define ENETC_SI_ALIGN 32
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
new file mode 100644
index ..3a660c80344a
--- /dev/null
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright 2025 NXP */
+
+#include 
+#include 
+#include 
+
+#include "enetc_pf.h"
+#include "enetc4_debugfs.h"
+
+#define is_en(x)   (x) ? "Enabled" : "Disabled"
+
+static void enetc_show_si_mac_hash_filter(struct seq_file *s, int i)
+{
+   struct enetc_si *si = s->private;
+   struct enetc_hw *hw = &si->hw;
+   u32 hash_h, hash_l;
+
+   hash_l = enetc_port_rd(hw, ENETC4_PSIUMHFR0(i));
+   hash_h = enetc_port_rd(hw, ENETC4_PSIUMHFR1(i));
+   seq_printf(s, "SI %d unicast MAC hash filter: 0x%08x%08x\n",
+  i, hash_h, hash_l);
+
+   hash_l = enetc_port_rd(hw, ENETC4_PSIMMHFR0(i));
+   hash_h = enetc_port_rd(hw, ENETC4_PSIMMHFR1(i));
+   seq_printf(s, "SI %d multicast MAC hash filter: 0x%08x%08x\n",
+  i, hash_h, hash_l);
+}
+
+static int enetc_mac_filter_show(struct seq_file *s, void *data)
+{
+   struct maft_entry_data maft_data;
+   struct enetc_si *si = s->private;
+   struct enetc_hw *hw = &si->hw;
+   struct maft_keye_data *keye;
+   struct enetc_pf *pf;
+   int i, err, num_si;
+   u32 val;
+
+   pf = enetc_si_priv(si);
+   num_si = pf->caps.num_vsi + 1;
+
+   val = enetc_port_rd(hw, ENETC4_PSIPMMR);
+   for (i = 0; i < num_si; i++) {
+   seq_printf(s, "SI %d Unicast Promiscuous mode: %s\n",
+  i, is_en(PSIPMMR_SI_MAC_UP(i) & val));
+   seq_printf(s, "SI %d Multicast Promiscuous mode: %s\n",
+  i, is_en(PSIPMMR_SI_MAC_MP(i) & val));
+   }
+
+   /* MAC hash filter table */
+   for (i = 0; i < num_si; i++)
+   enetc_show_si_mac_hash_filter(s, i);
+
+   if (!pf->num_mfe)
+   return 0;
+
+   /* MAC address filter table */
+   seq_puts(s, "Show MAC address filter table\n");
+   for (i = 0; i < pf->num_mfe; i++) {
+   memset(&maft_data, 0, sizeof(maft_data));
+   err = ntmp_maft_query_entry(&si->ntmp.cbdrs, i, &maft_data);
+   if (err)
+   return err;
+
+   keye = &maft_data.keye;
+   seq_printf(s, "Entry %d, MAC: %pM, SI bitmap: 0x%04x\n", i,
+  keye->mac_addr, 
le16_to_cpu(maft_data.cfge.si_bitmap));
+   }
+
+   return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(enetc_mac_filter);
+
+void enetc_create_debugfs(struct enetc_si *si)
+{
+   struct net_device *ndev = si->ndev;
+   struct dentry *root;
+
+   root = debugfs_create_dir(netdev_name(ndev), NULL);
+   if (IS_ERR(root))
+   return;
+
+   si->debugfs_root = root;
+
+   debugfs_create_file("mac_filter", 0444, root, si, 
&enetc_mac_filter_fops);
+}
+
+vo

Re: [PATCH] book3s64/radix : Align section vmemmap start address to PAGE_SIZE

2025-03-03 Thread Donet Tom



On 3/3/25 18:32, Aneesh Kumar K.V wrote:

Donet Tom  writes:


A vmemmap altmap is a device-provided region used to provide
backing storage for struct pages. For each namespace, the altmap
should belong to that same namespace. If the namespaces are
created unaligned, there is a chance that the section vmemmap
start address could also be unaligned. If the section vmemmap
start address is unaligned, the altmap page allocated from the
current namespace might be used by the previous namespace also.
During the free operation, since the altmap is shared between two
namespaces, the previous namespace may detect that the page does
not belong to its altmap and incorrectly assume that the page is a
normal page. It then attempts to free the normal page, which leads
to a kernel crash.

In this patch, we are aligning the section vmemmap start address
to PAGE_SIZE. After alignment, the start address will not be
part of the current namespace, and a normal page will be allocated
for the vmemmap mapping of the current section. For the remaining
sections, altmaps will be allocated. During the free operation,
the normal page will be correctly freed.

Without this patch
==
NS1 start   NS2 start
  _
| NS1   |NS2  |
  -
| Altmap| Altmap | .|Altmap| Altmap | ...
|  NS1  |  NS1   |  | NS2  |  NS2   |


 ^^^ this should be allocated in ram?



Yes, it should be allocated from RAM. However, in the current
implementation, an altmap page gets allocated. This is because the
NS2 vmemmap section's start address is unaligned. There is an
altmap_cross_boundary() check. Here, from the vmemmap section
start, we identify the namespace start and check if the namespace start
is within the boundary. Since it is within the boundary, it returns false,
causing an altmap page to be allocated. During the PTE update, the
vmemmap start address is aligned down to PAGE_SIZE, and the PTE is
updated. As a result, the altmap page is shared between the current
and previous namespaces.

If we had aligned the vmemmap start address, the
altmap_cross_boundary() function would return true because the
vmemmap section's start address belongs to the previous
namespace. Therefore normal page gets allocated. During the
PTE set operation, since the address is already aligned, the
PTE will updated.



In the above scenario, NS1 and NS2 are two namespaces. The vmemmap
for NS1 comes from Altmap NS1, which belongs to NS1, and the
vmemmap for NS2 comes from Altmap NS2, which belongs to NS2.

The vmemmap start for NS2 is not aligned, so Altmap NS2 is shared
by both NS1 and NS2. During the free operation in NS1, Altmap NS2
is not part of NS1's altmap, causing it to attempt to free an
invalid page.

With this patch
===
NS1 start   NS2 start
  _
| NS1   |NS2  |
  -
| Altmap| Altmap | .| Normal | Altmap | Altmap |...
|  NS1  |  NS1   |  |  Page  |  NS2   |  NS2   |

If the vmemmap start for NS2 is not aligned then we are allocating
a normal page. NS1 and NS2 vmemmap will be freed correctly.

Fixes: 368a0590d954("powerpc/book3s64/vmemmap: switch radix to use a different 
vmemmap handling function")
Co-developed-by: Ritesh Harjani (IBM) 
Signed-off-by: Ritesh Harjani (IBM) 
Signed-off-by: Donet Tom 
---
  arch/powerpc/mm/book3s64/radix_pgtable.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 311e2112d782..b22d5f6147d2 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1120,6 +1120,8 @@ int __meminit radix__vmemmap_populate(unsigned long 
start, unsigned long end, in
pmd_t *pmd;
pte_t *pte;
  
+	start = ALIGN_DOWN(start, PAGE_SIZE);

+
for (addr = start; addr < end; addr = next) {
next = pmd_addr_end(addr, end);
  
--

2.43.5