Re: [PATCH v4 6/6] Add Propeller configuration for kernel build.

2024-10-23 Thread Masahiro Yamada
On Wed, Oct 23, 2024 at 4:25 PM Arnd Bergmann  wrote:
>
> On Wed, Oct 23, 2024, at 07:06, Masahiro Yamada wrote:
> > On Tue, Oct 22, 2024 at 9:00 AM Rong Xu  wrote:
> >
> >> > > +===
> >> > > +
> >> > > +Configure the kernel with::
> >> > > +
> >> > > +   CONFIG_AUTOFDO_CLANG=y
> >> >
> >> >
> >> > This is automatically met due to "depends on AUTOFDO_CLANG".
> >>
> >> Agreed. But we will remove the dependency from PROPELlER_CLANG to 
> >> AUTOFDO_CLANG.
> >> So we will keep the part.
> >
> >
> > You can replace "depends on AUTOFDO_CLANG" with
> > "imply AUTOFDO_CLANG" if it is sensible.
> >
> > Up to you.
>
> I don't think we should ever encourage the use of 'imply'
> because it is almost always used incorrectly.

If we are able to delete the 'imply' keyword, Kconfig would be a bit cleaner.

In most cases, it can be replaced with 'default'.



-- 
Best Regards
Masahiro Yamada



Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()

2024-10-23 Thread David Hildenbrand

On 23.10.24 09:42, Heiko Carstens wrote:

On Mon, Oct 21, 2024 at 04:45:59PM +0200, David Hildenbrand wrote:

For my purpose (virtio-mem), it's sufficient to only support "kexec
triggered kdump" either way, so I don't care.

So for me it's good enough to have

bool is_kdump_kernel(void)
{
return oldmem_data.start;
}

And trying to document the situation in a comment like powerpc does :)


Then let's go forward with this, since as Alexander wrote, this is returning
what is actually happening. If this is not sufficient or something breaks we
can still address this.



Yes, I'll send this change separately from the other virtio-mem stuff 
out today.


--
Cheers,

David / dhildenb




Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()

2024-10-23 Thread Heiko Carstens
On Mon, Oct 21, 2024 at 04:45:59PM +0200, David Hildenbrand wrote:
> For my purpose (virtio-mem), it's sufficient to only support "kexec
> triggered kdump" either way, so I don't care.
> 
> So for me it's good enough to have
> 
> bool is_kdump_kernel(void)
> {
>   return oldmem_data.start;
> }
> 
> And trying to document the situation in a comment like powerpc does :)

Then let's go forward with this, since as Alexander wrote, this is returning
what is actually happening. If this is not sufficient or something breaks we
can still address this.



Re: [PATCH net] Documentation: ieee802154: fix grammar

2024-10-23 Thread Miquel Raynal
Hi Leo,

leocst...@gmail.com wrote on Tue, 22 Oct 2024 21:12:01 -0700:

> Fix grammar where it improves readability.
> 
> Signed-off-by: Leo Stone 

Reviewed-by: Miquel Raynal 

Thanks,
Miquèl



Re: [PATCH v4 6/6] Add Propeller configuration for kernel build.

2024-10-23 Thread Masahiro Yamada
On Tue, Oct 22, 2024 at 9:00 AM Rong Xu  wrote:

> > > +===
> > > +
> > > +Configure the kernel with::
> > > +
> > > +   CONFIG_AUTOFDO_CLANG=y
> >
> >
> > This is automatically met due to "depends on AUTOFDO_CLANG".
>
> Agreed. But we will remove the dependency from PROPELlER_CLANG to 
> AUTOFDO_CLANG.
> So we will keep the part.


You can replace "depends on AUTOFDO_CLANG" with
"imply AUTOFDO_CLANG" if it is sensible.

Up to you.



--
Best Regards
Masahiro Yamada



Re: [PATCH v4 6/6] Add Propeller configuration for kernel build.

2024-10-23 Thread Arnd Bergmann
On Wed, Oct 23, 2024, at 07:06, Masahiro Yamada wrote:
> On Tue, Oct 22, 2024 at 9:00 AM Rong Xu  wrote:
>
>> > > +===
>> > > +
>> > > +Configure the kernel with::
>> > > +
>> > > +   CONFIG_AUTOFDO_CLANG=y
>> >
>> >
>> > This is automatically met due to "depends on AUTOFDO_CLANG".
>>
>> Agreed. But we will remove the dependency from PROPELlER_CLANG to 
>> AUTOFDO_CLANG.
>> So we will keep the part.
>
>
> You can replace "depends on AUTOFDO_CLANG" with
> "imply AUTOFDO_CLANG" if it is sensible.
>
> Up to you.

I don't think we should ever encourage the use of 'imply'
because it is almost always used incorrectly.

   Arnd



Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()

2024-10-23 Thread Alexander Egorenkov
Hi David,

David Hildenbrand  writes:


> Staring at the powerpc implementation:
>
> /*
>   * Return true only when kexec based kernel dump capturing method is used.
>   * This ensures all restritions applied for kdump case are not automatically
>   * applied for fadump case.
>   */
> bool is_kdump_kernel(void)
> {
>   return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX;
> }
> EXPORT_SYMBOL_GPL(is_kdump_kernel);

Thanks for the pointer.

I would say power's version is semantically equivalent to what i have in
mind for s390 :) If a dump kernel is running, but not a stand-alone
one (apart from sa kdump), then it's a kdump kernel. 

Regards
Alex



Re: [PATCH v4 6/6] Add Propeller configuration for kernel build.

2024-10-23 Thread Rong Xu
While Propeller often works best with AutoFDO (or the instrumentation
based FDO), it's not required. One can use Propeller (or similar
post-link-optimizer, like Bolt) on plain kernel builds.

So I will remove "depends on AUTOFDO_CLANG". I will not use "imply" --
simpler is better here.

-Rong

On Wed, Oct 23, 2024 at 12:29 AM Masahiro Yamada  wrote:
>
> On Wed, Oct 23, 2024 at 4:25 PM Arnd Bergmann  wrote:
> >
> > On Wed, Oct 23, 2024, at 07:06, Masahiro Yamada wrote:
> > > On Tue, Oct 22, 2024 at 9:00 AM Rong Xu  wrote:
> > >
> > >> > > +===
> > >> > > +
> > >> > > +Configure the kernel with::
> > >> > > +
> > >> > > +   CONFIG_AUTOFDO_CLANG=y
> > >> >
> > >> >
> > >> > This is automatically met due to "depends on AUTOFDO_CLANG".
> > >>
> > >> Agreed. But we will remove the dependency from PROPELlER_CLANG to 
> > >> AUTOFDO_CLANG.
> > >> So we will keep the part.
> > >
> > >
> > > You can replace "depends on AUTOFDO_CLANG" with
> > > "imply AUTOFDO_CLANG" if it is sensible.
> > >
> > > Up to you.
> >
> > I don't think we should ever encourage the use of 'imply'
> > because it is almost always used incorrectly.
>
> If we are able to delete the 'imply' keyword, Kconfig would be a bit cleaner.
>
> In most cases, it can be replaced with 'default'.
>
>
>
> --
> Best Regards
> Masahiro Yamada



Re: [PATCH v6 3/6] KVM: arm64: Add support for PSCI v1.2 and v1.3

2024-10-23 Thread Miguel Luis



> On 19 Oct 2024, at 17:15, David Woodhouse  wrote:
> 
> From: David Woodhouse 
> 
> As with PSCI v1.1 in commit 512865d83fd9 ("KVM: arm64: Bump guest PSCI
> version to 1.1"), expose v1.3 to the guest by default. The SYSTEM_OFF2
> call which is exposed by doing so is compatible for userspace because
> it's just a new flag in the event that KVM raises, in precisely the same
> way that SYSTEM_RESET2 was compatible when v1.1 was enabled by default.
> 
> Signed-off-by: David Woodhouse 
> ---
> arch/arm64/kvm/hypercalls.c | 2 ++
> arch/arm64/kvm/psci.c   | 6 +-
> include/kvm/arm_psci.h  | 4 +++-
> 3 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 5763d979d8ca..9c6267ca2b82 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -575,6 +575,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const 
> struct kvm_one_reg *reg)
> case KVM_ARM_PSCI_0_2:
> case KVM_ARM_PSCI_1_0:
> case KVM_ARM_PSCI_1_1:
> + case KVM_ARM_PSCI_1_2:
> + case KVM_ARM_PSCI_1_3:
> if (!wants_02)
> return -EINVAL;
> vcpu->kvm->arch.psci_version = val;
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index df834f2e928e..6c24a9252fa3 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -328,7 +328,7 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 
> minor)
> 
> switch(psci_fn) {
> case PSCI_0_2_FN_PSCI_VERSION:
> - val = minor == 0 ? KVM_ARM_PSCI_1_0 : KVM_ARM_PSCI_1_1;
> + val = PSCI_VERSION(1, minor);
> break;
> case PSCI_1_0_FN_PSCI_FEATURES:
> arg = smccc_get_arg1(vcpu);
> @@ -493,6 +493,10 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
> }
> 
> switch (version) {
> + case KVM_ARM_PSCI_1_3:
> + return kvm_psci_1_x_call(vcpu, 3);
> + case KVM_ARM_PSCI_1_2:
> + return kvm_psci_1_x_call(vcpu, 2);
> case KVM_ARM_PSCI_1_1:
> return kvm_psci_1_x_call(vcpu, 1);
> case KVM_ARM_PSCI_1_0:
> diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> index e8fb624013d1..cbaec804eb83 100644
> --- a/include/kvm/arm_psci.h
> +++ b/include/kvm/arm_psci.h
> @@ -14,8 +14,10 @@
> #define KVM_ARM_PSCI_0_2 PSCI_VERSION(0, 2)
> #define KVM_ARM_PSCI_1_0 PSCI_VERSION(1, 0)
> #define KVM_ARM_PSCI_1_1 PSCI_VERSION(1, 1)
> +#define KVM_ARM_PSCI_1_2 PSCI_VERSION(1, 2)
> +#define KVM_ARM_PSCI_1_3 PSCI_VERSION(1, 3)
> 
> -#define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_1
> +#define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_3
> 

Reviewed-by: Miguel Luis 

> static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> {
> -- 
> 2.44.0
> 




Re: [PATCH v6 2/6] KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation

2024-10-23 Thread Miguel Luis
Hi David,

> On 19 Oct 2024, at 17:15, David Woodhouse  wrote:
> 
> From: David Woodhouse 
> 
> The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function
> which is analogous to ACPI S4 state. This will allow hosting
> environments to determine that a guest is hibernated rather than just
> powered off, and ensure that they preserve the virtual environment
> appropriately to allow the guest to resume safely (or bump the
> hardware_signature in the FACS to trigger a clean reboot instead).
> 
> This feature is safe to enable unconditionally (in a subsequent commit)
> because it is exposed to userspace through the existing
> KVM_SYSTEM_EVENT_SHUTDOWN event, just with an additional flag which
> userspace can use to know that the instance intended hibernation instead
> of a plain power-off.
> 
> As with SYSTEM_RESET2, there is only one type available (in this case
> HIBERNATE_OFF), and it is not explicitly reported to userspace through
> the event; userspace can get it from the registers if it cares).
> 
> Signed-off-by: David Woodhouse 
> ---
> Documentation/virt/kvm/api.rst| 11 
> arch/arm64/include/uapi/asm/kvm.h |  6 +
> arch/arm64/kvm/psci.c | 44 +++
> 3 files changed, 61 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index e32471977d0a..1ec076d806e6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6855,6 +6855,10 @@ the first `ndata` items (possibly zero) of the data 
> array are valid.
>the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
>specification.
> 
> + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
> +   if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
> +   specification.
> +
>  - for RISC-V, data[0] is set to the value of the second argument of the
>``sbi_system_reset`` call.
> 
> @@ -6888,6 +6892,13 @@ either:
>  - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
>"Caller responsibilities" for possible return values.
> 
> +Hibernation using the PSCI SYSTEM_OFF2 call is enabled when PSCI v1.3
> +is enabled. If a guest invokes the PSCI SYSTEM_OFF2 function, KVM will
> +exit to userspace with the KVM_SYSTEM_EVENT_SHUTDOWN event type and with
> +data[0] set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only
> +supported hibernate type for the SYSTEM_OFF2 function is HIBERNATE_OFF
> +0x0).

I don’t think that ‘0x0’ adds something to what’s already explained
before, IMO.

> +
> ::
> 
> /* KVM_EXIT_IOAPIC_EOI */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 964df31da975..66736ff04011 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -484,6 +484,12 @@ enum {
>  */
> #define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (1ULL << 0)
> 
> +/*
> + * Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call.
> + * Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN.
> + */
> +#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 (1ULL << 0)
> +
> /* run->fail_entry.hardware_entry_failure_reason codes. */
> #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED (1ULL << 0)
> 
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 1f69b667332b..df834f2e928e 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -194,6 +194,12 @@ static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
> kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0);
> }
> 
> +static void kvm_psci_system_off2(struct kvm_vcpu *vcpu)
> +{
> + kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN,
> + KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2);
> +}
> +
> static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> {
> kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0);
> @@ -358,6 +364,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 
> minor)
> if (minor >= 1)
> val = 0;
> break;
> + case PSCI_1_3_FN_SYSTEM_OFF2:
> + case PSCI_1_3_FN64_SYSTEM_OFF2:
> + if (minor >= 3)
> + val = PSCI_1_3_OFF_TYPE_HIBERNATE_OFF;
> + break;
> }
> break;
> case PSCI_1_0_FN_SYSTEM_SUSPEND:
> @@ -392,6 +403,39 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 
> minor)
> break;
> }
> break;
> + case PSCI_1_3_FN_SYSTEM_OFF2:
> + kvm_psci_narrow_to_32bit(vcpu);
> + fallthrough;
> + case PSCI_1_3_FN64_SYSTEM_OFF2:
> + if (minor < 3)
> + break;
> +
> + arg = smccc_get_arg1(vcpu);
> + /*
> + * PSCI v1.3 issue F.b requires that zero be accepted to mean
> + * HIBERNATE_OFF (in line with pre-publication versions of the
> + * spec, and thus some actual implementations in the wild).
> + * The second argument must be zero.
> + */
> + if ((arg && arg != PSCI_1_3_OFF_TYPE_HIBERNATE_OFF) ||
> +smccc_get_arg2(vcpu) != 0) {
> + val = PSCI_RET_INVALID_PARAMS;
> + break;
> + }
> + kvm_psci_system_off2(vcpu);
> + /*
> + * We shouldn't be going back to guest VCPU after
> + *

Re: [PATCH 0/1] remoteproc documentation changes

2024-10-23 Thread Jonathan Corbet
anish kumar  writes:

> This patch series transitions the documentation
> for remoteproc from the staging directory to the
> mainline kernel. It introduces both kernel and
> user-space APIs, enhancing the overall documentation
> quality.
>
> V4:
> Fixed compilation errors and moved documentation to
> driver-api directory.
>
> V3:
> Seperated out the patches further to make the intention
> clear for each patch.
>
> V2:
> Reported-by: kernel test robot 
> Closes: 
> https://lore.kernel.org/oe-kbuild-all/202410161444.jokmsogs-...@intel.com/

So I think you could make better use of kerneldoc comments for a number
of your APIs and structures - a project for the future.  I can't judge
the remoteproc aspects of this, but from a documentation mechanics point
of view, this looks about ready to me.  In the absence of objections
I'll apply it in the near future.

Thanks,

jon



[PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function

2024-10-23 Thread Suren Baghdasaryan
Implement a helper function to disable memory allocation profiling and
use it when creation of /proc/allocinfo fails.
Ensure /proc/allocinfo does not get created when memory allocation
profiling is disabled.

Signed-off-by: Suren Baghdasaryan 
---
 lib/alloc_tag.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 81e5f9a70f22..435aa837e550 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -8,6 +8,14 @@
 #include 
 #include 
 
+#define ALLOCINFO_FILE_NAME"allocinfo"
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+static bool mem_profiling_support __meminitdata = true;
+#else
+static bool mem_profiling_support __meminitdata;
+#endif
+
 static struct codetag_type *alloc_tag_cttype;
 
 DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
@@ -144,9 +152,26 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, 
size_t count, bool can_sl
return nr;
 }
 
+static void __init shutdown_mem_profiling(void)
+{
+   if (mem_alloc_profiling_enabled())
+   static_branch_disable(&mem_alloc_profiling_key);
+
+   if (!mem_profiling_support)
+   return;
+
+   mem_profiling_support = false;
+}
+
 static void __init procfs_init(void)
 {
-   proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
+   if (!mem_profiling_support)
+   return;
+
+   if (!proc_create_seq(ALLOCINFO_FILE_NAME, 0400, NULL, 
&allocinfo_seq_op)) {
+   pr_err("Failed to create %s file\n", ALLOCINFO_FILE_NAME);
+   shutdown_mem_profiling();
+   }
 }
 
 static bool alloc_tag_module_unload(struct codetag_type *cttype,
@@ -174,12 +199,6 @@ static bool alloc_tag_module_unload(struct codetag_type 
*cttype,
return module_unused;
 }
 
-#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
-static bool mem_profiling_support __meminitdata = true;
-#else
-static bool mem_profiling_support __meminitdata;
-#endif
-
 static int __init setup_early_mem_profiling(char *str)
 {
bool enable;
-- 
2.47.0.105.g07ac214952-goog




Re: [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper

2024-10-23 Thread Pasha Tatashin
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan  wrote:
>
> Add mas_for_each_rev() function to iterate maple tree nodes in reverse
> order.
>
> Suggested-by: Liam R. Howlett 
> Signed-off-by: Suren Baghdasaryan 
> Reviewed-by: Liam R. Howlett 

Reviewed-by: Pasha Tatashin 

> ---
>  include/linux/maple_tree.h | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index 61c236850ca8..cbbcd18d4186 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state 
> *mas)
>  #define mas_for_each(__mas, __entry, __max) \
> while (((__entry) = mas_find((__mas), (__max))) != NULL)
>
> +/**
> + * mas_for_each_rev() - Iterate over a range of the maple tree in reverse 
> order.
> + * @__mas: Maple Tree operation state (maple_state)
> + * @__entry: Entry retrieved from the tree
> + * @__min: minimum index to retrieve from the tree
> + *
> + * When returned, mas->index and mas->last will hold the entire range for the
> + * entry.
> + *
> + * Note: may return the zero entry.
> + */
> +#define mas_for_each_rev(__mas, __entry, __min) \
> +   while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
> +
>  #ifdef CONFIG_DEBUG_MAPLE_TREE
>  enum mt_dump_format {
> mt_dump_dec,
> --
> 2.47.0.105.g07ac214952-goog
>



Re: [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references

2024-10-23 Thread Pasha Tatashin
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan  wrote:
>
> To simplify later changes to page tag references, introduce new
> pgtag_ref_handle type. This allows easy replacement of page_ext
> as a storage of page allocation tags.
>
> Signed-off-by: Suren Baghdasaryan 

Reviewed-by: Pasha Tatashin 



Re: [PATCH 0/1] remoteproc documentation changes

2024-10-23 Thread Jonathan Corbet
Jonathan Corbet  writes:

> anish kumar  writes:
>
>> This patch series transitions the documentation
>> for remoteproc from the staging directory to the
>> mainline kernel. It introduces both kernel and
>> user-space APIs, enhancing the overall documentation
>> quality.
>>
>> V4:
>> Fixed compilation errors and moved documentation to
>> driver-api directory.
>>
>> V3:
>> Seperated out the patches further to make the intention
>> clear for each patch.
>>
>> V2:
>> Reported-by: kernel test robot 
>> Closes: 
>> https://lore.kernel.org/oe-kbuild-all/202410161444.jokmsogs-...@intel.com/
>
> So I think you could make better use of kerneldoc comments for a number
> of your APIs and structures - a project for the future.  I can't judge
> the remoteproc aspects of this, but from a documentation mechanics point
> of view, this looks about ready to me.  In the absence of objections
> I'll apply it in the near future.

One other question, actually - what kernel version did you make these
patches against?  It looks like something rather old...?

Thanks,

jon



Re: [PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function

2024-10-23 Thread Pasha Tatashin
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan  wrote:
>
> Implement a helper function to disable memory allocation profiling and
> use it when creation of /proc/allocinfo fails.
> Ensure /proc/allocinfo does not get created when memory allocation
> profiling is disabled.
>
> Signed-off-by: Suren Baghdasaryan 

Reviewed-by: Pasha Tatashin 



Re: [PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory

2024-10-23 Thread Pasha Tatashin
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan  wrote:
>
> When a module gets unloaded there is a possibility that some of the
> allocations it made are still used and therefore the allocation tags
> corresponding to these allocations are still referenced. As such, the
> memory for these tags can't be freed. This is currently handled as an
> abnormal situation and module's data section is not being unloaded.
> To handle this situation without keeping module's data in memory,
> allow codetags with longer lifespan than the module to be loaded into
> their own separate memory. The in-use memory areas and gaps after
> module unloading in this separate memory are tracked using maple trees.
> Allocation tags arrange their separate memory so that it is virtually
> contiguous and that will allow simple allocation tag indexing later on
> in this patchset. The size of this virtually contiguous memory is set
> to store up to 10 allocation tags.
>
> Signed-off-by: Suren Baghdasaryan 

Reviewed-by: Pasha Tatashin 

> ---
>  include/asm-generic/codetag.lds.h |  19 +++
>  include/linux/alloc_tag.h |  13 +-
>  include/linux/codetag.h   |  37 -
>  kernel/module/main.c  |  80 ++
>  lib/alloc_tag.c   | 249 +++---
>  lib/codetag.c | 100 +++-
>  scripts/module.lds.S  |   5 +-
>  7 files changed, 441 insertions(+), 62 deletions(-)
>
> diff --git a/include/asm-generic/codetag.lds.h 
> b/include/asm-generic/codetag.lds.h
> index 64f536b80380..372c320c5043 100644
> --- a/include/asm-generic/codetag.lds.h
> +++ b/include/asm-generic/codetag.lds.h
> @@ -11,4 +11,23 @@
>  #define CODETAG_SECTIONS() \
> SECTION_WITH_BOUNDARIES(alloc_tags)
>
> +/*
> + * Module codetags which aren't used after module unload, therefore have the
> + * same lifespan as the module and can be safely unloaded with the module.
> + */
> +#define MOD_CODETAG_SECTIONS()
> +
> +#define MOD_SEPARATE_CODETAG_SECTION(_name)\
> +   .codetag.##_name : {\
> +   SECTION_WITH_BOUNDARIES(_name)  \
> +   }
> +
> +/*
> + * For codetags which might be used after module unload, therefore might stay
> + * longer in memory. Each such codetag type has its own section so that we 
> can
> + * unload them individually once unused.
> + */
> +#define MOD_SEPARATE_CODETAG_SECTIONS()\
> +   MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
> +
>  #endif /* __ASM_GENERIC_CODETAG_LDS_H */
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index 1f0a9ff23a2c..7431757999c5 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -30,6 +30,13 @@ struct alloc_tag {
> struct alloc_tag_counters __percpu  *counters;
>  } __aligned(8);
>
> +struct alloc_tag_module_section {
> +   unsigned long start_addr;
> +   unsigned long end_addr;
> +   /* used size */
> +   unsigned long size;
> +};
> +
>  #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
>
>  #define CODETAG_EMPTY  ((void *)1)
> @@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref 
> *ref) {}
>
>  #ifdef CONFIG_MEM_ALLOC_PROFILING
>
> +#define ALLOC_TAG_SECTION_NAME "alloc_tags"
> +
>  struct codetag_bytes {
> struct codetag *ct;
> s64 bytes;
> @@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, 
> _shared_alloc_tag);
>
>  #define DEFINE_ALLOC_TAG(_alloc_tag) 
>   \
> static struct alloc_tag _alloc_tag __used __aligned(8)
>   \
> -   __section("alloc_tags") = {   
>   \
> +   __section(ALLOC_TAG_SECTION_NAME) = { 
>   \
> .ct = CODE_TAG_INIT,  
>   \
> .counters = &_shared_alloc_tag };
>
> @@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, 
> _shared_alloc_tag);
>  #define DEFINE_ALLOC_TAG(_alloc_tag) 
>   \
> static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);
>   \
> static struct alloc_tag _alloc_tag __used __aligned(8)
>   \
> -   __section("alloc_tags") = {   
>   \
> +   __section(ALLOC_TAG_SECTION_NAME) = { 
>   \
> .ct = CODE_TAG_INIT,  
>   \
> .counters = &_alloc_tag_cntr };
>
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index c2a579ccd455..d10bd9810d32 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -35,8 +35,15 @@ struct codetag_type_desc {
> size_t tag_size;
> void (*module_load)(struct codetag_type *cttype,
> struct codetag_module *cmod);
> -   bool (*module_unload)(s

Re: [PATCH v4 6/6] alloc_tag: support for page allocation tag compression

2024-10-23 Thread Pasha Tatashin
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan  wrote:
>
> Implement support for storing page allocation tag references directly
> in the page flags instead of page extensions. sysctl.vm.mem_profiling
> boot parameter it extended to provide a way for a user to request this
> mode. Enabling compression eliminates memory overhead caused by page_ext
> and results in better performance for page allocations. However this
> mode will not work if the number of available page flag bits is
> insufficient to address all kernel allocations. Such condition can
> happen during boot or when loading a module. If this condition is
> detected, memory allocation profiling gets disabled with an appropriate
> warning. By default compression mode is disabled.
>
> Signed-off-by: Suren Baghdasaryan 

Thank you very much Suren for doing this work. This is a very
significant improvement for the fleet users.

Reviewed-by: Pasha Tatashin 



Re: [PATCH v4 4/6] alloc_tag: populate memory for module tags as needed

2024-10-23 Thread Pasha Tatashin
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan  wrote:
>
> The memory reserved for module tags does not need to be backed by
> physical pages until there are tags to store there. Change the way
> we reserve this memory to allocate only virtual area for the tags
> and populate it with physical pages as needed when we load a module.
>
> Signed-off-by: Suren Baghdasaryan 

Reviewed-by: Pasha Tatashin 



[PATCH v4 0/6] page allocation tag compression

2024-10-23 Thread Suren Baghdasaryan
This patchset implements several improvements:
1. Gracefully handles module unloading while there are used allocations
allocated from that module;
2. Provides an option to store page allocation tag references in the
page flags, removing dependency on page extensions and eliminating the
memory overhead from storing page allocation references (~0.2% of total
system memory). This also improves page allocation performance when
CONFIG_MEM_ALLOC_PROFILING is enabled by eliminating page extension
lookup. Page allocation performance overhead is reduced from 41% to 5.5%.

Patch #1 introduces mas_for_each_rev() helper function.

Patch #2 introduces shutdown_mem_profiling() helper function to be used
when disabling memory allocation profiling.

Patch #3 copies module tags into virtually contiguous memory which
serves two purposes:
- Lets us deal with the situation when module is unloaded while there
are still live allocations from that module. Since we are using a copy
version of the tags we can safely unload the module. Space and gaps in
this contiguous memory are managed using a maple tree.
- Enables simple indexing of the tags in the later patches.

Patch #4 changes the way we allocate virtually contiguous memory for
module tags to reserve only vitrual area and populate physical pages
only as needed at module load time.

Patch #5 abstracts page allocation tag reference to simplify later
changes.

Patch #6 adds compression option to the sysctl.vm.mem_profiling boot
parameter for storing page allocation tag references inside page flags
if they fit. If the number of available page flag bits is insufficient
to address all kernel allocations, memory allocation profiling gets
disabled with an appropriate warning.

Patchset applies to mm-unstable.

Changes since v3 [1]:
- rebased over Mike's patchset in mm-unstable
- added Reviewed-by, per Liam Howlett
- limited execmem_vmap to work with EXECMEM_MODULE_DATA only,
per Mike Rapoport
- moved __get_vm_area_node() declaration into mm/internal.h,
per Mike Rapoport
- split parts of reserve_module_tags() into helper functions to make it
more readable, per Mike Rapoport
- introduced shutdown_mem_profiling() to be used when disabling memory
allocation profiling
- replaced CONFIG_PGALLOC_TAG_USE_PAGEFLAGS with a new boot parameter
option, per Michal Hocko
- minor code cleanups and refactoring to make the code more readable
- added VMALLOC and MODULE SUPPORT reviewers I missed before

[1] https://lore.kernel.org/all/20241014203646.1952505-1-sur...@google.com/

Suren Baghdasaryan (6):
  maple_tree: add mas_for_each_rev() helper
  alloc_tag: introduce shutdown_mem_profiling helper function
  alloc_tag: load module tags into separate contiguous memory
  alloc_tag: populate memory for module tags as needed
  alloc_tag: introduce pgtag_ref_handle to abstract page tag references
  alloc_tag: support for page allocation tag compression

 Documentation/mm/allocation-profiling.rst |   7 +-
 include/asm-generic/codetag.lds.h |  19 +
 include/linux/alloc_tag.h |  21 +-
 include/linux/codetag.h   |  40 +-
 include/linux/execmem.h   |  10 +
 include/linux/maple_tree.h|  14 +
 include/linux/mm.h|  25 +-
 include/linux/page-flags-layout.h |   7 +
 include/linux/pgalloc_tag.h   | 197 +++--
 include/linux/vmalloc.h   |   3 +
 kernel/module/main.c  |  80 ++--
 lib/alloc_tag.c   | 467 --
 lib/codetag.c | 104 -
 mm/execmem.c  |  16 +
 mm/internal.h |   6 +
 mm/mm_init.c  |   5 +-
 mm/vmalloc.c  |   4 +-
 scripts/module.lds.S  |   5 +-
 18 files changed, 903 insertions(+), 127 deletions(-)


base-commit: b5d43fad926a3f542cd06f3c9d286f6f489f7129
-- 
2.47.0.105.g07ac214952-goog




[PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper

2024-10-23 Thread Suren Baghdasaryan
Add mas_for_each_rev() function to iterate maple tree nodes in reverse
order.

Suggested-by: Liam R. Howlett 
Signed-off-by: Suren Baghdasaryan 
Reviewed-by: Liam R. Howlett 
---
 include/linux/maple_tree.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index 61c236850ca8..cbbcd18d4186 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
 #define mas_for_each(__mas, __entry, __max) \
while (((__entry) = mas_find((__mas), (__max))) != NULL)
 
+/**
+ * mas_for_each_rev() - Iterate over a range of the maple tree in reverse 
order.
+ * @__mas: Maple Tree operation state (maple_state)
+ * @__entry: Entry retrieved from the tree
+ * @__min: minimum index to retrieve from the tree
+ *
+ * When returned, mas->index and mas->last will hold the entire range for the
+ * entry.
+ *
+ * Note: may return the zero entry.
+ */
+#define mas_for_each_rev(__mas, __entry, __min) \
+   while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
+
 #ifdef CONFIG_DEBUG_MAPLE_TREE
 enum mt_dump_format {
mt_dump_dec,
-- 
2.47.0.105.g07ac214952-goog




[PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory

2024-10-23 Thread Suren Baghdasaryan
When a module gets unloaded there is a possibility that some of the
allocations it made are still used and therefore the allocation tags
corresponding to these allocations are still referenced. As such, the
memory for these tags can't be freed. This is currently handled as an
abnormal situation and module's data section is not being unloaded.
To handle this situation without keeping module's data in memory,
allow codetags with longer lifespan than the module to be loaded into
their own separate memory. The in-use memory areas and gaps after
module unloading in this separate memory are tracked using maple trees.
Allocation tags arrange their separate memory so that it is virtually
contiguous and that will allow simple allocation tag indexing later on
in this patchset. The size of this virtually contiguous memory is set
to store up to 10 allocation tags.

Signed-off-by: Suren Baghdasaryan 
---
 include/asm-generic/codetag.lds.h |  19 +++
 include/linux/alloc_tag.h |  13 +-
 include/linux/codetag.h   |  37 -
 kernel/module/main.c  |  80 ++
 lib/alloc_tag.c   | 249 +++---
 lib/codetag.c | 100 +++-
 scripts/module.lds.S  |   5 +-
 7 files changed, 441 insertions(+), 62 deletions(-)

diff --git a/include/asm-generic/codetag.lds.h 
b/include/asm-generic/codetag.lds.h
index 64f536b80380..372c320c5043 100644
--- a/include/asm-generic/codetag.lds.h
+++ b/include/asm-generic/codetag.lds.h
@@ -11,4 +11,23 @@
 #define CODETAG_SECTIONS() \
SECTION_WITH_BOUNDARIES(alloc_tags)
 
+/*
+ * Module codetags which aren't used after module unload, therefore have the
+ * same lifespan as the module and can be safely unloaded with the module.
+ */
+#define MOD_CODETAG_SECTIONS()
+
+#define MOD_SEPARATE_CODETAG_SECTION(_name)\
+   .codetag.##_name : {\
+   SECTION_WITH_BOUNDARIES(_name)  \
+   }
+
+/*
+ * For codetags which might be used after module unload, therefore might stay
+ * longer in memory. Each such codetag type has its own section so that we can
+ * unload them individually once unused.
+ */
+#define MOD_SEPARATE_CODETAG_SECTIONS()\
+   MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
+
 #endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 1f0a9ff23a2c..7431757999c5 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,13 @@ struct alloc_tag {
struct alloc_tag_counters __percpu  *counters;
 } __aligned(8);
 
+struct alloc_tag_module_section {
+   unsigned long start_addr;
+   unsigned long end_addr;
+   /* used size */
+   unsigned long size;
+};
+
 #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
 
 #define CODETAG_EMPTY  ((void *)1)
@@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) 
{}
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
+#define ALLOC_TAG_SECTION_NAME "alloc_tags"
+
 struct codetag_bytes {
struct codetag *ct;
s64 bytes;
@@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 
 #define DEFINE_ALLOC_TAG(_alloc_tag)   
\
static struct alloc_tag _alloc_tag __used __aligned(8)  
\
-   __section("alloc_tags") = { 
\
+   __section(ALLOC_TAG_SECTION_NAME) = {   
\
.ct = CODE_TAG_INIT,
\
.counters = &_shared_alloc_tag };
 
@@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
 #define DEFINE_ALLOC_TAG(_alloc_tag)   
\
static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);  
\
static struct alloc_tag _alloc_tag __used __aligned(8)  
\
-   __section("alloc_tags") = { 
\
+   __section(ALLOC_TAG_SECTION_NAME) = {   
\
.ct = CODE_TAG_INIT,
\
.counters = &_alloc_tag_cntr };
 
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c2a579ccd455..d10bd9810d32 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -35,8 +35,15 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
-   bool (*module_unload)(struct codetag_type *cttype,
+   void (*module_unload)(struct codetag_type *cttype,
  struct codetag_module *cmod);
+#ifdef CONFIG_MODULES
+   void (*module_replaced)(struct module *mod, struct module *new_mod);
+   bool (*needs_section_mem)(struct module *mod, unsigned long size);
+

[PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references

2024-10-23 Thread Suren Baghdasaryan
To simplify later changes to page tag references, introduce new
pgtag_ref_handle type. This allows easy replacement of page_ext
as a storage of page allocation tags.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h  | 25 +-
 include/linux/pgalloc_tag.h | 92 ++---
 2 files changed, 67 insertions(+), 50 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5cd22303fbc0..8efb4a6a1a70 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4180,37 +4180,38 @@ static inline void pgalloc_tag_split(struct folio 
*folio, int old_order, int new
return;
 
for (i = nr_pages; i < (1 << old_order); i += nr_pages) {
-   union codetag_ref *ref = get_page_tag_ref(folio_page(folio, i));
+   union pgtag_ref_handle handle;
+   union codetag_ref ref;
 
-   if (ref) {
+   if (get_page_tag_ref(folio_page(folio, i), &ref, &handle)) {
/* Set new reference to point to the original tag */
-   alloc_tag_ref_set(ref, tag);
-   put_page_tag_ref(ref);
+   alloc_tag_ref_set(&ref, tag);
+   update_page_tag_ref(handle, &ref);
+   put_page_tag_ref(handle);
}
}
 }
 
 static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
 {
+   union pgtag_ref_handle handle;
+   union codetag_ref ref;
struct alloc_tag *tag;
-   union codetag_ref *ref;
 
tag = pgalloc_tag_get(&old->page);
if (!tag)
return;
 
-   ref = get_page_tag_ref(&new->page);
-   if (!ref)
+   if (!get_page_tag_ref(&new->page, &ref, &handle))
return;
 
/* Clear the old ref to the original allocation tag. */
clear_page_tag_ref(&old->page);
/* Decrement the counters of the tag on get_new_folio. */
-   alloc_tag_sub(ref, folio_nr_pages(new));
-
-   __alloc_tag_ref_set(ref, tag);
-
-   put_page_tag_ref(ref);
+   alloc_tag_sub(&ref, folio_nr_pages(new));
+   __alloc_tag_ref_set(&ref, tag);
+   update_page_tag_ref(handle, &ref);
+   put_page_tag_ref(handle);
 }
 #else /* !CONFIG_MEM_ALLOC_PROFILING */
 static inline void pgalloc_tag_split(struct folio *folio, int old_order, int 
new_order)
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 59a3deb792a8..b13cd3313a88 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -11,46 +11,59 @@
 
 #include 
 
+union pgtag_ref_handle {
+   union codetag_ref *ref; /* reference in page extension */
+};
+
 extern struct page_ext_operations page_alloc_tagging_ops;
 
-static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext 
*page_ext)
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
+   union pgtag_ref_handle *handle)
 {
-   return (union codetag_ref *)page_ext_data(page_ext, 
&page_alloc_tagging_ops);
-}
+   struct page_ext *page_ext;
+   union codetag_ref *tmp;
 
-static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref 
*ref)
-{
-   return (void *)ref - page_alloc_tagging_ops.offset;
+   if (!page)
+   return false;
+
+   page_ext = page_ext_get(page);
+   if (!page_ext)
+   return false;
+
+   tmp = (union codetag_ref *)page_ext_data(page_ext, 
&page_alloc_tagging_ops);
+   ref->ct = tmp->ct;
+   handle->ref = tmp;
+   return true;
 }
 
-/* Should be called only if mem_alloc_profiling_enabled() */
-static inline union codetag_ref *get_page_tag_ref(struct page *page)
+static inline void put_page_tag_ref(union pgtag_ref_handle handle)
 {
-   if (page) {
-   struct page_ext *page_ext = page_ext_get(page);
+   if (WARN_ON(!handle.ref))
+   return;
 
-   if (page_ext)
-   return codetag_ref_from_page_ext(page_ext);
-   }
-   return NULL;
+   page_ext_put((void *)handle.ref - page_alloc_tagging_ops.offset);
 }
 
-static inline void put_page_tag_ref(union codetag_ref *ref)
+static inline void update_page_tag_ref(union pgtag_ref_handle handle,
+  union codetag_ref *ref)
 {
-   if (WARN_ON(!ref))
+   if (WARN_ON(!handle.ref || !ref))
return;
 
-   page_ext_put(page_ext_from_codetag_ref(ref));
+   handle.ref->ct = ref->ct;
 }
 
 static inline void clear_page_tag_ref(struct page *page)
 {
if (mem_alloc_profiling_enabled()) {
-   union codetag_ref *ref = get_page_tag_ref(page);
+   union pgtag_ref_handle handle;
+   union codetag_ref ref;
 
-   if (ref) {
-   set_codetag_empty(ref);
-   put_pag

[PATCH v4 4/6] alloc_tag: populate memory for module tags as needed

2024-10-23 Thread Suren Baghdasaryan
The memory reserved for module tags does not need to be backed by
physical pages until there are tags to store there. Change the way
we reserve this memory to allocate only virtual area for the tags
and populate it with physical pages as needed when we load a module.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/execmem.h | 10 ++
 include/linux/vmalloc.h |  3 ++
 lib/alloc_tag.c | 73 -
 mm/execmem.c| 16 +
 mm/internal.h   |  6 
 mm/vmalloc.c|  4 +--
 6 files changed, 101 insertions(+), 11 deletions(-)

diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 1517fa196bf7..5a5e2917f870 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -139,6 +139,16 @@ void *execmem_alloc(enum execmem_type type, size_t size);
  */
 void execmem_free(void *ptr);
 
+/**
+ * execmem_vmap - create virtual mapping for EXECMEM_MODULE_DATA memory
+ * @size: size of the virtual mapping in bytes
+ *
+ * Maps virtually contiguous area in the range suitable for 
EXECMEM_MODULE_DATA.
+ *
+ * Return: the area descriptor on success or %NULL on failure.
+ */
+struct vm_struct *execmem_vmap(size_t size);
+
 /**
  * execmem_update_copy - copy an update to executable memory
  * @dst:  destination address to update
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 27408f21e501..31e9ffd936e3 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -202,6 +202,9 @@ extern int remap_vmalloc_range_partial(struct 
vm_area_struct *vma,
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
 
+int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot,
+struct page **pages, unsigned int page_shift);
+
 /*
  * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED 
values
  * and let generic vmalloc and ioremap code know when 
arch_sync_kernel_mappings()
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d9f51169ffeb..061e43196247 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -8,14 +8,15 @@
 #include 
 #include 
 #include 
+#include 
 
 #define ALLOCINFO_FILE_NAME"allocinfo"
 #define MODULE_ALLOC_TAG_VMAP_SIZE (10UL * sizeof(struct alloc_tag))
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
-static bool mem_profiling_support __meminitdata = true;
+static bool mem_profiling_support = true;
 #else
-static bool mem_profiling_support __meminitdata;
+static bool mem_profiling_support;
 #endif
 
 static struct codetag_type *alloc_tag_cttype;
@@ -154,7 +155,7 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, 
size_t count, bool can_sl
return nr;
 }
 
-static void __init shutdown_mem_profiling(void)
+static void shutdown_mem_profiling(void)
 {
if (mem_alloc_profiling_enabled())
static_branch_disable(&mem_alloc_profiling_key);
@@ -179,6 +180,7 @@ static void __init procfs_init(void)
 #ifdef CONFIG_MODULES
 
 static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, 
MT_FLAGS_ALLOC_RANGE);
+static struct vm_struct *vm_module_tags;
 /* A dummy object used to indicate an unloaded module */
 static struct module unloaded_mod;
 /* A dummy object used to indicate a module prepended area */
@@ -252,6 +254,33 @@ static bool find_aligned_area(struct ma_state *mas, 
unsigned long section_size,
return false;
 }
 
+static int vm_module_tags_populate(void)
+{
+   unsigned long phys_size = vm_module_tags->nr_pages << PAGE_SHIFT;
+
+   if (phys_size < module_tags.size) {
+   struct page **next_page = vm_module_tags->pages + 
vm_module_tags->nr_pages;
+   unsigned long addr = module_tags.start_addr + phys_size;
+   unsigned long more_pages;
+   unsigned long nr;
+
+   more_pages = ALIGN(module_tags.size - phys_size, PAGE_SIZE) >> 
PAGE_SHIFT;
+   nr = alloc_pages_bulk_array_node(GFP_KERNEL | __GFP_NOWARN,
+NUMA_NO_NODE, more_pages, 
next_page);
+   if (nr < more_pages ||
+   vmap_pages_range(addr, addr + (nr << PAGE_SHIFT), 
PAGE_KERNEL,
+next_page, PAGE_SHIFT) < 0) {
+   /* Clean up and error out */
+   for (int i = 0; i < nr; i++)
+   __free_page(next_page[i]);
+   return -ENOMEM;
+   }
+   vm_module_tags->nr_pages += nr;
+   }
+
+   return 0;
+}
+
 static void *reserve_module_tags(struct module *mod, unsigned long size,
 unsigned int prepend, unsigned long align)
 {
@@ -310,8 +339,18 @@ static void *reserve_module_tags(struct module *mod, 
unsigned long size,
if (IS_ERR(ret))
return ret;
 
-   if (module_tags

Re: [PATCH v6 1/6] firmware/psci: Add definitions for PSCI v1.3 specification

2024-10-23 Thread Miguel Luis



> On 19 Oct 2024, at 17:15, David Woodhouse  wrote:
> 
> From: David Woodhouse 
> 
> The v1.3 PSCI spec (https://developer.arm.com/documentation/den0022) adds
> the SYSTEM_OFF2 function. Add definitions for it and its hibernation type
> parameter.
> 
> Signed-off-by: David Woodhouse 
> ---
> include/uapi/linux/psci.h | 5 +
> 1 file changed, 5 insertions(+)
> 
> diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
> index 42a40ad3fb62..81759ff385e6 100644
> --- a/include/uapi/linux/psci.h
> +++ b/include/uapi/linux/psci.h
> @@ -59,6 +59,7 @@
> #define PSCI_1_1_FN_SYSTEM_RESET2 PSCI_0_2_FN(18)
> #define PSCI_1_1_FN_MEM_PROTECT PSCI_0_2_FN(19)
> #define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN(20)
> +#define PSCI_1_3_FN_SYSTEM_OFF2 PSCI_0_2_FN(21)
> 
> #define PSCI_1_0_FN64_CPU_DEFAULT_SUSPEND PSCI_0_2_FN64(12)
> #define PSCI_1_0_FN64_NODE_HW_STATE PSCI_0_2_FN64(13)
> @@ -68,6 +69,7 @@
> 
> #define PSCI_1_1_FN64_SYSTEM_RESET2 PSCI_0_2_FN64(18)
> #define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN64(20)
> +#define PSCI_1_3_FN64_SYSTEM_OFF2 PSCI_0_2_FN64(21)
> 
> /* PSCI v0.2 power state encoding for CPU_SUSPEND function */
> #define PSCI_0_2_POWER_STATE_ID_MASK 0x
> @@ -100,6 +102,9 @@
> #define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET 0
> #define PSCI_1_1_RESET_TYPE_VENDOR_START 0x8000U
> 
> +/* PSCI v1.3 hibernate type for SYSTEM_OFF2 */
> +#define PSCI_1_3_OFF_TYPE_HIBERNATE_OFF BIT(0)
> +

Reviewed-by: Miguel Luis 

> /* PSCI version decoding (independent of PSCI version) */
> #define PSCI_VERSION_MAJOR_SHIFT 16
> #define PSCI_VERSION_MINOR_MASK \
> -- 
> 2.44.0
> 




Re: [PATCH 07/12] huge_memory: Allow mappings of PMD sized pages

2024-10-23 Thread Alistair Popple


Alistair Popple  writes:

> Alistair Popple wrote:
>> Dan Williams  writes:

[...]

>>> +
>>> +   return VM_FAULT_NOPAGE;
>>> +}
>>> +EXPORT_SYMBOL_GPL(dax_insert_pfn_pmd);
>>
>> Like I mentioned before, lets make the exported function
>> vmf_insert_folio() and move the pte, pmd, pud internal private / static
>> details of the implementation. The "dax_" specific aspect of this was
>> removed at the conversion of a dax_pfn to a folio.
>
> Ok, let me try that. Note that vmf_insert_pfn{_pmd|_pud} will have to
> stick around though.

Creating a single vmf_insert_folio() seems somewhat difficult because it
needs to be called from multiple fault paths (either PTE, PMD or PUD
fault) and do something different for each.

Specifically the issue I ran into is that DAX does not downgrade PMD
entries to PTE entries if they are backed by storage. So the PTE fault
handler will get a PMD-sized DAX entry and therefore a PMD size folio.

The way I tried implementing vmf_insert_folio() was to look at
folio_order() to determine which internal implementation to call. But
that doesn't work for a PTE fault, because there's no way to determine
if we should PTE map a subpage or PMD map the entire folio.

We could pass down some context as to what type of fault we're handling,
or add it to the vmf struct, but that seems excessive given callers
already know this and could just call a specific
vmf_insert_page_{pte|pmd|pud}.



[PATCH v5 1/7] Add AutoFDO support for Clang build

2024-10-23 Thread Rong Xu
Add the build support for using Clang's AutoFDO. Building the kernel
with AutoFDO does not reduce the optimization level from the
compiler. AutoFDO uses hardware sampling to gather information about
the frequency of execution of different code paths within a binary.
This information is then used to guide the compiler's optimization
decisions, resulting in a more efficient binary. Experiments
showed that the kernel can improve up to 10% in latency.

The support requires a Clang compiler after LLVM 17. This submission
is limited to x86 platforms that support PMU features like LBR on
Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1,
 and BRBE on ARM 1 is part of planned future work.

Here is an example workflow for AutoFDO kernel:

1) Build the kernel on the host machine with LLVM enabled, for example,
   $ make menuconfig LLVM=1
Turn on AutoFDO build config:
  CONFIG_AUTOFDO_CLANG=y
With a configuration that has LLVM enabled, use the following
command:
   scripts/config -e AUTOFDO_CLANG
After getting the config, build with
  $ make LLVM=1

2) Install the kernel on the test machine.

3) Run the load tests. The '-c' option in perf specifies the sample
   event period. We suggest using a suitable prime number,
   like 59, for this purpose.
   For Intel platforms:
  $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c  \
-o  -- 
   For AMD platforms:
  The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
 For Zen3:
  $ cat proc/cpuinfo | grep " brs"
  For Zen4:
  $ cat proc/cpuinfo | grep amd_lbr_v2
  $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
-N -b -c  -o  -- 

4) (Optional) Download the raw perf file to the host machine.

5) To generate an AutoFDO profile, two offline tools are available:
   create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
   of the AutoFDO project and can be found on GitHub
   (https://github.com/google/autofdo), version v0.30.1 or later. The
   llvm_profgen tool is included in the LLVM compiler itself. It's
   important to note that the version of llvm_profgen doesn't need to
   match the version of Clang. It needs to be the LLVM 19 release or
   later, or from the LLVM trunk.
  $ llvm-profgen --kernel --binary= --perfdata= \
-o 
   or
  $ create_llvm_prof --binary= --profile= \
--format=extbinary --out=

   Note that multiple AutoFDO profile files can be merged into one via:
  $ llvm-profdata merge -o... 

6) Rebuild the kernel using the AutoFDO profile file with the same config
   as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled):
  $ make LLVM=1 CLANG_AUTOFDO_PROFILE=

Co-developed-by: Han Shen 
Signed-off-by: Han Shen 
Signed-off-by: Rong Xu 
Suggested-by: Sriraman Tallam 
Suggested-by: Krzysztof Pszeniczny 
Suggested-by: Nick Desaulniers 
Suggested-by: Stephane Eranian 
Tested-by: Yonghong Song 
---
 Documentation/dev-tools/autofdo.rst | 167 
 Documentation/dev-tools/index.rst   |   1 +
 MAINTAINERS |   7 ++
 Makefile|   1 +
 arch/Kconfig|  20 
 arch/x86/Kconfig|   1 +
 scripts/Makefile.autofdo|  22 
 scripts/Makefile.lib|  10 ++
 tools/objtool/check.c   |   1 +
 9 files changed, 230 insertions(+)
 create mode 100644 Documentation/dev-tools/autofdo.rst
 create mode 100644 scripts/Makefile.autofdo

diff --git a/Documentation/dev-tools/autofdo.rst 
b/Documentation/dev-tools/autofdo.rst
new file mode 100644
index ..9d90e6d79781
--- /dev/null
+++ b/Documentation/dev-tools/autofdo.rst
@@ -0,0 +1,167 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Using AutoFDO with the Linux kernel
+===
+
+This enables AutoFDO build support for the kernel when using
+the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
+is a type of profile-guided optimization (PGO) used to enhance the
+performance of binary executables. It gathers information about the
+frequency of execution of various code paths within a binary using
+hardware sampling. This data is then used to guide the compiler's
+optimization decisions, resulting in a more efficient binary. AutoFDO
+is a powerful optimization technique, and data indicates that it can
+significantly improve kernel performance. It's especially beneficial
+for workloads affected by front-end stalls.
+
+For AutoFDO builds, unlike non-FDO builds, the user must supply a
+profile. Acquiring an AutoFDO profile can be done in several ways.
+AutoFDO profiles are created by converting hardware sampling using
+the "perf" tool. It is crucial that the workload used to create these
+perf files is representative; they must exhibit runtime
+characteristics similar to the workloads that are intended to be
+optimized. Failure to do so

Re: [PATCH 07/12] huge_memory: Allow mappings of PMD sized pages

2024-10-23 Thread Dan Williams
Alistair Popple wrote:
> 
> Alistair Popple  writes:
> 
> > Alistair Popple wrote:
> >> Dan Williams  writes:
> 
> [...]
> 
> >>> +
> >>> + return VM_FAULT_NOPAGE;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(dax_insert_pfn_pmd);
> >>
> >> Like I mentioned before, lets make the exported function
> >> vmf_insert_folio() and move the pte, pmd, pud internal private / static
> >> details of the implementation. The "dax_" specific aspect of this was
> >> removed at the conversion of a dax_pfn to a folio.
> >
> > Ok, let me try that. Note that vmf_insert_pfn{_pmd|_pud} will have to
> > stick around though.
> 
> Creating a single vmf_insert_folio() seems somewhat difficult because it
> needs to be called from multiple fault paths (either PTE, PMD or PUD
> fault) and do something different for each.
> 
> Specifically the issue I ran into is that DAX does not downgrade PMD
> entries to PTE entries if they are backed by storage. So the PTE fault
> handler will get a PMD-sized DAX entry and therefore a PMD size folio.
> 
> The way I tried implementing vmf_insert_folio() was to look at
> folio_order() to determine which internal implementation to call. But
> that doesn't work for a PTE fault, because there's no way to determine
> if we should PTE map a subpage or PMD map the entire folio.

Ah, that conflict makes sense.

> We could pass down some context as to what type of fault we're handling,
> or add it to the vmf struct, but that seems excessive given callers
> already know this and could just call a specific
> vmf_insert_page_{pte|pmd|pud}.

Ok, I think it is good to capture that "because dax does not downgrade
entries it may satisfy PTE faults with PMD inserts", or something like
that in comment or changelog.



[PATCH v5 0/7] Add AutoFDO and Propeller support for Clang build

2024-10-23 Thread Rong Xu
Hi,

This patch series is to integrate AutoFDO and Propeller support into
the Linux kernel. AutoFDO is a profile-guided optimization technique
that leverages hardware sampling to enhance binary performance.
Unlike Instrumentation-based FDO (iFDO), AutoFDO offers a user-friendly
and straightforward application process. While iFDO generally yields
superior profile quality and performance, our findings reveal that
AutoFDO achieves remarkable effectiveness, bringing performance close
to iFDO for benchmark applications.

Propeller is a profile-guided, post-link optimizer that improves
the performance of large-scale applications compiled with LLVM. It
operates by relinking the binary based on an additional round of runtime
profiles, enabling precise optimizations that are not possible at
compile time.  Similar to AutoFDO, Propeller too utilizes hardware
sampling to collect profiles and apply post-link optimizations to improve
the benchmark’s performance over and above AutoFDO.

Our empirical data demonstrates significant performance improvements
with AutoFDO and Propeller, up to 10% on microbenchmarks and up to 5%
on large warehouse-scale benchmarks. This makes a strong case for their
inclusion as supported features in the upstream kernel.

Background

A significant fraction of fleet processing cycles (excluding idle time)
from data center workloads are attributable to the kernel. Ware-house
scale workloads maximize performance by optimizing the production kernel
using iFDO (a.k.a instrumented PGO, Profile Guided Optimization).

iFDO can significantly enhance application performance but its use
within the kernel has raised concerns. AutoFDO is a variant of FDO that
uses the hardware’s Performance Monitoring Unit (PMU) to collect
profiling data. While AutoFDO typically yields smaller performance
gains than iFDO, it presents unique benefits for optimizing kernels.

AutoFDO eliminates the need for instrumented kernels, allowing a single
optimized kernel to serve both execution and profile collection. It also
minimizes slowdown during profile collection, potentially yielding
higher-fidelity profiling, especially for time-sensitive code, compared
to iFDO. Additionally, AutoFDO profiles can be obtained from production
environments via the hardware’s PMU whereas iFDO profiles require
carefully curated load tests that are representative of real-world
traffic.

AutoFDO facilitates profile collection across diverse targets.
Preliminary studies indicate significant variation in kernel hot spots
within Google’s infrastructure, suggesting potential performance gains
through target-specific kernel customization.

Furthermore, other advanced compiler optimization techniques, including
ThinLTO and Propeller can be stacked on top of AutoFDO, similar to iFDO.
ThinLTO achieves better runtime performance through whole-program
analysis and cross module optimizations. The main difference between
traditional LTO and ThinLTO is that the latter is scalable in time and
memory.

This patch series adds AutoFDO and Propeller support to the kernel. The
actual solution comes in six parts:

[P 1] Add the build support for using AutoFDO in Clang

  Add the basic support for AutoFDO build and provide the
  instructions for using AutoFDO.

[P 2] Fix objtool for bogus warnings when -ffunction-sections is enabled

[P 3] Change the subsection ordering when -ffunction-sections is enabled

[P 4] Add markers for text_unlikely and text_hot sections

[P 5] Enable –ffunction-sections for the AutoFDO build

[P 6] Enable Machine Function Split (MFS) optimization for AutoFDO

[P 7] Add Propeller configuration to the kernel build

Patch 1 provides basic AutoFDO build support. Patches 2 to 6 further
enhance the performance of AutoFDO builds and are functionally dependent
on Patch 1. Patch 7 enables support for Propeller and is dependent on
patch 2 to patch 4.

Caveats

AutoFDO is compatible with both GCC and Clang, but the patches in this
series are exclusively applicable to LLVM 17 or newer for AutoFDO and
LLVM 19 or newer for Propeller. For profile conversion, two different
tools could be used, llvm_profgen or create_llvm_prof. llvm_profgen
needs to be the LLVM 19 or newer, or just the LLVM trunk. Alternatively,
create_llvm_prof v0.30.1 or newer can be used instead of llvm-profgen.

Additionally, the build is only supported on x86 platforms equipped
with PMU capabilities, such as LBR on Intel machines. More
specifically:
 * Intel platforms: works on every platform that supports LBR;
   we have tested on Skylake.
 * AMD platforms: tested on AMD Zen3 with the BRS feature. The kernel
   needs to be configured with “CONFIG_PERF_EVENTS_AMD_BRS=y", To
   check, use
   $ cat /proc/cpuinfo | grep “ brs”
   For the AMD Zen4, AMD LBRV2 is supported, but we suspect a bug with
   AMD LBRv2 implementation in Genoa which blocks the usage.

For ARM, we plan to send patches for SPE-based Propeller when
AutoFDO for Arm is ready.

Experiments and Results

Experiments 

[PATCH v5 2/7] objtool: Fix unreachable instruction warnings for weak functions

2024-10-23 Thread Rong Xu
In the presence of both weak and strong function definitions, the
linker drops the weak symbol in favor of a strong symbol, but
leaves the code in place. Code in ignore_unreachable_insn() has
some heuristics to suppress the warning, but it does not work when
-ffunction-sections is enabled.

Suppose function foo has both strong and weak definitions.
Case 1: The strong definition has an annotated section name,
like .init.text. Only the weak definition will be placed into
.text.foo. But since the section has no symbols, there will be no
"hole" in the section.

Case 2: Both sections are without an annotated section name.
Both will be placed into .text.foo section, but there will be only one
symbol (the strong one). If the weak code is before the strong code,
there is no "hole" as it fails to find the right-most symbol before
the offset.

The fix is to use the first node to compute the hole if hole.sym
is empty. If there is no symbol in the section, the first node
will be NULL, in which case, -1 is returned to skip the whole
section.

Co-developed-by: Han Shen 
Signed-off-by: Han Shen 
Signed-off-by: Rong Xu 
Suggested-by: Sriraman Tallam 
Suggested-by: Krzysztof Pszeniczny 
Tested-by: Yonghong Song 
---
 tools/objtool/elf.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 3d27983dc908..6f64d611faea 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -224,12 +224,17 @@ int find_symbol_hole_containing(const struct section 
*sec, unsigned long offset)
if (n)
return 0; /* not a hole */
 
-   /* didn't find a symbol for which @offset is after it */
-   if (!hole.sym)
-   return 0; /* not a hole */
+   /*
+* @offset >= sym->offset + sym->len, find symbol after it.
+* When hole.sym is empty, use the first node to compute the hole.
+* If there is no symbol in the section, the first node will be NULL,
+* in which case, -1 is returned to skip the whole section.
+*/
+   if (hole.sym)
+   n = rb_next(&hole.sym->node);
+   else
+   n = rb_first_cached(&sec->symbol_tree);
 
-   /* @offset >= sym->offset + sym->len, find symbol after it */
-   n = rb_next(&hole.sym->node);
if (!n)
return -1; /* until end of address space */
 
-- 
2.47.0.105.g07ac214952-goog




[PATCH v5 7/7] Add Propeller configuration for kernel build

2024-10-23 Thread Rong Xu
Add the build support for using Clang's Propeller optimizer. Like
AutoFDO, Propeller uses hardware sampling to gather information
about the frequency of execution of different code paths within a
binary. This information is then used to guide the compiler's
optimization decisions, resulting in a more efficient binary.

The support requires a Clang compiler LLVM 19 or later, and the
create_llvm_prof tool
(https://github.com/google/autofdo/releases/tag/v0.30.1). This
commit is limited to x86 platforms that support PMU features
like LBR on Intel machines and AMD Zen3 BRS.

Here is an example workflow for building an AutoFDO+Propeller
optimized kernel:

1) Build the kernel on the host machine, with AutoFDO and Propeller
   build config
  CONFIG_AUTOFDO_CLANG=y
  CONFIG_PROPELLER_CLANG=y
   then
  $ make LLVM=1 CLANG_AUTOFDO_PROFILE=

“” is the profile collected when doing a non-Propeller
AutoFDO build. This step builds a kernel that has the same optimization
level as AutoFDO, plus a metadata section that records basic block
information. This kernel image runs as fast as an AutoFDO optimized
kernel.

2) Install the kernel on test/production machines.

3) Run the load tests. The '-c' option in perf specifies the sample
   event period. We suggest using a suitable prime number,
   like 59, for this purpose.
   For Intel platforms:
  $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c  \
-o  -- 
   For AMD platforms:
  The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
  # To see if Zen3 support LBR:
  $ cat proc/cpuinfo | grep " brs"
  # To see if Zen4 support LBR:
  $ cat proc/cpuinfo | grep amd_lbr_v2
  # If the result is yes, then collect the profile using:
  $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
-N -b -c  -o  -- 

4) (Optional) Download the raw perf file to the host machine.

5) Generate Propeller profile:
   $ create_llvm_prof --binary= --profile= \
 --format=propeller --propeller_output_module_name \
 --out=_cc_profile.txt \
 --propeller_symorder=_ld_profile.txt

   “create_llvm_prof” is the profile conversion tool, and a prebuilt
   binary for linux can be found on
   https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build
   from source).

   "" can be something like
   "/home/user/dir/any_string".

   This command generates a pair of Propeller profiles:
   "_cc_profile.txt" and
   "_ld_profile.txt".

6) Rebuild the kernel using the AutoFDO and Propeller profile files.
  CONFIG_AUTOFDO_CLANG=y
  CONFIG_PROPELLER_CLANG=y
   and
  $ make LLVM=1 CLANG_AUTOFDO_PROFILE= \
CLANG_PROPELLER_PROFILE_PREFIX=

Co-developed-by: Han Shen 
Signed-off-by: Han Shen 
Signed-off-by: Rong Xu 
Suggested-by: Sriraman Tallam 
Suggested-by: Krzysztof Pszeniczny 
Suggested-by: Nick Desaulniers 
Suggested-by: Stephane Eranian 
Tested-by: Yonghong Song 
---
 Documentation/dev-tools/index.rst |   1 +
 Documentation/dev-tools/propeller.rst | 162 ++
 MAINTAINERS   |   7 ++
 Makefile  |   1 +
 arch/Kconfig  |  19 +++
 arch/x86/Kconfig  |   1 +
 arch/x86/kernel/vmlinux.lds.S |   4 +
 include/asm-generic/vmlinux.lds.h |   6 +-
 scripts/Makefile.lib  |  10 ++
 scripts/Makefile.propeller|  28 +
 tools/objtool/check.c |   1 +
 11 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/dev-tools/propeller.rst
 create mode 100644 scripts/Makefile.propeller

diff --git a/Documentation/dev-tools/index.rst 
b/Documentation/dev-tools/index.rst
index 6945644f7008..3c0ac08b2709 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -35,6 +35,7 @@ Documentation/dev-tools/testing-overview.rst
checkuapi
gpio-sloppy-logic-analyzer
autofdo
+   propeller
 
 
 .. only::  subproject and html
diff --git a/Documentation/dev-tools/propeller.rst 
b/Documentation/dev-tools/propeller.rst
new file mode 100644
index ..92195958e3db
--- /dev/null
+++ b/Documentation/dev-tools/propeller.rst
@@ -0,0 +1,162 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+Using Propeller with the Linux kernel
+=
+
+This enables Propeller build support for the kernel when using Clang
+compiler. Propeller is a profile-guided optimization (PGO) method used
+to optimize binary executables. Like AutoFDO, it utilizes hardware
+sampling to gather information about the frequency of execution of
+different code paths within a binary. Unlike AutoFDO, this information
+is then used right before linking phase to optimize (among others)
+block layout within and across functions.
+
+A few important notes about adopting Propeller optimization:
+
+#. Although it can be used as a standalone optimization step, i

[PATCH v5 5/7] AutoFDO: Enable -ffunction-sections for the AutoFDO build

2024-10-23 Thread Rong Xu
Enable -ffunction-sections by default for the AutoFDO build.

With -ffunction-sections, the compiler places each function in its own
section named .text.function_name instead of placing all functions in
the .text section. In the AutoFDO build, this allows the linker to
utilize profile information to reorganize functions for improved
utilization of iCache and iTLB.

Co-developed-by: Han Shen 
Signed-off-by: Han Shen 
Signed-off-by: Rong Xu 
Suggested-by: Sriraman Tallam 
Tested-by: Yonghong Song 
---
 include/asm-generic/vmlinux.lds.h | 11 +--
 scripts/Makefile.autofdo  |  2 +-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index e02973f3b418..bd64fdedabd2 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -95,18 +95,25 @@
  * With LTO_CLANG, the linker also splits sections by default, so we need
  * these macros to combine the sections during the final link.
  *
+ * With AUTOFDO_CLANG, by default, the linker splits text sections and
+ * regroups functions into subsections.
+ *
  * RODATA_MAIN is not used because existing code already defines .rodata.x
  * sections to be brought in with rodata.
  */
-#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) 
|| \
+defined(CONFIG_AUTOFDO_CLANG)
 #define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
+#else
+#define TEXT_MAIN .text
+#endif
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
 #define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral* 
.data.$__unnamed_* .data.$L*
 #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
 #define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L*
 #define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..L* .bss..compoundliteral*
 #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
 #else
-#define TEXT_MAIN .text
 #define DATA_MAIN .data
 #define SDATA_MAIN .sdata
 #define RODATA_MAIN .rodata
diff --git a/scripts/Makefile.autofdo b/scripts/Makefile.autofdo
index ff96a63fea7c..6155d6fc4ca7 100644
--- a/scripts/Makefile.autofdo
+++ b/scripts/Makefile.autofdo
@@ -9,7 +9,7 @@ ifndef CONFIG_DEBUG_INFO
 endif
 
 ifdef CLANG_AUTOFDO_PROFILE
-  CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE)
+  CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE) 
-ffunction-sections
 endif
 
 ifdef CONFIG_LTO_CLANG_THIN
-- 
2.47.0.105.g07ac214952-goog




[PATCH v5 3/7] Change the symbols order when --ffunction-sections is enabled

2024-10-23 Thread Rong Xu
When the -ffunction-sections compiler option is enabled, each function
is placed in a separate section named .text.function_name rather than
putting all functions in a single .text section.

However, using -function-sections can cause problems with the
linker script. The comments included in include/asm-generic/vmlinux.lds.h
note these issues.:
  “TEXT_MAIN here will match .text.fixup and .text.unlikely if dead
   code elimination is enabled, so these sections should be converted
   to use ".." first.”

It is unclear whether there is a straightforward method for converting
a suffix to "..".

This patch modifies the order of subsections within the text output
section. Specifically, it repositions sections with certain fixed patterns
(for example .text.unlikely) before TEXT_MAIN, ensuring that they are
grouped and matched together. It also places .text.hot section at the
beginning of a page to help the TLB performance.

Note that the limitation arises because the linker script employs glob
patterns instead of regular expressions for string matching. While there
is a method to maintain the current order using complex patterns, this
significantly complicates the pattern and increases the likelihood of
errors.

Co-developed-by: Han Shen 
Signed-off-by: Han Shen 
Signed-off-by: Rong Xu 
Suggested-by: Sriraman Tallam 
Suggested-by: Krzysztof Pszeniczny 
Tested-by: Yonghong Song 
---
 include/asm-generic/vmlinux.lds.h | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index eeadbaeccf88..fd901951549c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -553,19 +553,24 @@
  * .text section. Map to function alignment to avoid address changes
  * during second ld run in second ld pass when generating System.map
  *
- * TEXT_MAIN here will match .text.fixup and .text.unlikely if dead
- * code elimination is enabled, so these sections should be converted
- * to use ".." first.
+ * TEXT_MAIN here will match symbols with a fixed pattern (for example,
+ * .text.hot or .text.unlikely) if dead code elimination or
+ * function-section is enabled. Match these symbols first before
+ * TEXT_MAIN to ensure they are grouped together.
+ *
+ * Also placing .text.hot section at the beginning of a page, this
+ * would help the TLB performance.
  */
 #define TEXT_TEXT  \
ALIGN_FUNCTION();   \
+   *(.text.asan.* .text.tsan.*)\
+   *(.text.unknown .text.unknown.*)\
+   *(.text.unlikely .text.unlikely.*)  \
+   . = ALIGN(PAGE_SIZE);   \
*(.text.hot .text.hot.*)\
*(TEXT_MAIN .text.fixup)\
-   *(.text.unlikely .text.unlikely.*)  \
-   *(.text.unknown .text.unknown.*)\
NOINSTR_TEXT\
-   *(.ref.text)\
-   *(.text.asan.* .text.tsan.*)
+   *(.ref.text)
 
 
 /* sched.text is aling to function alignment to secure we have same
-- 
2.47.0.105.g07ac214952-goog




[PATCH v5 6/7] AutoFDO: Enable machine function split optimization for AutoFDO

2024-10-23 Thread Rong Xu
Enable the machine function split optimization for AutoFDO in Clang.

Machine function split (MFS) is a pass in the Clang compiler that
splits a function into hot and cold parts. The linker groups all
cold blocks across functions together. This decreases hot code
fragmentation and improves iCache and iTLB utilization.

MFS requires a profile so this is enabled only for the AutoFDO builds.

Co-developed-by: Han Shen 
Signed-off-by: Han Shen 
Signed-off-by: Rong Xu 
Suggested-by: Sriraman Tallam 
Suggested-by: Krzysztof Pszeniczny 
Tested-by: Yonghong Song 
---
 include/asm-generic/vmlinux.lds.h | 7 ++-
 scripts/Makefile.autofdo  | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index bd64fdedabd2..8a0bb3946cf0 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -556,6 +556,11 @@ defined(CONFIG_AUTOFDO_CLANG)
__cpuidle_text_end = .; \
__noinstr_text_end = .;
 
+#define TEXT_SPLIT \
+   __split_text_start = .; \
+   *(.text.split .text.split.[0-9a-zA-Z_]*)\
+   __split_text_end = .;
+
 #define TEXT_UNLIKELY  \
__unlikely_text_start = .;  \
*(.text.unlikely .text.unlikely.*)  \
@@ -582,6 +587,7 @@ defined(CONFIG_AUTOFDO_CLANG)
ALIGN_FUNCTION();   \
*(.text.asan.* .text.tsan.*)\
*(.text.unknown .text.unknown.*)\
+   TEXT_SPLIT  \
TEXT_UNLIKELY   \
. = ALIGN(PAGE_SIZE);   \
TEXT_HOT\
@@ -589,7 +595,6 @@ defined(CONFIG_AUTOFDO_CLANG)
NOINSTR_TEXT\
*(.ref.text)
 
-
 /* sched.text is aling to function alignment to secure we have same
  * address even at second ld pass when generating System.map */
 #define SCHED_TEXT \
diff --git a/scripts/Makefile.autofdo b/scripts/Makefile.autofdo
index 6155d6fc4ca7..1caf2457e585 100644
--- a/scripts/Makefile.autofdo
+++ b/scripts/Makefile.autofdo
@@ -10,6 +10,7 @@ endif
 
 ifdef CLANG_AUTOFDO_PROFILE
   CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE) 
-ffunction-sections
+  CFLAGS_AUTOFDO_CLANG += -fsplit-machine-functions
 endif
 
 ifdef CONFIG_LTO_CLANG_THIN
@@ -17,6 +18,7 @@ ifdef CONFIG_LTO_CLANG_THIN
 KBUILD_LDFLAGS += --lto-sample-profile=$(CLANG_AUTOFDO_PROFILE)
   endif
   KBUILD_LDFLAGS += --mllvm=-enable-fs-discriminator=true 
--mllvm=-improved-fs-discriminator=true -plugin-opt=thinlto
+  KBUILD_LDFLAGS += -plugin-opt=-split-machine-functions
 endif
 
 export CFLAGS_AUTOFDO_CLANG
-- 
2.47.0.105.g07ac214952-goog




[PATCH v5 4/7] Add markers for text_unlikely and text_hot sections

2024-10-23 Thread Rong Xu
Add markers like __hot_text_start, __hot_text_end, __unlikely_text_start,
and __unlikely_text_end which will be included in System.map. These markers
indicate how the compiler groups functions, providing valuable information
to developers about the layout and optimization of the code.

Co-developed-by: Han Shen 
Signed-off-by: Han Shen 
Signed-off-by: Rong Xu 
Suggested-by: Sriraman Tallam 
Tested-by: Yonghong Song 
---
 include/asm-generic/vmlinux.lds.h | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index fd901951549c..e02973f3b418 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -549,6 +549,16 @@
__cpuidle_text_end = .; \
__noinstr_text_end = .;
 
+#define TEXT_UNLIKELY  \
+   __unlikely_text_start = .;  \
+   *(.text.unlikely .text.unlikely.*)  \
+   __unlikely_text_end = .;
+
+#define TEXT_HOT   \
+   __hot_text_start = .;   \
+   *(.text.hot .text.hot.*)\
+   __hot_text_end = .;
+
 /*
  * .text section. Map to function alignment to avoid address changes
  * during second ld run in second ld pass when generating System.map
@@ -565,9 +575,9 @@
ALIGN_FUNCTION();   \
*(.text.asan.* .text.tsan.*)\
*(.text.unknown .text.unknown.*)\
-   *(.text.unlikely .text.unlikely.*)  \
+   TEXT_UNLIKELY   \
. = ALIGN(PAGE_SIZE);   \
-   *(.text.hot .text.hot.*)\
+   TEXT_HOT\
*(TEXT_MAIN .text.fixup)\
NOINSTR_TEXT\
*(.ref.text)
-- 
2.47.0.105.g07ac214952-goog




Re: [PATCH 0/1] remoteproc documentation changes

2024-10-23 Thread Mathieu Poirier
On Wed, 23 Oct 2024 at 07:53, Jonathan Corbet  wrote:
>
> anish kumar  writes:
>
> > This patch series transitions the documentation
> > for remoteproc from the staging directory to the
> > mainline kernel. It introduces both kernel and
> > user-space APIs, enhancing the overall documentation
> > quality.
> >
> > V4:
> > Fixed compilation errors and moved documentation to
> > driver-api directory.
> >
> > V3:
> > Seperated out the patches further to make the intention
> > clear for each patch.
> >
> > V2:
> > Reported-by: kernel test robot 
> > Closes: 
> > https://lore.kernel.org/oe-kbuild-all/202410161444.jokmsogs-...@intel.com/
>
> So I think you could make better use of kerneldoc comments for a number
> of your APIs and structures - a project for the future.  I can't judge
> the remoteproc aspects of this, but from a documentation mechanics point
> of view, this looks about ready to me.  In the absence of objections
> I'll apply it in the near future.
>

Please hold off before applying, I will review the content in the coming days.

Thanks,
Mathieu



Re: [PATCH v4 5/6] AutoFDO: Enable machine function split optimization for AutoFDO

2024-10-23 Thread Rong Xu
On Tue, Oct 22, 2024 at 11:50 PM Masahiro Yamada  wrote:
>
> On Tue, Oct 22, 2024 at 8:28 AM Rong Xu  wrote:
> >
> > On Sun, Oct 20, 2024 at 8:18 PM Masahiro Yamada  
> > wrote:
> > >
> > > On Tue, Oct 15, 2024 at 6:33 AM Rong Xu  wrote:
> > > >
> > > > Enable the machine function split optimization for AutoFDO in Clang.
> > > >
> > > > Machine function split (MFS) is a pass in the Clang compiler that
> > > > splits a function into hot and cold parts. The linker groups all
> > > > cold blocks across functions together. This decreases hot code
> > > > fragmentation and improves iCache and iTLB utilization.
> > > >
> > > > MFS requires a profile so this is enabled only for the AutoFDO builds.
> > > >
> > > > Co-developed-by: Han Shen 
> > > > Signed-off-by: Han Shen 
> > > > Signed-off-by: Rong Xu 
> > > > Suggested-by: Sriraman Tallam 
> > > > Suggested-by: Krzysztof Pszeniczny 
> > > > ---
> > > >  include/asm-generic/vmlinux.lds.h | 6 ++
> > > >  scripts/Makefile.autofdo  | 2 ++
> > > >  2 files changed, 8 insertions(+)
> > > >
> > > > diff --git a/include/asm-generic/vmlinux.lds.h 
> > > > b/include/asm-generic/vmlinux.lds.h
> > > > index ace617d1af9b..20e46c0917db 100644
> > > > --- a/include/asm-generic/vmlinux.lds.h
> > > > +++ b/include/asm-generic/vmlinux.lds.h
> > > > @@ -565,9 +565,14 @@ defined(CONFIG_AUTOFDO_CLANG)
> > > > __unlikely_text_start = .;  
> > > > \
> > > > *(.text.unlikely .text.unlikely.*)  
> > > > \
> > > > __unlikely_text_end = .;
> > > > +#define TEXT_SPLIT 
> > > > \
> > > > +   __split_text_start = .; 
> > > > \
> > > > +   *(.text.split .text.split.[0-9a-zA-Z_]*)
> > > > \
> > > > +   __split_text_end = .;
> > > >  #else
> > > >  #define TEXT_HOT *(.text.hot .text.hot.*)
> > > >  #define TEXT_UNLIKELY *(.text.unlikely .text.unlikely.*)
> > > > +#define TEXT_SPLIT
> > > >  #endif
> > >
> > >
> > > Why conditional?
> >
> > The condition is to ensure that we don't change the default kernel
> > build by any means.
> > The new code will introduce a few new symbols.
>
>
> Same.
>
> Adding two __split_text_start and __split_text_end markers
> do not affect anything. It just increases the kallsyms table slightly.
>
> You can do it unconditionally.

Got it.

>
>
>
> >
> > >
> > >
> > > Where are __unlikely_text_start and __unlikely_text_end used?
> >
> > These new symbols are currently unreferenced within the kernel source tree.
> > However, they provide a valuable means of identifying hot and cold
> > sections of text,
> > and how large they are. I think they are useful information.
>
>
> Should be explained in the commit description.

Will explain the commit message.

>
>
>
> --
> Best Regards
> Masahiro Yamada



[PATCH v4 6/6] alloc_tag: support for page allocation tag compression

2024-10-23 Thread Suren Baghdasaryan
Implement support for storing page allocation tag references directly
in the page flags instead of page extensions. sysctl.vm.mem_profiling
boot parameter it extended to provide a way for a user to request this
mode. Enabling compression eliminates memory overhead caused by page_ext
and results in better performance for page allocations. However this
mode will not work if the number of available page flag bits is
insufficient to address all kernel allocations. Such condition can
happen during boot or when loading a module. If this condition is
detected, memory allocation profiling gets disabled with an appropriate
warning. By default compression mode is disabled.

Signed-off-by: Suren Baghdasaryan 
---
 Documentation/mm/allocation-profiling.rst |   7 +-
 include/linux/alloc_tag.h |  10 +-
 include/linux/codetag.h   |   3 +
 include/linux/page-flags-layout.h |   7 ++
 include/linux/pgalloc_tag.h   | 145 +++---
 lib/alloc_tag.c   | 142 +++--
 lib/codetag.c |   4 +-
 mm/mm_init.c  |   5 +-
 8 files changed, 290 insertions(+), 33 deletions(-)

diff --git a/Documentation/mm/allocation-profiling.rst 
b/Documentation/mm/allocation-profiling.rst
index ffd6655b7be2..316311240e6a 100644
--- a/Documentation/mm/allocation-profiling.rst
+++ b/Documentation/mm/allocation-profiling.rst
@@ -18,12 +18,17 @@ kconfig options:
   missing annotation
 
 Boot parameter:
-  sysctl.vm.mem_profiling=0|1|never
+  sysctl.vm.mem_profiling={0|1|never}[,compressed]
 
   When set to "never", memory allocation profiling overhead is minimized and it
   cannot be enabled at runtime (sysctl becomes read-only).
   When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y, default value is "1".
   When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n, default value is 
"never".
+  "compressed" optional parameter will try to store page tag references in a
+  compact format, avoiding page extensions. This results in improved 
performance
+  and memory consumption, however it might fail depending on system 
configuration.
+  If compression fails, a warning is issued and memory allocation profiling 
gets
+  disabled.
 
 sysctl:
   /proc/sys/vm/mem_profiling
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 7431757999c5..4f811ec0ffe0 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,8 +30,16 @@ struct alloc_tag {
struct alloc_tag_counters __percpu  *counters;
 } __aligned(8);
 
+struct alloc_tag_kernel_section {
+   struct alloc_tag *first_tag;
+   unsigned long count;
+};
+
 struct alloc_tag_module_section {
-   unsigned long start_addr;
+   union {
+   unsigned long start_addr;
+   struct alloc_tag *first_tag;
+   };
unsigned long end_addr;
/* used size */
unsigned long size;
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index d10bd9810d32..d14dbd26b370 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -13,6 +13,9 @@ struct codetag_module;
 struct seq_buf;
 struct module;
 
+#define CODETAG_SECTION_START_PREFIX   "__start_"
+#define CODETAG_SECTION_STOP_PREFIX"__stop_"
+
 /*
  * An instance of this structure is created in a special ELF section at every
  * code location being tagged.  At runtime, the special section is treated as
diff --git a/include/linux/page-flags-layout.h 
b/include/linux/page-flags-layout.h
index 7d79818dc065..4f5c9e979bb9 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -111,5 +111,12 @@
ZONES_WIDTH - LRU_GEN_WIDTH - SECTIONS_WIDTH - \
NODES_WIDTH - KASAN_TAG_WIDTH - LAST_CPUPID_WIDTH)
 
+#define NR_NON_PAGEFLAG_BITS   (SECTIONS_WIDTH + NODES_WIDTH + ZONES_WIDTH + \
+   LAST_CPUPID_SHIFT + KASAN_TAG_WIDTH + \
+   LRU_GEN_WIDTH + LRU_REFS_WIDTH)
+
+#define NR_UNUSED_PAGEFLAG_BITS(BITS_PER_LONG - \
+   (NR_NON_PAGEFLAG_BITS + NR_PAGEFLAGS))
+
 #endif
 #endif /* _LINUX_PAGE_FLAGS_LAYOUT */
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index b13cd3313a88..1fe63b52e5e5 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -11,29 +11,118 @@
 
 #include 
 
+extern struct page_ext_operations page_alloc_tagging_ops;
+extern unsigned long alloc_tag_ref_mask;
+extern int alloc_tag_ref_offs;
+extern struct alloc_tag_kernel_section kernel_tags;
+
+DECLARE_STATIC_KEY_FALSE(mem_profiling_compressed);
+
+typedef u16pgalloc_tag_idx;
+
 union pgtag_ref_handle {
union codetag_ref *ref; /* reference in page extension */
+   struct page *page;  /* reference in page flags */
 };
 
-extern struct page_ext_operations page_alloc_tagging_ops;
+/* Reserved indexes */
+#defi

Re: [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references

2024-10-23 Thread Andrew Morton
On Wed, 23 Oct 2024 10:07:58 -0700 Suren Baghdasaryan  wrote:

> To simplify later changes to page tag references, introduce new
> pgtag_ref_handle type. This allows easy replacement of page_ext
> as a storage of page allocation tags.
> 
> ...
>
>  static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
>  {
> + union pgtag_ref_handle handle;
> + union codetag_ref ref;
>   struct alloc_tag *tag;
> - union codetag_ref *ref;
>  
>   tag = pgalloc_tag_get(&old->page);
>   if (!tag)
>   return;
>  
> - ref = get_page_tag_ref(&new->page);
> - if (!ref)
> + if (!get_page_tag_ref(&new->page, &ref, &handle))
>   return;
>  
>   /* Clear the old ref to the original allocation tag. */
>   clear_page_tag_ref(&old->page);
>   /* Decrement the counters of the tag on get_new_folio. */
> - alloc_tag_sub(ref, folio_nr_pages(new));
> -
> - __alloc_tag_ref_set(ref, tag);
> -
> - put_page_tag_ref(ref);
> + alloc_tag_sub(&ref, folio_nr_pages(new));

mm-stable has folio_size(new) here, fixed up.

I think we aleady discussed this, but there's a crazy amount of
inlining here.  pgalloc_tag_split() is huge, and has four callsites.



Re: [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references

2024-10-23 Thread Suren Baghdasaryan
On Wed, Oct 23, 2024 at 2:00 PM Andrew Morton  wrote:
>
> On Wed, 23 Oct 2024 10:07:58 -0700 Suren Baghdasaryan  
> wrote:
>
> > To simplify later changes to page tag references, introduce new
> > pgtag_ref_handle type. This allows easy replacement of page_ext
> > as a storage of page allocation tags.
> >
> > ...
> >
> >  static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
> >  {
> > + union pgtag_ref_handle handle;
> > + union codetag_ref ref;
> >   struct alloc_tag *tag;
> > - union codetag_ref *ref;
> >
> >   tag = pgalloc_tag_get(&old->page);
> >   if (!tag)
> >   return;
> >
> > - ref = get_page_tag_ref(&new->page);
> > - if (!ref)
> > + if (!get_page_tag_ref(&new->page, &ref, &handle))
> >   return;
> >
> >   /* Clear the old ref to the original allocation tag. */
> >   clear_page_tag_ref(&old->page);
> >   /* Decrement the counters of the tag on get_new_folio. */
> > - alloc_tag_sub(ref, folio_nr_pages(new));
> > -
> > - __alloc_tag_ref_set(ref, tag);
> > -
> > - put_page_tag_ref(ref);
> > + alloc_tag_sub(&ref, folio_nr_pages(new));
>
> mm-stable has folio_size(new) here, fixed up.

Oh, right. You merged that patch tonight and I formatted my patchset
yesterday :)
Thanks for the fixup.

>
> I think we aleady discussed this, but there's a crazy amount of
> inlining here.  pgalloc_tag_split() is huge, and has four callsites.

I must have missed that discussion but I am happy to unline this
function. I think splitting is heavy enough operation that this
uninlining would not have be noticeable.
Thanks!