date:20250124

Re: [PATCH v3 4/6] kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU

2025-01-24 Thread Athira Rajeev




> On 23 Jan 2025, at 5:37 PM, Vaibhav Jain  wrote:
> 
> Introduce a new PMU named 'kvm-hv' to report Book3s kvm-hv specific
> performance counters. This will expose KVM-HV specific performance
> attributes to user-space via kernel's PMU infrastructure and would enable
> users to monitor active kvm-hv based guests.
> 
> The patch creates necessary scaffolding to for the new PMU callbacks and
> introduces two new exports kvmppc_{,un}register_pmu() that are called from
> kvm-hv init and exit function to perform initialize and cleanup for the
> 'kvm-hv' PMU. The patch doesn't introduce any perf-events yet, which will
> be introduced in later patches
> 
> Signed-off-by: Vaibhav Jain 
> 
> ---
> Changelog
> 
> v2->v3:
> * Fixed a build warning reported by kernel build robot.
> Link:
> https://lore.kernel.org/oe-kbuild-all/202501171030.3x0gqw8g-...@intel.com
> 
> v1->v2:
> * Fixed an issue of kvm-hv not loading on baremetal kvm [Gautam]
> ---
> arch/powerpc/include/asm/kvm_book3s.h |  20 
> arch/powerpc/kvm/Makefile |   6 ++
> arch/powerpc/kvm/book3s_hv.c  |   9 ++
> arch/powerpc/kvm/book3s_hv_pmu.c  | 133 ++
> 4 files changed, 168 insertions(+)
> create mode 100644 arch/powerpc/kvm/book3s_hv_pmu.c
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index e1ff291ba891..7a7854c65ebb 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -348,6 +348,26 @@ static inline bool kvmhv_is_nestedv1(void)
> 
> #endif
> 
> +/* kvm-ppc pmu registration */
> +#if IS_ENABLED(CONFIG_KVM_BOOK3S_64_HV)
> +#ifdef CONFIG_PERF_EVENTS
> +int kvmppc_register_pmu(void);
> +void kvmppc_unregister_pmu(void);
> +
> +#else
> +
> +static inline int kvmppc_register_pmu(void)
> +{
> + return 0;
> +}
> +
> +static inline void kvmppc_unregister_pmu(void)
> +{
> + /* do nothing */
> +}
> +#endif /* CONFIG_PERF_EVENTS */
> +#endif /* CONFIG_KVM_BOOK3S_64_HV */
> +
> int __kvmhv_nestedv2_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs 
> *regs);
> int __kvmhv_nestedv2_mark_dirty_ptregs(struct kvm_vcpu *vcpu, struct pt_regs 
> *regs);
> int __kvmhv_nestedv2_mark_dirty(struct kvm_vcpu *vcpu, u16 iden);
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 4bd9d1230869..7645307ff277 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -92,6 +92,12 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) 
> += \
> $(kvm-book3s_64-builtin-tm-objs-y) \
> $(kvm-book3s_64-builtin-xics-objs-y)
> 
> +# enable kvm_hv perf events
> +ifdef CONFIG_PERF_EVENTS
> +kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
> + book3s_hv_pmu.o
> +endif
> +
> obj-$(CONFIG_GUEST_STATE_BUFFER_TEST) += test-guest-state-buffer.o
> endif
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 25429905ae90..6365b8126574 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -6662,6 +6662,14 @@ static int kvmppc_book3s_init_hv(void)
> return r;
> }
> 
> + r = kvmppc_register_pmu();
> + if (r == -EOPNOTSUPP) {
> + pr_info("KVM-HV: PMU not supported %d\n", r);
> + } else if (r) {
> + pr_err("KVM-HV: Unable to register PMUs %d\n", r);
> + goto err;
> + }
> +
> kvm_ops_hv.owner = THIS_MODULE;
> kvmppc_hv_ops = &kvm_ops_hv;
> 
> @@ -6676,6 +6684,7 @@ static int kvmppc_book3s_init_hv(void)
> 
> static void kvmppc_book3s_exit_hv(void)
> {
> + kvmppc_unregister_pmu();
> kvmppc_uvmem_free();
> kvmppc_free_host_rm_ops();
> if (kvmppc_radix_possible())
> diff --git a/arch/powerpc/kvm/book3s_hv_pmu.c 
> b/arch/powerpc/kvm/book3s_hv_pmu.c
> new file mode 100644
> index ..8c6ed30b7654
> --- /dev/null
> +++ b/arch/powerpc/kvm/book3s_hv_pmu.c
> @@ -0,0 +1,133 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Description: PMUs specific to running nested KVM-HV guests
> + * on Book3S processors (specifically POWER9 and later).
> + */
> +
> +#define pr_fmt(fmt)  "kvmppc-pmu: " fmt

Hi Vaibhav

All PMU specific code is under “arch/powerpc/perf in the kernel source. Here 
since we are introducing a kvm-hv specific PMU, can we please have it in 
arch/powerpc/perf ?

Thanks
Athira
> +
> +#include "asm-generic/local64.h"
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +enum kvmppc_pmu_eventid {
> + KVMPPC_EVENT_MAX,
> +};
> +
> +static struct attribute *kvmppc_pmu_events_attr[] = {
> + NULL,
> +};
> +
> +static const struct attribute_group kvmppc_pmu_events_group = {
> + .name = "events",
> + .attrs = kvmppc_pmu_events_attr,
> +};
> +
> +PMU_FORMAT_ATTR(event, "config:0");
> +static struct attribute *kvmppc_pmu_format_attr[] = {
> + &format_attr_event.attr,
> + NULL,
> +};
> +
> +static struct attribute_group kvmppc

Re: [PATCH 00/15] cpufreq: simplify boost handling

2025-01-24 Thread Rafael J. Wysocki

On Fri, Jan 24, 2025 at 9:58 AM Viresh Kumar  wrote:
>
> Hello,
>
> The boost feature can be controlled at two levels currently, driver
> level (applies to all policies) and per-policy.
>
> Currently most of the drivers enables driver level boost support from the
> per-policy ->init() callback, which isn't really efficient as that gets called
> for each policy and then there is online/offline path too where this gets done
> unnecessarily.
>
> Also it is possible to have a scenario where not all cpufreq policies support
> boost frequencies. And letting sysfs (or other parts of the kernel) enable 
> boost
> feature for that policy isn't correct.
>
> Simplify and cleanup handling of boost to solve these issues.

I guess this depends on the previous series?

> Pushed here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/boost
>
> Rebased over few dependencies from PM tree, will push to the arm-cpufreq tree
> after merge window is closed.
>
> Viresh Kumar (15):
>   cpufreq: staticize cpufreq_boost_trigger_state()
>   cpufreq: Export cpufreq_boost_set_sw()
>   cpufreq: Introduce policy->boost_supported flag
>   cpufreq: acpi: Set policy->boost_supported
>   cpufreq: amd: Set policy->boost_supported
>   cpufreq: cppc: Set policy->boost_supported
>   cpufreq: Restrict enabling boost on policies with no boost frequencies
>   cpufreq: apple: Set .set_boost directly
>   cpufreq: loongson: Set .set_boost directly
>   cpufreq: powernv: Set .set_boost directly
>   cpufreq: scmi: Set .set_boost directly
>   cpufreq: dt: Set .set_boost directly
>   cpufreq: qcom: Set .set_boost directly
>   cpufreq: staticize policy_has_boost_freq()
>   cpufreq: Remove cpufreq_enable_boost_support()
>
>  drivers/cpufreq/acpi-cpufreq.c  |  3 +++
>  drivers/cpufreq/amd-pstate.c|  4 ++--
>  drivers/cpufreq/apple-soc-cpufreq.c | 10 +-
>  drivers/cpufreq/cppc_cpufreq.c  |  9 +
>  drivers/cpufreq/cpufreq-dt.c| 14 +-
>  drivers/cpufreq/cpufreq.c   | 30 -
>  drivers/cpufreq/freq_table.c|  7 +--
>  drivers/cpufreq/loongson3_cpufreq.c | 10 +-
>  drivers/cpufreq/powernv-cpufreq.c   |  5 +
>  drivers/cpufreq/qcom-cpufreq-hw.c   |  7 +--
>  drivers/cpufreq/scmi-cpufreq.c  | 11 +--
>  include/linux/cpufreq.h | 20 ++-
>  12 files changed, 35 insertions(+), 95 deletions(-)
>
> --
> 2.31.1.272.g89b43f80a514
>

[PATCH 00/15] cpufreq: simplify boost handling

2025-01-24 Thread Viresh Kumar

Hello,

The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.

Currently most of the drivers enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets called
for each policy and then there is online/offline path too where this gets done
unnecessarily.

Also it is possible to have a scenario where not all cpufreq policies support
boost frequencies. And letting sysfs (or other parts of the kernel) enable boost
feature for that policy isn't correct.

Simplify and cleanup handling of boost to solve these issues.

Pushed here:

git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/boost

Rebased over few dependencies from PM tree, will push to the arm-cpufreq tree
after merge window is closed.

Viresh Kumar (15):
  cpufreq: staticize cpufreq_boost_trigger_state()
  cpufreq: Export cpufreq_boost_set_sw()
  cpufreq: Introduce policy->boost_supported flag
  cpufreq: acpi: Set policy->boost_supported
  cpufreq: amd: Set policy->boost_supported
  cpufreq: cppc: Set policy->boost_supported
  cpufreq: Restrict enabling boost on policies with no boost frequencies
  cpufreq: apple: Set .set_boost directly
  cpufreq: loongson: Set .set_boost directly
  cpufreq: powernv: Set .set_boost directly
  cpufreq: scmi: Set .set_boost directly
  cpufreq: dt: Set .set_boost directly
  cpufreq: qcom: Set .set_boost directly
  cpufreq: staticize policy_has_boost_freq()
  cpufreq: Remove cpufreq_enable_boost_support()

 drivers/cpufreq/acpi-cpufreq.c  |  3 +++
 drivers/cpufreq/amd-pstate.c|  4 ++--
 drivers/cpufreq/apple-soc-cpufreq.c | 10 +-
 drivers/cpufreq/cppc_cpufreq.c  |  9 +
 drivers/cpufreq/cpufreq-dt.c| 14 +-
 drivers/cpufreq/cpufreq.c   | 30 -
 drivers/cpufreq/freq_table.c|  7 +--
 drivers/cpufreq/loongson3_cpufreq.c | 10 +-
 drivers/cpufreq/powernv-cpufreq.c   |  5 +
 drivers/cpufreq/qcom-cpufreq-hw.c   |  7 +--
 drivers/cpufreq/scmi-cpufreq.c  | 11 +--
 include/linux/cpufreq.h | 20 ++-
 12 files changed, 35 insertions(+), 95 deletions(-)

-- 
2.31.1.272.g89b43f80a514

[PATCH 10/15] cpufreq: powernv: Set .set_boost directly

2025-01-24 Thread Viresh Kumar

The boost feature can be controlled at two levels currently, driver
level (applies to all policies) and per-policy.

Currently the driver enables driver level boost support from the
per-policy ->init() callback, which isn't really efficient as that gets
called for each policy and then there is online/offline path too where
this gets done unnecessarily.

Instead set the .set_boost field directly and always enable the boost
support. If a policy doesn't support boost feature, the core will not
enable it for that policy.

Keep the initial state of driver level boost to disabled and let the
user enable it if required as ideally the boost frequencies must be used
only when really required.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/powernv-cpufreq.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 0c3e907c58bc..4d3e891ff508 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -1125,7 +1125,7 @@ static int __init powernv_cpufreq_init(void)
goto out;
 
if (powernv_pstate_info.wof_enabled)
-   powernv_cpufreq_driver.boost_enabled = true;
+   powernv_cpufreq_driver.set_boost = cpufreq_boost_set_sw;
else
powernv_cpu_freq_attr[SCALING_BOOST_FREQS_ATTR_INDEX] = NULL;
 
@@ -1135,9 +1135,6 @@ static int __init powernv_cpufreq_init(void)
goto cleanup;
}
 
-   if (powernv_pstate_info.wof_enabled)
-   cpufreq_enable_boost_support();
-
register_reboot_notifier(&powernv_cpufreq_reboot_nb);
opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
 
-- 
2.31.1.272.g89b43f80a514

Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-24 Thread Alexey Gladkov

On Fri, Jan 24, 2025 at 01:43:22AM +0200, Dmitry V. Levin wrote:
> On Thu, Jan 23, 2025 at 08:28:15PM +0200, Dmitry V. Levin wrote:
> > On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> > > Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> > > > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> > > >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > > >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> > > >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > >>>
> > > >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > > >>> syscall_set_return_value()").
> > > >>
> > > >> There is a clear detailed explanation in that commit of why it needs to
> > > >> be done.
> > > >>
> > > >> If you think that commit is wrong you have to explain why with at least
> > > >> the same level of details.
> > > > 
> > > > OK, please have a look whether this explanation is clear and detailed 
> > > > enough:
> > > > 
> > > > ===
> > > > powerpc: properly negate error in syscall_set_return_value()
> > > > 
> > > > When syscall_set_return_value() is used to set an error code, the caller
> > > > specifies it as a negative value in -ERRORCODE form.
> > > > 
> > > > In !trap_is_scv case the error code is traditionally stored as follows:
> > > > gpr[3] contains a positive ERRORCODE, and ccr has 0x1000 flag set.
> > > > Here are a few examples to illustrate this convention.  The first one
> > > > is from syscall_get_error():
> > > >  /*
> > > >   * If the system call failed,
> > > >   * regs->gpr[3] contains a positive ERRORCODE.
> > > >   */
> > > >  return (regs->ccr & 0x1000UL) ? -regs->gpr[3] : 0;
> > > > 
> > > > The second example is from regs_return_value():
> > > >  if (is_syscall_success(regs))
> > > >  return regs->gpr[3];
> > > >  else
> > > >  return -regs->gpr[3];
> > > > 
> > > > The third example is from check_syscall_restart():
> > > >  regs->result = -EINTR;
> > > >  regs->gpr[3] = EINTR;
> > > >  regs->ccr |= 0x1000;
> > > > 
> > > > Compared with these examples, the failure of syscall_set_return_value()
> > > > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> > > > /*
> > > >  * In the general case it's not obvious that we must deal with
> > > >  * CCR here, as the syscall exit path will also do that for us.
> > > >  * However there are some places, eg. the signal code, which
> > > >  * check ccr to decide if the value in r3 is actually an error.
> > > >  */
> > > > if (error) {
> > > > regs->ccr |= 0x1000L;
> > > > regs->gpr[3] = error;
> > > > } else {
> > > > regs->ccr &= ~0x1000L;
> > > > regs->gpr[3] = val;
> > > > }
> > > > 
> > > > This fix brings syscall_set_return_value() in sync with 
> > > > syscall_get_error()
> > > > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > > 
> > > > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in 
> > > > syscall_set_return_value()").
> > > > ===
> > > 
> > > I think there is still something going wrong.
> > > 
> > > do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
> > > 
> > > Then it calls __secure_computing() which returns what __seccomp_filter() 
> > > returns.
> > > 
> > > In case of error, __seccomp_filter() calls syscall_set_return_value() 
> > > with a negative value then returns -1
> > > 
> > > do_seccomp() is called by do_syscall_trace_enter() which returns -1 when 
> > > do_seccomp() doesn't return 0.
> > > 
> > > do_syscall_trace_enter() is called by system_call_exception() and 
> > > returns -1, so syscall_exception() returns regs->gpr[3]
> > > 
> > > In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then 
> > > called with the return of syscall_exception() as first parameter, which 
> > > leads to:
> > > 
> > >   if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
> > >   if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL {
> > >   r3 = -r3;
> > >   regs->ccr |= 0x1000; /* Set SO bit in CR */
> > >   }
> > >   }
> > > 
> > > By chance, because you have already changed the sign of gpr[3], the 
> > > above test fails and nothing is done to r3, and because you have also 
> > > already set regs->ccr it works.
> > > 
> > > But all this looks inconsistent with the fact that do_seccomp sets 
> > > -ENOSYS as default value
> > > 
> > > Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the 
> > > syscall number and when it is wrong it goes to skip: which sets 
> > > regs->gpr[3] = -ENOSYS;
> > > 
> > > So really I think it is not in line with your changes to set positive 
> > > value in gpr[3].
> > > 
> > > Maybe your change is still correct

Re: [PATCH v2] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-01-24 Thread Christophe Leroy





Le 24/01/2025 à 11:32, Sourabh Jain a écrit :

The fadump kernel boots with limited memory solely to collect the kernel
core dump. Having gigantic hugepages in the fadump kernel is of no use.
Many times, the fadump kernel encounters OOM (Out of Memory) issues if
gigantic hugepages are allocated.

To address this, disable gigantic hugepages if fadump is active by
returning early from arch_hugetlb_valid_size() using
hugepages_supported(). hugepages_supported() returns false if fadump is
active.

Returning early from arch_hugetlb_valid_size() not only disables
gigantic hugepages but also avoids unnecessary hstate initialization for
every hugepage size supported by the platform.

kernel logs related to hugepages with this patch included:
kernel argument passed: hugepagesz=1G hugepages=1

First kernel: gigantic hugepage got allocated
==

dmesg | grep -i "hugetlb"
-
HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

$ cat /proc/meminfo | grep -i "hugetlb"
-
Hugetlb: 1048576 kB

Fadump kernel: gigantic hugepage not allocated
===

dmesg | grep -i "hugetlb"
-
[0.00] HugeTLB: unsupported hugepagesz=1G
[0.00] HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
[0.706375] HugeTLB support is disabled!
[0.773530] hugetlbfs: disabling because there are no supported hugepage 
sizes

$ cat /proc/meminfo | grep -i "hugetlb"
--


Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Ritesh Harjani (IBM)" 
Signed-off-by: Sourabh Jain 
---

Changelog:

v1:
https://lore.kernel.org/all/20250121150419.1342794-1-sourabhj...@linux.ibm.com/

v2:
  - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()

---
  arch/powerpc/mm/hugetlbpage.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 6b043180220a..087a8df32416 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -135,8 +135,12 @@ int __init alloc_bootmem_huge_page(struct hstate *h, int 
nid)
  
  bool __init arch_hugetlb_valid_size(unsigned long size)

  {
-   int shift = __ffs(size);
-   int mmu_psize;
+   int shift, mmu_psize;
+
+   if (!hugepages_supported())
+   return false;
+
+   shift = __ffs(size);


Why change the declaration/init of shift ?

It should be enough to leave things as they are and just add

if (!hugepages_supported())
return false;


  
  	/* Check that it is a page size supported by the hardware and

 * that it fits within pagetable and slice limits. */

Re: [PATCH v3 0/6] kvm powerpc/book3s-hv: Expose Hostwide counters as perf-events

2025-01-24 Thread Gautam Menghani



I tested this series on both lpar and bare metal, LGTM

For the series:
Tested-by: Gautam Menghani

Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-24 Thread Dmitry V. Levin

On Fri, Jan 24, 2025 at 04:18:10PM +0100, Alexey Gladkov wrote:
> On Fri, Jan 24, 2025 at 01:43:22AM +0200, Dmitry V. Levin wrote:
> > On Thu, Jan 23, 2025 at 08:28:15PM +0200, Dmitry V. Levin wrote:
> > > On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> > > > Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> > > > > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> > > > >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > > > >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> > > > >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > > >>>
> > > > >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > > > >>> syscall_set_return_value()").
> > > > >>
> > > > >> There is a clear detailed explanation in that commit of why it needs 
> > > > >> to
> > > > >> be done.
> > > > >>
> > > > >> If you think that commit is wrong you have to explain why with at 
> > > > >> least
> > > > >> the same level of details.
> > > > > 
> > > > > OK, please have a look whether this explanation is clear and detailed 
> > > > > enough:
> > > > > 
> > > > > ===
> > > > > powerpc: properly negate error in syscall_set_return_value()
> > > > > 
> > > > > When syscall_set_return_value() is used to set an error code, the 
> > > > > caller
> > > > > specifies it as a negative value in -ERRORCODE form.
> > > > > 
> > > > > In !trap_is_scv case the error code is traditionally stored as 
> > > > > follows:
> > > > > gpr[3] contains a positive ERRORCODE, and ccr has 0x1000 flag set.
> > > > > Here are a few examples to illustrate this convention.  The first one
> > > > > is from syscall_get_error():
> > > > >  /*
> > > > >   * If the system call failed,
> > > > >   * regs->gpr[3] contains a positive ERRORCODE.
> > > > >   */
> > > > >  return (regs->ccr & 0x1000UL) ? -regs->gpr[3] : 0;
> > > > > 
> > > > > The second example is from regs_return_value():
> > > > >  if (is_syscall_success(regs))
> > > > >  return regs->gpr[3];
> > > > >  else
> > > > >  return -regs->gpr[3];
> > > > > 
> > > > > The third example is from check_syscall_restart():
> > > > >  regs->result = -EINTR;
> > > > >  regs->gpr[3] = EINTR;
> > > > >  regs->ccr |= 0x1000;
> > > > > 
> > > > > Compared with these examples, the failure of 
> > > > > syscall_set_return_value()
> > > > > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> > > > >   /*
> > > > >* In the general case it's not obvious that we must deal with
> > > > >* CCR here, as the syscall exit path will also do that for us.
> > > > >* However there are some places, eg. the signal code, which
> > > > >* check ccr to decide if the value in r3 is actually an error.
> > > > >*/
> > > > >   if (error) {
> > > > >   regs->ccr |= 0x1000L;
> > > > >   regs->gpr[3] = error;
> > > > >   } else {
> > > > >   regs->ccr &= ~0x1000L;
> > > > >   regs->gpr[3] = val;
> > > > >   }
> > > > > 
> > > > > This fix brings syscall_set_return_value() in sync with 
> > > > > syscall_get_error()
> > > > > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > > > 
> > > > > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in 
> > > > > syscall_set_return_value()").
> > > > > ===
> > > > 
> > > > I think there is still something going wrong.
> > > > 
> > > > do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
> > > > 
> > > > Then it calls __secure_computing() which returns what 
> > > > __seccomp_filter() 
> > > > returns.
> > > > 
> > > > In case of error, __seccomp_filter() calls syscall_set_return_value() 
> > > > with a negative value then returns -1
> > > > 
> > > > do_seccomp() is called by do_syscall_trace_enter() which returns -1 
> > > > when 
> > > > do_seccomp() doesn't return 0.
> > > > 
> > > > do_syscall_trace_enter() is called by system_call_exception() and 
> > > > returns -1, so syscall_exception() returns regs->gpr[3]
> > > > 
> > > > In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then 
> > > > called with the return of syscall_exception() as first parameter, which 
> > > > leads to:
> > > > 
> > > > if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
> > > > if (likely(!(ti_flags & (_TIF_NOERROR | 
> > > > _TIF_RESTOREALL {
> > > > r3 = -r3;
> > > > regs->ccr |= 0x1000; /* Set SO bit in CR */
> > > > }
> > > > }
> > > > 
> > > > By chance, because you have already changed the sign of gpr[3], the 
> > > > above test fails and nothing is done to r3, and because you have also 
> > > > already set regs->ccr it works.
> > > > 
> > > > But all this looks inconsistent with the fact that do_seccomp sets 
> > >

[powerpc:next] BUILD SUCCESS 17391cb2613b82f8c405570fea605af3255ff8d2

2025-01-24 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: 17391cb2613b82f8c405570fea605af3255ff8d2  powerpc/pseries/iommu: 
Don't unset window if it was never set

elapsed time: 1173m

configs tested: 252
configs skipped: 8

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfiggcc-14.2.0
alphaallyesconfigclang-20
alphaallyesconfiggcc-14.2.0
alpha   defconfiggcc-14.2.0
arc  alldefconfiggcc-13.2.0
arc  allmodconfigclang-18
arc  allmodconfiggcc-13.2.0
arc   allnoconfiggcc-13.2.0
arc  allyesconfigclang-18
arc  allyesconfiggcc-13.2.0
arc defconfiggcc-14.2.0
arcnsim_700_defconfiggcc-13.2.0
arc nsimosci_hs_smp_defconfiggcc-13.2.0
arc   randconfig-001-20250124gcc-13.2.0
arc   randconfig-001-20250125gcc-13.2.0
arc   randconfig-002-20250124gcc-13.2.0
arc   randconfig-002-20250125gcc-13.2.0
arm  allmodconfigclang-18
arm  allmodconfiggcc-14.2.0
arm   allnoconfigclang-17
arm  allyesconfigclang-18
arm  allyesconfiggcc-14.2.0
arm at91_dt_defconfigclang-20
arm defconfiggcc-14.2.0
arm  ep93xx_defconfiggcc-14.2.0
arm   randconfig-001-20250124clang-17
arm   randconfig-001-20250125gcc-14.2.0
arm   randconfig-002-20250124gcc-14.2.0
arm   randconfig-002-20250125gcc-14.2.0
arm   randconfig-003-20250124gcc-14.2.0
arm   randconfig-003-20250125clang-18
arm   randconfig-004-20250124clang-19
arm   randconfig-004-20250125clang-20
arm  sp7021_defconfiggcc-14.2.0
armvt8500_v6_v7_defconfiggcc-14.2.0
arm64allmodconfigclang-18
arm64 allnoconfiggcc-14.2.0
arm64   defconfiggcc-14.2.0
arm64 randconfig-001-20250124clang-20
arm64 randconfig-001-20250125gcc-14.2.0
arm64 randconfig-002-20250124clang-20
arm64 randconfig-002-20250125gcc-14.2.0
arm64 randconfig-003-20250124clang-19
arm64 randconfig-003-20250125gcc-14.2.0
arm64 randconfig-004-20250124clang-20
arm64 randconfig-004-20250125gcc-14.2.0
csky  allnoconfiggcc-14.2.0
cskydefconfiggcc-14.2.0
csky  randconfig-001-20250124gcc-14.2.0
csky  randconfig-001-20250125gcc-14.2.0
csky  randconfig-002-20250124gcc-14.2.0
csky  randconfig-002-20250125gcc-14.2.0
hexagon  allmodconfigclang-20
hexagon   allnoconfigclang-20
hexagon  allyesconfigclang-20
hexagon defconfiggcc-14.2.0
hexagon   randconfig-001-20250124clang-20
hexagon   randconfig-001-20250125clang-20
hexagon   randconfig-001-20250125gcc-14.2.0
hexagon   randconfig-002-20250124clang-14
hexagon   randconfig-002-20250125clang-20
hexagon   randconfig-002-20250125gcc-14.2.0
i386 allmodconfigclang-19
i386 allmodconfiggcc-12
i386  allnoconfigclang-19
i386  allnoconfiggcc-12
i386 allyesconfigclang-19
i386 allyesconfiggcc-12
i386buildonly-randconfig-001-20250124clang-19
i386buildonly-randconfig-002-20250124clang-19
i386buildonly-randconfig-003-20250124gcc-12
i386buildonly-randconfig-004-20250124gcc-12
i386buildonly-randconfig-005-20250124gcc-12
i386buildonly-randconfig-006-20250124gcc-12
i386defconfigclang-19
i386  randconfig-001-20250125clang-19
i386  randconfig-002-20250125clang-19
i386  randconfig-003-20250125clang-19

Re: [PATCH v2 6/6] crash: option to let arch decide mem range is usable

2025-01-24 Thread Sourabh Jain


Hello Hari,


On 24/01/25 15:22, Hari Bathini wrote:

Hi Sourabh,

On 21/01/25 5:24 pm, Sourabh Jain wrote:

On PowerPC, the memory reserved for the crashkernel can contain
components like RTAS, TCE, OPAL, etc., which should be avoided when
loading kexec segments into crashkernel memory. Due to these special
components, PowerPC has its own set of functions to locate holes in the
crashkernel memory for loading kexec segments for kdump. However, for
loading kexec segments in the kexec case, PowerPC uses generic functions
to locate holes.

So, let's use generic functions to locate memory holes for kdump on
PowerPC by adding an arch hook to handle such special regions while
loading kexec segments, and remove the PowerPC functions to locate
holes.

Cc: Andrew Morton 
Cc: Baoquan he 
Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Sourabh Jain 
---
  arch/powerpc/include/asm/kexec.h  |   6 +-
  arch/powerpc/kexec/file_load_64.c | 259 ++
  include/linux/kexec.h |   9 ++
  kernel/kexec_file.c   |  12 ++
  4 files changed, 34 insertions(+), 252 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h 
b/arch/powerpc/include/asm/kexec.h

index 64741558071f..5e4680f9ff35 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -95,8 +95,10 @@ int arch_kexec_kernel_image_probe(struct kimage 
*image, void *buf, unsigned long

  int arch_kimage_file_post_load_cleanup(struct kimage *image);
  #define arch_kimage_file_post_load_cleanup 
arch_kimage_file_post_load_cleanup

  -int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
-#define arch_kexec_locate_mem_hole arch_kexec_locate_mem_hole
+int arch_check_excluded_range(struct kimage *image, unsigned long 
start,

+  unsigned long end);
+#define arch_check_excluded_range  arch_check_excluded_range
+
    int load_crashdump_segments_ppc64(struct kimage *image,
    struct kexec_buf *kbuf);
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c

index dc65c1391157..e7ef8b2a2554 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -49,201 +49,18 @@ const struct kexec_file_ops * const 
kexec_file_loaders[] = {

  NULL
  };
  -/**
- * __locate_mem_hole_top_down - Looks top down for a large enough 
memory hole
- *  in the memory regions between 
buf_min & buf_max
- *  for the buffer. If found, sets 
kbuf->mem.

- * @kbuf:   Buffer contents and memory parameters.
- * @buf_min:    Minimum address for the buffer.
- * @buf_max:    Maximum address for the buffer.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
-  u64 buf_min, u64 buf_max)
-{
-    int ret = -EADDRNOTAVAIL;
-    phys_addr_t start, end;
-    u64 i;
-
-    for_each_mem_range_rev(i, &start, &end) {
-    /*
- * memblock uses [start, end) convention while it is
- * [start, end] here. Fix the off-by-one to have the
- * same convention.
- */
-    end -= 1;
-
-    if (start > buf_max)
-    continue;
-
-    /* Memory hole not found */
-    if (end < buf_min)
-    break;
-
-    /* Adjust memory region based on the given range */
-    if (start < buf_min)
-    start = buf_min;
-    if (end > buf_max)
-    end = buf_max;
-
-    start = ALIGN(start, kbuf->buf_align);
-    if (start < end && (end - start + 1) >= kbuf->memsz) {
-    /* Suitable memory range found. Set kbuf->mem */
-    kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1,
-   kbuf->buf_align);
-    ret = 0;
-    break;
-    }
-    }
-
-    return ret;
-}
-
-/**
- * locate_mem_hole_top_down_ppc64 - Skip special memory regions to 
find a
- *  suitable buffer with top down 
approach.
- * @kbuf:   Buffer contents and memory 
parameters.

- * @buf_min:    Minimum address for the buffer.
- * @buf_max:    Maximum address for the buffer.
- * @emem:   Exclude memory ranges.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
-  u64 buf_min, u64 buf_max,
-  const struct crash_mem *emem)
+int arch_check_excluded_range(struct kimage *image, unsigned long 
start,

+  unsigned long end)
  {
-    int i, ret = 0, err = -EADDRNOTAVAIL;
-    u64 start, end, tmin, tmax;
-
-    tmax = buf_max;
-    for (i = (emem->nr_ranges - 1); i >= 0; i--) {
-    start = emem->ranges[i].start;
-

[PATCH v2] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-01-24 Thread Sourabh Jain

The fadump kernel boots with limited memory solely to collect the kernel
core dump. Having gigantic hugepages in the fadump kernel is of no use.
Many times, the fadump kernel encounters OOM (Out of Memory) issues if
gigantic hugepages are allocated.

To address this, disable gigantic hugepages if fadump is active by
returning early from arch_hugetlb_valid_size() using
hugepages_supported(). hugepages_supported() returns false if fadump is
active.

Returning early from arch_hugetlb_valid_size() not only disables
gigantic hugepages but also avoids unnecessary hstate initialization for
every hugepage size supported by the platform.

kernel logs related to hugepages with this patch included:
kernel argument passed: hugepagesz=1G hugepages=1

First kernel: gigantic hugepage got allocated
==

dmesg | grep -i "hugetlb"
-
HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

$ cat /proc/meminfo | grep -i "hugetlb"
-
Hugetlb: 1048576 kB

Fadump kernel: gigantic hugepage not allocated
===

dmesg | grep -i "hugetlb"
-
[0.00] HugeTLB: unsupported hugepagesz=1G
[0.00] HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
[0.706375] HugeTLB support is disabled!
[0.773530] hugetlbfs: disabling because there are no supported hugepage 
sizes

$ cat /proc/meminfo | grep -i "hugetlb"
--


Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Ritesh Harjani (IBM)" 
Signed-off-by: Sourabh Jain 
---

Changelog:

v1:
https://lore.kernel.org/all/20250121150419.1342794-1-sourabhj...@linux.ibm.com/

v2:
 - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()

---
 arch/powerpc/mm/hugetlbpage.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 6b043180220a..087a8df32416 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -135,8 +135,12 @@ int __init alloc_bootmem_huge_page(struct hstate *h, int 
nid)
 
 bool __init arch_hugetlb_valid_size(unsigned long size)
 {
-   int shift = __ffs(size);
-   int mmu_psize;
+   int shift, mmu_psize;
+
+   if (!hugepages_supported())
+   return false;
+
+   shift = __ffs(size);
 
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
-- 
2.48.1

Re: [PATCH v2 6/6] crash: option to let arch decide mem range is usable

2025-01-24 Thread Hari Bathini


Hi Sourabh,

On 21/01/25 5:24 pm, Sourabh Jain wrote:

On PowerPC, the memory reserved for the crashkernel can contain
components like RTAS, TCE, OPAL, etc., which should be avoided when
loading kexec segments into crashkernel memory. Due to these special
components, PowerPC has its own set of functions to locate holes in the
crashkernel memory for loading kexec segments for kdump. However, for
loading kexec segments in the kexec case, PowerPC uses generic functions
to locate holes.

So, let's use generic functions to locate memory holes for kdump on
PowerPC by adding an arch hook to handle such special regions while
loading kexec segments, and remove the PowerPC functions to locate
holes.

Cc: Andrew Morton 
Cc: Baoquan he 
Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Sourabh Jain 
---
  arch/powerpc/include/asm/kexec.h  |   6 +-
  arch/powerpc/kexec/file_load_64.c | 259 ++
  include/linux/kexec.h |   9 ++
  kernel/kexec_file.c   |  12 ++
  4 files changed, 34 insertions(+), 252 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 64741558071f..5e4680f9ff35 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -95,8 +95,10 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void 
*buf, unsigned long
  int arch_kimage_file_post_load_cleanup(struct kimage *image);
  #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
  
-int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);

-#define arch_kexec_locate_mem_hole arch_kexec_locate_mem_hole
+int arch_check_excluded_range(struct kimage *image, unsigned long start,
+ unsigned long end);
+#define arch_check_excluded_range  arch_check_excluded_range
+
  
  int load_crashdump_segments_ppc64(struct kimage *image,

  struct kexec_buf *kbuf);
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index dc65c1391157..e7ef8b2a2554 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -49,201 +49,18 @@ const struct kexec_file_ops * const kexec_file_loaders[] = 
{
NULL
  };
  
-/**

- * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
- *  in the memory regions between buf_min & buf_max
- *  for the buffer. If found, sets kbuf->mem.
- * @kbuf:   Buffer contents and memory parameters.
- * @buf_min:Minimum address for the buffer.
- * @buf_max:Maximum address for the buffer.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
- u64 buf_min, u64 buf_max)
-{
-   int ret = -EADDRNOTAVAIL;
-   phys_addr_t start, end;
-   u64 i;
-
-   for_each_mem_range_rev(i, &start, &end) {
-   /*
-* memblock uses [start, end) convention while it is
-* [start, end] here. Fix the off-by-one to have the
-* same convention.
-*/
-   end -= 1;
-
-   if (start > buf_max)
-   continue;
-
-   /* Memory hole not found */
-   if (end < buf_min)
-   break;
-
-   /* Adjust memory region based on the given range */
-   if (start < buf_min)
-   start = buf_min;
-   if (end > buf_max)
-   end = buf_max;
-
-   start = ALIGN(start, kbuf->buf_align);
-   if (start < end && (end - start + 1) >= kbuf->memsz) {
-   /* Suitable memory range found. Set kbuf->mem */
-   kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1,
-  kbuf->buf_align);
-   ret = 0;
-   break;
-   }
-   }
-
-   return ret;
-}
-
-/**
- * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a
- *  suitable buffer with top down approach.
- * @kbuf:   Buffer contents and memory parameters.
- * @buf_min:Minimum address for the buffer.
- * @buf_max:Maximum address for the buffer.
- * @emem:   Exclude memory ranges.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
- u64 buf_min, u64 buf_max,
- const struct crash_mem *emem)
+int arch_check_excluded_range(struct kimage *image, unsigned long

Re: [PATCH v2] fs: introduce getfsxattrat and setfsxattrat syscalls

2025-01-24 Thread Christian Brauner

On Wed, Jan 22, 2025 at 03:18:34PM +0100, Andrey Albershteyn wrote:
> From: Andrey Albershteyn 
> 
> Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> extended attributes/flags. The syscalls take parent directory FD and
> path to the child together with struct fsxattr.
> 
> This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> that file don't need to be open. By having this we can manipulated

By that you mean that you can use absolute or relative paths instead of
file descriptors?

> inode extended attributes not only on normal files but also on
> special ones. This is not possible with FS_IOC_FSSETXATTR ioctl as
> opening special files returns VFS special inode instead of
> underlying filesystem one.

I'm not following this argument currently. In what sense does opening
special files return a VFS special inode and how does that prevent
FS_IOC_FSSEETXATTR from working? The inode in

static int ioctl_fssetxattr(struct file *file, void __user *argp)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct dentry *dentry = file->f_path.dentry;

d_inode(dentry)


and your:

error = user_path_at(dfd, filename, lookup_flags, &filepath);
if (error)
goto out;

d_inode(filepath.dentry)

is the same.

> 
> This patch adds two new syscalls which allows userspace to set
> extended inode attributes on special files by using parent directory
> to open FS inode.
> 
> Also, as vfs_fileattr_set() is now will be called on special files
> too, let's forbid any other attributes except projid and nextents
> (symlink can have an extent).
> 
> CC: linux-...@vger.kernel.org
> CC: linux-fsde...@vger.kernel.org
> CC: linux-...@vger.kernel.org
> Signed-off-by: Andrey Albershteyn 
> ---
> v1:
> https://lore.kernel.org/linuxppc-dev/20250109174540.893098-1-aalbe...@kernel.org/
> 
> Previous discussion:
> https://lore.kernel.org/linux-xfs/20240520164624.665269-2-aalbe...@redhat.com/
> 
> XFS has project quotas which could be attached to a directory. All
> new inodes in these directories inherit project ID set on parent
> directory.
> 
> The project is created from userspace by opening and calling
> FS_IOC_FSSETXATTR on each inode. This is not possible for special
> files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
> with empty project ID. Those inodes then are not shown in the quota
> accounting but still exist in the directory. Moreover, in the case
> when special files are created in the directory with already
> existing project quota, these inode inherit extended attributes.
> This than leaves them with these attributes without the possibility
> to clear them out. This, in turn, prevents userspace from
> re-creating quota project on these existing files.
> ---
>  arch/alpha/kernel/syscalls/syscall.tbl  |  2 +
>  arch/arm/tools/syscall.tbl  |  2 +
>  arch/arm64/tools/syscall_32.tbl |  2 +
>  arch/m68k/kernel/syscalls/syscall.tbl   |  2 +
>  arch/microblaze/kernel/syscalls/syscall.tbl |  2 +
>  arch/mips/kernel/syscalls/syscall_n32.tbl   |  2 +
>  arch/mips/kernel/syscalls/syscall_n64.tbl   |  2 +
>  arch/mips/kernel/syscalls/syscall_o32.tbl   |  2 +
>  arch/parisc/kernel/syscalls/syscall.tbl |  2 +
>  arch/powerpc/kernel/syscalls/syscall.tbl|  2 +
>  arch/s390/kernel/syscalls/syscall.tbl   |  2 +
>  arch/sh/kernel/syscalls/syscall.tbl |  2 +
>  arch/sparc/kernel/syscalls/syscall.tbl  |  2 +
>  arch/x86/entry/syscalls/syscall_32.tbl  |  2 +
>  arch/x86/entry/syscalls/syscall_64.tbl  |  2 +
>  arch/xtensa/kernel/syscalls/syscall.tbl |  2 +
>  fs/inode.c  | 99 
> +
>  fs/ioctl.c  | 16 -
>  include/linux/fileattr.h|  1 +
>  include/linux/syscalls.h|  4 ++
>  include/uapi/asm-generic/unistd.h   |  8 ++-
>  21 files changed, 157 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
> b/arch/alpha/kernel/syscalls/syscall.tbl
> index 
> c59d53d6d3f3490f976ca179ddfe02e69265ae4d..4b9e687494c16b60c6fd6ca1dc4d6564706a7e25
>  100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -506,3 +506,5 @@
>  574  common  getxattrat  sys_getxattrat
>  575  common  listxattrat sys_listxattrat
>  576  common  removexattrat   sys_removexattrat
> +577  common  getfsxattratsys_getfsxattrat
> +578  common  setfsxattratsys_setfsxattrat
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index 
> 49eeb2ad8dbd8e074c6240417693f23fb328afa8..66466257f3c2debb3e2299f0b608c6740c98cab2
>  100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -481,3 +481,5 @@
>  464  common  getxattrat  sys_getxattrat
>  465  common  listxattrat

Re: [PATCH v3 1/6] powerpc: Document APIv2 KVM hcall spec for Hostwide counters

2025-01-24 Thread Bagas Sanjaya

On Thu, Jan 23, 2025 at 05:37:43PM +0530, Vaibhav Jain wrote:
> diff --git a/Documentation/arch/powerpc/kvm-nested.rst 
> b/Documentation/arch/powerpc/kvm-nested.rst
> index 5defd13cc6c1..574592505604 100644
> --- a/Documentation/arch/powerpc/kvm-nested.rst
> +++ b/Documentation/arch/powerpc/kvm-nested.rst
> @@ -208,13 +208,9 @@ associated values for each ID in the GSB::
>flags:
>   Bit 0: getGuestWideState: Request state of the Guest instead
> of an individual VCPU.
> - Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
> -   over ownership of the VCPU state and that the L0 can free
> -   the storage holding the state. The VCPU state will need to
> -   be returned to the Hypervisor via H_GUEST_SET_STATE prior
> -   to H_GUEST_RUN_VCPU being called for this VCPU. The data
> -   returned in the dataBuffer is in a Hypervisor internal
> -   format.
> + Bit 1: getHostWideState: Request stats of the Host. This causes
> +   the guestId and vcpuId parameters to be ignored and attempting
> +   to get the VCPU/Guest state will cause an error.
>   Bits 2-63: Reserved
>guestId: ID obtained from H_GUEST_CREATE
>vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
> @@ -406,9 +402,10 @@ the partition like the timebase offset and partition 
> scoped page
>  table information.
>  
>  ++---+++--+
> -|   ID   | Size  | RW | Thread | Details  |
> -|| Bytes || Guest  |  |
> -||   || Scope  |  |
> +|   ID   | Size  | RW |(H)ost  | Details  |
> +|| Bytes ||(G)uest |  |
> +||   ||(T)hread|  |
> +||   ||Scope   |  |
>  ++===+++==+
>  | 0x |   | RW |   TG   | NOP element  |
>  ++---+++--+
> @@ -434,6 +431,29 @@ table information.
>  ||   |||- 0x8 Table size. |
>  ++---+++--+
>  | 0x0007-|   ||| Reserved |
> +| 0x07FF |   |||  |
> +++---+++--+
> +| 0x0800 | 0x08  | R  |   H| Current usage in bytes of the|
> +||   ||| L0's Guest Management Space  |
> +||   ||| for an L1-Lpar.  |
> +++---+++--+
> +| 0x0801 | 0x08  | R  |   H| Max bytes available in the   |
> +||   ||| L0's Guest Management Space for  |
> +||   ||| an L1-Lpar   |
> +++---+++--+
> +| 0x0802 | 0x08  | R  |   H| Current usage in bytes of the|
> +||   ||| L0's Guest Page Table Management |
> +||   ||| Space for an L1-Lpar |
> +++---+++--+
> +| 0x0803 | 0x08  | R  |   H| Max bytes available in the L0's  |
> +||   ||| Guest Page Table Management  |
> +||   ||| Space for an L1-Lpar |
> +++---+++--+
> +| 0x0804 | 0x08  | R  |   H| Cumulative Reclaimed bytes from  |
> +||   ||| L0 Guest's Page Table Management |
> +||   ||| Space due to overcommit  |
> +++---+++--+
> +| 0x0805-|   ||| Reserved |
>  | 0x0BFF |   |||  |
>  ++---+++--+
>  | 0x0C00 | 0x10  | RW |   T|Run vCPU Input Buffer:|

The doc LGTM, thanks!

Reviewed-by: Bagas Sanjaya 

-- 
An old man doll... just what I always wanted! - Clara


signature.asc
Description: PGP signature

Re: [PATCH v3 4/6] kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU

Re: [PATCH 00/15] cpufreq: simplify boost handling

[PATCH 00/15] cpufreq: simplify boost handling

[PATCH 10/15] cpufreq: powernv: Set .set_boost directly

Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

Re: [PATCH v2] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

Re: [PATCH v3 0/6] kvm powerpc/book3s-hv: Expose Hostwide counters as perf-events

Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

[powerpc:next] BUILD SUCCESS 17391cb2613b82f8c405570fea605af3255ff8d2

Re: [PATCH v2 6/6] crash: option to let arch decide mem range is usable

[PATCH v2] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

Re: [PATCH v2 6/6] crash: option to let arch decide mem range is usable

Re: [PATCH v2] fs: introduce getfsxattrat and setfsxattrat syscalls

Re: [PATCH v3 1/6] powerpc: Document APIv2 KVM hcall spec for Hostwide counters

14 matches

Site Navigation

Mail list logo

Footer information