date:20241022

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Peter Zijlstra

On Mon, Oct 21, 2024 at 08:30:43PM -0700, Paul E. McKenney wrote:
> Thoughts?

Thanks Paul!

Acked-by: Peter Zijlstra (Intel) 

> 
> 
> commit 4433b7d3785d8d2a700f5ed5ca234c64bc63180e
> Author: Paul E. McKenney 
> Date:   Mon Oct 21 15:09:39 2024 -0700
> 
> srcu: Guarantee non-negative return value from srcu_read_lock()
> 
> For almost 20 years, the int return value from srcu_read_lock() has
> been always either zero or one.  This commit therefore documents the
> fact that it will be non-negative, and does the same for the underlying
> __srcu_read_lock().
> 
> [ paulmck: Apply Andrii Nakryiko feedback. ]
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Peter Zijlstra 
> Acked-by: Andrii Nakryiko 
> 
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index bab1dae3f69e6..512a8c54ba5ba 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -238,13 +238,14 @@ void srcu_check_read_flavor(struct srcu_struct *ssp, 
> int read_flavor);
>   * a mutex that is held elsewhere while calling synchronize_srcu() or
>   * synchronize_srcu_expedited().
>   *
> - * The return value from srcu_read_lock() must be passed unaltered
> - * to the matching srcu_read_unlock().  Note that srcu_read_lock() and
> - * the matching srcu_read_unlock() must occur in the same context, for
> - * example, it is illegal to invoke srcu_read_unlock() in an irq handler
> - * if the matching srcu_read_lock() was invoked in process context.  Or,
> - * for that matter to invoke srcu_read_unlock() from one task and the
> - * matching srcu_read_lock() from another.
> + * The return value from srcu_read_lock() is guaranteed to be
> + * non-negative.  This value must be passed unaltered to the matching
> + * srcu_read_unlock().  Note that srcu_read_lock() and the matching
> + * srcu_read_unlock() must occur in the same context, for example, it is
> + * illegal to invoke srcu_read_unlock() in an irq handler if the matching
> + * srcu_read_lock() was invoked in process context.  Or, for that matter to
> + * invoke srcu_read_unlock() from one task and the matching srcu_read_lock()
> + * from another.
>   */
>  static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
>  {
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 07147efcb64d3..ae17c214e0de5 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -738,7 +738,8 @@ EXPORT_SYMBOL_GPL(srcu_check_read_flavor);
>  /*
>   * Counts the new reader in the appropriate per-CPU element of the
>   * srcu_struct.
> - * Returns an index that must be passed to the matching srcu_read_unlock().
> + * Returns a guaranteed non-negative index that must be passed to the
> + * matching __srcu_read_unlock().
>   */
>  int __srcu_read_lock(struct srcu_struct *ssp)
>  {

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Peter Zijlstra

On Mon, Oct 21, 2024 at 11:51:40PM -0700, Christoph Hellwig wrote:
> On Mon, Oct 21, 2024 at 03:13:05PM -0700, Paul E. McKenney wrote:
> > For almost 20 years, the int return value from srcu_read_lock() has
> > been always either zero or one.  This commit therefore documents the
> > fact that it will be non-negative.
> 
> If it is always zero or one, wouldn't bool the better return value
> type?

What is returned is an array index -- and SRCU is currently built using
an array of size 2. Using larger arrays is conceivable (IIRC some
versions of preemptible RCU used up to 4 or something).

So while the values 0,1 are possible inside bool, that does not reflect
the nature of the numbers, which is an array index. Mapping that onto
bool would be slightly confusing (and limit possible future extention of
using larger arrays for SRCU).

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Christoph Hellwig

On Tue, Oct 22, 2024 at 09:06:35AM +0200, Peter Zijlstra wrote:
> What is returned is an array index -- and SRCU is currently built using
> an array of size 2. Using larger arrays is conceivable (IIRC some
> versions of preemptible RCU used up to 4 or something).
> 
> So while the values 0,1 are possible inside bool, that does not reflect
> the nature of the numbers, which is an array index. Mapping that onto
> bool would be slightly confusing (and limit possible future extention of
> using larger arrays for SRCU).

Ok, make sense.  Maybe add this to the comment if we're updating іt.
But using an unsigned return value might still be useful.

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Peter Zijlstra

On Tue, Oct 22, 2024 at 12:07:35AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 22, 2024 at 09:06:35AM +0200, Peter Zijlstra wrote:
> > What is returned is an array index -- and SRCU is currently built using
> > an array of size 2. Using larger arrays is conceivable (IIRC some
> > versions of preemptible RCU used up to 4 or something).
> > 
> > So while the values 0,1 are possible inside bool, that does not reflect
> > the nature of the numbers, which is an array index. Mapping that onto
> > bool would be slightly confusing (and limit possible future extention of
> > using larger arrays for SRCU).
> 
> Ok, make sense.  Maybe add this to the comment if we're updating іt.
> But using an unsigned return value might still be useful.

Ah, well, the thing that got us here is that we (Andrii and me) wanted
to use -1 as an 'invalid' value to indicate SRCU is not currently in
use.

So it all being int is really rather convenient :-)

Re: [PATCH net-next v5 2/3] connector/cn_proc: Kunit tests for threads hash table

2024-10-22 Thread Anjali Kulkarni



> On Oct 17, 2024, at 5:55 PM, Stanislav Fomichev  wrote:
> 
> On 10/18, Anjali Kulkarni wrote:
>> 
>> 
>>> On Oct 17, 2024, at 5:13 PM, Stanislav Fomichev  
>>> wrote:
>>> 
>>> On 10/17, Anjali Kulkarni wrote:
 Kunit tests to test hash table add, delete, duplicate add and delete.
 Add following configs and compile kernel code:
 
 CONFIG_CONNECTOR=y
 CONFIG_PROC_EVENTS=y
 CONFIG_NET=y
 CONFIG_KUNIT=m
 CONFIG_CN_HASH_KUNIT_TEST=m
 
 To run kunit tests:
 sudo modprobe cn_hash_test
 
 Output of kunit tests and hash table contents are displayed in
 /var/log/messages (at KERN_DEBUG level).
 
 Signed-off-by: Anjali Kulkarni 
 ---
 drivers/connector/cn_hash.c   |  40 
 drivers/connector/connector.c |  12 +++
 include/linux/connector.h |   4 +
 lib/Kconfig.debug |  17 
 lib/Makefile  |   1 +
 lib/cn_hash_test.c| 167 ++
 lib/cn_hash_test.h|  10 ++
 7 files changed, 251 insertions(+)
 create mode 100644 lib/cn_hash_test.c
 create mode 100644 lib/cn_hash_test.h
 
 diff --git a/drivers/connector/cn_hash.c b/drivers/connector/cn_hash.c
 index a079e9bcea6d..40099b5908ac 100644
 --- a/drivers/connector/cn_hash.c
 +++ b/drivers/connector/cn_hash.c
 @@ -170,6 +170,46 @@ int cn_hash_get_exval(struct cn_hash_dev *hdev, pid_t 
 pid)
 return -EINVAL;
 }
 
 +int cn_hash_display_hlist(struct cn_hash_dev *hdev, pid_t pid, int 
 max_len,
 + int *hkey, int *key_display)
 +{
 + struct uexit_pid_hnode *hnode;
 + int key, count = 0;
 +
 + mutex_lock(&hdev->uexit_hash_lock);
 + key = hash_min(pid, HASH_BITS(hdev->uexit_pid_htable));
 + pr_debug("Bucket: %d\n", key);
 +
 + hlist_for_each_entry(hnode,
 + &hdev->uexit_pid_htable[key],
 + uexit_pid_hlist) {
 + if (key_display[key] != 1) {
 + if (hnode->uexit_pid_hlist.next == NULL)
 + pr_debug("pid %d ", hnode->pid);
 + else
 + pr_debug("pid %d --> ", hnode->pid);
 + }
 + count++;
 + }
 +
 + mutex_unlock(&hdev->uexit_hash_lock);
 +
 + if ((key_display[key] != 1) && !count)
 + pr_debug("(empty)\n");
 +
 + pr_debug("\n");
 +
 + *hkey = key;
 +
 + if (count > max_len) {
 + pr_err("%d entries in hlist for key %d, expected %d\n",
 + count, key, max_len);
 + return -EINVAL;
 + }
 +
 + return 0;
 +}
 +
 bool cn_hash_table_empty(struct cn_hash_dev *hdev)
 {
 bool is_empty;
 diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
 index c1c0dcec53c0..2be2fe1adc12 100644
 --- a/drivers/connector/connector.c
 +++ b/drivers/connector/connector.c
 @@ -304,6 +304,18 @@ int cn_get_exval(pid_t pid)
 }
 EXPORT_SYMBOL_GPL(cn_get_exval);
 
 +int cn_display_hlist(pid_t pid, int max_len, int *hkey, int *key_display)
 +{
 + struct cn_dev *dev = &cdev;
 +
 + if (!cn_already_initialized)
 + return 0;
 +
 + return cn_hash_display_hlist(dev->hdev, pid, max_len,
 + hkey, key_display);
 +}
 +EXPORT_SYMBOL_GPL(cn_display_hlist);
 +
 bool cn_table_empty(void)
 {
 struct cn_dev *dev = &cdev;
 diff --git a/include/linux/connector.h b/include/linux/connector.h
 index 5384e4bb98e8..a75c3fcf182a 100644
 --- a/include/linux/connector.h
 +++ b/include/linux/connector.h
 @@ -168,4 +168,8 @@ int cn_get_exval(pid_t pid);
 bool cn_table_empty(void);
 bool cn_hash_table_empty(struct cn_hash_dev *hdev);
 
 +int cn_display_hlist(pid_t pid, int max_len, int *hkey, int *key_display);
 +int cn_hash_display_hlist(struct cn_hash_dev *hdev, pid_t pid, int 
 max_len,
 + int *hkey, int *key_display);
 +
 #endif /* __CONNECTOR_H */
 diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
 index 7315f643817a..290cf0a6befa 100644
 --- a/lib/Kconfig.debug
 +++ b/lib/Kconfig.debug
 @@ -2705,6 +2705,23 @@ config HASHTABLE_KUNIT_TEST
 
 If unsure, say N.
 
 +config CN_HASH_KUNIT_TEST
 + tristate "KUnit Test for connector hashtable code" if !KUNIT_ALL_TESTS
 + depends on KUNIT
 + default KUNIT_ALL_TESTS
 + help
 +  This builds the hashtable KUnit test suite.
 +  It tests the basic functionality of the API defined in
 +  drivers/connector/cn_hash.c.
 +  CONFIG_CONNECTOR=y, CONFIG_PROC_EVENTS=y and CONFIG_NET=y needs
 +  to be enabled along with CONFIG_CN_HASH_KUNIT_TEST=m and
 +  CONFIG_KUNIT=m in .config file to compile and then test as a kernel
 +  module with "modprobe cn_hash_test".
 +  For more information on KUnit and unit tests in general please
 +  refer to the KUnit documentation in Documentation/dev-tools/kunit/.
 +
 +  If unsure, say N.
 +
>>> 
>>> Looks like this

Re: [PATCH 1/1] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.

2024-10-22 Thread Frederic Weisbecker

Le Tue, Oct 22, 2024 at 05:34:21PM +0200, Sebastian Andrzej Siewior a écrit :
> On 2024-10-22 15:28:56 [+0200], Frederic Weisbecker wrote:
> > > Once the ksoftirqd is marked as pending (or is running) it will collect
> > > all raised softirqs. This in turn means that a softirq which would have
> > > been processed at the end of the threaded interrupt, which runs at an
> > > elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
> > > priority and competes with every regular task for CPU resources.
> > 
> > But for ksoftirqd to collect other non-timers softirqs, those vectors must
> > have been raised before from a threaded interrupt, right? So why those
> > vectors didn't get a chance to execute at the end of that threaded 
> > interrupt?
> 
> This statement is no longer accurate since
>   d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")
> 
> So the "collect all" part is no longer.
> 
> > OTOH one problem I can imagine is a threaded interrupt preempting ksoftirqd
> > which must wait for ksoftirqd to complete due to the local_bh_disable()
> > in the beginning of irq_forced_thread_fn(). But then isn't there some
> > PI involved on the local lock?
> 
> Yes, there is PI involved on the local lock. So this will happen.
> 
> > Sorry I'm probably missing something...
> 
> Try again without the "ksoftirqd will collect it all" since this won't
> happen since the revert I mentioned.

I still don't get it, this makes:

"""
Once the ksoftirqd is marked as pending (or is running), a softirq which
would have been processed at the end of the threaded interrupt, which runs
at an elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
priority and competes with every regular task for CPU resources.
"""

ksoftirqd raised for timers still doesn't prevent a threaded IRQ from running
softirqs, unless it preempts ksoftirqd and waits with PI. So is it what you're
trying to solve?

Or is the problem that timer softirqs are executed with SCHED_NORMAL?

> 
> > > This introduces long delays on heavy loaded systems and is not desired
> > > especially if the system is not overloaded by the softirqs.
> > > 
> > > Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
> > > timers thread and let it run at the lowest SCHED_FIFO priority.
> > > Wake-ups for RT tasks happen from hardirq context so only timer_list 
> > > timers
> > > and hrtimers for "regular" tasks are processed here.
> > 
> > That last sentence confuses me. How are timers for non regular task 
> > processed
> > elsewhere? Ah you mean RT tasks rely on hard hrtimers?
> 
> Correct. A clock_nanosleep() for a RT task will result in wake_up() from
> hardirq. A clock_nanosleep() for a !RT task will result in wake_up()
> from ksoftirqd or now the suggested ktimersd.

Oh I see now, that's all in __hrtimer_init_sleeper().

> 
> Quick question: Do we want this in forced-threaded mode, too? The timer
> (timer_list timer) and all HRTIMER_MODE_SOFT are handled in ksoftirqd.

It's hard to tell for me as I don't know the !RT usecases for forced-threaded 
mode.
"If in doubt say N" ;-)

> > > +#ifdef CONFIG_PREEMPT_RT
> > > +static void timersd_setup(unsigned int cpu)
> > > +{
> > 
> > That also needs a comment.
> 
> Why we want the priority I guess.

Yes.

> 
> …
> > > +void raise_hrtimer_softirq(void)
> > > +{
> > > + raise_ktimers_thread(HRTIMER_SOFTIRQ);
> > > +}
> > > +
> > > +void raise_timer_softirq(void)
> > > +{
> > > + unsigned long flags;
> > > +
> > > + local_irq_save(flags);
> > > + raise_ktimers_thread(TIMER_SOFTIRQ);
> > > + wake_timersd();
> > 
> > This is supposed to be called from hardirq only, right?
> > Can't irq_exit_rcu() take care of it? Why is it different
> > from HRTIMER_SOFTIRQ ?
> 
> Good question. This shouldn't be any different compared to the hrtimer
> case. This is only raised in hardirq, so yes, the irq_save can go away
> and the wake call, too.

Cool. You can add lockdep_assert_in_irq() within raise_ktimers_thread() for
some well deserved relief :-)

Thanks.

> 
> > Thanks.
> 
> Sebastian

Re: [PATCH V13 03/14] KVM: x86: Fix Intel PT Host/Guest mode when host tracing also

2024-10-22 Thread Sean Christopherson

On Tue, Oct 22, 2024, Adrian Hunter wrote:
> On 22/10/24 19:30, Sean Christopherson wrote:
> >>> LOL, yeah, this needs to be burned with fire.  It's wildly broken.  So 
> >>> for stable@,
> >>
> >> It doesn't seem wildly broken.  Just the VMM passing invalid CPUID
> >> and KVM not validating it.
> > 
> > Heh, I agree with "just", but unfortunately "just ... not validating" a 
> > large
> > swath of userspace inputs is pretty widly broken.  More importantly, it's 
> > not
> > easy to fix.  E.g. KVM could require the inputs to exactly match hardware, 
> > but
> > that creates an ABI that I'm not entirely sure is desirable in the long 
> > term.
> 
> Although the CPUID ABI does not really change.  KVM does not support
> emulating Intel PT, so accepting CPUID that the hardware cannot support
> seems like a bit of a lie.

But it's not all or nothing, e.g. KVM should support exposing fewer address 
ranges
than are supported by hardware, so that the same virtual CPU model can be run on
different generations of hardware.

> Aren't there other features that KVM does not support if the hardware support
> is not there?

Many.  But either features are one-off things without configurable properties,
or KVM does the right thing (usually).  E.g. nested virtualization heavily 
relies
on hardware, and has a plethora of knobs, but KVM (usually) honors and validates
the configuration provided by userspace.

> To some degree, a testing and debugging feature does not have to be
> available in 100% of cases because it can still be useful when it is
> available.

I don't disagree, but "works on my machine" is how KVM has gotten into so many
messes with such features.  I also don't necessarily disagree with supporting a
very limited subset of use cases, but I want such support to come as 
well-defined
package with proper guard rails, docs, and ideally tests.

> >>> I'll post a patch to hide the module param if CONFIG_BROKEN=n (and will 
> >>> omit
> >>> stable@ for the previous patch).
> >>>
> >>> Going forward, if someone actually cares about virtualizing PT enough to 
> >>> want to
> >>> fix KVM's mess, then they can put in the effort to fix all the bugs, 
> >>> write all
> >>> the tests, and in general clean up the implementation to meet KVM's 
> >>> current
> >>> standards.  E.g. KVM usage of intel_pt_validate_cap() instead of KVM's 
> >>> guest CPUID
> >>> and capabilities infrastructure needs to go.
> >>
> >> The problem below seems to be caused by not validating against the *host*
> >> CPUID.  KVM's CPUID information seems to be invalid.
> > 
> > Yes.
> > 
> >>> My vote is to queue the current code for removal, and revisit support 
> >>> after the
> >>> mediated PMU has landed.  Because I don't see any point in supporting 
> >>> Intel PT
> >>> without a mediated PMU, as host/guest mode really only makes sense if the 
> >>> entire
> >>> PMU is being handed over to the guest.
> >>
> >> Why?
> > 
> > To simplify the implementation, and because I don't see how virtualizing 
> > Intel PT
> > without also enabling the mediated PMU makes any sense.
> > 
> > Conceptually, KVM's PT implementation is very, very similar to the mediated 
> > PMU.
> > They both effectively give the guest control of hardware when the vCPU 
> > starts
> > running, and take back control when the vCPU stops running.
> > 
> > If KVM allows Intel PT without the mediated PMU, then KVM and perf have to 
> > support
> > two separate implementations for the same model.  If virtualizing Intel PT 
> > is
> > allowed if and only if the mediated PMU is enabled, then 
> > .handle_intel_pt_intr()
> > goes away.  And on the flip side, it becomes super obvious that host usage 
> > of
> > Intel PT needs to be mutually exclusive with the mediated PMU.
> 
> And forgo being able to trace mediated passthough with Intel PT ;-)

It can't work, generally.  Anything that generates a ToPA PMI will go sideways.
In the worst case scenario, the spurious PMI could crash the guest.

And when the mediated PMU supports PEBS, that would likely break too.

[PATCH 1/2] selftests/intel_pstate: fix operand expected

2024-10-22 Thread Alessandro Zanni

Running "make kselftest TARGETS=intel_pstate" results in
the following errors:
- ./run.sh: line 90: / 1000: syntax error: operand expected
(error token is "/ 1000")
- ./run.sh: line 92: / 1000: syntax error: operand expected
(error token is "/ 1000")

This fix allows to have cross-platform compatibility when
using arithmetic expression with command substitutions.

Signed-off-by: Alessandro Zanni 
---
 tools/testing/selftests/intel_pstate/run.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/intel_pstate/run.sh 
b/tools/testing/selftests/intel_pstate/run.sh
index e7008f614ad7..0c1b6c1308a4 100755
--- a/tools/testing/selftests/intel_pstate/run.sh
+++ b/tools/testing/selftests/intel_pstate/run.sh
@@ -87,9 +87,9 @@ mkt_freq=${_mkt_freq}0
 
 # Get the ranges from cpupower
 _min_freq=$(cpupower frequency-info -l | tail -1 | awk ' { print $1 } ')
-min_freq=$(($_min_freq / 1000))
+min_freq=$((_min_freq / 1000))
 _max_freq=$(cpupower frequency-info -l | tail -1 | awk ' { print $2 } ')
-max_freq=$(($_max_freq / 1000))
+max_freq=$((_max_freq / 1000))
 
 
 [ $EVALUATE_ONLY -eq 0 ] && for freq in `seq $max_freq -100 $min_freq`
-- 
2.43.0


>From d6500cf7c800bd39ae3ef2930cece5b3be460c0b Mon Sep 17 00:00:00 2001
From: Alessandro Zanni 
Date: Tue, 22 Oct 2024 17:22:13 +0200
Subject: [PATCH 2/2] selftests/intel_pstate: cpupower command not found
To: sh...@kernel.org
Cc: linux-kselft...@vger.kernel.org,
linux-kernel@vger.kernel.org,
sk...@linuxfoundation.org,
anupnewsm...@gmail.com

Running "make kselftest TARGETS=intel_pstate" results in the following errors:
- ./run.sh: line 89: cpupower: command not found
- ./run.sh: line 91: cpupower: command not found

if the cpupower is not installed.

Since the test depends on cpupower, this patch stops the test if the cpupower
is not installed.

Signed-off-by: Alessandro Zanni 
---
 tools/testing/selftests/intel_pstate/run.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/intel_pstate/run.sh 
b/tools/testing/selftests/intel_pstate/run.sh
index 0c1b6c1308a4..6a3b8503264e 100755
--- a/tools/testing/selftests/intel_pstate/run.sh
+++ b/tools/testing/selftests/intel_pstate/run.sh
@@ -44,6 +44,11 @@ if [ $UID != 0 ] && [ $EVALUATE_ONLY == 0 ]; then
 exit $ksft_skip
 fi
 
+if ! command -v cpupower &> /dev/null; then
+   echo $msg cpupower could not be found, please install it >&2
+   exit $ksft_skip
+fi
+
 max_cpus=$(($(nproc)-1))
 
 function run_test () {
-- 
2.43.0

Re: [PATCH V13 03/14] KVM: x86: Fix Intel PT Host/Guest mode when host tracing also

2024-10-22 Thread Adrian Hunter

On 22/10/24 19:30, Sean Christopherson wrote:
> On Tue, Oct 22, 2024, Adrian Hunter wrote:
>> On 14/10/24 21:25, Sean Christopherson wrote:
 Fixes: 2ef444f1600b ("KVM: x86: Add Intel PT context switch for each vcpu")
 Cc: sta...@vger.kernel.org
>>>
>>> This is way, way too big for stable@.  Given that host/guest mode is 
>>> disabled by
>>> default and that no one has complained about this, I think it's safe to say 
>>> that
>>> unless we can provide a minimal patch, fixing this in LTS kernels isn't a 
>>> priority.
>>>
>>> Alternatively, I'm tempted to simply drop support for host/guest mode.  It 
>>> clearly
>>> hasn't been well tested, and given the lack of bug reports, likely doesn't 
>>> have
>>> many, if any, users.  And I'm guessing the overhead needed to context 
>>> switch all
>>> the RTIT MSRs makes tracing in the guest relatively useless.
>>
>> As a control flow trace, it is not affected by context switch overhead.
> 
> Out of curiosity, how much is Intel PT used purely for control flow tracing, 
> i.e.
> without caring _at all_ about perceived execution time?

It can be used to try to understand how code went wrong.

But timestamps would still indicate what was happening on different VCPUs
at about the same time.

> 
>> Intel PT timestamps are also not affected by that.
> 
> Timestamps are affected because the guest will see inexplicable jumps in time.

Tracing a task on the host sees jumps in time for scheduler
context switches anyway.

> Those gaps are unavoidable to some degree, but context switching on every 
> entry
> and exit is 
> 
>> This patch reduces the MSR switching.
> 
> To be clear, I'm not objecting to any of the ideas in this patch, I'm 
> "objecting"
> to trying to put band-aids on KVM's existing implementation, which is clearly
> buggy and, like far too many PMU-ish features in KVM, was probably developed
> without any thought as to how it would affect use cases beyond the host admin
> and the VM owner being a single person.  And I'm also objecting, vehemently, 
> to
> sending anything of this magnitude and complexity to LTS kernels.

That's your call for sure.

> 
>>> /me fiddles around
>>>
>>> LOL, yeah, this needs to be burned with fire.  It's wildly broken.  So for 
>>> stable@,
>>
>> It doesn't seem wildly broken.  Just the VMM passing invalid CPUID
>> and KVM not validating it.
> 
> Heh, I agree with "just", but unfortunately "just ... not validating" a large
> swath of userspace inputs is pretty widly broken.  More importantly, it's not
> easy to fix.  E.g. KVM could require the inputs to exactly match hardware, but
> that creates an ABI that I'm not entirely sure is desirable in the long term.

Although the CPUID ABI does not really change.  KVM does not support
emulating Intel PT, so accepting CPUID that the hardware cannot support
seems like a bit of a lie.  Aren't there other features that KVM does not
support if the hardware support is not there?

To some degree, a testing and debugging feature does not have to be
available in 100% of cases because it can still be useful when it is
available.

> 
>>> I'll post a patch to hide the module param if CONFIG_BROKEN=n (and will omit
>>> stable@ for the previous patch).
>>>
>>> Going forward, if someone actually cares about virtualizing PT enough to 
>>> want to
>>> fix KVM's mess, then they can put in the effort to fix all the bugs, write 
>>> all
>>> the tests, and in general clean up the implementation to meet KVM's current
>>> standards.  E.g. KVM usage of intel_pt_validate_cap() instead of KVM's 
>>> guest CPUID
>>> and capabilities infrastructure needs to go.
>>
>> The problem below seems to be caused by not validating against the *host*
>> CPUID.  KVM's CPUID information seems to be invalid.
> 
> Yes.
> 
>>> My vote is to queue the current code for removal, and revisit support after 
>>> the
>>> mediated PMU has landed.  Because I don't see any point in supporting Intel 
>>> PT
>>> without a mediated PMU, as host/guest mode really only makes sense if the 
>>> entire
>>> PMU is being handed over to the guest.
>>
>> Why?
> 
> To simplify the implementation, and because I don't see how virtualizing 
> Intel PT
> without also enabling the mediated PMU makes any sense.
> 
> Conceptually, KVM's PT implementation is very, very similar to the mediated 
> PMU.
> They both effectively give the guest control of hardware when the vCPU starts
> running, and take back control when the vCPU stops running.
> 
> If KVM allows Intel PT without the mediated PMU, then KVM and perf have to 
> support
> two separate implementations for the same model.  If virtualizing Intel PT is
> allowed if and only if the mediated PMU is enabled, then 
> .handle_intel_pt_intr()
> goes away.  And on the flip side, it becomes super obvious that host usage of
> Intel PT needs to be mutually exclusive with the mediated PMU.

And forgo being able to trace mediated passthough with Intel PT ;-)

> 
>> Intel PT PMU is programmed separately from the x8

Re: [PATCH] kunit: Introduce autorun option

2024-10-22 Thread Rae Moar

On Thu, Oct 17, 2024 at 5:34 PM Stanislav Kinsburskii
 wrote:
>
> The new option controls tests run on boot or module load. With the new
> debugfs "run" dentry allowing to run tests on demand, an ability to disable
> automatic tests run becomes a useful option in case of intrusive tests.
>
> The option is set to true by default to preserve the existent behavior. It
> can be overridden by either the corresponding module option or by the
> corresponding config build option.
>
> Signed-off-by: Stanislav Kinsburskii 

Hello!

This patch looks good to me! The functionality seems to be working
well. I do have one comment below.

Thanks!
-Rae

> ---
>  include/kunit/test.h |4 +++-
>  lib/kunit/Kconfig|   12 
>  lib/kunit/debugfs.c  |2 +-
>  lib/kunit/executor.c |   18 +-
>  lib/kunit/test.c |6 --
>  5 files changed, 37 insertions(+), 5 deletions(-)
>
> diff --git a/include/kunit/test.h b/include/kunit/test.h
> index 34b71e42fb10..58dbab60f853 100644
> --- a/include/kunit/test.h
> +++ b/include/kunit/test.h
> @@ -312,6 +312,7 @@ static inline void kunit_set_failure(struct kunit *test)
>  }
>
>  bool kunit_enabled(void);
> +bool kunit_autorun(void);
>  const char *kunit_action(void);
>  const char *kunit_filter_glob(void);
>  char *kunit_filter(void);
> @@ -334,7 +335,8 @@ kunit_filter_suites(const struct kunit_suite_set 
> *suite_set,
> int *err);
>  void kunit_free_suite_set(struct kunit_suite_set suite_set);
>
> -int __kunit_test_suites_init(struct kunit_suite * const * const suites, int 
> num_suites);
> +int __kunit_test_suites_init(struct kunit_suite * const * const suites, int 
> num_suites,
> +bool run_tests);
>
>  void __kunit_test_suites_exit(struct kunit_suite **suites, int num_suites);
>
> diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
> index 34d7242d526d..a97897edd964 100644
> --- a/lib/kunit/Kconfig
> +++ b/lib/kunit/Kconfig
> @@ -81,4 +81,16 @@ config KUNIT_DEFAULT_ENABLED
>   In most cases this should be left as Y. Only if additional opt-in
>   behavior is needed should this be set to N.
>
> +config KUNIT_AUTORUN_ENABLED
> +   bool "Default value of kunit.autorun"
> +   default y
> +   help
> + Sets the default value of kunit.autorun. If set to N then KUnit
> + tests will not run after initialization unless kunit.autorun=1 is
> + passed to the kernel command line. The test can still be run 
> manually
> + via debugfs interface.
> +
> + In most cases this should be left as Y. Only if additional opt-in
> + behavior is needed should this be set to N.
> +
>  endif # KUNIT
> diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c
> index d548750a325a..9df064f40d98 100644
> --- a/lib/kunit/debugfs.c
> +++ b/lib/kunit/debugfs.c
> @@ -145,7 +145,7 @@ static ssize_t debugfs_run(struct file *file,
> struct inode *f_inode = file->f_inode;
> struct kunit_suite *suite = (struct kunit_suite *) f_inode->i_private;
>
> -   __kunit_test_suites_init(&suite, 1);
> +   __kunit_test_suites_init(&suite, 1, true);
>
> return count;
>  }
> diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
> index 34b7b6833df3..340723571b0f 100644
> --- a/lib/kunit/executor.c
> +++ b/lib/kunit/executor.c
> @@ -29,6 +29,22 @@ const char *kunit_action(void)
> return action_param;
>  }
>
> +/*
> + * Run KUnit tests after initialization
> + */
> +#ifdef CONFIG_KUNIT_AUTORUN_ENABLED
> +static bool autorun_param = true;
> +#else
> +static bool autorun_param;
> +#endif
> +module_param_named(autorun, autorun_param, bool, 0);
> +MODULE_PARM_DESC(autorun, "Run KUnit tests after initialization");
> +
> +bool kunit_autorun(void)
> +{
> +   return autorun_param;
> +}
> +
>  static char *filter_glob_param;
>  static char *filter_param;
>  static char *filter_action_param;
> @@ -266,7 +282,7 @@ void kunit_exec_run_tests(struct kunit_suite_set 
> *suite_set, bool builtin)
> pr_info("1..%zu\n", num_suites);

When using this feature, I still see some KTAP output that are printed
from this function (kunit_exec_run_tests). I think it would be great
if we could remove this output as to not clutter the kernel log.

At first, I was confused as to why we needed to call this function and
initialize the tests but I realized the debugfs suites need to be
created.

So instead, could we check for kunit_autorun() here instead as a
condition before printing the output?

> }
>
> -   __kunit_test_suites_init(suite_set->start, num_suites);
> +   __kunit_test_suites_init(suite_set->start, num_suites, 
> kunit_autorun());
>  }
>
>  void kunit_exec_list_tests(struct kunit_suite_set *suite_set, bool 
> include_attr)
> diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> index 089c832e3cdb..146d1b48a096 100644
> --- a/lib/kunit/test.c
> +++ b/lib/kunit/test.c
> @@ -708,7 +708,8 @@ bool kunit_enabled(void)
> return

[PATCH] kselftest/arm64: Log fp-stress child startup errors to stdout

2024-10-22 Thread Mark Brown

Currently if we encounter an error between fork() and exec() of a child
process we log the error to stderr. This means that the errors don't get
annotated with the child information which makes diagnostics harder and
means that if we miss the exit signal from the child we can deadlock
waiting for output from the child. Improve robustness and output quality
by logging to stdout instead.

Signed-off-by: Mark Brown 
---
 tools/testing/selftests/arm64/fp/fp-stress.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c 
b/tools/testing/selftests/arm64/fp/fp-stress.c
index 
faac24bdefeb9436e2daf20b7250d0ae25ca23a7..80f22789504d661efc52a90d4b0893fbebec42f8
 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -79,7 +79,7 @@ static void child_start(struct child_data *child, const char 
*program)
 */
ret = dup2(pipefd[1], 1);
if (ret == -1) {
-   fprintf(stderr, "dup2() %d\n", errno);
+   printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
 
@@ -89,7 +89,7 @@ static void child_start(struct child_data *child, const char 
*program)
 */
ret = dup2(startup_pipe[0], 3);
if (ret == -1) {
-   fprintf(stderr, "dup2() %d\n", errno);
+   printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
 
@@ -107,16 +107,15 @@ static void child_start(struct child_data *child, const 
char *program)
 */
ret = read(3, &i, sizeof(i));
if (ret < 0)
-   fprintf(stderr, "read(startp pipe) failed: %s (%d)\n",
-   strerror(errno), errno);
+   printf("read(startp pipe) failed: %s (%d)\n",
+  strerror(errno), errno);
if (ret > 0)
-   fprintf(stderr, "%d bytes of data on startup pipe\n",
-   ret);
+   printf("%d bytes of data on startup pipe\n", ret);
close(3);
 
ret = execl(program, program, NULL);
-   fprintf(stderr, "execl(%s) failed: %d (%s)\n",
-   program, errno, strerror(errno));
+   printf("execl(%s) failed: %d (%s)\n",
+  program, errno, strerror(errno));
 
exit(EXIT_FAILURE);
} else {

---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241017-arm64-fp-stress-exec-fail-d074ec82cf43

Best regards,
-- 
Mark Brown

Re: [PATCH V13 03/14] KVM: x86: Fix Intel PT Host/Guest mode when host tracing also

2024-10-22 Thread Andi Kleen

> 
> Out of curiosity, how much is Intel PT used purely for control flow tracing, 
> i.e.
> without caring _at all_ about perceived execution time?

It is very common, e.g. one major use of PT is control flow discovery in
feedback fuzzers.

-Andi

Re: [PATCH net-next v5 2/3] connector/cn_proc: Kunit tests for threads hash table

2024-10-22 Thread Stanislav Fomichev

On 10/22, Anjali Kulkarni wrote:
> 
> 
> > On Oct 17, 2024, at 5:55 PM, Stanislav Fomichev  
> > wrote:
> > 
> > On 10/18, Anjali Kulkarni wrote:
> >> 
> >> 
> >>> On Oct 17, 2024, at 5:13 PM, Stanislav Fomichev  
> >>> wrote:
> >>> 
> >>> On 10/17, Anjali Kulkarni wrote:
>  Kunit tests to test hash table add, delete, duplicate add and delete.
>  Add following configs and compile kernel code:
>  
>  CONFIG_CONNECTOR=y
>  CONFIG_PROC_EVENTS=y
>  CONFIG_NET=y
>  CONFIG_KUNIT=m
>  CONFIG_CN_HASH_KUNIT_TEST=m
>  
>  To run kunit tests:
>  sudo modprobe cn_hash_test
>  
>  Output of kunit tests and hash table contents are displayed in
>  /var/log/messages (at KERN_DEBUG level).
>  
>  Signed-off-by: Anjali Kulkarni 
>  ---
>  drivers/connector/cn_hash.c   |  40 
>  drivers/connector/connector.c |  12 +++
>  include/linux/connector.h |   4 +
>  lib/Kconfig.debug |  17 
>  lib/Makefile  |   1 +
>  lib/cn_hash_test.c| 167 ++
>  lib/cn_hash_test.h|  10 ++
>  7 files changed, 251 insertions(+)
>  create mode 100644 lib/cn_hash_test.c
>  create mode 100644 lib/cn_hash_test.h
>  
>  diff --git a/drivers/connector/cn_hash.c b/drivers/connector/cn_hash.c
>  index a079e9bcea6d..40099b5908ac 100644
>  --- a/drivers/connector/cn_hash.c
>  +++ b/drivers/connector/cn_hash.c
>  @@ -170,6 +170,46 @@ int cn_hash_get_exval(struct cn_hash_dev *hdev, 
>  pid_t pid)
>  return -EINVAL;
>  }
>  
>  +int cn_hash_display_hlist(struct cn_hash_dev *hdev, pid_t pid, int 
>  max_len,
>  + int *hkey, int *key_display)
>  +{
>  + struct uexit_pid_hnode *hnode;
>  + int key, count = 0;
>  +
>  + mutex_lock(&hdev->uexit_hash_lock);
>  + key = hash_min(pid, HASH_BITS(hdev->uexit_pid_htable));
>  + pr_debug("Bucket: %d\n", key);
>  +
>  + hlist_for_each_entry(hnode,
>  + &hdev->uexit_pid_htable[key],
>  + uexit_pid_hlist) {
>  + if (key_display[key] != 1) {
>  + if (hnode->uexit_pid_hlist.next == NULL)
>  + pr_debug("pid %d ", hnode->pid);
>  + else
>  + pr_debug("pid %d --> ", hnode->pid);
>  + }
>  + count++;
>  + }
>  +
>  + mutex_unlock(&hdev->uexit_hash_lock);
>  +
>  + if ((key_display[key] != 1) && !count)
>  + pr_debug("(empty)\n");
>  +
>  + pr_debug("\n");
>  +
>  + *hkey = key;
>  +
>  + if (count > max_len) {
>  + pr_err("%d entries in hlist for key %d, expected %d\n",
>  + count, key, max_len);
>  + return -EINVAL;
>  + }
>  +
>  + return 0;
>  +}
>  +
>  bool cn_hash_table_empty(struct cn_hash_dev *hdev)
>  {
>  bool is_empty;
>  diff --git a/drivers/connector/connector.c 
>  b/drivers/connector/connector.c
>  index c1c0dcec53c0..2be2fe1adc12 100644
>  --- a/drivers/connector/connector.c
>  +++ b/drivers/connector/connector.c
>  @@ -304,6 +304,18 @@ int cn_get_exval(pid_t pid)
>  }
>  EXPORT_SYMBOL_GPL(cn_get_exval);
>  
>  +int cn_display_hlist(pid_t pid, int max_len, int *hkey, int 
>  *key_display)
>  +{
>  + struct cn_dev *dev = &cdev;
>  +
>  + if (!cn_already_initialized)
>  + return 0;
>  +
>  + return cn_hash_display_hlist(dev->hdev, pid, max_len,
>  + hkey, key_display);
>  +}
>  +EXPORT_SYMBOL_GPL(cn_display_hlist);
>  +
>  bool cn_table_empty(void)
>  {
>  struct cn_dev *dev = &cdev;
>  diff --git a/include/linux/connector.h b/include/linux/connector.h
>  index 5384e4bb98e8..a75c3fcf182a 100644
>  --- a/include/linux/connector.h
>  +++ b/include/linux/connector.h
>  @@ -168,4 +168,8 @@ int cn_get_exval(pid_t pid);
>  bool cn_table_empty(void);
>  bool cn_hash_table_empty(struct cn_hash_dev *hdev);
>  
>  +int cn_display_hlist(pid_t pid, int max_len, int *hkey, int 
>  *key_display);
>  +int cn_hash_display_hlist(struct cn_hash_dev *hdev, pid_t pid, int 
>  max_len,
>  + int *hkey, int *key_display);
>  +
>  #endif /* __CONNECTOR_H */
>  diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>  index 7315f643817a..290cf0a6befa 100644
>  --- a/lib/Kconfig.debug
>  +++ b/lib/Kconfig.debug
>  @@ -2705,6 +2705,23 @@ config HASHTABLE_KUNIT_TEST
>  
>  If unsure, say N.
>  
>  +config CN_HASH_KUNIT_TEST
>  + tristate "KUnit Test for connector hashtable code" if !KUNIT_ALL_TESTS
>  + depends on KUNIT
>  + default KUNIT_ALL_TESTS
>  + help
>  +  This builds the hashtable KUnit test suite.
>  +  It tests the basic functionality of the API defined in
>  +  drivers/connector/cn_hash.c.
>  +  CONFIG_CONNECTOR=y, CONFIG_PROC_EVENTS=y and CONFIG_NET=y needs
>  +  to be enabled along with CONFIG_C

Re: [PATCH v4 1/4] pidfd: extend pidfd_get_pid() and de-duplicate pid lookup

2024-10-22 Thread Shakeel Butt

On Thu, Oct 17, 2024 at 10:05:49PM GMT, Lorenzo Stoakes wrote:
> The means by which a pid is determined from a pidfd is duplicated, with
> some callers holding a reference to the (pid)fd, and others explicitly
> pinning the pid.
> 
> Introduce __pidfd_get_pid() which narrows this to one approach of pinning
> the pid, with an optional output parameters for file->f_flags to avoid the
> need to hold onto a file to retrieve this.
> 
> Additionally, allow the ability to open a pidfd by opening a /proc/
> directory, utilised by the pidfd_send_signal() system call, providing a
> pidfd_get_pid_proc() helper function to do so.
> 
> Doing this allows us to eliminate open-coded pidfd pid lookup and to
> consistently handle this in one place.
> 
> This lays the groundwork for a subsequent patch which adds a new sentinel
> pidfd to explicitly reference the current process (i.e. thread group
> leader) without the need for a pidfd.
> 
> Signed-off-by: Lorenzo Stoakes 

Reviewed-by: Shakeel Butt

Re: [PATCH] rcu/nocb: Fix rcuog wake-up from offline softirq

2024-10-22 Thread Paul E. McKenney

On Thu, Oct 10, 2024 at 06:36:09PM +0200, Frederic Weisbecker wrote:
> After a CPU has set itself offline and before it eventually calls
> rcutree_report_cpu_dead(), there are still opportunities for callbacks
> to be enqueued, for example from a softirq. When that happens on NOCB,
> the rcuog wake-up is deferred through an IPI to an online CPU in order
> not to call into the scheduler and risk arming the RT-bandwidth after
> hrtimers have been migrated out and disabled.
> 
> But performing a synchronized IPI from a softirq is buggy as reported in
> the following scenario:
> 
> WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
> Modules linked in: rcutorture torture
> CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 
> 6.11.0-rc1-00012-g9139f93209d1 #1
> Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
> RIP: 0010:smp_call_function_single
> 
> swake_up_one_online
> __call_rcu_nocb_wake
> __call_rcu_common
> ? rcu_torture_one_read
> call_timer_fn
> __run_timers
> run_timer_softirq
> handle_softirqs
> irq_exit_rcu
> ? tick_handle_periodic
> sysvec_apic_timer_interrupt
> 
> 
> Fix this with forcing deferred rcuog wake up through the NOCB timer when
> the CPU is offline. The actual wake up will happen from
> rcutree_report_cpu_dead().
> 
> Reported-by: kernel test robot 
> Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-...@intel.com
> Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline 
> CPU")
> Reviewed-by: Joel Fernandes (Google) 
> Signed-off-by: Frederic Weisbecker 

Reviewed-by: Paul E. McKenney 

> ---
>  kernel/rcu/tree_nocb.h | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 97b99cd06923..16865475120b 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -554,13 +554,19 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, 
> bool was_alldone,
>   rcu_nocb_unlock(rdp);
>   wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
>  TPS("WakeLazy"));
> - } else if (!irqs_disabled_flags(flags)) {
> + } else if (!irqs_disabled_flags(flags) && cpu_online(rdp->cpu)) 
> {
>   /* ... if queue was empty ... */
>   rcu_nocb_unlock(rdp);
>   wake_nocb_gp(rdp, false);
>   trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>   TPS("WakeEmpty"));
>   } else {
> + /*
> +  * Don't do the wake-up upfront on fragile paths.
> +  * Also offline CPUs can't call swake_up_one_online() 
> from
> +  * (soft-)IRQs. Rely on the final deferred wake-up from
> +  * rcutree_report_cpu_dead()
> +  */
>   rcu_nocb_unlock(rdp);
>   wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
>  TPS("WakeEmptyIsDeferred"));
> -- 
> 2.46.0
>

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Andrii Nakryiko

On Tue, Oct 22, 2024 at 7:26 AM Paul E. McKenney  wrote:
>
> On Tue, Oct 22, 2024 at 12:13:12AM -0700, Christoph Hellwig wrote:
> > On Tue, Oct 22, 2024 at 09:10:18AM +0200, Peter Zijlstra wrote:
> > > Ah, well, the thing that got us here is that we (Andrii and me) wanted
> > > to use -1 as an 'invalid' value to indicate SRCU is not currently in
> > > use.
> > >
> > > So it all being int is really rather convenient :-)
> >
> > Then please document that use.  Maybe even with a symolic name for
> > -1 that clearly describes these uses.
>
> Would this work?
>
> #define SRCU_INVALID_INDEX -1
>

But why? It's a nice property to have an int-returning API where valid
values are only >= 0, so callers are free to use the entire negative
range (not just -1) for whatever they need to store in case there is
no srcu_idx value.

Why are we complicating something that's simple and straightforward?
It's similar with any fd- and id- returning API.

Marking it as u16 would be fine, but unusual (and probably suboptimal
due to common u16 to int conversion).

Marking it as unsigned int would be bad, because it implies that all
32 bits can be used for valid value, making it impossible to use <0
for something else.

Just documenting that valid int values are always >= is good and
convenient, why not sticking to just that?

> Whatever the name, maybe Peter and Andrii define this under #ifndef
> right now, and we get it into include/linux/srcu.h over time.
>
> Or is there a better way?  Or name, for that matter.
>
> Thanx, Paul

[PATCH v2] selftests: tc-testing: Fix typo error

2024-10-22 Thread Karan Sanghavi

Correct the typo errors in json files

- "diffferent" is corrected to "different".
- "muliple" and "miltiple" is corrected to "multiple".

Reviewed-by: Simon Horman 
Reviewed-by: Shuah Khan 
Signed-off-by: Karan Sanghavi 
---
Changes in v2:
- Rewrote short log and commit message
- Link to v1: 
https://lore.kernel.org/r/20241019-multiple_spell_error-v1-1-facff43b5...@gmail.com
---
 tools/testing/selftests/tc-testing/tc-tests/filters/basic.json  | 6 +++---
 tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json | 6 +++---
 tools/testing/selftests/tc-testing/tc-tests/filters/flow.json   | 2 +-
 tools/testing/selftests/tc-testing/tc-tests/filters/route.json  | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/basic.json 
b/tools/testing/selftests/tc-testing/tc-tests/filters/basic.json
index d1278de8ebc3..c9309a44a87e 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/basic.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/basic.json
@@ -67,7 +67,7 @@
 },
 {
 "id": "4943",
-"name": "Add basic filter with cmp ematch u32/link layer and miltiple 
actions",
+"name": "Add basic filter with cmp ematch u32/link layer and multiple 
actions",
 "category": [
 "filter",
 "basic"
@@ -155,7 +155,7 @@
 },
 {
 "id": "32d8",
-"name": "Add basic filter with cmp ematch u32/network layer and 
miltiple actions",
+"name": "Add basic filter with cmp ematch u32/network layer and 
multiple actions",
 "category": [
 "filter",
 "basic"
@@ -243,7 +243,7 @@
 },
 {
 "id": "62d7",
-"name": "Add basic filter with cmp ematch u32/transport layer and 
miltiple actions",
+"name": "Add basic filter with cmp ematch u32/transport layer and 
multiple actions",
 "category": [
 "filter",
 "basic"
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json 
b/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json
index 03723cf84379..35c9a7dbe1c4 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json
@@ -67,7 +67,7 @@
 },
 {
 "id": "0234",
-"name": "Add cgroup filter with cmp ematch u32/link layer and miltiple 
actions",
+"name": "Add cgroup filter with cmp ematch u32/link layer and multiple 
actions",
 "category": [
 "filter",
 "cgroup"
@@ -155,7 +155,7 @@
 },
 {
 "id": "2733",
-"name": "Add cgroup filter with cmp ematch u32/network layer and 
miltiple actions",
+"name": "Add cgroup filter with cmp ematch u32/network layer and 
multiple actions",
 "category": [
 "filter",
 "cgroup"
@@ -1189,7 +1189,7 @@
 },
 {
 "id": "4319",
-"name": "Replace cgroup filter with diffferent match",
+"name": "Replace cgroup filter with different match",
 "category": [
 "filter",
 "cgroup"
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json 
b/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json
index 58189327f644..996448afe31b 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json
@@ -507,7 +507,7 @@
 },
 {
 "id": "4341",
-"name": "Add flow filter with muliple ops",
+"name": "Add flow filter with multiple ops",
 "category": [
 "filter",
 "flow"
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/route.json 
b/tools/testing/selftests/tc-testing/tc-tests/filters/route.json
index 8d8de8f65aef..05cedca67cca 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/route.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/route.json
@@ -111,7 +111,7 @@
 },
 {
 "id": "7994",
-"name": "Add route filter with miltiple actions",
+"name": "Add route filter with multiple actions",
 "category": [
 "filter",
 "route"

---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241017-multiple_spell_error-8b267e47

Best regards,
-- 
Karan Sanghavi

Re: [PATCH v4 2/4] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

2024-10-22 Thread Shakeel Butt

On Thu, Oct 17, 2024 at 10:05:50PM GMT, Lorenzo Stoakes wrote:
> It is useful to be able to utilise the pidfd mechanism to reference the
> current thread or process (from a userland point of view - thread group
> leader from the kernel's point of view).
> 
> Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
> 
> For convenience and to avoid confusion from userland's perspective we alias
> these:
> 
> * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
>   the user will want to use, as they would find it surprising if for
>   instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
>   and that failed.
> 
> * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
>   have no concept of thread groups or what a thread group leader is, and
>   from userland's perspective and nomenclature this is what userland
>   considers to be a process.

Should users use PIDFD_SELF_PROCESS in process_madvise() for self
madvise() (once the support is added)?

> 
[...]
>  
> +static struct pid *pidfd_get_pid_self(unsigned int pidfd, unsigned int 
> *flags)
> +{
> + bool is_thread = pidfd == PIDFD_SELF_THREAD;
> + enum pid_type type = is_thread ? PIDTYPE_PID : PIDTYPE_TGID;
> + struct pid *pid = *task_pid_ptr(current, type);
> +
> + /* The caller expects an elevated reference count. */
> + get_pid(pid);

Do you want this helper to work for scenarios where pid is used across
context? Otherwise can't we get rid of this get and later put for self?

> + return pid;
> +}
> +

Overall looks good to me.

Reviewed-by: Shakeel Butt

Re: [PATCH 2/2] soc: qcom: pmic_glink: Handle GLINK intent allocation rejections

2024-10-22 Thread Johan Hovold

On Tue, Oct 22, 2024 at 04:17:12AM +, Bjorn Andersson wrote:
> Some versions of the pmic_glink firmware does not allow dynamic GLINK
> intent allocations, attempting to send a message before the firmware has
> allocated its receive buffers and announced these intent allocations
> will fail. When this happens something like this showns up in the log:
> 
>   [9.799719] pmic_glink_altmode.pmic_glink_altmode 
> pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>   [9.812446] pmic_glink_altmode.pmic_glink_altmode 
> pmic_glink.altmode.0: failed to request altmode notifications: -125
>   [9.831796] ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to 
> send UCSI read request: -125

I think you should drop the time stamps here, and also add the battery
notification error to make the patch easier to find when searching for
these errors:

qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed 
to request power notifications

> GLINK has been updated to distinguish between the cases where the remote
> is going down (-ECANCELLED) and the intent allocation being rejected
> (-EAGAIN).
> 
> Retry the send until intent buffers becomes available, or an actual
> error occur.
> 
> To avoid infinitely waiting for the firmware in the event that this
> misbehaves and no intents arrive, an arbitrary 10 second timeout is
> used.
> 
> This patch was developed with input from Chris Lew.
> 
> Reported-by: Johan Hovold 
> Closes: https://lore.kernel.org/all/zqet8iinndhnx...@hovoldconsulting.com/#t

This indeed seems to fix the -ECANCELED related errors I reported above,
but the audio probe failure still remains as expected:

PDR: avs/audio get domain list txn wait failed: -110
PDR: service lookup for avs/audio failed: -110

I hit it on the third reboot and then again after another 75 reboots
(and have never seen it with the user space pd-mapper over several
hundred boots).

Do you guys have any theories as to what is causing the above with the
in-kernel pd-mapper (beyond the obvious changes in timing)?

> Cc: sta...@vger.kernel.org

Can you add a Fixes tag here?

This patch depends on the former, but that is not necessarily obvious
for someone backporting this (and the previous patch is only going to be
backported to 6.4).

Perhaps you can use the stable tag dependency annotation or even mark
the previous patch so that it is backported far enough.

> Signed-off-by: Bjorn Andersson 

Tested-by: Johan Hovold 

> ---
>  drivers/soc/qcom/pmic_glink.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
> index 
> 9606222993fd78e80d776ea299cad024a0197e91..221639f3da149da1f967dbc769a97d327ffd6c63
>  100644
> --- a/drivers/soc/qcom/pmic_glink.c
> +++ b/drivers/soc/qcom/pmic_glink.c
> @@ -13,6 +13,8 @@
>  #include 
>  #include 
>  
> +#define PMIC_GLINK_SEND_TIMEOUT (10*HZ)

nit: spaces around *

Ten seconds seems a little excessive; are there any reasons for not
picking something shorter like 5 s (also used by USB but that comes from
spec)?

> +
>  enum {
>   PMIC_GLINK_CLIENT_BATT = 0,
>   PMIC_GLINK_CLIENT_ALTMODE,
> @@ -112,13 +114,23 @@ EXPORT_SYMBOL_GPL(pmic_glink_client_register);
>  int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
>  {
>   struct pmic_glink *pg = client->pg;
> + unsigned long start;
> + bool timeout_reached = false;

No need to initialise.

>   int ret;
>  
>   mutex_lock(&pg->state_lock);
> - if (!pg->ept)
> + if (!pg->ept) {
>   ret = -ECONNRESET;
> - else
> - ret = rpmsg_send(pg->ept, data, len);
> + } else {
> + start = jiffies;
> + do {
> + timeout_reached = time_after(jiffies, start + 
> PMIC_GLINK_SEND_TIMEOUT);
> + ret = rpmsg_send(pg->ept, data, len);

Add a delay here to avoid hammering the remote side with requests in a
tight loop for 10 s?

> + } while (ret == -EAGAIN && !timeout_reached);
> +
> + if (ret == -EAGAIN && timeout_reached)
> + ret = -ETIMEDOUT;
> + }
>   mutex_unlock(&pg->state_lock);
>  
>   return ret;

Looks good to me otherwise: 

Reviewed-by: Johan Hovold 

Johan

Re: [PATCH v2 1/2] selftest: rtc: Add to check rtc alarm status for alarm related test

2024-10-22 Thread Alexandre Belloni

On 20/10/2024 20:22:13-0700, Joseph Jang wrote:
> In alarm_wkalm_set and alarm_wkalm_set_minute test, they use different
> ioctl (RTC_ALM_SET/RTC_WKALM_SET) for alarm feature detection. They will
> skip testing if RTC_ALM_SET/RTC_WKALM_SET ioctl returns an EINVAL error
> code. This design may miss detecting real problems when the
> efi.set_wakeup_time() return errors and then RTC_ALM_SET/RTC_WKALM_SET
> ioctl returns an EINVAL error code with RTC_FEATURE_ALARM enabled.
> 
> In order to make rtctest more explicit and robust, we propose to use
> RTC_PARAM_GET ioctl interface to check rtc alarm feature state before
> running alarm related tests. If the kernel does not support RTC_PARAM_GET
> ioctl interface, we will fallback to check the error number of
> (RTC_ALM_SET/RTC_WKALM_SET) ioctl call for alarm feature detection.
> 
> Requires commit 101ca8d05913b ("rtc: efi: Enable SET/GET WAKEUP services
> as optional")
> 
> Reviewed-by: Koba Ko 
> Reviewed-by: Matthew R. Ochs 
> Signed-off-by: Joseph Jang 

Acked-by: Alexandre Belloni 

> ---
> Changes in v2:
> - Changed to use $(top_srcdir) instead of hardcoding the path.
> 
>  tools/testing/selftests/rtc/Makefile  |  2 +-
>  tools/testing/selftests/rtc/rtctest.c | 64 +++
>  2 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/rtc/Makefile 
> b/tools/testing/selftests/rtc/Makefile
> index 55198ecc04db..6e3a98fb24ba 100644
> --- a/tools/testing/selftests/rtc/Makefile
> +++ b/tools/testing/selftests/rtc/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -CFLAGS += -O3 -Wl,-no-as-needed -Wall
> +CFLAGS += -O3 -Wl,-no-as-needed -Wall -I$(top_srcdir)/usr/include
>  LDLIBS += -lrt -lpthread -lm
>  
>  TEST_GEN_PROGS = rtctest
> diff --git a/tools/testing/selftests/rtc/rtctest.c 
> b/tools/testing/selftests/rtc/rtctest.c
> index 63ce02d1d5cc..2b12497eb30d 100644
> --- a/tools/testing/selftests/rtc/rtctest.c
> +++ b/tools/testing/selftests/rtc/rtctest.c
> @@ -25,6 +25,12 @@
>  
>  static char *rtc_file = "/dev/rtc0";
>  
> +enum rtc_alarm_state {
> + RTC_ALARM_UNKNOWN,
> + RTC_ALARM_ENABLED,
> + RTC_ALARM_DISABLED,
> +};
> +
>  FIXTURE(rtc) {
>   int fd;
>  };
> @@ -82,6 +88,24 @@ static void nanosleep_with_retries(long ns)
>   }
>  }
>  
> +static enum rtc_alarm_state get_rtc_alarm_state(int fd)
> +{
> + struct rtc_param param = { 0 };
> + int rc;
> +
> + /* Validate kernel reflects unsupported RTC alarm state */
> + param.param = RTC_PARAM_FEATURES;
> + param.index = 0;
> + rc = ioctl(fd, RTC_PARAM_GET, ¶m);
> + if (rc < 0)
> + return RTC_ALARM_UNKNOWN;
> +
> + if ((param.uvalue & _BITUL(RTC_FEATURE_ALARM)) == 0)
> + return RTC_ALARM_DISABLED;
> +
> + return RTC_ALARM_ENABLED;
> +}
> +
>  TEST_F_TIMEOUT(rtc, date_read_loop, READ_LOOP_DURATION_SEC + 2) {
>   int rc;
>   long iter_count = 0;
> @@ -197,11 +221,16 @@ TEST_F(rtc, alarm_alm_set) {
>   fd_set readfds;
>   time_t secs, new;
>   int rc;
> + enum rtc_alarm_state alarm_state = RTC_ALARM_UNKNOWN;
>  
>   if (self->fd == -1 && errno == ENOENT)
>   SKIP(return, "Skipping test since %s does not exist", rtc_file);
>   ASSERT_NE(-1, self->fd);
>  
> + alarm_state = get_rtc_alarm_state(self->fd);
> + if (alarm_state == RTC_ALARM_DISABLED)
> + SKIP(return, "Skipping test since alarms are not supported.");
> +
>   rc = ioctl(self->fd, RTC_RD_TIME, &tm);
>   ASSERT_NE(-1, rc);
>  
> @@ -210,6 +239,11 @@ TEST_F(rtc, alarm_alm_set) {
>  
>   rc = ioctl(self->fd, RTC_ALM_SET, &tm);
>   if (rc == -1) {
> + /*
> +  * Report error if rtc alarm was enabled. Fallback to check 
> ioctl
> +  * error number if rtc alarm state is unknown.
> +  */
> + ASSERT_EQ(RTC_ALARM_UNKNOWN, alarm_state);
>   ASSERT_EQ(EINVAL, errno);
>   TH_LOG("skip alarms are not supported.");
>   return;
> @@ -255,11 +289,16 @@ TEST_F(rtc, alarm_wkalm_set) {
>   fd_set readfds;
>   time_t secs, new;
>   int rc;
> + enum rtc_alarm_state alarm_state = RTC_ALARM_UNKNOWN;
>  
>   if (self->fd == -1 && errno == ENOENT)
>   SKIP(return, "Skipping test since %s does not exist", rtc_file);
>   ASSERT_NE(-1, self->fd);
>  
> + alarm_state = get_rtc_alarm_state(self->fd);
> + if (alarm_state == RTC_ALARM_DISABLED)
> + SKIP(return, "Skipping test since alarms are not supported.");
> +
>   rc = ioctl(self->fd, RTC_RD_TIME, &alarm.time);
>   ASSERT_NE(-1, rc);
>  
> @@ -270,6 +309,11 @@ TEST_F(rtc, alarm_wkalm_set) {
>  
>   rc = ioctl(self->fd, RTC_WKALM_SET, &alarm);
>   if (rc == -1) {
> + /*
> +  * Report error if rtc alarm was enabled. Fallback to check 
> ioctl
> +  * error number if rtc alarm state is unknown.
> +  */
>

Re: [PATCH v2 1/2] mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()

2024-10-22 Thread SeongJae Park

Hi Zheng,


We Cc kunit folks for any DAMON kunit test changes, so I Cc-ed them.

On Tue, 22 Oct 2024 16:39:26 +0800 Zheng Yejian  
wrote:

> According to the logic of damon_va_evenly_split_region(), currently
> following split case would not meet the expectation:
> 
>   Suppose DAMON_MIN_REGION=0x1000,
>   Case: Split [0x0, 0x3000) into 2 pieces, then the result would be
> acutually 3 regions:
>   [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000)
> but NOT the expected 2 regions:
>   [0x0, 0x1000), [0x1000, 0x3000) !!!
> 
> The root cause is that when calculating size of each split piece in
> damon_va_evenly_split_region():
> 
>   `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);`
> 
> both the dividing and the ALIGN_DOWN may cause loss of precision,
> then each time split one piece of size 'sz_piece' from origin 'start' to
> 'end' would cause more pieces are split out than expected!!!
> 
> To fix it, count for each piece split and make sure no more than
> 'nr_pieces'. In addition, add above case into damon_test_split_evenly().
> 
> After this patch, damon-operations test passed:

Just for a clarification.  damon-operations test doesn't fail without this
patch.  This patch introduces two changes.  A new kunit test, and a bug fix.
Without the bug fix, the new kunit test fails.

I usually prefer separating test changes from fixes (introduc a fix first, and
then the test for it, to avoid unnecessary test failures).  But, given the
small size and the simplicity of the kunit change for this patch, I think
introducing it together with the fix is ok.

> 
>  # ./tools/testing/kunit/kunit.py run damon-operations
>  [...]
>  == damon-operations (6 subtests) ===
>  [PASSED] damon_test_three_regions_in_vmas
>  [PASSED] damon_test_apply_three_regions1
>  [PASSED] damon_test_apply_three_regions2
>  [PASSED] damon_test_apply_three_regions3
>  [PASSED] damon_test_apply_three_regions4
>  [PASSED] damon_test_split_evenly
>   [PASSED] damon-operations =
> 
> Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory 
> address spaces")
> Signed-off-by: Zheng Yejian 

Reviewed-by: SeongJae Park 


Thanks,
SJ

[...]

Re: [PATCH v2 2/2] mm/damon/vaddr: Add 'nr_piece == 1' check in damon_va_evenly_split_region()

2024-10-22 Thread SeongJae Park

Hi Zheng,


Cc-ed kunit folks, as we usually do for DAMON kunit test changes.

On Tue, 22 Oct 2024 16:39:27 +0800 Zheng Yejian  
wrote:

> As discussed in [1], damon_va_evenly_split_region() is called to
> size-evenly split a region into 'nr_pieces' small regions,
> when nr_pieces == 1, no actual split is required. Check that case
> for better code readability and add a simple kunit testcase.
> 
> [1] https://lore.kernel.org/all/20241021163316.12443-1...@kernel.org/
> 
> Signed-off-by: Zheng Yejian 

Reviewed-by: SeongJae Park 


Thanks,
SJ

[...]

Re: [PATCH V13 03/14] KVM: x86: Fix Intel PT Host/Guest mode when host tracing also

2024-10-22 Thread Sean Christopherson

On Tue, Oct 22, 2024, Adrian Hunter wrote:
> On 14/10/24 21:25, Sean Christopherson wrote:
> >> Fixes: 2ef444f1600b ("KVM: x86: Add Intel PT context switch for each vcpu")
> >> Cc: sta...@vger.kernel.org
> > 
> > This is way, way too big for stable@.  Given that host/guest mode is 
> > disabled by
> > default and that no one has complained about this, I think it's safe to say 
> > that
> > unless we can provide a minimal patch, fixing this in LTS kernels isn't a 
> > priority.
> > 
> > Alternatively, I'm tempted to simply drop support for host/guest mode.  It 
> > clearly
> > hasn't been well tested, and given the lack of bug reports, likely doesn't 
> > have
> > many, if any, users.  And I'm guessing the overhead needed to context 
> > switch all
> > the RTIT MSRs makes tracing in the guest relatively useless.
> 
> As a control flow trace, it is not affected by context switch overhead.

Out of curiosity, how much is Intel PT used purely for control flow tracing, 
i.e.
without caring _at all_ about perceived execution time?

> Intel PT timestamps are also not affected by that.

Timestamps are affected because the guest will see inexplicable jumps in time.
Those gaps are unavoidable to some degree, but context switching on every entry
and exit is 

> This patch reduces the MSR switching.

To be clear, I'm not objecting to any of the ideas in this patch, I'm 
"objecting"
to trying to put band-aids on KVM's existing implementation, which is clearly
buggy and, like far too many PMU-ish features in KVM, was probably developed
without any thought as to how it would affect use cases beyond the host admin
and the VM owner being a single person.  And I'm also objecting, vehemently, to
sending anything of this magnitude and complexity to LTS kernels.

> > /me fiddles around
> > 
> > LOL, yeah, this needs to be burned with fire.  It's wildly broken.  So for 
> > stable@,
> 
> It doesn't seem wildly broken.  Just the VMM passing invalid CPUID
> and KVM not validating it.

Heh, I agree with "just", but unfortunately "just ... not validating" a large
swath of userspace inputs is pretty widly broken.  More importantly, it's not
easy to fix.  E.g. KVM could require the inputs to exactly match hardware, but
that creates an ABI that I'm not entirely sure is desirable in the long term.

> > I'll post a patch to hide the module param if CONFIG_BROKEN=n (and will omit
> > stable@ for the previous patch).
> > 
> > Going forward, if someone actually cares about virtualizing PT enough to 
> > want to
> > fix KVM's mess, then they can put in the effort to fix all the bugs, write 
> > all
> > the tests, and in general clean up the implementation to meet KVM's current
> > standards.  E.g. KVM usage of intel_pt_validate_cap() instead of KVM's 
> > guest CPUID
> > and capabilities infrastructure needs to go.
> 
> The problem below seems to be caused by not validating against the *host*
> CPUID.  KVM's CPUID information seems to be invalid.

Yes.

> > My vote is to queue the current code for removal, and revisit support after 
> > the
> > mediated PMU has landed.  Because I don't see any point in supporting Intel 
> > PT
> > without a mediated PMU, as host/guest mode really only makes sense if the 
> > entire
> > PMU is being handed over to the guest.
> 
> Why?

To simplify the implementation, and because I don't see how virtualizing Intel 
PT
without also enabling the mediated PMU makes any sense.

Conceptually, KVM's PT implementation is very, very similar to the mediated PMU.
They both effectively give the guest control of hardware when the vCPU starts
running, and take back control when the vCPU stops running.

If KVM allows Intel PT without the mediated PMU, then KVM and perf have to 
support
two separate implementations for the same model.  If virtualizing Intel PT is
allowed if and only if the mediated PMU is enabled, then .handle_intel_pt_intr()
goes away.  And on the flip side, it becomes super obvious that host usage of
Intel PT needs to be mutually exclusive with the mediated PMU.

> Intel PT PMU is programmed separately from the x86 PMU.

Except for the minor detail that Intel PT generates PMIs, and that PEBS can log
to PT buffers.  Oh, and giving the guest control of the PMU means host usage of
Intel PT will break the host *and* guest.  The host won't get PMIs, while the
guest will see spurious PMIs.

So I don't see any reason to try to separate the two.

> > [ 1458.686107] [ cut here ]
> > [ 1458.690766] Invalid MSR 588, please adapt vmx_possible_passthrough_msrs[]
> 
> VMM is trying to set a non-existent MSR.  Looks like it has
> decided there are more PT address filter MSRs that are architecturally
> possible.
> 
> I had no idea QEMU was so broken.  

It's not QEMU that's broken, it's KVM that's broken.  

> I always just use -cpu host.

Yes, and that's exactly the problem.  The only people that have ever touched 
this
likely only ever use `-cpu host`, and so KVM's flaws have gone unnoticed.

> W

Re: [PATCH] selftests/sched: add basic test for psi

2024-10-22 Thread Pintu Agarwal

Hi Johannes,

On Tue, 22 Oct 2024 at 19:36, Johannes Weiner  wrote:
>
> On Tue, Oct 22, 2024 at 05:51:58PM +0530, Pintu Kumar wrote:
> > There is a psi module that exists under kernel/sched/psi.
> > Add a basic test to test the psi.
>
> I'm not sure this is a valuable use of test cycles. The mere existence
> and basic format of the files is very unlikely to be buggy, and such a
> bug wouldn't hide for very long either.
>
Yes, I agree this is just a basic version to prepare for the test framework.
Once the framework is available more tests can be added, such as trigger poll.
If you have any other suggestions for the test please let me know.

> > @@ -18548,10 +18548,12 @@ F:  include/uapi/linux/pps.h
> >  PRESSURE STALL INFORMATION (PSI)
> >  M:   Johannes Weiner 
> >  M:   Suren Baghdasaryan 
> > +M:   Pintu Kumar 
>
> $ git log --oneline --author='Pintu Kumar' kernel/sched/psi.c | wc -l
> 0
>
> Really? ;)
>
oh sorry.
This maintainer was added for tools/testing/sefttests/sched/psi_test
after referring to existing once.
Otherwise, checkpatch was giving below warning:

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#51:
new file mode 100644
total: 0 errors, 1 warnings, 134 lines checked

Please let me know what is the correct way.

Thanks,
Pintu

Re: [PATCH 1/1] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.

2024-10-22 Thread Sebastian Andrzej Siewior

On 2024-10-22 15:28:56 [+0200], Frederic Weisbecker wrote:
> Le Fri, Oct 04, 2024 at 12:17:04PM +0200, Sebastian Andrzej Siewior a écrit :
> > A timer/ hrtimer softirq is raised in-IRQ context. With threaded
> > interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd
> > for the processing of the softirq.
> 
> It took me some time to understand the actual problem (yeah I know...)
> 
> Can this be rephrased as: "Timer and hrtimer softirq vectors are special
> in that they are always raised in-IRQ context whereas other vectors are
> more likely to be raised from threaded interrupts or any regular tasks
> when threaded interrupts or PREEMPT_RT are enabled. This leads to
> waking ksoftirqd for the processing of the softirqs whenever timer
> vectors are involved.

Oki.

> > Once the ksoftirqd is marked as pending (or is running) it will collect
> > all raised softirqs. This in turn means that a softirq which would have
> > been processed at the end of the threaded interrupt, which runs at an
> > elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
> > priority and competes with every regular task for CPU resources.
> 
> But for ksoftirqd to collect other non-timers softirqs, those vectors must
> have been raised before from a threaded interrupt, right? So why those
> vectors didn't get a chance to execute at the end of that threaded interrupt?

This statement is no longer accurate since
d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")

So the "collect all" part is no longer.

> OTOH one problem I can imagine is a threaded interrupt preempting ksoftirqd
> which must wait for ksoftirqd to complete due to the local_bh_disable()
> in the beginning of irq_forced_thread_fn(). But then isn't there some
> PI involved on the local lock?

Yes, there is PI involved on the local lock. So this will happen.

> Sorry I'm probably missing something...

Try again without the "ksoftirqd will collect it all" since this won't
happen since the revert I mentioned.

> > This introduces long delays on heavy loaded systems and is not desired
> > especially if the system is not overloaded by the softirqs.
> > 
> > Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
> > timers thread and let it run at the lowest SCHED_FIFO priority.
> > Wake-ups for RT tasks happen from hardirq context so only timer_list timers
> > and hrtimers for "regular" tasks are processed here.
> 
> That last sentence confuses me. How are timers for non regular task processed
> elsewhere? Ah you mean RT tasks rely on hard hrtimers?

Correct. A clock_nanosleep() for a RT task will result in wake_up() from
hardirq. A clock_nanosleep() for a !RT task will result in wake_up()
from ksoftirqd or now the suggested ktimersd.

Quick question: Do we want this in forced-threaded mode, too? The timer
(timer_list timer) and all HRTIMER_MODE_SOFT are handled in ksoftirqd.

> > The higher priority
> > ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
> > 
> > Using a dedicated variable to store the pending softirq bits values
> > ensure that the timer are not accidentally picked up by ksoftirqd and
> > other threaded interrupts.
> > It shouldn't be picked up by ksoftirqd since it runs at lower priority.
> > However if ksoftirqd is already running while a timer fires, then
> > ksoftird will be PI-boosted due to the BH-lock to ktimer's priority.
> > Ideally we try to avoid having ksoftirqd running.
> > 
> > The timer thread can pick up pending softirqs from ksoftirqd but only
> > if the softirq load is high. It is not be desired that the picked up
> > softirqs are processed at SCHED_FIFO priority under high softirq load
> > but this can already happen by a PI-boost by a force-threaded interrupt.
> > 
> > [ frede...@kernel.org: rcutorture.c fixes, storm fix by introduction of
> >   local_pending_timers() for tick_nohz_next_event() ]
> > 
> > [ junxiao.ch...@intel.com: Ensure ktimersd gets woken up even if a
> >   softirq is currently served. ]
> > 
> > Signed-off-by: Sebastian Andrzej Siewior 
> > ---
> >  include/linux/interrupt.h | 29 ++
> >  kernel/rcu/rcutorture.c   |  6 +++
> >  kernel/softirq.c  | 82 ++-
> >  kernel/time/hrtimer.c |  4 +-
> >  kernel/time/tick-sched.c  |  2 +-
> >  kernel/time/timer.c   |  2 +-
> >  6 files changed, 120 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > index 457151f9f263d..4a4f367cd6864 100644
> > --- a/include/linux/interrupt.h
> > +++ b/include/linux/interrupt.h
> > @@ -616,6 +616,35 @@ extern void __raise_softirq_irqoff(unsigned int nr);
> >  extern void raise_softirq_irqoff(unsigned int nr);
> >  extern void raise_softirq(unsigned int nr);
> >  
> > +#ifdef CONFIG_PREEMPT_RT
> 
> This needs a comment section to explain why a dedicated
> timers processing is needed.

Okay.

> > +DECLARE_PER_CPU(struct task_struct *, timersd);

Re: [PATCH v11 7/7] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-10-22 Thread Mathieu Poirier

On Wed, Oct 09, 2024 at 10:01:08AM +0200, Arnaud Pouliquen wrote:
> The new TEE remoteproc driver is used to manage remote firmware in a
> secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is
> introduced to delegate the loading of the firmware to the trusted
> execution context. In such cases, the firmware should be signed and
> adhere to the image format defined by the TEE.
> 
> Signed-off-by: Arnaud Pouliquen 
> ---
> updates vs v9 revision:
> - rename tee_interface to tee_rproc_itf
> - in stm32_rproc_probe(), test and use rproc->tee_rproc_itf instead of
>   trproc in the tee_rproc_unregister() call
> - initialize release_fw ops
> ---
>  drivers/remoteproc/stm32_rproc.c | 63 ++--
>  1 file changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/remoteproc/stm32_rproc.c 
> b/drivers/remoteproc/stm32_rproc.c
> index 288bd70c7861..cb7093de41df 100644
> --- a/drivers/remoteproc/stm32_rproc.c
> +++ b/drivers/remoteproc/stm32_rproc.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -255,6 +256,19 @@ static int stm32_rproc_release(struct rproc *rproc)
>   return 0;
>  }
>  
> +static int stm32_rproc_tee_stop(struct rproc *rproc)
> +{
> + int err;
> +
> + stm32_rproc_request_shutdown(rproc);
> +
> + err = tee_rproc_stop(rproc);
> + if (err)
> + return err;
> +
> + return stm32_rproc_release(rproc);
> +}
> +
>  static int stm32_rproc_prepare(struct rproc *rproc)
>  {
>   struct device *dev = rproc->dev.parent;
> @@ -691,8 +705,20 @@ static const struct rproc_ops st_rproc_ops = {
>   .get_boot_addr  = rproc_elf_get_boot_addr,
>  };
>  
> +static const struct rproc_ops st_rproc_tee_ops = {
> + .prepare= stm32_rproc_prepare,
> + .start  = tee_rproc_start,
> + .stop   = stm32_rproc_tee_stop,
> + .kick   = stm32_rproc_kick,
> + .load   = tee_rproc_load_fw,
> + .parse_fw   = tee_rproc_parse_fw,
> + .find_loaded_rsc_table = tee_rproc_find_loaded_rsc_table,
> + .release_fw = tee_rproc_release_fw,
> +};
> +
>  static const struct of_device_id stm32_rproc_match[] = {
>   { .compatible = "st,stm32mp1-m4" },
> + { .compatible = "st,stm32mp1-m4-tee" },
>   {},
>  };
>  MODULE_DEVICE_TABLE(of, stm32_rproc_match);
> @@ -851,17 +877,42 @@ static int stm32_rproc_probe(struct platform_device 
> *pdev)
>   struct device *dev = &pdev->dev;
>   struct stm32_rproc *ddata;
>   struct device_node *np = dev->of_node;
> + struct tee_rproc *trproc = NULL;

The cleaner this patchset get, the more obvious it is (at least to me) that
struct tee_rproc needs to be changed to struct rproc_tee.  Otherwise I keep
wondering if this is coming from the TEE subsystem or the remoteproc subsystem.

>   struct rproc *rproc;
>   unsigned int state;
> + u32 proc_id;
>   int ret;
>  
>   ret = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(32));
>   if (ret)
>   return ret;
>  
> - rproc = devm_rproc_alloc(dev, np->name, &st_rproc_ops, NULL, 
> sizeof(*ddata));
> - if (!rproc)
> - return -ENOMEM;
> + if (of_device_is_compatible(np, "st,stm32mp1-m4-tee")) {
> + /*
> +  * Delegate the firmware management to the secure context.
> +  * The firmware loaded has to be signed.
> +  */
> + ret = of_property_read_u32(np, "st,proc-id", &proc_id);
> + if (ret) {
> + dev_err(dev, "failed to read st,rproc-id property\n");
> + return ret;
> + }
> +
> + rproc = devm_rproc_alloc(dev, np->name, &st_rproc_tee_ops, 
> NULL, sizeof(*ddata));
> + if (!rproc)
> + return -ENOMEM;
> +
> + trproc = tee_rproc_register(dev, rproc, proc_id);

This should return an integer rather than a struct tee_rproc * since the latter
is available through rproc->tee_rproc_itf.

In line with my comment above, this should be changed to rproc_tee_register()
since it belongs to the remoteproc subsystem.  Before when I asked for
tee_remoteproc.c to be changed to remoteproc_tee.c, I thought we could get by
without changing the inside but now I think it is clear that we can't - this
needs to be addressed.  

> + if (IS_ERR(trproc)) {
> + dev_err_probe(dev, PTR_ERR(trproc),
> +   "signed firmware not supported by TEE\n");
> + return PTR_ERR(trproc);

return dev_err_probe(...);
> + }
> + } else {
> + rproc = devm_rproc_alloc(dev, np->name, &st_rproc_ops, NULL, 
> sizeof(*ddata));
> + if (!rproc)
> + return -ENOMEM;
> + }
>  
>   ddata = rproc->priv;
>  
> @@ -913,6 +964,9 @@ static int stm32_rproc_probe(struct platform_device *pdev

Re: [PATCH v5 0/3] selftests: livepatch: test livepatching a kprobed function

2024-10-22 Thread Petr Mladek

On Thu 2024-10-17 22:01:29, Michael Vetter wrote:
> Thanks for all the reviews.
> 
> V5:
> Replace /sys/kernel/livepatch also in other/already existing tests.
> Improve commit message of 3rd patch.

JFYI, I have committed the patchset into livepatching.git,
branch for-6.13/selftests-kprobe.

I have fixed the wrong substitution in the 1st patch as suggested by Joe.

Also I have made two substitutions in test-syscall.sh in the 2nd patch
as suggested by Miroslav.

I would personally prefer to do the substitutions of existing paths
in a separate patch (same as Miroslav). But I do not think that
it was worth another respin. It would just increase risk of more
typos. And we have already spent enough time on this ;-)

Best Regards,
Petr

Re: [PATCH 1/2] rpmsg: glink: Handle rejected intent request better

2024-10-22 Thread Johan Hovold

On Tue, Oct 22, 2024 at 04:17:11AM +, Bjorn Andersson wrote:
> The initial implementation of request intent response handling dealt
> with two outcomes; granted allocations, and all other cases being
> considered -ECANCELLED (likely from "cancelling the operation as the
> remote is going down").

For the benefit of casual reviewers and contributors, could you add
introductory comment about what "intents" are?

> But on some channels intent allocation is not supported, instead the
> remote will pre-allocate and announce a fixed number of intents for the
> sender to use. If for such channels an rpmsg_send() is being invoked
> before any channels have been announced, an intent request will be
> issued and as this comes back rejected the call is failed with
> -ECANCELLED.

It's actually the one L -ECANCELED

s/is failed/fails/ ?
 
> Given that this is reported in the same way as the remote being shut
> down, there's no way for the client to differentiate the two cases.
> 
> In line with the original GLINK design, change the return value to
> -EAGAIN for the case where the remote rejects an intent allocation
> request.
> 
> It's tempting to handle this case in the GLINK core, as we expect
> intents to show up in this case. But there's no way to distinguish
> between this case and a rejection for a too big allocation, nor is it
> possible to predict if a currently used (and seeminly suitable) intent

seemingly

> will be returned for reuse or not. As such, returning the error to the
> client and allow it to react seems to be the only sensible solution.

s/allow/allowing/ ?

> In addition to this, commit 'c05dfce0b89e ("rpmsg: glink: Wait for
> intent, not just request ack")' changed the logic such that the code
> always wait for an intent request response and an intent. This works out
> in most cases, but in the event that a intent request is rejected and no

an intent

> further intent arrives (e.g. client asks for a too big intent), the code
> will stall for 10 seconds and then return -ETIMEDOUT; instead of a more
> suitable error.
> 
> This change also resulted in intent requests racing with the shutdown of
> the remote would be exposed to this same problem, unless some intent
> happens to arrive. A patch for this was developed and posted by Sarannya
> S [1], and has been incorporated here.
> 
> To summarize, the intent request can end in 4 ways:
> - Timeout, no response arrived => return -ETIMEDOUT
> - Abort TX, the edge is going away => return -ECANCELLED
> - Intent request was rejected => return -EAGAIN
> - Intent request was accepted, and an intent arrived => return 0
> 
> This patch was developed with input from Sarannya S, Deepak Kumar Singh,
> and Chris Lew.
> 
> [1] 
> https://lore.kernel.org/all/20240925072328.1163183-1-quic_dee...@quicinc.com/
> 
> Fixes: c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Bjorn Andersson 

Nit picks aside, this was all nice and clear.

Tested-by: Johan Hovold

Re: [PATCH V13 03/14] KVM: x86: Fix Intel PT Host/Guest mode when host tracing also

2024-10-22 Thread Adrian Hunter

On 14/10/24 21:25, Sean Christopherson wrote:
> On Mon, Oct 14, 2024, Adrian Hunter wrote:
>> Ensure Intel PT tracing is disabled before VM-Entry in Intel PT Host/Guest
>> mode.
>>
>> Intel PT has 2 modes for tracing virtual machines. The default is System
>> mode whereby host and guest output to the host trace buffer. The other is
>> Host/Guest mode whereby host and guest output to their own buffers.
>> Host/Guest mode is selected by kvm_intel module parameter pt_mode=1.
>>
>> In Host/Guest mode, the following rule must be followed:
> 
> This is misleading and arguably wrong.  The following "rule" must _always_ be
> followed.  If I weren't intimately familiar with the distinctive style of the
> SDM's consistency checks, odds are good I wouldn't have any idea where this 
> rule
> came from.
> 
>>  If the logical processor is operating with Intel PT enabled
>>  (if IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the
>>  "load IA32_RTIT_CTL" VM-entry control must be 0.
> 
>> However, "load IA32_RTIT_CTL" VM-entry control is always 1 in Host/Guest
>> mode, so IA32_RTIT_CTL.TraceEn must always be 0 at VM entry, irrespective
>> of whether guest IA32_RTIT_CTL.TraceEn is 1.
> 
> Explicitly state what the bad behavior is, _somewhere_.  Similar to the 
> previous
> patch, their is a lot of information to wade through just to understand that 
> this
> results in a failed VM-Entry.

Sorry for the slow reply, been away.  Yes, the commit message fails to call
out that the issue is failed VM-Entry.

> 
> Furthermore, nothing in here spells out exactly under what conditions this bug
> surfaces, which makes it unnecessarily difficult to understand what can go 
> wrong,
> and when.
> 
>> Fix by stopping host Intel PT tracing always at VM entry in Host/Guest
> 
> It's not _at_ VM-Entry.  The language matters, because this makes it sound 
> like
> PT tracing is being disabled as part of VM-Entry.
> 
>> mode.
>>
>> That also fixes the issue whereby the Intel PT NMI handler would
>> set IA32_RTIT_CTL.TraceEn back to 1 after KVM has just set it to 0.
> 
> In theory, this should be an entirely separate fix.  In practice, simply 
> clearing
> MSR_IA32_RTIT_CTL before VM-Enter if tracing is enabled doesn't help much, 
> i.e.
> re-enabling in the NMI handler isn't all that rare.

The commit message also fails to make clear that there are 2 ways that
VM-Entry can fail.

1. Not setting MSR_IA32_RTIT_CTL to zero _always_ in host/guest mode.
This is the common case.  Current code sets MSR_IA32_RTIT_CTL to zero
only if the guest has TraceEn, so if the guest is not tracing but the
host is tracing, then VM-Entry fails.

2. More rarely, the PT NMI might set TraceEn again before VM-Entry.
It isn't that easy to hit, but the selftest in patch 3 usually
manages it by using a small buffer size and trying many times gradually
increasing the amount of trace data.

>  That absolutely needs to
> be called out in the changelog.
> 
>> Fixes: 2ef444f1600b ("KVM: x86: Add Intel PT context switch for each vcpu")
>> Cc: sta...@vger.kernel.org
> 
> This is way, way too big for stable@.  Given that host/guest mode is disabled 
> by
> default and that no one has complained about this, I think it's safe to say 
> that
> unless we can provide a minimal patch, fixing this in LTS kernels isn't a 
> priority.
> 
> Alternatively, I'm tempted to simply drop support for host/guest mode.  It 
> clearly
> hasn't been well tested, and given the lack of bug reports, likely doesn't 
> have
> many, if any, users.  And I'm guessing the overhead needed to context switch 
> all
> the RTIT MSRs makes tracing in the guest relatively useless.

As a control flow trace, it is not affected by context switch overhead.
Intel PT timestamps are also not affected by that.

This patch reduces the MSR switching.

> 
> /me fiddles around
> 
> LOL, yeah, this needs to be burned with fire.  It's wildly broken.  So for 
> stable@,

It doesn't seem wildly broken.  Just the VMM passing invalid CPUID
and KVM not validating it.

> I'll post a patch to hide the module param if CONFIG_BROKEN=n (and will omit
> stable@ for the previous patch).
> 
> Going forward, if someone actually cares about virtualizing PT enough to want 
> to
> fix KVM's mess, then they can put in the effort to fix all the bugs, write all
> the tests, and in general clean up the implementation to meet KVM's current
> standards.  E.g. KVM usage of intel_pt_validate_cap() instead of KVM's guest 
> CPUID
> and capabilities infrastructure needs to go.

The problem below seems to be caused by not validating against the *host*
CPUID.  KVM's CPUID information seems to be invalid.

> 
> My vote is to queue the current code for removal, and revisit support after 
> the
> mediated PMU has landed.  Because I don't see any point in supporting Intel PT
> without a mediated PMU, as host/guest mode really only makes sense if the 
> entire
> PMU is being handed over to the g

Re: [PATCH] kselftest/arm64: Fail the overall fp-stress test if any test fails

2024-10-22 Thread Catalin Marinas

On Thu, 17 Oct 2024 18:43:31 +0100, Mark Brown wrote:
> Currently fp-stress does not report a top level test result if it runs to
> completion, it always exits with a return code 0. Use the ksft_finished()
> helper to ensure that the exit code for the top level program reports a
> failure if any of the individual tests has failed.
> 
> 

Applied to arm64 (for-next/kselftest), thanks!

[1/1] kselftest/arm64: Fail the overall fp-stress test if any test fails
  https://git.kernel.org/arm64/c/7a08cb9b4bb9

-- 
Catalin

Re: [PATCH v4 05/19] gendwarfksyms: Add a cache for processed DIEs

2024-10-22 Thread Petr Pavlu

On 10/8/24 20:38, Sami Tolvanen wrote:
> Basic types in DWARF repeat frequently and traversing the DIEs using
> libdw is relatively slow. Add a simple hashtable based cache for the
> processed DIEs.
> 
> Signed-off-by: Sami Tolvanen 
> Acked-by: Neal Gompa 

Looks ok to me, free free to add:

Reviewed-by: Petr Pavlu 

-- Petr

Re: [PATCH net-next v3 1/3] selftests: nic_link_layer: Add link layer selftest for NIC driver

2024-10-22 Thread Paolo Abeni

On 10/16/24 23:50, Mohan Prasad J wrote:
> Add selftest file for the link layer tests of a NIC driver.
> Test for auto-negotiation is added.
> Add LinkConfig class for changing link layer configs.
> Selftest makes use of ksft modules and ethtool.
> Include selftest file in the Makefile.
> 
> Signed-off-by: Mohan Prasad J 
> ---
>  .../testing/selftests/drivers/net/hw/Makefile |   1 +
>  .../drivers/net/hw/lib/py/__init__.py |   1 +
>  .../drivers/net/hw/lib/py/linkconfig.py   | 220 ++
>  .../drivers/net/hw/nic_link_layer.py  |  84 +++
>  4 files changed, 306 insertions(+)
>  create mode 100644 
> tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
>  create mode 100644 tools/testing/selftests/drivers/net/hw/nic_link_layer.py
> 
> diff --git a/tools/testing/selftests/drivers/net/hw/Makefile 
> b/tools/testing/selftests/drivers/net/hw/Makefile
> index c9f2f48fc..0dac40c4e 100644
> --- a/tools/testing/selftests/drivers/net/hw/Makefile
> +++ b/tools/testing/selftests/drivers/net/hw/Makefile
> @@ -10,6 +10,7 @@ TEST_PROGS = \
>   hw_stats_l3.sh \
>   hw_stats_l3_gre.sh \
>   loopback.sh \
> + nic_link_layer.py \
>   pp_alloc_fail.py \
>   rss_ctx.py \
>   #
> diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py 
> b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
> index b58288578..399789a96 100644
> --- a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
> +++ b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
> @@ -9,6 +9,7 @@ try:
>  sys.path.append(KSFT_DIR.as_posix())
>  from net.lib.py import *
>  from drivers.net.lib.py import *
> +from .linkconfig import LinkConfig
>  except ModuleNotFoundError as e:
>  ksft_pr("Failed importing `net` library from kernel sources")
>  ksft_pr(str(e))
> diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py 
> b/tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
> new file mode 100644
> index 0..86cbf10a3
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
> @@ -0,0 +1,220 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +from lib.py import cmd
> +from lib.py import ethtool
> +from lib.py import ksft_pr, ksft_eq
> +import re
> +import time
> +import json
> +
> +#The LinkConfig class is implemented to handle the link layer configurations.
> +#Required minimum ethtool version is 6.10
> +#The ethtool and ip would require authentication to make changes, so better
> +# to check them for sudo privileges for interruption free testing.
> +
> +class LinkConfig:
> +"""Class for handling the link layer configurations"""
> +def __init__(self, cfg):
> +self.cfg = cfg
> +self.partner_netif = self.get_partner_netif_name()
> +
> +"""Get the initial link configuration of local interface"""
> +self.common_link_modes = self.get_common_link_modes()
> +
> +def get_partner_netif_name(self):

You should use type hints for both the arguments and the return type.

> +partner_netif = None
> +try:
> +"""Get partner interface name"""
> +partner_cmd = f"ip -o -f inet addr show | grep 
> '{self.cfg.remote_addr}' " + "| awk '{print $2}'"

It's better if you use json output even here

[...]
> +def reset_interface(self, local=True, remote=True):
> +ksft_pr("Resetting interfaces in local and remote")
> +if remote:
> +if self.partner_netif is not None:
> +ifname = self.partner_netif
> +link_up_cmd = f"sudo ip link set up {ifname}"
> +link_down_cmd = f"sudo ip link set down {ifname}"
> +reset_cmd = f"{link_down_cmd} && sleep 5 && {link_up_cmd}"
> +try:
> +cmd(f"{reset_cmd}", host=self.cfg.remote)
> +except Exception as e:
> +ksft_pr("Check sudo permission for ip command")
> +ksft_pr(f"Unexpected error occurred: {e}")

Please, don't use sudo, just run the command directly. The selftests
should be executed only by privileged users.

[...]
> +def check_autoneg_supported(self, remote=False):
> +if remote==False:
> +local_autoneg = 
> self.get_ethtool_field("supports-auto-negotiation")
> +if local_autoneg is None:
> +ksft_pr(f"Unable to fetch auto-negotiation status for 
> interface {self.cfg.ifname}")
> +"""Return autoneg status of the local interface"""
> +status = True if local_autoneg == True else False
> +return status

Out of sheer ignorance, in't

return local_autoneg

enough?


> +else:
> +"""Check remote auto-negotiation support status"""
> +partner_autoneg = False
> +if self.partner_netif is not None:
> +partner_autoneg = 
> self.get_ethtool_field("supports-auto-negoti

Re: [PATCH v8 3/3] irqchip/loongson-eiointc: Add extioi virt extension support

2024-10-22 Thread maobibo




Got it, thanks.

Regards
Bibo Mao

On 2024/10/22 下午5:45, Huacai Chen wrote:

On Tue, Oct 22, 2024 at 5:17 PM maobibo  wrote:


Hi Huacai/Thomas,

Sorry for the ping message :(

Can this patch be applied int next RC version?

Queued for the next release.

Huacai



Regards
Bibo Mao

On 2024/10/2 下午9:42, Thomas Gleixner wrote:

On Wed, Sep 11 2024 at 17:11, Huacai Chen wrote:

Hi, Thomas,

On Fri, Aug 30, 2024 at 5:32 PM Bibo Mao  wrote:


Interrupts can be routed to maximal four virtual CPUs with one HW
EIOINTC interrupt controller model, since interrupt routing is encoded with
CPU bitmap and EIOINTC node combined method. Here add the EIOINTC virt
extension support so that interrupts can be routed to 256 vCPUs on
hypervisor mode. CPU bitmap is replaced with normal encoding and EIOINTC
node type is removed, so there are 8 bits for cpu selection, at most 256
vCPUs are supported for interrupt routing.

This patch is OK for me now, but seems it depends on the first two,
and the first two will get upstream via loongarch-kvm tree. So is that
possible to also apply this one to loongarch-kvm with your Acked-by?


Go ahead.

Reviewed-by: Thomas Gleixner

Re: [PATCH net-next v3 3/3] selftests: nic_performance: Add selftest for performance of NIC driver

2024-10-22 Thread Paolo Abeni

On 10/16/24 23:50, Mohan Prasad J wrote:
> Add selftest case to check the send and receive throughput.
> Supported link modes between local NIC driver and partner
> are varied. Then send and receive throughput is captured
> and verified. Test uses iperf3 tool.
> 
> Signed-off-by: Mohan Prasad J 
> ---
>  .../testing/selftests/drivers/net/hw/Makefile |   1 +
>  .../drivers/net/hw/nic_performance.py | 121 ++
>  2 files changed, 122 insertions(+)
>  create mode 100644 tools/testing/selftests/drivers/net/hw/nic_performance.py
> 
> diff --git a/tools/testing/selftests/drivers/net/hw/Makefile 
> b/tools/testing/selftests/drivers/net/hw/Makefile
> index 0dac40c4e..289512092 100644
> --- a/tools/testing/selftests/drivers/net/hw/Makefile
> +++ b/tools/testing/selftests/drivers/net/hw/Makefile
> @@ -11,6 +11,7 @@ TEST_PROGS = \
>   hw_stats_l3_gre.sh \
>   loopback.sh \
>   nic_link_layer.py \
> + nic_performance.py \
>   pp_alloc_fail.py \
>   rss_ctx.py \
>   #
> diff --git a/tools/testing/selftests/drivers/net/hw/nic_performance.py 
> b/tools/testing/selftests/drivers/net/hw/nic_performance.py
> new file mode 100644
> index 0..152c62511
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/net/hw/nic_performance.py
> @@ -0,0 +1,121 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0
> +
> +#Introduction:
> +#This file has basic performance test for generic NIC drivers.
> +#The test comprises of throughput check for TCP and UDP streams.
> +#
> +#Setup:
> +#Connect the DUT PC with NIC card to partner pc back via ethernet medium of 
> your choice(RJ45, T1)
> +#
> +#DUT PC  Partner PC
> +#┌───┐ 
> ┌──┐
> +#│   │ │ 
>  │
> +#│   │ │ 
>  │
> +#│   ┌───┐ │ 
>  │
> +#│   │DUT NIC│ Eth │ 
>  │
> +#│   │Interface ─┼─┼─any eth Interface   
>  │
> +#│   └───┘ │ 
>  │
> +#│   │ │ 
>  │
> +#│   │ │ 
>  │
> +#└───┘ 
> └──┘
> +#
> +#Configurations:
> +#To prevent interruptions, Add ethtool, ip to the sudoers list in remote PC 
> and get the ssh key from remote.
> +#Required minimum ethtool version is 6.10
> +#Change the below configuration based on your hw needs.
> +# """Default values"""
> +time_delay = 8 #time taken to wait for transitions to happen, in seconds.
> +test_duration = 10  #performance test duration for the throughput check, in 
> seconds.
> +send_throughput_threshold = 80 #percentage of send throughput required to 
> pass the check
> +receive_throughput_threshold = 50 #percentage of receive throughput required 
> to pass the check

Please allow the user to override this parameters with env variable
and/or with the command line.

> +
> +import time
> +import json
> +from lib.py import ksft_run, ksft_exit, ksft_pr, ksft_true
> +from lib.py import KsftFailEx, KsftSkipEx
> +from lib.py import NetDrvEpEnv
> +from lib.py import cmd
> +from lib.py import LinkConfig
> +
> +def verify_throughput(cfg, link_config) -> None:
> +protocols = ["TCP", "UDP"]
> +common_link_modes = link_config.common_link_modes
> +speeds, duplex_modes = 
> link_config.get_speed_duplex_values(common_link_modes)
> +"""Test duration in seconds"""
> +duration = test_duration
> +target_ip = cfg.remote_addr
> +
> +for protocol in protocols:
> +ksft_pr(f"{protocol} test")
> +test_type = "-u" if protocol == "UDP" else ""
> +send_throughput = []
> +receive_throughput = []
> +for idx in range(0, len(speeds)):
> +bit_rate = f"-b {speeds[idx]}M" if protocol == "UDP" else ""

Always use '-b 0'. Will work with both TCP and UDP and is usually more
efficient than forcing a specific speed.

> +if link_config.set_speed_and_duplex(speeds[idx], 
> duplex_modes[idx]) == False:
> +raise KsftFailEx(f"Not able to set speed and duplex 
> parameters for {cfg.ifname}")
> +time.sleep(time_delay)
> +if link_config.verify_link_up() == False:
> +raise KsftSkipEx(f"Link state of interface {cfg.ifname} is 
> DOWN")
> +send_command=f"iperf3 {test_type} -c {target_ip} {bit_rate} -t 
> {duration} --json"
> +receive_command=f"iperf3 {test_type} -c {target_ip} {bit_rate} 
> -t {duration} --reverse --json"
> +send_result = cmd(send_command)
> +receive

Re: [PATCH v8 3/3] irqchip/loongson-eiointc: Add extioi virt extension support

2024-10-22 Thread Huacai Chen

On Tue, Oct 22, 2024 at 5:17 PM maobibo  wrote:
>
> Hi Huacai/Thomas,
>
> Sorry for the ping message :(
>
> Can this patch be applied int next RC version?
Queued for the next release.

Huacai

>
> Regards
> Bibo Mao
>
> On 2024/10/2 下午9:42, Thomas Gleixner wrote:
> > On Wed, Sep 11 2024 at 17:11, Huacai Chen wrote:
> >> Hi, Thomas,
> >>
> >> On Fri, Aug 30, 2024 at 5:32 PM Bibo Mao  wrote:
> >>>
> >>> Interrupts can be routed to maximal four virtual CPUs with one HW
> >>> EIOINTC interrupt controller model, since interrupt routing is encoded 
> >>> with
> >>> CPU bitmap and EIOINTC node combined method. Here add the EIOINTC virt
> >>> extension support so that interrupts can be routed to 256 vCPUs on
> >>> hypervisor mode. CPU bitmap is replaced with normal encoding and EIOINTC
> >>> node type is removed, so there are 8 bits for cpu selection, at most 256
> >>> vCPUs are supported for interrupt routing.
> >> This patch is OK for me now, but seems it depends on the first two,
> >> and the first two will get upstream via loongarch-kvm tree. So is that
> >> possible to also apply this one to loongarch-kvm with your Acked-by?
> >
> > Go ahead.
> >
> > Reviewed-by: Thomas Gleixner 
> >
>

[PATCH] rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu

2024-10-22 Thread Uladzislau Rezki (Sony)

KCSAN reports a data race when access the krcp->monitor_work.timer.expires
variable in the schedule_delayed_monitor_work() function:


BUG: KCSAN: data-race in __mod_timer / kvfree_call_rcu

read to 0x888237d1cce8 of 8 bytes by task 10149 on cpu 1:
 schedule_delayed_monitor_work kernel/rcu/tree.c:3520 [inline]
 kvfree_call_rcu+0x3b8/0x510 kernel/rcu/tree.c:3839
 trie_update_elem+0x47c/0x620 kernel/bpf/lpm_trie.c:441
 bpf_map_update_value+0x324/0x350 kernel/bpf/syscall.c:203
 generic_map_update_batch+0x401/0x520 kernel/bpf/syscall.c:1849
 bpf_map_do_batch+0x28c/0x3f0 kernel/bpf/syscall.c:5143
 __sys_bpf+0x2e5/0x7a0
 __do_sys_bpf kernel/bpf/syscall.c:5741 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5739 [inline]
 __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5739
 x64_sys_call+0x2625/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:322
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

write to 0x888237d1cce8 of 8 bytes by task 56 on cpu 0:
 __mod_timer+0x578/0x7f0 kernel/time/timer.c:1173
 add_timer_global+0x51/0x70 kernel/time/timer.c:1330
 __queue_delayed_work+0x127/0x1a0 kernel/workqueue.c:2523
 queue_delayed_work_on+0xdf/0x190 kernel/workqueue.c:2552
 queue_delayed_work include/linux/workqueue.h:677 [inline]
 schedule_delayed_monitor_work kernel/rcu/tree.c:3525 [inline]
 kfree_rcu_monitor+0x5e8/0x660 kernel/rcu/tree.c:3643
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3310
 worker_thread+0x51d/0x6f0 kernel/workqueue.c:3391
 kthread+0x1d1/0x210 kernel/kthread.c:389
 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 56 Comm: kworker/u8:4 Not tainted 
6.12.0-rc2-syzkaller-00050-g5b7c893ed5ed #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
09/13/2024
Workqueue: events_unbound kfree_rcu_monitor


kfree_rcu_monitor() rearms the work if a "krcp" has to be still
offloaded and this is done without holding krcp->lock, whereas
the kvfree_call_rcu() holds it.

Fix it by acquiring the "krcp->lock" for kfree_rcu_monitor() so
both functions do not race anymore.

Reported-by: syzbot+061d370693bdd99f9...@syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/ZxZ68KmHDQYU0yfD@pc636/T/
Fixes: 8fc5494ad5fa ("rcu/kvfree: Move need_offload_krc() out of krcp->lock")
Signed-off-by: Uladzislau Rezki (Sony) 
---
 kernel/rcu/tree.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b1f883fcd918..3e486ccaa4ca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3511,7 +3511,7 @@ static int krc_count(struct kfree_rcu_cpu *krcp)
 }
 
 static void
-schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+__schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
 {
long delay, delay_left;
 
@@ -3525,6 +3525,16 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
 }
 
+static void
+schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(&krcp->lock, flags);
+   __schedule_delayed_monitor_work(krcp);
+   raw_spin_unlock_irqrestore(&krcp->lock, flags);
+}
+
 static void
 kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp)
 {
@@ -3836,7 +3846,7 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
 
// Set timer to drain after KFREE_DRAIN_JIFFIES.
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
-   schedule_delayed_monitor_work(krcp);
+   __schedule_delayed_monitor_work(krcp);
 
 unlock_return:
krc_this_cpu_unlock(krcp, flags);
-- 
2.39.5

Re: [PATCH 1/1] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.

2024-10-22 Thread Frederic Weisbecker

Le Fri, Oct 04, 2024 at 12:17:04PM +0200, Sebastian Andrzej Siewior a écrit :
> A timer/ hrtimer softirq is raised in-IRQ context. With threaded
> interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd
> for the processing of the softirq.

It took me some time to understand the actual problem (yeah I know...)

Can this be rephrased as: "Timer and hrtimer softirq vectors are special
in that they are always raised in-IRQ context whereas other vectors are
more likely to be raised from threaded interrupts or any regular tasks
when threaded interrupts or PREEMPT_RT are enabled. This leads to
waking ksoftirqd for the processing of the softirqs whenever timer
vectors are involved.

>
> Once the ksoftirqd is marked as pending (or is running) it will collect
> all raised softirqs. This in turn means that a softirq which would have
> been processed at the end of the threaded interrupt, which runs at an
> elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
> priority and competes with every regular task for CPU resources.

But for ksoftirqd to collect other non-timers softirqs, those vectors must
have been raised before from a threaded interrupt, right? So why those
vectors didn't get a chance to execute at the end of that threaded interrupt?

OTOH one problem I can imagine is a threaded interrupt preempting ksoftirqd
which must wait for ksoftirqd to complete due to the local_bh_disable()
in the beginning of irq_forced_thread_fn(). But then isn't there some
PI involved on the local lock?

Sorry I'm probably missing something...

>
> This introduces long delays on heavy loaded systems and is not desired
> especially if the system is not overloaded by the softirqs.
> 
> Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
> timers thread and let it run at the lowest SCHED_FIFO priority.
> Wake-ups for RT tasks happen from hardirq context so only timer_list timers
> and hrtimers for "regular" tasks are processed here.

That last sentence confuses me. How are timers for non regular task processed
elsewhere? Ah you mean RT tasks rely on hard hrtimers?

> The higher priority
> ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
> 
> Using a dedicated variable to store the pending softirq bits values
> ensure that the timer are not accidentally picked up by ksoftirqd and
> other threaded interrupts.
> It shouldn't be picked up by ksoftirqd since it runs at lower priority.
> However if ksoftirqd is already running while a timer fires, then
> ksoftird will be PI-boosted due to the BH-lock to ktimer's priority.
> Ideally we try to avoid having ksoftirqd running.
> 
> The timer thread can pick up pending softirqs from ksoftirqd but only
> if the softirq load is high. It is not be desired that the picked up
> softirqs are processed at SCHED_FIFO priority under high softirq load
> but this can already happen by a PI-boost by a force-threaded interrupt.
> 
> [ frede...@kernel.org: rcutorture.c fixes, storm fix by introduction of
>   local_pending_timers() for tick_nohz_next_event() ]
> 
> [ junxiao.ch...@intel.com: Ensure ktimersd gets woken up even if a
>   softirq is currently served. ]
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
>  include/linux/interrupt.h | 29 ++
>  kernel/rcu/rcutorture.c   |  6 +++
>  kernel/softirq.c  | 82 ++-
>  kernel/time/hrtimer.c |  4 +-
>  kernel/time/tick-sched.c  |  2 +-
>  kernel/time/timer.c   |  2 +-
>  6 files changed, 120 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 457151f9f263d..4a4f367cd6864 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -616,6 +616,35 @@ extern void __raise_softirq_irqoff(unsigned int nr);
>  extern void raise_softirq_irqoff(unsigned int nr);
>  extern void raise_softirq(unsigned int nr);
>  
> +#ifdef CONFIG_PREEMPT_RT

This needs a comment section to explain why a dedicated
timers processing is needed.

> +DECLARE_PER_CPU(struct task_struct *, timersd);
> +DECLARE_PER_CPU(unsigned long, pending_timer_softirq);
> +
> +extern void raise_timer_softirq(void);
> +extern void raise_hrtimer_softirq(void);
> +
> +static inline unsigned int local_pending_timers(void)

Let's align with local_softirq_pending() naming and rather
have local_timers_pending() ?

> +{
> + return __this_cpu_read(pending_timer_softirq);
> +}
> +
> +#ifdef CONFIG_PREEMPT_RT
> +static void timersd_setup(unsigned int cpu)
> +{

That also needs a comment.

> + sched_set_fifo_low(current);
> +}
> +
> +static int timersd_should_run(unsigned int cpu)
> +{
> + return local_pending_timers();
> +}
> +
> +static void run_timersd(unsigned int cpu)
> +{
> + unsigned int timer_si;
> +
> + ksoftirqd_run_begin();
> +
> + timer_si = local_pending_timers();
> + __this_cpu_write(pending_timer_softirq, 0);
> + or_softirq_pending(timer_si);
> +
> + _

Re: [PATCH v4 02/11] iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct

2024-10-22 Thread Baolu Lu


On 2024/10/22 21:15, Jason Gunthorpe wrote:

On Tue, Oct 22, 2024 at 04:59:07PM +0800, Baolu Lu wrote:


Is it feasible to make vIOMMU object more generic, rather than strictly
tying it to nested translation? For example, a normal paging domain that
translates gPAs to hPAs could also have a vIOMMU object associated with
it.

While we can only support vIOMMU object allocation uAPI for S2 paging
domains in the context of this series, we could consider leaving the
option open to associate a vIOMMU object with other normal paging
domains that are not a nested parent?

Why? The nested parent flavour of the domain is basically free to
create, what reason would be to not do that?


Above addressed my question. The software using vIOMMU should allocate a
domain of nested parent type.



If the HW doesn't support it, then does the HW really need/support a
VIOMMU?

I suppose it could be made up to the driver, but for now I think we
should leave it as is in the core code requiring it until we have a
reason to relax it.


Yes. In such cases, the iommu driver could always allow nested parent
domain allocation, but fails to allocate a nested domain if the hardware
is not capable of nesting translation.

Thanks,
baolu

[PATCH v7 1/3] modules: Support extended MODVERSIONS info

2024-10-22 Thread Matthew Maurer

Adds a new format for MODVERSIONS which stores each field in a separate
ELF section. This initially adds support for variable length names, but
could later be used to add additional fields to MODVERSIONS in a
backwards compatible way if needed. Any new fields will be ignored by
old user tooling, unlike the current format where user tooling cannot
tolerate adjustments to the format (for example making the name field
longer).

Since PPC munges its version records to strip leading dots, we reproduce
the munging for the new format. Other architectures do not appear to
have architecture-specific usage of this information.

Signed-off-by: Matthew Maurer 
---
 arch/powerpc/kernel/module_64.c | 24 ++-
 kernel/module/internal.h| 11 +
 kernel/module/main.c| 92 +
 kernel/module/version.c | 45 
 4 files changed, 162 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 
e9bab599d0c2745e4d2b5cae04f2c56395c24654..02ada0b057cef6b2f29fa7519a5d52acac740ee5
 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -355,6 +355,24 @@ static void dedotify_versions(struct modversion_info *vers,
}
 }
 
+/* Same as normal versions, remove a leading dot if present. */
+static void dedotify_ext_version_names(char *str_seq, unsigned long size)
+{
+   unsigned long out = 0;
+   unsigned long in;
+   char last = '\0';
+
+   for (in = 0; in < size; in++) {
+   /* Skip one leading dot */
+   if (last == '\0' && str_seq[in] == '.')
+   in++;
+   last = str_seq[in];
+   str_seq[out++] = last;
+   }
+   /* Zero the trailing portion of the names table for robustness */
+   memset(&str_seq[out], 0, size - out);
+}
+
 /*
  * Undefined symbols which refer to .funcname, hack to funcname. Make .TOC.
  * seem to be defined (value set later).
@@ -424,10 +442,12 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
me->arch.toc_section = i;
if (sechdrs[i].sh_addralign < 8)
sechdrs[i].sh_addralign = 8;
-   }
-   else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
+   } else if (strcmp(secstrings + sechdrs[i].sh_name, 
"__versions") == 0)
dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
  sechdrs[i].sh_size);
+   else if (strcmp(secstrings + sechdrs[i].sh_name, 
"__version_ext_names") == 0)
+   dedotify_ext_version_names((void *)hdr + 
sechdrs[i].sh_offset,
+  sechdrs[i].sh_size);
 
if (sechdrs[i].sh_type == SHT_SYMTAB)
dedotify((void *)hdr + sechdrs[i].sh_offset,
diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index 
daef2be8390222c0e2f168baa8d35ad531b9..59959c21b205bf91c0073260885743098c4022cf
 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -86,6 +86,8 @@ struct load_info {
unsigned int vers;
unsigned int info;
unsigned int pcpu;
+   unsigned int vers_ext_crc;
+   unsigned int vers_ext_name;
} index;
 };
 
@@ -389,6 +391,15 @@ void module_layout(struct module *mod, struct 
modversion_info *ver, struct kerne
   struct kernel_symbol *ks, struct tracepoint * const *tp);
 int check_modstruct_version(const struct load_info *info, struct module *mod);
 int same_magic(const char *amagic, const char *bmagic, bool has_crcs);
+struct modversion_info_ext {
+   size_t remaining;
+   const s32 *crc;
+   const char *name;
+};
+void modversion_ext_start(const struct load_info *info, struct 
modversion_info_ext *ver);
+void modversion_ext_advance(struct modversion_info_ext *ver);
+#define for_each_modversion_info_ext(ver, info) \
+   for (modversion_ext_start(info, &ver); ver.remaining > 0; 
modversion_ext_advance(&ver))
 #else /* !CONFIG_MODVERSIONS */
 static inline int check_version(const struct load_info *info,
const char *symname,
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 
b40b632f00a65e66ed73c4b386ef1f323a5b790c..9a9feca344f8bb06408d350e13f759bb909962cd
 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2039,6 +2039,82 @@ static int elf_validity_cache_index_str(struct load_info 
*info)
return 0;
 }
 
+/**
+ * elf_validity_cache_index_versions() - Validate and cache version indices
+ * @info:  Load info to cache version indices in.
+ * Must have &load_info->sechdrs and &load_info->secstrings populated.
+ * @flags: Load flags, relevant to suppress version loading, see
+ * uapi/linux/module.h
+ *
+ * If we're i

[PATCH v7 2/3] modpost: Produce extended MODVERSIONS information

2024-10-22 Thread Matthew Maurer

Generate both the existing modversions format and the new extended one
when running modpost. Presence of this metadata in the final .ko is
guarded by CONFIG_EXTENDED_MODVERSIONS.

We no longer generate an error on long symbols in modpost if
CONFIG_EXTENDED_MODVERSIONS is set, as they can now be appropriately
encoded in the extended section. These symbols will be skipped in the
previous encoding. An error will still be generated if
CONFIG_EXTENDED_MODVERSIONS is not set.

Signed-off-by: Matthew Maurer 
---
 kernel/module/Kconfig| 10 
 scripts/Makefile.modpost |  1 +
 scripts/mod/modpost.c| 65 +---
 3 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
index 
e6b2427e5c190aacf7b9c5c1bb57fca39d311564..a31c617cd67d3d66b24d2fba34cbd5cc9c53ab78
 100644
--- a/kernel/module/Kconfig
+++ b/kernel/module/Kconfig
@@ -208,6 +208,16 @@ config ASM_MODVERSIONS
  assembly. This can be enabled only when the target architecture
  supports it.
 
+config EXTENDED_MODVERSIONS
+   bool "Extended Module Versioning Support"
+   depends on MODVERSIONS
+   help
+ This enables extended MODVERSIONs support, allowing long symbol
+ names to be versioned.
+
+ The most likely reason you would enable this is to enable Rust
+ support. If unsure, say N.
+
 config MODULE_SRCVERSION_ALL
bool "Source checksum for all modules"
help
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 
44936ebad161e914cbcc40ac74a2d651596d7b07..765da63d592be56fe93c0f4a35f1bfbcb924541a
 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -43,6 +43,7 @@ MODPOST = scripts/mod/modpost
 modpost-args = 
\
$(if $(CONFIG_MODULES),-M)  
\
$(if $(CONFIG_MODVERSIONS),-m)  
\
+   $(if $(CONFIG_EXTENDED_MODVERSIONS),-x) 
\
$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a)
\
$(if $(CONFIG_SECTION_MISMATCH_WARN_ONLY),,-E)  
\
$(if $(KBUILD_MODPOST_WARN),-w) 
\
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 
107393a8c48a5993dbe456702fec0652a967ee86..bd38f33fd41fbd98bce34f8924b2fb0ac04297ee
 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -32,6 +32,8 @@ static bool module_enabled;
 static bool modversions;
 /* Is CONFIG_MODULE_SRCVERSION_ALL set? */
 static bool all_versions;
+/* Is CONFIG_EXTENDED_MODVERSIONS set? */
+static bool extended_modversions;
 /* If we are modposting external module set to 1 */
 static bool external_module;
 /* Only warn about unresolved symbols */
@@ -1817,6 +1819,52 @@ static void add_exported_symbols(struct buffer *buf, 
struct module *mod)
}
 }
 
+/**
+ * Record CRCs for unresolved symbols, supporting long names
+ */
+static void add_extended_versions(struct buffer *b, struct module *mod)
+{
+   struct symbol *s;
+
+   if (!extended_modversions)
+   return;
+
+   buf_printf(b, "\n");
+   buf_printf(b, "static const s32 version_ext_crcs[]\n");
+   buf_printf(b, "__used __section(\"__version_ext_crcs\") = {\n");
+   list_for_each_entry(s, &mod->unresolved_symbols, list) {
+   if (!s->module)
+   continue;
+   if (!s->crc_valid) {
+   /*
+* We already warned on this when producing the legacy
+* modversions table.
+*/
+   continue;
+   }
+   buf_printf(b, "\t%#8x,\n", s->crc);
+   }
+   buf_printf(b, "};\n");
+
+   buf_printf(b, "static const char version_ext_names[]\n");
+   buf_printf(b, "__used __section(\"__version_ext_names\") =\n");
+   list_for_each_entry(s, &mod->unresolved_symbols, list) {
+   if (!s->module)
+   continue;
+   if (!s->crc_valid) {
+   /*
+* We already warned on this when producing the legacy
+* modversions table.
+* We need to skip its name too, as the indexes in
+* both tables need to align.
+*/
+   continue;
+   }
+   buf_printf(b, "\t\"%s\\0\"\n", s->name);
+   }
+   buf_printf(b, ";\n");
+}
+
 /**
  * Record CRCs for unresolved symbols
  **/
@@ -1840,9 +1888,14 @@ static void add_versions(struct buffer *b, struct module 
*mod)
continue;
}
if (strlen(s->name) >= MODULE_NA

[PATCH v7 3/3] rust: Use gendwarfksyms + extended modversions for CONFIG_MODVERSIONS

2024-10-22 Thread Matthew Maurer

From: Sami Tolvanen 

Previously, two things stopped Rust from using MODVERSIONS:
1. Rust symbols are occasionally too long to be represented in the
   original versions table
2. Rust types cannot be properly hashed by the existing genksyms
   approach because:
* Looking up type definitions in Rust is more complex than C
* Type layout is potentially dependent on the compiler in Rust,
  not just the source type declaration.

CONFIG_EXTENDED_MODVERSIONS addresses the first point, and
CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow
it to do so by selecting both features.

Signed-off-by: Sami Tolvanen 
Co-developed-by: Matthew Maurer 
Signed-off-by: Matthew Maurer 
---
 init/Kconfig  |  3 ++-
 rust/Makefile | 32 ++--
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 
530a382ee0feb391b4717abdba3672e584a462d0..f5cce579f29b2ed89e97f8075a3bf70e32e71ad0
 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1942,7 +1942,8 @@ config RUST
bool "Rust support"
depends on HAVE_RUST
depends on RUST_IS_AVAILABLE
-   depends on !MODVERSIONS
+   select EXTENDED_MODVERSIONS if MODVERSIONS
+   depends on (GENDWARFKSYMS || !MODVERSIONS)
depends on !GCC_PLUGIN_RANDSTRUCT
depends on !RANDSTRUCT
depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
diff --git a/rust/Makefile b/rust/Makefile
index 
b5e0a73b78f3e58fc8fb8c9fab8fb5792406c6d8..b80bc4eb98202f774c493da89c2caee322cffc91
 100644
--- a/rust/Makefile
+++ b/rust/Makefile
@@ -303,10 +303,11 @@ $(obj)/bindings/bindings_helpers_generated.rs: private 
bindgen_target_extra = ;
 $(obj)/bindings/bindings_helpers_generated.rs: $(src)/helpers/helpers.c FORCE
$(call if_changed_dep,bindgen)
 
+rust_exports = $(NM) -p --defined-only $(1) | awk '$$2~/(T|R|D|B)/ && 
$$3!~/__cfi/ { printf $(2),$(3) }'
+
 quiet_cmd_exports = EXPORTS $@
   cmd_exports = \
-   $(NM) -p --defined-only $< \
-   | awk '$$2~/(T|R|D|B)/ && $$3!~/__cfi/ {printf 
"EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3}' > $@
+   $(call rust_exports,$<,"EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3) > $@
 
 $(obj)/exports_core_generated.h: $(obj)/core.o FORCE
$(call if_changed,exports)
@@ -378,11 +379,35 @@ ifneq ($(or $(CONFIG_ARM64),$(and 
$(CONFIG_RISCV),$(CONFIG_64BIT))),)
__ashlti3 __lshrti3
 endif
 
+ifdef CONFIG_MODVERSIONS
+cmd_gendwarfksyms = $(if $(skip_gendwarfksyms),, \
+   $(call rust_exports,$@,"%s\n",$$3) | \
+   scripts/gendwarfksyms/gendwarfksyms \
+   $(if $(KBUILD_SYMTYPES), --symtypes $(@:.o=.symtypes),) \
+   $@ >> $(dot-target).cmd)
+endif
+
 define rule_rustc_library
$(call cmd_and_fixdep,rustc_library)
$(call cmd,gen_objtooldep)
+   $(call cmd,gendwarfksyms)
 endef
 
+define rule_rust_cc_library
+   $(call if_changed_rule,cc_o_c)
+   $(call cmd,force_checksrc)
+   $(call cmd,gendwarfksyms)
+endef
+
+# helpers.o uses the same export mechanism as Rust libraries, so ensure symbol
+# versions are calculated for the helpers too.
+$(obj)/helpers/helpers.o: $(src)/helpers/helpers.c $(recordmcount_source) FORCE
+   +$(call if_changed_rule,rust_cc_library)
+
+# Disable symbol versioning for exports.o to avoid conflicts with the actual
+# symbol versions generated from Rust objects.
+$(obj)/exports.o: private skip_gendwarfksyms = 1
+
 $(obj)/core.o: private skip_clippy = 1
 $(obj)/core.o: private skip_flags = -Wunreachable_pub
 $(obj)/core.o: private rustc_objcopy = $(foreach 
sym,$(redirect-intrinsics),--redefine-sym $(sym)=__rust$(sym))
@@ -394,6 +419,7 @@ ifneq ($(or $(CONFIG_X86_64),$(CONFIG_X86_32)),)
 $(obj)/core.o: scripts/target.json
 endif
 
+$(obj)/compiler_builtins.o: private skip_gendwarfksyms = 1
 $(obj)/compiler_builtins.o: private rustc_objcopy = -w -W '__*'
 $(obj)/compiler_builtins.o: $(src)/compiler_builtins.rs $(obj)/core.o FORCE
+$(call if_changed_rule,rustc_library)
@@ -404,6 +430,7 @@ $(obj)/alloc.o: private rustc_target_flags = $(alloc-cfgs)
 $(obj)/alloc.o: $(RUST_LIB_SRC)/alloc/src/lib.rs $(obj)/compiler_builtins.o 
FORCE
+$(call if_changed_rule,rustc_library)
 
+$(obj)/build_error.o: private skip_gendwarfksyms = 1
 $(obj)/build_error.o: $(src)/build_error.rs $(obj)/compiler_builtins.o FORCE
+$(call if_changed_rule,rustc_library)
 
@@ -413,6 +440,7 @@ $(obj)/bindings.o: $(src)/bindings/lib.rs \
 $(obj)/bindings/bindings_helpers_generated.rs FORCE
+$(call if_changed_rule,rustc_library)
 
+$(obj)/uapi.o: private skip_gendwarfksyms = 1
 $(obj)/uapi.o: $(src)/uapi/lib.rs \
 $(obj)/compiler_builtins.o \
 $(obj)/uapi/uapi_generated.rs FORCE

-- 
2.47.0.105.g07ac214952-goog

Re: [PATCH v2 0/3] modules: few of alignment fixes

2024-10-22 Thread Masahiro Yamada

On Tue, Oct 22, 2024 at 5:07 AM Helge Deller  wrote:
>
> On 10/21/24 21:22, Luis Chamberlain wrote:
> > On Fri, Feb 02, 2024 at 10:23:21AM -0800, Luis Chamberlain wrote:
> >> On Sat, Feb 03, 2024 at 12:20:38AM +0900, Masahiro Yamada wrote:
> >>> On Fri, Feb 2, 2024 at 3:05 AM Luis Chamberlain  wrote:
> 
>  On Wed, Jan 31, 2024 at 02:11:44PM -0800, Luis Chamberlain wrote:
> > On Mon, Jan 29, 2024 at 11:26:39AM -0800, Luis Chamberlain wrote:
> >> Masahiro, if there no issues feel free to take this or I can take them 
> >> in
> >> too via the modules-next tree. Lemme know!
> >
> > I've queued this onto modules-testing to get winder testing [0]
> 
>  I've moved it to modules-next as I've found no issues.
> 
> Luis
> >>>
> >>>
> >>> I believe this patch series is wrong.
> >>>
> >>> I thought we agreed that the alignment must be added to
> >>> individual asm code, not to the linker script.
> >>>
> >>> I am surprised that you came back to this.
> >>
> >> I misseed the dialog on the old cover letter, sorry. I've yanked these 
> >> patches
> >> out. I'd expect a respin from Helge.
> >
> > Just goind down memory lane -- Helge, the work here pending was to move
> > this to the linker script. Were you going to follow up on this?
>
> Masahiro mentions above, that the alignment should be added
> to the individual asm code. This happened in the meantime for parisc, but
> I'm not sure if all platforms get this right.
> So in addition, I still believe that adding the alignment to the linker
> script too is another right thing to do.
>
> Helge

Yes, I believe the proper alignment should be specified in asm code.

-- 
Best Regards
Masahiro Yamada

Re: [PATCH 1/1] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.

2024-10-22 Thread Sebastian Andrzej Siewior

On 2024-10-23 00:27:34 [+0200], Frederic Weisbecker wrote:
> > Try again without the "ksoftirqd will collect it all" since this won't
> > happen since the revert I mentioned.
> 
> I still don't get it, this makes:
> 
> """
> Once the ksoftirqd is marked as pending (or is running), a softirq which
> would have been processed at the end of the threaded interrupt, which runs
> at an elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
> priority and competes with every regular task for CPU resources.
> """
> 
> ksoftirqd raised for timers still doesn't prevent a threaded IRQ from running
> softirqs, unless it preempts ksoftirqd and waits with PI. So is it what you're
> trying to solve?
> 
> Or is the problem that timer softirqs are executed with SCHED_NORMAL?

Exactly. It runs at SCHED_NORMAL and competes with everything else. It
can delay tasks wakes depending on system load.

> > Quick question: Do we want this in forced-threaded mode, too? The timer
> > (timer_list timer) and all HRTIMER_MODE_SOFT are handled in ksoftirqd.
> 
> It's hard to tell for me as I don't know the !RT usecases for forced-threaded 
> mode.
> "If in doubt say N" ;-)

Oki.

> > > > +void raise_timer_softirq(void)
> > > > +{
> > > > +   unsigned long flags;
> > > > +
> > > > +   local_irq_save(flags);
> > > > +   raise_ktimers_thread(TIMER_SOFTIRQ);
> > > > +   wake_timersd();
> > > 
> > > This is supposed to be called from hardirq only, right?
> > > Can't irq_exit_rcu() take care of it? Why is it different
> > > from HRTIMER_SOFTIRQ ?
> > 
> > Good question. This shouldn't be any different compared to the hrtimer
> > case. This is only raised in hardirq, so yes, the irq_save can go away
> > and the wake call, too.
> 
> Cool. You can add lockdep_assert_in_irq() within raise_ktimers_thread() for
> some well deserved relief :-)

If you want to, sure. I would add them to both raise functions.

> Thanks.

Sebastian

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Christoph Hellwig

On Tue, Oct 22, 2024 at 10:29:13AM -0700, Andrii Nakryiko wrote:
> >
> > Would this work?
> >
> > #define SRCU_INVALID_INDEX -1
> >
> 
> But why?

Becaue it very clearly documents what is going on.

>It's a nice property to have an int-returning API where valid
> values are only >= 0, so callers are free to use the entire negative
> range (not just -1) for whatever they need to store in case there is
> no srcu_idx value.

Well, if you have a concrete use case for that we can probably live
with it, but I'd rather have that use case extremely well documented,
as it will be very puzzling to the reader.

Re: [PATCH net-next v5 2/3] connector/cn_proc: Kunit tests for threads hash table

2024-10-22 Thread Anjali Kulkarni



> On Oct 22, 2024, at 4:50 PM, Stanislav Fomichev  wrote:
> 
> On 10/22, Anjali Kulkarni wrote:
>> 
>> 
>>> On Oct 17, 2024, at 5:55 PM, Stanislav Fomichev  
>>> wrote:
>>> 
>>> On 10/18, Anjali Kulkarni wrote:
 
 
> On Oct 17, 2024, at 5:13 PM, Stanislav Fomichev  
> wrote:
> 
> On 10/17, Anjali Kulkarni wrote:
>> Kunit tests to test hash table add, delete, duplicate add and delete.
>> Add following configs and compile kernel code:
>> 
>> CONFIG_CONNECTOR=y
>> CONFIG_PROC_EVENTS=y
>> CONFIG_NET=y
>> CONFIG_KUNIT=m
>> CONFIG_CN_HASH_KUNIT_TEST=m
>> 
>> To run kunit tests:
>> sudo modprobe cn_hash_test
>> 
>> Output of kunit tests and hash table contents are displayed in
>> /var/log/messages (at KERN_DEBUG level).
>> 
>> Signed-off-by: Anjali Kulkarni 
>> ---
>> drivers/connector/cn_hash.c   |  40 
>> drivers/connector/connector.c |  12 +++
>> include/linux/connector.h |   4 +
>> lib/Kconfig.debug |  17 
>> lib/Makefile  |   1 +
>> lib/cn_hash_test.c| 167 ++
>> lib/cn_hash_test.h|  10 ++
>> 7 files changed, 251 insertions(+)
>> create mode 100644 lib/cn_hash_test.c
>> create mode 100644 lib/cn_hash_test.h
>> 
>> diff --git a/drivers/connector/cn_hash.c b/drivers/connector/cn_hash.c
>> index a079e9bcea6d..40099b5908ac 100644
>> --- a/drivers/connector/cn_hash.c
>> +++ b/drivers/connector/cn_hash.c
>> @@ -170,6 +170,46 @@ int cn_hash_get_exval(struct cn_hash_dev *hdev, 
>> pid_t pid)
>> return -EINVAL;
>> }
>> 
>> +int cn_hash_display_hlist(struct cn_hash_dev *hdev, pid_t pid, int 
>> max_len,
>> + int *hkey, int *key_display)
>> +{
>> + struct uexit_pid_hnode *hnode;
>> + int key, count = 0;
>> +
>> + mutex_lock(&hdev->uexit_hash_lock);
>> + key = hash_min(pid, HASH_BITS(hdev->uexit_pid_htable));
>> + pr_debug("Bucket: %d\n", key);
>> +
>> + hlist_for_each_entry(hnode,
>> + &hdev->uexit_pid_htable[key],
>> + uexit_pid_hlist) {
>> + if (key_display[key] != 1) {
>> + if (hnode->uexit_pid_hlist.next == NULL)
>> + pr_debug("pid %d ", hnode->pid);
>> + else
>> + pr_debug("pid %d --> ", hnode->pid);
>> + }
>> + count++;
>> + }
>> +
>> + mutex_unlock(&hdev->uexit_hash_lock);
>> +
>> + if ((key_display[key] != 1) && !count)
>> + pr_debug("(empty)\n");
>> +
>> + pr_debug("\n");
>> +
>> + *hkey = key;
>> +
>> + if (count > max_len) {
>> + pr_err("%d entries in hlist for key %d, expected %d\n",
>> + count, key, max_len);
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> bool cn_hash_table_empty(struct cn_hash_dev *hdev)
>> {
>> bool is_empty;
>> diff --git a/drivers/connector/connector.c 
>> b/drivers/connector/connector.c
>> index c1c0dcec53c0..2be2fe1adc12 100644
>> --- a/drivers/connector/connector.c
>> +++ b/drivers/connector/connector.c
>> @@ -304,6 +304,18 @@ int cn_get_exval(pid_t pid)
>> }
>> EXPORT_SYMBOL_GPL(cn_get_exval);
>> 
>> +int cn_display_hlist(pid_t pid, int max_len, int *hkey, int 
>> *key_display)
>> +{
>> + struct cn_dev *dev = &cdev;
>> +
>> + if (!cn_already_initialized)
>> + return 0;
>> +
>> + return cn_hash_display_hlist(dev->hdev, pid, max_len,
>> + hkey, key_display);
>> +}
>> +EXPORT_SYMBOL_GPL(cn_display_hlist);
>> +
>> bool cn_table_empty(void)
>> {
>> struct cn_dev *dev = &cdev;
>> diff --git a/include/linux/connector.h b/include/linux/connector.h
>> index 5384e4bb98e8..a75c3fcf182a 100644
>> --- a/include/linux/connector.h
>> +++ b/include/linux/connector.h
>> @@ -168,4 +168,8 @@ int cn_get_exval(pid_t pid);
>> bool cn_table_empty(void);
>> bool cn_hash_table_empty(struct cn_hash_dev *hdev);
>> 
>> +int cn_display_hlist(pid_t pid, int max_len, int *hkey, int 
>> *key_display);
>> +int cn_hash_display_hlist(struct cn_hash_dev *hdev, pid_t pid, int 
>> max_len,
>> + int *hkey, int *key_display);
>> +
>> #endif /* __CONNECTOR_H */
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index 7315f643817a..290cf0a6befa 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -2705,6 +2705,23 @@ config HASHTABLE_KUNIT_TEST
>> 
>> If unsure, say N.
>> 
>> +config CN_HASH_KUNIT_TEST
>> + tristate "KUnit Test for connector hashtable code" if !KUNIT_ALL_TESTS
>> + depends on KUNIT
>> + default KUNIT_ALL_TESTS
>> + help
>> +  This builds the hashtable KUnit test suite.
>> +  It tests the basic functionality of the API defined in
>> +  drivers/connector/cn_hash.c.
>> +  CONFIG_CONNECTOR=y, CONFIG_PROC_EVENTS=y

Re: [PATCH net-next v5 2/3] connector/cn_proc: Kunit tests for threads hash table

2024-10-22 Thread Anjali Kulkarni

[…snip…]

 
 Yes, make sure all required options are picked up by
 "./tools/testing/kunit/kunit.py run" instead of manually adding options
 and doing modprobe.
>>> 
>>> The environment issues are resolved and I am able to run kunit.py, but my 
>>> tests
>>> are not invoked without giving options via —kconfig-add. Other tests are 
>>> also not
>>> invoked. Running with the manual options runs 413 tests, and with just 
>>> kunit.py
>>> runs 389 tests. (I have added 6). Any idea how I can make it run my tests?
>> 
>> The runner does: ./tools/testing/kunit/kunit.py run --alltests
>> Is it not enough in your case? What options do you pass via
>> --kconfig-add? Is it because CONNECTOR stuff is disabled by default?
> 
> No, it still does not run.
> However, I added to tools/testing/kunit/configs/all_tests.config:
> 
> CONFIG_CONNECTOR=y
> CONFIG_PROC_EVENTS=y
> CONFIG_NET=y
> CONFIG_CN_HASH_KUNIT_TEST=y
> 
> And now it does run.
> Should I make the change above? I will also check with the kunit guys.
> But I do not understand how it ran for you(and run into the error), or did
> it just try to compile?

I see this in comments on top of all_tests.config.

# The config is manually maintained, though it uses KUNIT_ALL_TESTS=y to enable
# any tests whose dependencies are already satisfied. Please feel free to add
# more options if they any new tests.

So I suppose if a test needs more dependencies, it needs to be added here.

Anjali

Re: [syzbot] [virt?] KCSAN: data-race in virtqueue_disable_cb / virtqueue_disable_cb (5)

2024-10-22 Thread Jason Wang

On Tue, Oct 22, 2024 at 6:07 PM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:3d5ad2d4eca3 Merge tag 'bpf-fixes' of git://git.kernel.org..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=154b748798
> kernel config:  https://syzkaller.appspot.com/x/.config?x=fd83253b74c9c570
> dashboard link: https://syzkaller.appspot.com/bug?extid=9d46c74b27b961b244a9
> compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
> 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/9d112e37e26b/disk-3d5ad2d4.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/d8f036bdcdb7/vmlinux-3d5ad2d4.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/61727d52db48/bzImage-3d5ad2d4.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+9d46c74b27b961b24...@syzkaller.appspotmail.com
>
> ==
> BUG: KCSAN: data-race in virtqueue_disable_cb / virtqueue_disable_cb
>
> read to 0x888102c18178 of 2 bytes by interrupt on cpu 1:
>  virtqueue_disable_cb_split drivers/virtio/virtio_ring.c:885 [inline]

static void virtqueue_disable_cb_split(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);

if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;

Seems indeed racy as it could write notification callbacks directly
without any protection.

It looks to me we can only fix this by using atomic operation,
Michael, do you have other idea?

Thanks

>  virtqueue_disable_cb+0x63/0x180 drivers/virtio/virtio_ring.c:2446
>  skb_xmit_done+0x5f/0x140 drivers/net/virtio_net.c:715
>  vring_interrupt+0x161/0x190 drivers/virtio/virtio_ring.c:2595
>  __handle_irq_event_percpu+0x95/0x490 kernel/irq/handle.c:158
>  handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
>  handle_irq_event+0x64/0xf0 kernel/irq/handle.c:210
>  handle_edge_irq+0x16d/0x5b0 kernel/irq/chip.c:831
>  generic_handle_irq_desc include/linux/irqdesc.h:173 [inline]
>  handle_irq arch/x86/kernel/irq.c:247 [inline]
>  call_irq_handler arch/x86/kernel/irq.c:259 [inline]
>  __common_interrupt+0x58/0xe0 arch/x86/kernel/irq.c:285
>  common_interrupt+0x7c/0x90 arch/x86/kernel/irq.c:278
>  asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
>  kcsan_setup_watchpoint+0x404/0x410 kernel/kcsan/core.c:705
>  format_decode+0x1d1/0x8a0 lib/vsprintf.c:2554
>  vsnprintf+0xc6/0xe30 lib/vsprintf.c:2755
>  sprintf+0x89/0xb0 lib/vsprintf.c:3007
>  print_caller kernel/printk/printk.c:1373 [inline]
>  info_print_prefix+0x142/0x1a0 kernel/printk/printk.c:1390
>  record_print_text kernel/printk/printk.c:1437 [inline]
>  printk_get_next_message+0x446/0x6f0 kernel/printk/printk.c:2978
>  console_emit_next_record kernel/printk/printk.c:3046 [inline]
>  console_flush_all+0x28a/0x770 kernel/printk/printk.c:3180
>  __console_flush_and_unlock kernel/printk/printk.c:3239 [inline]
>  console_unlock+0xab/0x330 kernel/printk/printk.c:3279
>  vprintk_emit+0x3f4/0x680 kernel/printk/printk.c:2407
>  vprintk_default+0x26/0x30 kernel/printk/printk.c:2422
>  vprintk+0x75/0x80 kernel/printk/printk_safe.c:68
>  _printk+0x7a/0xa0 kernel/printk/printk.c:2432
>  batadv_check_known_mac_addr+0x153/0x180 net/batman-adv/hard-interface.c:528
>  batadv_hard_if_event+0x4b0/0x1010 net/batman-adv/hard-interface.c:998
>  notifier_call_chain kernel/notifier.c:93 [inline]
>  raw_notifier_call_chain+0x6f/0x1d0 kernel/notifier.c:461
>  call_netdevice_notifiers_info+0xae/0x100 net/core/dev.c:1996
>  call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
>  call_netdevice_notifiers net/core/dev.c:2048 [inline]
>  dev_set_mac_address+0x1ff/0x260 net/core/dev.c:9104
>  dev_set_mac_address_user+0x31/0x50 net/core/dev.c:9118
>  do_setlink+0x513/0x2490 net/core/rtnetlink.c:2884
>  __rtnl_newlink net/core/rtnetlink.c:3725 [inline]
>  rtnl_newlink+0x11a3/0x1690 net/core/rtnetlink.c:3772
>  rtnetlink_rcv_msg+0x6aa/0x710 net/core/rtnetlink.c:6675
>  netlink_rcv_skb+0x12c/0x230 net/netlink/af_netlink.c:2551
>  rtnetlink_rcv+0x1c/0x30 net/core/rtnetlink.c:6693
>  netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
>  netlink_unicast+0x599/0x670 net/netlink/af_netlink.c:1357
>  netlink_sendmsg+0x5cc/0x6e0 net/netlink/af_netlink.c:1901
>  sock_sendmsg_nosec net/socket.c:729 [inline]
>  __sock_sendmsg+0x140/0x180 net/socket.c:744
>  __sys_sendto+0x1d6/0x260 net/socket.c:2214
>  __do_sys_sendto net/socket.c:2226 [inline]
>  __se_sys_sendto net/socket.c: [inline]
>  __x64_sys_sendto+0x78/0x90 net/socket.c:
>  x64_sys_call+0x2959/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:45
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xc9/0x1c0 a

[PATCH v7 0/3] Extended MODVERSIONS Support

2024-10-22 Thread Matthew Maurer

This patch series is intended for use alongside the Implement DWARF
modversions series [1] to enable RUST and MODVERSIONS at the same
time.

Elsewhere, we've seen a desire for long symbol name support for LTO
symbol names [2], and the previous series came up [3] as a possible
solution rather than hashing, which some have objected [4] to.

This series adds a MODVERSIONS format which uses a section per column.
This avoids userspace tools breaking if we need to make a similar change
to the format in the future - we would do so by adding a new section,
rather than editing the struct definition. In the new format, the name
section is formatted as a concatenated sequence of NUL-terminated
strings, which allows for arbitrary length names.

Emitting the extended format is guarded by CONFIG_EXTENDED_MODVERSIONS,
but the kernel always knows how to validate both the original and
extended formats.

Selecting RUST and MODVERSIONS is now possible if GENDWARFKSYMS is
selected, and will implicitly select EXTENDED_MODVERSIONS.

This series depends upon the module verification refactor patches [5]
that were split off of v5, and DWARF-based versions [1].

linuxppc-dev is requested to look at the ppc-specific munging,
as Luis would like some eyes on there [6].

[1] 
https://lore.kernel.org/lkml/20241008183823.36676-21-samitolva...@google.com/
[2] https://lore.kernel.org/lkml/20240605032120.3179157-1-s...@kernel.org/
[3] https://lore.kernel.org/lkml/zoxbeesk40asi...@bombadil.infradead.org/
[4] https://lore.kernel.org/lkml/0b2697fd-7ab4-469f-83a6-ec9ebc701...@suse.com/
[5] 
https://lore.kernel.org/linux-modules/20241015231651.3851138-1-mmau...@google.com/T/#t
[6] https://lore.kernel.org/lkml/zxahdv5zkdm__...@bombadil.infradead.org/

Changes in v7:
- Fix modpost to detect EXTENDED_MODVERSIONS based on a flag
- Drop patches to fix export_report.pl
- Switch from conditional compilation in .mod.c to conditional emission
  in modpost
- Factored extended modversion emission into its own function
- Allow RUST + MODVERSIONS if GENDWARFKSYMS is enabled by selecting
  EXTENDED_MODVERSIONS

v6: https://lore.kernel.org/lkml/20241015231925.3854230-1-mmau...@google.com/
- Splits verification refactor Luis requested out to a separate change
- Clarifies commits around export_report.pl repairs
- Add CONFIG_EXTENDED_MODVERSIONS to control whether extended
  information is included in the module, per Luis's request.

v5: https://lore.kernel.org/all/20240925233854.90072-1-mmau...@google.com/
- Addresses Sami's comments from v3 that I missed in v4 (missing early
  return, extra parens)

v4: https://lore.kernel.org/asahi/20240924212024.540574-1-mmau...@google.com/
- Fix incorrect dot munging in PPC

v3: https://lore.kernel.org/lkml/87le0w2hop.fsf@mail.lhotse/T/
- Split up the module verification refactor into smaller patches, per
  Greg K-H's suggestion.

v2: https://lore.kernel.org/all/20231118025748.2778044-1-mmau...@google.com/
- Add loading/verification refactor before modifying, per Luis's request

v1: 
https://lore.kernel.org/rust-for-linux/20231115185858.2110875-1-mmau...@google.com/

Matthew Maurer (5):
  export_report: Rehabilitate script
  modules: Support extended MODVERSIONS info
  export_report: Tolerate additional `.mod.c` content
  modpost: Produce extended MODVERSIONS information
  export_report: Use new version info format

 arch/powerpc/kernel/module_64.c | 23 -
 kernel/module/Kconfig   |  8 +++
 kernel/module/internal.h| 11 
 kernel/module/main.c| 92 ++---
 kernel/module/version.c | 45 
 scripts/export_report.pl| 17 +++---
 scripts/mod/modpost.c   | 41 +++
 7 files changed, 220 insertions(+), 17 deletions(-)

--
2.47.0.rc1.288.g06298d1525-goog

---
Matthew Maurer (2):
  modules: Support extended MODVERSIONS info
  modpost: Produce extended MODVERSIONS information

Sami Tolvanen (1):
  rust: Use gendwarfksyms + extended modversions for CONFIG_MODVERSIONS

 arch/powerpc/kernel/module_64.c | 24 ++-
 init/Kconfig|  3 +-
 kernel/module/Kconfig   | 10 +
 kernel/module/internal.h| 11 +
 kernel/module/main.c| 92 +
 kernel/module/version.c | 45 
 rust/Makefile   | 32 +-
 scripts/Makefile.modpost|  1 +
 scripts/mod/modpost.c   | 65 +++--
 9 files changed, 266 insertions(+), 17 deletions(-)
---
base-commit: 2295cf87ed5a6da4564034e4f8ebcce0a0a021ed
change-id: 20241022-extended-modversions-a7b44dfbfff1
prerequisite-message-id: <20241008183823.36676-21-samitolva...@google.com>
prerequisite-patch-id: 08b46e0d1e37c262c08da6db4a87728d7b3047cc
prerequisite-patch-id: 97f307e05ec4b7a653f1ec68f825e8d5bd622b05
prerequisite-patch-id: a4519fb5eef33d692b918529ae09484

Re: [PATCH] selftests: livepatch: add test cases of stack_order sysfs interface

2024-10-22 Thread zhang warden




> On Oct 11, 2024, at 23:11, Wardenjohn  wrote:
> 
> Add selftest test cases to sysfs attribute 'stack_order'.
> 
> Signed-off-by: Wardenjohn 
> ---
> .../testing/selftests/livepatch/test-sysfs.sh | 74 +++
> 1 file changed, 74 insertions(+)
> 
> diff --git a/tools/testing/selftests/livepatch/test-sysfs.sh 
> b/tools/testing/selftests/livepatch/test-sysfs.sh
> index 05a14f5a7bfb..71a2e95636b1 100755
> --- a/tools/testing/selftests/livepatch/test-sysfs.sh
> +++ b/tools/testing/selftests/livepatch/test-sysfs.sh
> @@ -5,6 +5,8 @@
> . $(dirname $0)/functions.sh
> 
> MOD_LIVEPATCH=test_klp_livepatch
> +MOD_LIVEPATCH2=test_klp_callbacks_demo
> +MOD_LIVEPATCH3=test_klp_syscall
> 
> setup_config
> 
> @@ -131,4 +133,76 @@ livepatch: '$MOD_LIVEPATCH': completing unpatching 
> transition
> livepatch: '$MOD_LIVEPATCH': unpatching complete
> % rmmod $MOD_LIVEPATCH"
> 
> +start_test "sysfs test stack_order read"
> +
> +load_lp $MOD_LIVEPATCH
> +
> +check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
> +check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
> +
> +load_lp $MOD_LIVEPATCH2
> +
> +check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
> +check_sysfs_value  "$MOD_LIVEPATCH2" "stack_order" "2"
> +
> +load_lp $MOD_LIVEPATCH3
> +
> +check_sysfs_rights "$MOD_LIVEPATCH3" "stack_order" "-r--r--r--"
> +check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "3"
> +
> +disable_lp $MOD_LIVEPATCH2
> +unload_lp $MOD_LIVEPATCH2
> +
> +check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
> +check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
> +check_sysfs_rights "$MOD_LIVEPATCH3" "stack_order" "-r--r--r--"
> +check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "2"
> +
> +disable_lp $MOD_LIVEPATCH3
> +unload_lp $MOD_LIVEPATCH3
> +
> +disable_lp $MOD_LIVEPATCH
> +unload_lp $MOD_LIVEPATCH
> +
> +check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
> +livepatch: enabling patch '$MOD_LIVEPATCH'
> +livepatch: '$MOD_LIVEPATCH': initializing patching transition
> +livepatch: '$MOD_LIVEPATCH': starting patching transition
> +livepatch: '$MOD_LIVEPATCH': completing patching transition
> +livepatch: '$MOD_LIVEPATCH': patching complete
> +% insmod test_modules/$MOD_LIVEPATCH2.ko
> +livepatch: enabling patch '$MOD_LIVEPATCH2'
> +livepatch: '$MOD_LIVEPATCH2': initializing patching transition
> +$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
> +livepatch: '$MOD_LIVEPATCH2': starting patching transition
> +livepatch: '$MOD_LIVEPATCH2': completing patching transition
> +$MOD_LIVEPATCH2: post_patch_callback: vmlinux
> +livepatch: '$MOD_LIVEPATCH2': patching complete
> +% insmod test_modules/$MOD_LIVEPATCH3.ko
> +livepatch: enabling patch '$MOD_LIVEPATCH3'
> +livepatch: '$MOD_LIVEPATCH3': initializing patching transition
> +livepatch: '$MOD_LIVEPATCH3': starting patching transition
> +livepatch: '$MOD_LIVEPATCH3': completing patching transition
> +livepatch: '$MOD_LIVEPATCH3': patching complete
> +% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH2/enabled
> +livepatch: '$MOD_LIVEPATCH2': initializing unpatching transition
> +$MOD_LIVEPATCH2: pre_unpatch_callback: vmlinux
> +livepatch: '$MOD_LIVEPATCH2': starting unpatching transition
> +livepatch: '$MOD_LIVEPATCH2': completing unpatching transition
> +$MOD_LIVEPATCH2: post_unpatch_callback: vmlinux
> +livepatch: '$MOD_LIVEPATCH2': unpatching complete
> +% rmmod $MOD_LIVEPATCH2
> +% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH3/enabled
> +livepatch: '$MOD_LIVEPATCH3': initializing unpatching transition
> +livepatch: '$MOD_LIVEPATCH3': starting unpatching transition
> +livepatch: '$MOD_LIVEPATCH3': completing unpatching transition
> +livepatch: '$MOD_LIVEPATCH3': unpatching complete
> +% rmmod $MOD_LIVEPATCH3
> +% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH/enabled
> +livepatch: '$MOD_LIVEPATCH': initializing unpatching transition
> +livepatch: '$MOD_LIVEPATCH': starting unpatching transition
> +livepatch: '$MOD_LIVEPATCH': completing unpatching transition
> +livepatch: '$MOD_LIVEPATCH': unpatching complete
> +% rmmod $MOD_LIVEPATCH"
> +
> exit 0
> -- 
> 2.43.5
> 

Hi, maintainers.

Any suggestions of this selftest case?

Regards.
Wardenjohn

Re: [PATCH] remoteproc: qcom: pas: Make remoteproc name human friendly

2024-10-22 Thread Dmitry Baryshkov

On Tue, Oct 22, 2024 at 04:21:03AM +, Bjorn Andersson wrote:
> The remoteproc "name" property is supposed to present the "human
> readable" name of the remoteproc, while using the device name is
> readable, it's not "friendly".
> 
> Instead, use the "sysmon_name" as the identifier for the remoteproc
> instance. It matches the typical names used when we speak about each
> instance, while still being unique.
> 
> Signed-off-by: Bjorn Andersson 
> ---
>  drivers/remoteproc/qcom_q6v5_pas.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reviewed-by: Dmitry Baryshkov 

-- 
With best wishes
Dmitry

[syzbot] [virt?] KCSAN: data-race in virtqueue_disable_cb / virtqueue_disable_cb (5)

2024-10-22 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:3d5ad2d4eca3 Merge tag 'bpf-fixes' of git://git.kernel.org..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=154b748798
kernel config:  https://syzkaller.appspot.com/x/.config?x=fd83253b74c9c570
dashboard link: https://syzkaller.appspot.com/bug?extid=9d46c74b27b961b244a9
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/9d112e37e26b/disk-3d5ad2d4.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/d8f036bdcdb7/vmlinux-3d5ad2d4.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/61727d52db48/bzImage-3d5ad2d4.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9d46c74b27b961b24...@syzkaller.appspotmail.com

==
BUG: KCSAN: data-race in virtqueue_disable_cb / virtqueue_disable_cb

read to 0x888102c18178 of 2 bytes by interrupt on cpu 1:
 virtqueue_disable_cb_split drivers/virtio/virtio_ring.c:885 [inline]
 virtqueue_disable_cb+0x63/0x180 drivers/virtio/virtio_ring.c:2446
 skb_xmit_done+0x5f/0x140 drivers/net/virtio_net.c:715
 vring_interrupt+0x161/0x190 drivers/virtio/virtio_ring.c:2595
 __handle_irq_event_percpu+0x95/0x490 kernel/irq/handle.c:158
 handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
 handle_irq_event+0x64/0xf0 kernel/irq/handle.c:210
 handle_edge_irq+0x16d/0x5b0 kernel/irq/chip.c:831
 generic_handle_irq_desc include/linux/irqdesc.h:173 [inline]
 handle_irq arch/x86/kernel/irq.c:247 [inline]
 call_irq_handler arch/x86/kernel/irq.c:259 [inline]
 __common_interrupt+0x58/0xe0 arch/x86/kernel/irq.c:285
 common_interrupt+0x7c/0x90 arch/x86/kernel/irq.c:278
 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
 kcsan_setup_watchpoint+0x404/0x410 kernel/kcsan/core.c:705
 format_decode+0x1d1/0x8a0 lib/vsprintf.c:2554
 vsnprintf+0xc6/0xe30 lib/vsprintf.c:2755
 sprintf+0x89/0xb0 lib/vsprintf.c:3007
 print_caller kernel/printk/printk.c:1373 [inline]
 info_print_prefix+0x142/0x1a0 kernel/printk/printk.c:1390
 record_print_text kernel/printk/printk.c:1437 [inline]
 printk_get_next_message+0x446/0x6f0 kernel/printk/printk.c:2978
 console_emit_next_record kernel/printk/printk.c:3046 [inline]
 console_flush_all+0x28a/0x770 kernel/printk/printk.c:3180
 __console_flush_and_unlock kernel/printk/printk.c:3239 [inline]
 console_unlock+0xab/0x330 kernel/printk/printk.c:3279
 vprintk_emit+0x3f4/0x680 kernel/printk/printk.c:2407
 vprintk_default+0x26/0x30 kernel/printk/printk.c:2422
 vprintk+0x75/0x80 kernel/printk/printk_safe.c:68
 _printk+0x7a/0xa0 kernel/printk/printk.c:2432
 batadv_check_known_mac_addr+0x153/0x180 net/batman-adv/hard-interface.c:528
 batadv_hard_if_event+0x4b0/0x1010 net/batman-adv/hard-interface.c:998
 notifier_call_chain kernel/notifier.c:93 [inline]
 raw_notifier_call_chain+0x6f/0x1d0 kernel/notifier.c:461
 call_netdevice_notifiers_info+0xae/0x100 net/core/dev.c:1996
 call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
 call_netdevice_notifiers net/core/dev.c:2048 [inline]
 dev_set_mac_address+0x1ff/0x260 net/core/dev.c:9104
 dev_set_mac_address_user+0x31/0x50 net/core/dev.c:9118
 do_setlink+0x513/0x2490 net/core/rtnetlink.c:2884
 __rtnl_newlink net/core/rtnetlink.c:3725 [inline]
 rtnl_newlink+0x11a3/0x1690 net/core/rtnetlink.c:3772
 rtnetlink_rcv_msg+0x6aa/0x710 net/core/rtnetlink.c:6675
 netlink_rcv_skb+0x12c/0x230 net/netlink/af_netlink.c:2551
 rtnetlink_rcv+0x1c/0x30 net/core/rtnetlink.c:6693
 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
 netlink_unicast+0x599/0x670 net/netlink/af_netlink.c:1357
 netlink_sendmsg+0x5cc/0x6e0 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:729 [inline]
 __sock_sendmsg+0x140/0x180 net/socket.c:744
 __sys_sendto+0x1d6/0x260 net/socket.c:2214
 __do_sys_sendto net/socket.c:2226 [inline]
 __se_sys_sendto net/socket.c: [inline]
 __x64_sys_sendto+0x78/0x90 net/socket.c:
 x64_sys_call+0x2959/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:45
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

write to 0x888102c18178 of 2 bytes by task 3255 on cpu 0:
 virtqueue_disable_cb_split drivers/virtio/virtio_ring.c:886 [inline]
 virtqueue_disable_cb+0x85/0x180 drivers/virtio/virtio_ring.c:2446
 start_xmit+0x14b/0x1260 drivers/net/virtio_net.c:3067
 __netdev_start_xmit include/linux/netdevice.h:4916 [inline]
 netdev_start_xmit include/linux/netdevice.h:4925 [inline]
 xmit_one net/core/dev.c:3588 [inline]
 dev_hard_start_xmit+0x119/0x3f0 net/core/dev.c:3604
 sch_direct_xmit+0x1a9/0x580 net/sched/sch_generic.c:343
 __dev_xmit_skb net/core/dev.c:3821 [inline]
 __dev_queue_xmit+0xf1a/0x2040 net/core/dev.

Re: [PATCH v4 02/11] iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct

2024-10-22 Thread Jason Gunthorpe

On Tue, Oct 22, 2024 at 04:59:07PM +0800, Baolu Lu wrote:

> Is it feasible to make vIOMMU object more generic, rather than strictly
> tying it to nested translation? For example, a normal paging domain that
> translates gPAs to hPAs could also have a vIOMMU object associated with
> it.
> 
> While we can only support vIOMMU object allocation uAPI for S2 paging
> domains in the context of this series, we could consider leaving the
> option open to associate a vIOMMU object with other normal paging
> domains that are not a nested parent?

Why? The nested parent flavour of the domain is basically free to
create, what reason would be to not do that?

If the HW doesn't support it, then does the HW really need/support a
VIOMMU?

I suppose it could be made up to the driver, but for now I think we
should leave it as is in the core code requiring it until we have a
reason to relax it.

Jason

[PATCH] selftests/sched: add basic test for psi

2024-10-22 Thread Pintu Kumar

There is a psi module that exists under kernel/sched/psi.
Add a basic test to test the psi.
This test just add the basic support to check cpu/memory/io interface.
Further test will be added on top of this.

Signed-off-by: Pintu Kumar 
---
 MAINTAINERS  |  2 +
 tools/testing/selftests/sched/.gitignore |  1 +
 tools/testing/selftests/sched/Makefile   |  4 +-
 tools/testing/selftests/sched/config |  1 +
 tools/testing/selftests/sched/psi_test.c | 85 
 tools/testing/selftests/sched/run_psi.sh | 36 ++
 6 files changed, 127 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/sched/psi_test.c
 create mode 100755 tools/testing/selftests/sched/run_psi.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 84a73e90cfe8..d84ff9ca36a9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18548,10 +18548,12 @@ F:include/uapi/linux/pps.h
 PRESSURE STALL INFORMATION (PSI)
 M: Johannes Weiner 
 M: Suren Baghdasaryan 
+M: Pintu Kumar 
 R: Peter Ziljstra 
 S: Maintained
 F: include/linux/psi*
 F: kernel/sched/psi.c
+F: tools/testing/selftests/sched/psi_test.c
 
 PRINTK
 M: Petr Mladek 
diff --git a/tools/testing/selftests/sched/.gitignore 
b/tools/testing/selftests/sched/.gitignore
index 6996d4654d92..2b15c11b93e6 100644
--- a/tools/testing/selftests/sched/.gitignore
+++ b/tools/testing/selftests/sched/.gitignore
@@ -1 +1,2 @@
 cs_prctl_test
+psi_test
diff --git a/tools/testing/selftests/sched/Makefile 
b/tools/testing/selftests/sched/Makefile
index 099ee9213557..795f6613eb2c 100644
--- a/tools/testing/selftests/sched/Makefile
+++ b/tools/testing/selftests/sched/Makefile
@@ -8,7 +8,7 @@ CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -Wl,-rpath=./ \
  $(CLANG_FLAGS)
 LDLIBS += -lpthread
 
-TEST_GEN_FILES := cs_prctl_test
-TEST_PROGS := cs_prctl_test
+TEST_GEN_FILES := cs_prctl_test psi_test
+TEST_PROGS := cs_prctl_test run_psi.sh
 
 include ../lib.mk
diff --git a/tools/testing/selftests/sched/config 
b/tools/testing/selftests/sched/config
index e8b09aa7c0c4..287cccd434fd 100644
--- a/tools/testing/selftests/sched/config
+++ b/tools/testing/selftests/sched/config
@@ -1 +1,2 @@
 CONFIG_SCHED_DEBUG=y
+CONFIG_PSI=y
diff --git a/tools/testing/selftests/sched/psi_test.c 
b/tools/testing/selftests/sched/psi_test.c
new file mode 100644
index ..eeba138d2b39
--- /dev/null
+++ b/tools/testing/selftests/sched/psi_test.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+
+
+struct load_avg {
+   float avg10;
+   float avg60;
+   float avg300;
+   unsigned long long total;
+};
+
+struct pressure {
+   struct load_avg some;
+   struct load_avg full;
+};
+
+
+int psi_get_data_from_proc_pressure(const char *path, struct pressure *p)
+{
+   FILE *fp;
+   int rc = -1;
+   int ret = 0;
+
+   if (path == NULL || p == NULL)
+   return -1;
+
+   fp = fopen(path, "r");
+   if (fp == NULL)
+   return -1;
+
+   while (!feof(fp)) {
+   rc = fscanf(fp, "some avg10=%f avg60=%f avg300=%f total=%llu\n",
+   &p->some.avg10, &p->some.avg60, &p->some.avg300, 
&p->some.total);
+   if (rc < 1) {
+   ret = -1;
+   break;
+   }
+
+   /* Note: In some cases (cpu) full may not exists */
+   rc = fscanf(fp, "full avg10=%f avg60=%f avg300=%f total=%llu\n",
+   &p->full.avg10, &p->full.avg60, &p->full.avg300, 
&p->full.total);
+   /* We don't care about full case. This is needed to avoid 
warnings */
+   rc = 0;
+   }
+
+   fclose(fp);
+
+   return ret;
+}
+
+int main(int argc, char *argv[])
+{
+   int ret;
+   struct pressure rs = {0,};
+   char path[32];
+
+   if (argc < 2) {
+   fprintf(stderr, "usage: %s \n", argv[0]);
+   return -1;
+   }
+
+   memset(&rs, 0, sizeof(rs));
+   printf("Pressure data: %s\n", argv[1]);
+   snprintf(path, sizeof(path)-1, "/proc/pressure/%s", argv[1]);
+
+   ret = psi_get_data_from_proc_pressure(path, &rs);
+   if (ret < 0) {
+   printf("PSI <%s>: FAIL\n", argv[1]);
+   return -1;
+   }
+   printf("Some Avg10   = %5.2f\n", rs.some.avg10);
+   printf("Some Avg60   = %5.2f\n", rs.some.avg60);
+   printf("Some Avg300  = %5.2f\n", rs.some.avg300);
+   printf("Some Total  = %llu\n", rs.some.total);
+   printf("Full Avg10  = %5.2f\n", rs.full.avg10);
+   printf("Full Avg60  = %5.2f\n", rs.full.avg60);
+   printf("Full Avg300 = %5.2f\n", rs.full.avg300);
+   printf("Full Total  = %llu\n", rs.full.total);
+
+
+   return 0;
+}
diff --git a/tools/testing/selftests/sched/run_psi.sh 
b/tools/testing/selftests/sched/run_psi.sh
new file mode 100755
index ..d0b1c7ae3736
--- /dev/null
+++ b/tools/t

Re: [PATCH v5 2/3] selftests: livepatch: save and restore kprobe state

2024-10-22 Thread Miroslav Benes

Hi,

On Thu, 17 Oct 2024, Michael Vetter wrote:

> Save the state of /sys/kernel/debug/kprobes/enabled
> during setup_config() and restore it during cleanup().
> 
> This is in preparation for a future commit that will add a test
> that should confirm that we cannot livepatch a kprobed function
> if that kprobe has a post handler.
> 
> Signed-off-by: Michael Vetter 
> ---
>  tools/testing/selftests/livepatch/functions.sh | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/livepatch/functions.sh 
> b/tools/testing/selftests/livepatch/functions.sh
> index e0e7f8db894c..e78e0e16ad4d 100644
> --- a/tools/testing/selftests/livepatch/functions.sh
> +++ b/tools/testing/selftests/livepatch/functions.sh
> @@ -6,7 +6,10 @@
>  
>  MAX_RETRIES=600
>  RETRY_INTERVAL=".1"  # seconds
> -SYSFS_KLP_DIR="/sys/kernel/livepatch"
> +SYSFS_KERNEL_DIR="/sys/kernel"
> +SYSFS_KLP_DIR="$SYSFS_KERNEL_DIR/livepatch"
> +SYSFS_DEBUG_DIR="$SYSFS_KERNEL_DIR/debug"
> +SYSFS_KPROBES_DIR="$SYSFS_DEBUG_DIR/kprobes"

there are still two occurrences of /sys/kernel/ in test-syscall.sh which 
can be replaced with $SYSFS_KERNEL_DIR now. I suppose that Petr can fix 
that when applying as well.

The patch also contains two separate things... the cleanup and the kprobe 
state handling. Ideally, it should be split but unless someone else also 
speaks up, I can certainly live with the current state.

With that

Reviewed-by: Miroslav Benes 

M

Re: [PATCH v5 3/3] selftests: livepatch: test livepatching a kprobed function

2024-10-22 Thread Miroslav Benes

On Thu, 17 Oct 2024, Michael Vetter wrote:

> The test proves that a function that is being kprobed and uses a
> post_handler cannot be livepatched.
> 
> Only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be registered
> to any given function at a time.
> 
> Note that the conflicting kprobe could not be created using the
> tracefs interface, see Documentation/trace/kprobetrace.rst.
> This interface uses only the pre_handler(), see alloc_trace_kprobe().
> But FTRACE_OPS_FL_IPMODIFY is used only when the kprobe is using a
> post_handler, see arm_kprobe_ftrace().
> 
> Signed-off-by: Michael Vetter 

Nice test.

Reviewed-by: Miroslav Benes 

Thank you,
Miroslav

Re: [PATCH] KVM: selftests: Fix build on on non-x86 architectures

2024-10-22 Thread Vitaly Kuznetsov

Mark Brown  writes:

> Commit 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX
> instructions") unconditionally added -march=x86-64-v2 to the CFLAGS used
> to build the KVM selftests which does not work on non-x86 architectures:
>
>   cc1: error: unknown value ‘x86-64-v2’ for ‘-march’
>
> Fix this by making the addition of this x86 specific command line flag
> conditional on building for x86.
>
> Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions")
> Signed-off-by: Mark Brown 
> ---
>  tools/testing/selftests/kvm/Makefile | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index 
> e6b7e01d57080b304b21120f0d47bda260ba6c43..156fbfae940feac649f933dc6e048a2e2926542a
>  100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -244,11 +244,13 @@ CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 
> -g -std=gnu99 \
>   -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
>   -I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \
>   -I$( - -march=x86-64-v2 \
>   $(KHDR_INCLUDES)
>  ifeq ($(ARCH),s390)
>   CFLAGS += -march=z10
>  endif
> +ifeq ($(ARCH),x86)
> + CFLAGS += -march=x86-64-v2
> +endif
>  ifeq ($(ARCH),arm64)
>  tools_dir := $(top_srcdir)/tools
>  arm64_tools_dir := $(tools_dir)/arch/arm64/tools/
>
> ---
> base-commit: d129377639907fce7e0a27990e590e4661d3ee02
> change-id: 20241021-kvm-build-break-495abedc51e0
>
> Best regards,

I see this patch is already in Linus' tree, but anyway

Reviewed-by: Vitaly Kuznetsov 

Thanks for the quick fixup!

-- 
Vitaly

Re: [PATCH] selftests/sched: add basic test for psi

2024-10-22 Thread Johannes Weiner

On Tue, Oct 22, 2024 at 05:51:58PM +0530, Pintu Kumar wrote:
> There is a psi module that exists under kernel/sched/psi.
> Add a basic test to test the psi.

I'm not sure this is a valuable use of test cycles. The mere existence
and basic format of the files is very unlikely to be buggy, and such a
bug wouldn't hide for very long either.

> @@ -18548,10 +18548,12 @@ F:  include/uapi/linux/pps.h
>  PRESSURE STALL INFORMATION (PSI)
>  M:   Johannes Weiner 
>  M:   Suren Baghdasaryan 
> +M:   Pintu Kumar 

$ git log --oneline --author='Pintu Kumar' kernel/sched/psi.c | wc -l
0

Really? ;)

>  R:   Peter Ziljstra 
>  S:   Maintained
>  F:   include/linux/psi*
>  F:   kernel/sched/psi.c
> +F:   tools/testing/selftests/sched/psi_test.c
>  
>  PRINTK
>  M:   Petr Mladek

Re: [PATCH v2] rcu/nocb: Fix the WARN_ON_ONCE() in rcu_nocb_rdp_deoffload()

2024-10-22 Thread Frederic Weisbecker

Le Tue, Oct 22, 2024 at 11:41:17AM +0800, Zqiang a écrit :
> Currently, running rcutorture test with torture_type=rcu fwd_progress=8
> n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60
> test_boost=2, will trigger the following warning:
> 
> WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 
> rcu_nocb_rdp_deoffload+0x292/0x2a0
> RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0
> [18839.537322] Call Trace:
> [18839.538006]  
> [18839.538596]  ? __warn+0x7e/0x120
> [18839.539491]  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
> [18839.540757]  ? report_bug+0x18e/0x1a0
> [18839.541805]  ? handle_bug+0x3d/0x70
> [18839.542837]  ? exc_invalid_op+0x18/0x70
> [18839.543959]  ? asm_exc_invalid_op+0x1a/0x20
> [18839.545165]  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
> [18839.546547]  rcu_nocb_cpu_deoffload+0x70/0xa0
> [18839.547814]  rcu_nocb_toggle+0x136/0x1c0
> [18839.548960]  ? __pfx_rcu_nocb_toggle+0x10/0x10
> [18839.550073]  kthread+0xd1/0x100
> [18839.550958]  ? __pfx_kthread+0x10/0x10
> [18839.552008]  ret_from_fork+0x2f/0x50
> [18839.553002]  ? __pfx_kthread+0x10/0x10
> [18839.553968]  ret_from_fork_asm+0x1a/0x30
> [18839.555038]  
> 
> CPU0   CPU2  CPU3
> //rcu_nocb_toggle //nocb_cb_wait   //rcutorture
> 
> // deoffload CPU1 // process CPU1's rdp
> rcu_barrier()
> rcu_segcblist_entrain()
> rcu_segcblist_add_len(1);
> // len == 2
> // enqueue barrier
> // callback to CPU1's
> // rdp->cblist
>  rcu_do_batch()
>  // invoke CPU1's rdp->cblist
>  // callback
>  rcu_barrier_callback()
>  rcu_barrier()
>
> mutex_lock(&rcu_state.barrier_mutex);
>// still see 
> len == 2
>// enqueue 
> barrier callback
>// to CPU1's 
> rdp->cblist
>
> rcu_segcblist_entrain()
>
> rcu_segcblist_add_len(1);
>// len == 3
>  // decrement len
>  rcu_segcblist_add_len(-2);
>  kthread_parkme()
> 
> // CPU1's rdp->cblist len == 1
> // Warn because there is
> // still a pending barrier
> // trigger warning
> WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> cpus_read_unlock();
> 
> // wait CPU1 
> comes online
> // invoke 
> barrier callback on
> // CPU1 
> rdp's->cblist
> 
> wait_for_completion(&rcu_state.barrier_completion);
> // deoffload CPU4
> cpus_read_lock()
>   rcu_barrier()
> mutex_lock(&rcu_state.barrier_mutex);
> // block on barrier_mutex
> // wait rcu_barrier() on
> // CPU3 to unlock barrier_mutex
> // but CPU3 unlock barrier_mutex
> // need to wait CPU1 comes online
> // when CPU1 going online will block on cpus_write_lock
> 
> The above scenario will not only trigger WARN_ON_ONCE(), but also
> trigger deadlock, this commit therefore check rdp->nocb_cb_sleep
> flags before invoke kthread_parkme(), and the kthread_parkme() is
> not invoke until there are no pending callbacks and set 
> rdp->nocb_cb_sleep is true.
> 
> Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine")
> Suggested-by: Frederic Weisbecker 
> Signed-off-by: Zqiang 

Applied with the below wordsmithing, thanks a lot!

---
From: Zqiang 
Date: Tue, 22 Oct 2024 11:41:17 +0800
Subject: [PATCH] rcu/nocb: Fix missed RCU barrier on deoffloading

Currently, running rcutorture test with torture_type=rcu fwd_progress=8
n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60
test_boost=2, will trigger the following warning:

WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 
rcu_nocb_rdp_deoffload+0x292/0x2a0
RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0
 Call Trace:
  
  ? __warn+0x7e/0x120
  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
  ? report_bug+0x18e/0x1a0
  ? handle_bug+0x3d/0x70
  ? exc_invalid_op+0x18/0x70
  ? asm_exc_invalid_op+0x1a/0x20
  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
  rcu_nocb_cpu_deoffload+0x70/0xa0
  rcu_nocb_toggle+0x136/0x1c0
  ? __pfx_rcu_nocb_toggle+0x10/0x10
  k

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Paul E. McKenney

On Tue, Oct 22, 2024 at 12:13:12AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 22, 2024 at 09:10:18AM +0200, Peter Zijlstra wrote:
> > Ah, well, the thing that got us here is that we (Andrii and me) wanted
> > to use -1 as an 'invalid' value to indicate SRCU is not currently in
> > use.
> > 
> > So it all being int is really rather convenient :-)
> 
> Then please document that use.  Maybe even with a symolic name for
> -1 that clearly describes these uses.

Would this work?

#define SRCU_INVALID_INDEX -1

Whatever the name, maybe Peter and Andrii define this under #ifndef
right now, and we get it into include/linux/srcu.h over time.

Or is there a better way?  Or name, for that matter.

Thanx, Paul

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Paul E. McKenney

On Tue, Oct 22, 2024 at 09:07:17AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2024 at 08:30:43PM -0700, Paul E. McKenney wrote:
> > Thoughts?
> 
> Thanks Paul!
> 
> Acked-by: Peter Zijlstra (Intel) 

Thank you, Peter!  I will apply this on my next rebase.

Thanx, Paul

> > 
> > 
> > commit 4433b7d3785d8d2a700f5ed5ca234c64bc63180e
> > Author: Paul E. McKenney 
> > Date:   Mon Oct 21 15:09:39 2024 -0700
> > 
> > srcu: Guarantee non-negative return value from srcu_read_lock()
> > 
> > For almost 20 years, the int return value from srcu_read_lock() has
> > been always either zero or one.  This commit therefore documents the
> > fact that it will be non-negative, and does the same for the underlying
> > __srcu_read_lock().
> > 
> > [ paulmck: Apply Andrii Nakryiko feedback. ]
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Peter Zijlstra 
> > Acked-by: Andrii Nakryiko 
> > 
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index bab1dae3f69e6..512a8c54ba5ba 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -238,13 +238,14 @@ void srcu_check_read_flavor(struct srcu_struct *ssp, 
> > int read_flavor);
> >   * a mutex that is held elsewhere while calling synchronize_srcu() or
> >   * synchronize_srcu_expedited().
> >   *
> > - * The return value from srcu_read_lock() must be passed unaltered
> > - * to the matching srcu_read_unlock().  Note that srcu_read_lock() and
> > - * the matching srcu_read_unlock() must occur in the same context, for
> > - * example, it is illegal to invoke srcu_read_unlock() in an irq handler
> > - * if the matching srcu_read_lock() was invoked in process context.  Or,
> > - * for that matter to invoke srcu_read_unlock() from one task and the
> > - * matching srcu_read_lock() from another.
> > + * The return value from srcu_read_lock() is guaranteed to be
> > + * non-negative.  This value must be passed unaltered to the matching
> > + * srcu_read_unlock().  Note that srcu_read_lock() and the matching
> > + * srcu_read_unlock() must occur in the same context, for example, it is
> > + * illegal to invoke srcu_read_unlock() in an irq handler if the matching
> > + * srcu_read_lock() was invoked in process context.  Or, for that matter to
> > + * invoke srcu_read_unlock() from one task and the matching 
> > srcu_read_lock()
> > + * from another.
> >   */
> >  static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
> >  {
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 07147efcb64d3..ae17c214e0de5 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -738,7 +738,8 @@ EXPORT_SYMBOL_GPL(srcu_check_read_flavor);
> >  /*
> >   * Counts the new reader in the appropriate per-CPU element of the
> >   * srcu_struct.
> > - * Returns an index that must be passed to the matching srcu_read_unlock().
> > + * Returns a guaranteed non-negative index that must be passed to the
> > + * matching __srcu_read_unlock().
> >   */
> >  int __srcu_read_lock(struct srcu_struct *ssp)
> >  {

Re: [PATCH v4 14/19] gendwarfksyms: Add support for kABI rules

2024-10-22 Thread Petr Pavlu

On 10/8/24 20:38, Sami Tolvanen wrote:
> Distributions that want to maintain a stable kABI need the ability
> to make ABI compatible changes to kernel without affecting symbol
> versions, either because of LTS updates or backports.
> 
> With genksyms, developers would typically hide these changes from
> version calculation with #ifndef __GENKSYMS__, which would result
> in the symbol version not changing even though the actual type has
> changed.  When we process precompiled object files, this isn't an
> option.
> 
> To support this use case, add a --stable command line flag that
> gates kABI stability features that are not needed in mainline
> kernels, but can be useful for distributions, and add support for
> kABI rules, which can be used to restrict gendwarfksyms output.
> 
> The rules are specified as a set of null-terminated strings stored
> in the .discard.gendwarfksyms.kabi_rules section. Each rule consists
> of four strings as follows:
> 
>   "version\0type\0target\0value"
> 
> The version string ensures the structure can be changed in a
> backwards compatible way. The type string indicates the type of the
> rule, and target and value strings contain rule-specific data.
> 
> Initially support two simple rules:
> 
>   1. Declaration-only structures
> 
>  A structure declaration can change into a full definition when
>  additional includes are pulled in to the TU, which changes the
>  versions of any symbol that references the struct. Add support
>  for defining declaration-only structs whose definition is not
>  expanded during versioning.
> 
>   2. Ignored enum fields
> 
>  It's possible to add new enum fields without changing the ABI,
>  but as the fields are included in symbol versioning, this would
>  change the versions. Add support for ignoring specific fields.
> 
> Add examples for using the rules under the examples/ directory.

Thanks for coming up with this approach. It makes sense to me.

> 
> Signed-off-by: Sami Tolvanen 
> Acked-by: Neal Gompa 
> ---
>  scripts/gendwarfksyms/Makefile  |   1 +
>  scripts/gendwarfksyms/dwarf.c   |  19 +-
>  scripts/gendwarfksyms/examples/kabi.h   |  61 ++
>  scripts/gendwarfksyms/examples/kabi_rules.c |  56 +
>  scripts/gendwarfksyms/gendwarfksyms.c   |  11 +-
>  scripts/gendwarfksyms/gendwarfksyms.h   |  57 ++
>  scripts/gendwarfksyms/kabi.c| 214 
>  7 files changed, 415 insertions(+), 4 deletions(-)
>  create mode 100644 scripts/gendwarfksyms/examples/kabi.h
>  create mode 100644 scripts/gendwarfksyms/examples/kabi_rules.c
>  create mode 100644 scripts/gendwarfksyms/kabi.c
> 
> diff --git a/scripts/gendwarfksyms/Makefile b/scripts/gendwarfksyms/Makefile
> index 6540282dc746..27258c31e839 100644
> --- a/scripts/gendwarfksyms/Makefile
> +++ b/scripts/gendwarfksyms/Makefile
> @@ -5,6 +5,7 @@ gendwarfksyms-objs += gendwarfksyms.o
>  gendwarfksyms-objs += cache.o
>  gendwarfksyms-objs += die.o
>  gendwarfksyms-objs += dwarf.o
> +gendwarfksyms-objs += kabi.o
>  gendwarfksyms-objs += symbols.o
>  gendwarfksyms-objs += types.o
>  
> diff --git a/scripts/gendwarfksyms/dwarf.c b/scripts/gendwarfksyms/dwarf.c
> index a47a3a0f7a69..b15f1a5db452 100644
> --- a/scripts/gendwarfksyms/dwarf.c
> +++ b/scripts/gendwarfksyms/dwarf.c
> @@ -80,11 +80,12 @@ static bool match_export_symbol(struct state *state, 
> Dwarf_Die *die)
>   return !!state->sym;
>  }
>  
> -static bool is_declaration(Dwarf_Die *die)
> +static bool is_declaration(struct die *cache, Dwarf_Die *die)
>  {
>   bool value;
>  
> - return get_flag_attr(die, DW_AT_declaration, &value) && value;
> + return (get_flag_attr(die, DW_AT_declaration, &value) && value) ||
> +kabi_is_struct_declonly(cache->fqn);
>  }
>  
>  /*
> @@ -472,10 +473,11 @@ static void __process_structure_type(struct state 
> *state, struct die *cache,
>   process(cache, " {");
>   process_linebreak(cache, 1);
>  
> - is_decl = is_declaration(die);
> + is_decl = is_declaration(cache, die);
>  
>   if (!is_decl && state->expand.expand) {
>   cache_mark_expanded(&state->expansion_cache, die->addr);
> + state->expand.current_fqn = cache->fqn;
>   check(process_die_container(state, cache, die, process_func,
>   match_func));
>   }
> @@ -508,6 +510,15 @@ static void process_enumerator_type(struct state *state, 
> struct die *cache,
>  {
>   Dwarf_Word value;
>  
> + if (stable) {
> + /* Get the fqn before we process anything */
> + update_fqn(cache, die);
> +
> + if (kabi_is_enumerator_ignored(state->expand.current_fqn,
> +cache->fqn))
> + return;
> + }
> +
>   process_list_comma(state, cache);
>   process(cache, "enumerator");
>   process_fqn(cache, die);
> @@ -580,6 +591,7 @@ static v

Re: [PATCH v4 02/11] iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct

2024-10-22 Thread Baolu Lu


On 2024/10/22 12:40, Nicolin Chen wrote:

On Tue, Oct 22, 2024 at 10:28:30AM +0800, Baolu Lu wrote:

On 2024/10/22 8:19, Nicolin Chen wrote:

+ * @viommu_alloc: Allocate an iommufd_viommu on a physical IOMMU instance 
behind
+ *the @dev, as the set of virtualization resources 
shared/passed
+ *to user space IOMMU instance. And associate it with a nesting
+ *@parent_domain. The @viommu_type must be defined in the 
header
+ *include/uapi/linux/iommufd.h
+ *It is suggested to call iommufd_viommu_alloc() helper for
+ *a bundled allocation of the core and the driver structures,
+ *using the given @ictx pointer.
* @pgsize_bitmap: bitmap of all possible supported page sizes
* @owner: Driver module providing these ops
* @identity_domain: An always available, always attachable identity
@@ -591,6 +601,10 @@ struct iommu_ops {
   void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid,
struct iommu_domain *domain);

+ struct iommufd_viommu *(*viommu_alloc)(
+ struct device *dev, struct iommu_domain *parent_domain,
+ struct iommufd_ctx *ictx, unsigned int viommu_type);

Is the vIOMMU object limited to a parent domain?

Yes, for each vIOMMU (a slice of physical IOMMU per VM), one S2
parent domain is enough. Typically, it has the mappings between
gPAs and hPAs. If its format/compatibility allows, this single
parent domain can be shared with other vIOMMUs.


Is it feasible to make vIOMMU object more generic, rather than strictly
tying it to nested translation? For example, a normal paging domain that
translates gPAs to hPAs could also have a vIOMMU object associated with
it.

While we can only support vIOMMU object allocation uAPI for S2 paging
domains in the context of this series, we could consider leaving the
option open to associate a vIOMMU object with other normal paging
domains that are not a nested parent?

Thanks,
baolu

Re: [PATCH v5 0/3] selftests: livepatch: test livepatching a kprobed function

2024-10-22 Thread Petr Mladek

On Thu 2024-10-17 22:01:29, Michael Vetter wrote:
> Thanks for all the reviews.
> 
> V5:
> Replace /sys/kernel/livepatch also in other/already existing tests.
> Improve commit message of 3rd patch.

With the syntax error fixed by Joe:

Reviewed-by: Petr Mladek 
Tested-by: Petr Mladek 

Note: I could fix the syntax error when pushing the patchset.
  There is no need to send v6 if this is the only problem.

Best Regards,
Petr

Re: [PATCH v4 09/19] gendwarfksyms: Expand structure types

2024-10-22 Thread Petr Pavlu

On 10/8/24 20:38, Sami Tolvanen wrote:
> Recursively expand DWARF structure types, i.e. structs, unions, and
> enums. Also include relevant DWARF attributes in type strings to
> encode structure layout, for example.
> 
> Example output with --dump-dies:
> 
>   subprogram (
> formal_parameter structure_type &str {
>   member pointer_type {
> base_type u8 byte_size(1) encoding(7)
>   } data_ptr data_member_location(0) ,
>   member base_type usize byte_size(8) encoding(7) length 
> data_member_location(8)
> } byte_size(16) alignment(8) msg
>   )
>   -> base_type void
> 
> Signed-off-by: Sami Tolvanen 
> Acked-by: Neal Gompa 

Looks ok to me, feel free to add:

Reviewed-by: Petr Pavlu 

-- Petr

Re: [PATCH v4 13/19] gendwarfksyms: Add symbol versioning

2024-10-22 Thread Petr Pavlu

On 10/8/24 20:38, Sami Tolvanen wrote:
> Calculate symbol versions from the fully expanded type strings in
> type_map, and output the versions in a genksyms-compatible format.
> 
> Signed-off-by: Sami Tolvanen 
> Acked-by: Neal Gompa 
> ---
>  scripts/gendwarfksyms/dwarf.c |  25 +-
>  scripts/gendwarfksyms/gendwarfksyms.c |  11 ++-
>  scripts/gendwarfksyms/gendwarfksyms.h |  13 ++-
>  scripts/gendwarfksyms/symbols.c   |  59 +
>  scripts/gendwarfksyms/types.c | 122 +-
>  5 files changed, 222 insertions(+), 8 deletions(-)
> 
> diff --git a/scripts/gendwarfksyms/dwarf.c b/scripts/gendwarfksyms/dwarf.c
> index e1a9e9061b1d..a47a3a0f7a69 100644
> --- a/scripts/gendwarfksyms/dwarf.c
> +++ b/scripts/gendwarfksyms/dwarf.c
> @@ -723,12 +723,33 @@ static int process_type(struct state *state, struct die 
> *parent, Dwarf_Die *die)
>  /*
>   * Exported symbol processing
>   */
> +static struct die *get_symbol_cache(struct state *state, Dwarf_Die *die)
> +{
> + struct die *cache;
> +
> + cache = die_map_get(die, DIE_SYMBOL);
> +
> + if (cache->state != DIE_INCOMPLETE)
> + return NULL; /* We already processed a symbol for this DIE */
> +
> + cache->tag = dwarf_tag(die);
> + return cache;
> +}
> +
>  static void process_symbol(struct state *state, Dwarf_Die *die,
>  die_callback_t process_func)
>  {
> + struct die *cache;
> +
> + symbol_set_die(state->sym, die);
> +
> + cache = get_symbol_cache(state, die);
> + if (!cache)
> + return;
> +
>   debug("%s", state->sym->name);
> - check(process_func(state, NULL, die));
> - state->sym->state = SYMBOL_MAPPED;
> + check(process_func(state, cache, die));
> + cache->state = DIE_SYMBOL;
>   if (dump_dies)
>   fputs("\n", stderr);
>  }
> diff --git a/scripts/gendwarfksyms/gendwarfksyms.c 
> b/scripts/gendwarfksyms/gendwarfksyms.c
> index 24c87523fc3a..e90d909d259b 100644
> --- a/scripts/gendwarfksyms/gendwarfksyms.c
> +++ b/scripts/gendwarfksyms/gendwarfksyms.c
> @@ -23,6 +23,8 @@ int dump_dies;
>  int dump_die_map;
>  /* Print out type strings (i.e. type_map) */
>  int dump_types;
> +/* Print out expanded type strings used for symbol versions */
> +int dump_versions;
>  /* Write a symtypes file */
>  int symtypes;
>  static const char *symtypes_file;
> @@ -35,6 +37,7 @@ static void usage(void)
> "  --dump-dies  Dump DWARF DIE contents\n"
> "  --dump-die-map   Print debugging information about die_map 
> changes\n"
> "  --dump-types Dump type strings\n"
> +   "  --dump-versions  Dump expanded type strings used for 
> symbol versions\n"
> "  -T, --symtypes file  Write a symtypes file\n"
> "  -h, --help   Print this message\n"
> "\n",
> @@ -69,9 +72,10 @@ static int process_module(Dwfl_Module *mod, void 
> **userdata, const char *name,
>   } while (cu);
>  
>   /*
> -  * Use die_map to expand type strings and write them to `symfile`.
> +  * Use die_map to expand type strings, write them to `symfile`, and
> +  * calculate symbol versions.
>*/
> - generate_symtypes(symfile);
> + generate_symtypes_and_versions(symfile);
>   die_map_free();
>  
>   return DWARF_CB_OK;
> @@ -92,6 +96,7 @@ int main(int argc, char **argv)
>{ "dump-dies", 0, &dump_dies, 1 },
>{ "dump-die-map", 0, &dump_die_map, 1 },
>{ "dump-types", 0, &dump_types, 1 },
> +  { "dump-versions", 0, &dump_versions, 1 },
>{ "symtypes", 1, NULL, 'T' },
>{ "help", 0, NULL, 'h' },
>{ 0, 0, NULL, 0 } };
> @@ -167,5 +172,7 @@ int main(int argc, char **argv)
>   if (symfile)
>   check(fclose(symfile));
>  
> + symbol_print_versions();
> +
>   return 0;
>  }
> diff --git a/scripts/gendwarfksyms/gendwarfksyms.h 
> b/scripts/gendwarfksyms/gendwarfksyms.h
> index e47b5e967520..814f53ef799e 100644
> --- a/scripts/gendwarfksyms/gendwarfksyms.h
> +++ b/scripts/gendwarfksyms/gendwarfksyms.h
> @@ -26,6 +26,7 @@ extern int debug;
>  extern int dump_dies;
>  extern int dump_die_map;
>  extern int dump_types;
> +extern int dump_versions;
>  extern int symtypes;
>  
>  /*
> @@ -98,6 +99,7 @@ static inline unsigned int addr_hash(uintptr_t addr)
>  enum symbol_state {
>   SYMBOL_UNPROCESSED,
>   SYMBOL_MAPPED,
> + SYMBOL_PROCESSED
>  };
>  
>  struct symbol_addr {
> @@ -112,6 +114,7 @@ struct symbol {
>   struct hlist_node name_hash;
>   enum symbol_state state;
>   uintptr_t die_addr;
> + unsigned long crc;
>  };
>  
>  typedef void (*symbol_callback_t)(struct symbol *, void *arg);
> @@ -119,6 +122,10 @@ typedef void (*symbol_callback_t)(struct symbol

Re: [PATCH v8 3/3] irqchip/loongson-eiointc: Add extioi virt extension support

2024-10-22 Thread maobibo


Hi Huacai/Thomas,

Sorry for the ping message :(

Can this patch be applied int next RC version?

Regards
Bibo Mao

On 2024/10/2 下午9:42, Thomas Gleixner wrote:

On Wed, Sep 11 2024 at 17:11, Huacai Chen wrote:

Hi, Thomas,

On Fri, Aug 30, 2024 at 5:32 PM Bibo Mao  wrote:


Interrupts can be routed to maximal four virtual CPUs with one HW
EIOINTC interrupt controller model, since interrupt routing is encoded with
CPU bitmap and EIOINTC node combined method. Here add the EIOINTC virt
extension support so that interrupts can be routed to 256 vCPUs on
hypervisor mode. CPU bitmap is replaced with normal encoding and EIOINTC
node type is removed, so there are 8 bits for cpu selection, at most 256
vCPUs are supported for interrupt routing.

This patch is OK for me now, but seems it depends on the first two,
and the first two will get upstream via loongarch-kvm tree. So is that
possible to also apply this one to loongarch-kvm with your Acked-by?


Go ahead.

Reviewed-by: Thomas Gleixner

Re: [PATCH rcu] srcu: Guarantee non-negative return value from srcu_read_lock()

2024-10-22 Thread Christoph Hellwig

On Tue, Oct 22, 2024 at 09:10:18AM +0200, Peter Zijlstra wrote:
> Ah, well, the thing that got us here is that we (Andrii and me) wanted
> to use -1 as an 'invalid' value to indicate SRCU is not currently in
> use.
> 
> So it all being int is really rather convenient :-)

Then please document that use.  Maybe even with a symolic name for
-1 that clearly describes these uses.

67 matches

Mail list logo