On 03/07/2025 10:01 am, Jan Beulich wrote:
> On 02.07.2025 16:41, Andrew Cooper wrote:
>> In order elide IPIs, we must be able to identify whether a target CPU is in
>> MWAIT at the point it is woken up.  i.e. the store to wake it up must also
>> identify the state.
>>
>> Create a new in_mwait variable beside __softirq_pending, so we can use a
>> CMPXCHG to set the softirq while also observing the status safely.  Implement
>> an x86 version of arch_pend_softirq() which does this.
>>
>> In mwait_idle_with_hints(), advertise in_mwait, with an explanation of
>> precisely what it means.  X86_BUG_MONITOR can be accounted for simply by not
>> advertising in_mwait.
>>
>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
>> ---
>> CC: Jan Beulich <jbeul...@suse.com>
>> CC: Roger Pau Monné <roger....@citrix.com>
>> CC: Anthony PERARD <anthony.per...@vates.tech>
>> CC: Michal Orzel <michal.or...@amd.com>
>> CC: Julien Grall <jul...@xen.org>
>> CC: Stefano Stabellini <sstabell...@kernel.org>
>>
>> This is modelled after Linux's TIF_NEED_RESCHED (single bit equivelent of all
>> of __softirq_pending), and TIF_POLLING_NRFLAG (arch-neutral "in_mwait").
>>
>> In Linux, they're both in the same flags field, which adds complexity.  In
>> Xen, __softirq_pending is already unsigned long for everything other than 
>> x86,
>> so adding an arch-neutral "in_mwait" would need wider changes.
> Why would the flag need to be arch-neutral?

Because it's not about mwait; it's about the ability to notice the
rising edge of TIF_NEED_RESCHED, and is implemented by more than just
x86 in Linux.

>> +     */
>> +    alternative_io("movb $1, %[in_mwait]",
>> +                   "", X86_BUG_MONITOR,
>> +                   [in_mwait] "=m" (stat->in_mwait));
>>  
>>      monitor(this_softirq_pending, 0, 0);
>>      smp_mb();
> Unlike alternative(), alternative_io() has no memory clobber. To the compiler
> the two resulting asm() therefore have no dependency operand-wise[1]. The
> sole ordering criteria then is that they're both volatile asm()-s. It not
> being explicitly said (anywhere that I could find) that volatile guarantees
> such ordering, I wonder if we wouldn't better have an operand-based
> dependency between the two. Otoh it may well be that we rely on such ordering
> elsewhere already, in which case having one more instance probably would be
> okay(ish).
>
> [1] It's actually worse than that, I think: monitor() also doesn't specify
> the location as a (memory) input.

The GCC bugzilla has plenty of statements that volatiles (which have
survived optimisation passes) can't be reordered.

>
>> @@ -452,6 +466,10 @@ void mwait_idle_with_hints(unsigned int eax, unsigned 
>> int ecx)
>>          mwait(eax, ecx);
>>          spec_ctrl_exit_idle(info);
>>      }
>> +
>> +    alternative_io("movb $0, %[in_mwait]",
>> +                   "", X86_BUG_MONITOR,
>> +                   [in_mwait] "=m" (stat->in_mwait));
> This doesn't strictly need to be an alternative, does it? We can save a
> memory write in the buggy case, but that likely makes at most a marginal
> difference.

It doesn't *need* to be an alternative.  It could be:

if ( !cpu_bug_monitor )
    ACCESS_ONCE(stat->in_mwait) = true;

but getting rid of the branch is an advantage too.
>> --- a/xen/arch/x86/include/asm/hardirq.h
>> +++ b/xen/arch/x86/include/asm/hardirq.h
>> @@ -5,7 +5,19 @@
>>  #include <xen/types.h>
>>  
>>  typedef struct {
>> -    unsigned int __softirq_pending;
>> +    /*
>> +     * The layout is important.  Any CPU can set bits in __softirq_pending,
>> +     * but in_mwait is a status bit owned by the CPU.  softirq_mwait_raw 
>> must
>> +     * cover both, and must be in a single cacheline.
>> +     */
>> +    union {
>> +        struct {
>> +            unsigned int __softirq_pending;
>> +            bool in_mwait;
>> +        };
>> +        uint64_t softirq_mwait_raw;
>> +    };
> To guard against someone changing e.g. __softirq_pending to unsigned long
> while ignoring the comment, how about adding a BUILD_BUG_ON() checking both
> parts of the union are the same size (which of course would require naming
> either the union field within the containing struct or its struct member)?

That turns out to be very hard because of the definition of
softirq_pending() being common.  More below.


>
>> @@ -9,4 +11,36 @@
>>  #define HVM_DPCI_SOFTIRQ       (NR_COMMON_SOFTIRQS + 4)
>>  #define NR_ARCH_SOFTIRQS       5
>>  
>> +/*
>> + * Ensure softirq @nr is pending on @cpu.  Return true if an IPI can be
>> + * skipped, false if the IPI cannot be skipped.
>> + *
>> + * We use a CMPXCHG covering both __softirq_pending and in_mwait, in order 
>> to
>> + * set softirq @nr while also observing in_mwait in a race-free way.
>> + */
>> +static always_inline bool arch_pend_softirq(unsigned int nr, unsigned int 
>> cpu)
>> +{
>> +    uint64_t *ptr = &irq_stat[cpu].softirq_mwait_raw;
>> +    uint64_t old, new;
>> +    unsigned int softirq = 1U << nr;
>> +
>> +    old = ACCESS_ONCE(*ptr);
>> +
>> +    do {
>> +        if ( old & softirq )
>> +            /* Softirq already pending, nothing to do. */
>> +            return true;
>> +
>> +        new = old | softirq;
>> +
>> +    } while ( (old = cmpxchg(ptr, old, new)) != new );
> Don't you mean
>
>     } while ( (new = cmpxchg(ptr, old, new)) != old );
>
> here

No. I'm pretty sure I don't.

> (with new latched ahead of the loop and old set from new inside it),
> like we have it elsewhere when we avoid the use of yet another variable,
> e.g. in inc_linear_{entries,uses}()?

Eww, no.  Having new and old backwards like that is borderline
obfucation, and is unnecessary cognitive complexity for the reader.


>
>> +    /*
>> +     * We have caused the softirq to become pending.  If in_mwait was set, 
>> the
>> +     * target CPU will notice the modification and act on it.
>> +     */
>> +    return new & (1UL << 32);
> Hmm, this requires the layout to allow for even less re-arrangement than the
> comment ahead of the union says: You strictly require in_mwait to follow
> __softirq_pending immediately, and the (assembly) write to strictly write 1
> into the field. Could I talk you into at least
>
>     return new & (1UL << (offsetof(..., in_mwait) * 8));
>
> ? This way the field being inspected would also be mentioned in the access
> itself (i.e. become grep-able beyond the mentioning in the comment). And I
> sincerely dislike hard-coded literal numbers when they (relatively) easily
> can be expressed another way.

The nice way to do this would be a named union and a proper field, but
that doesn't work because of how softirq_pending() is declared.

I suppose I can use an offsetof() expression.

~Andrew

Reply via email to