On 18.06.2024 16:50, Roger Pau Monné wrote:
> On Tue, Jun 18, 2024 at 04:34:50PM +0200, Jan Beulich wrote:
>> On 18.06.2024 13:30, Roger Pau Monné wrote:
>>> On Mon, Jun 17, 2024 at 03:41:12PM +0200, Jan Beulich wrote:
>>>> On 13.06.2024 18:56, Roger Pau Monne wrote:
>>>>> @@ -2686,11 +2705,27 @@ void fixup_irqs(const cpumask_t *mask, bool 
>>>>> verbose)
>>>>>          if ( desc->handler->disable )
>>>>>              desc->handler->disable(desc);
>>>>>  
>>>>> +        /*
>>>>> +         * If the current CPU is going offline and is (one of) the 
>>>>> target(s) of
>>>>> +         * the interrupt, signal to check whether there are any pending 
>>>>> vectors
>>>>> +         * to be handled in the local APIC after the interrupt has been 
>>>>> moved.
>>>>> +         */
>>>>> +        if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, 
>>>>> desc->arch.cpu_mask) )
>>>>> +            check_irr = true;
>>>>> +
>>>>>          if ( desc->handler->set_affinity )
>>>>>              desc->handler->set_affinity(desc, affinity);
>>>>>          else if ( !(warned++) )
>>>>>              set_affinity = false;
>>>>>  
>>>>> +        if ( check_irr && apic_irr_read(vector) )
>>>>> +            /*
>>>>> +             * Forward pending interrupt to the new destination, this 
>>>>> CPU is
>>>>> +             * going offline and otherwise the interrupt would be lost.
>>>>> +             */
>>>>> +            send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
>>>>> +                          desc->arch.vector);
>>>>
>>>> Hmm, IRR may become set right after the IRR read (unlike in the other 
>>>> cases,
>>>> where new IRQs ought to be surfacing only at the new destination). Doesn't
>>>> this want moving ...
>>>>
>>>>>          if ( desc->handler->enable )
>>>>>              desc->handler->enable(desc);
>>>>
>>>> ... past the actual affinity change?
>>>
>>> Hm, but the ->enable() hook is just unmasking the interrupt, the
>>> actual affinity change is done in ->set_affinity(), and hence after
>>> the call to ->set_affinity() no further interrupts should be delivered
>>> to the CPU regardless of whether the source is masked?
>>>
>>> Or is it possible for the device/interrupt controller to not switch to
>>> use the new destination until the interrupt is unmasked, and hence
>>> could have pending masked interrupts still using the old destination?
>>> IIRC For MSI-X it's required that the device updates the destination
>>> target once the entry is unmasked.
>>
>> That's all not relevant here, I think. IRR gets set when an interrupt is
>> signaled, no matter whether it's masked.
> 
> I'm kind of lost here, what does signaling mean in this context?
> 
> I would expect the interrupt vector to not get set in IRR if the MSI-X
> entry is masked, as at that point the state of the address/data fields
> might not be consistent (that's the whole point of masking it right?)
> 
>> It's its handling which the
>> masking would prevent, i.e. the "moving" of the set bit from IRR to ISR.
> 
> My understanding was that the masking would prevent the message write to
> the APIC from happening, and hence no vector should get set in IRR.

Hmm, yes, looks like I was confused. The masking is at the source side
(IO-APIC RTE, MSI-X entry, or - if supported - in the MSI capability).
So the sole case to worry about is MSI without mask-bit support then.

>> Plus we run with IRQs off here anyway if I'm not mistaken, so no
>> interrupt can be delivered to the local CPU. IOW whatever IRR bits it
>> has set (including ones becoming set between the IRR read and the actual
>> vector change), those would never be serviced. Hence the reading of the
>> bit ought to occur after the vector change: It's only then that we know
>> the IRR bit corresponding to the old vector can't become set anymore.
> 
> Right, and the vector change happens in ->set_affinity(), not
> ->enable().  See for example set_msi_affinity() and the
> write_msi_msg(), that's where the vector gets changed.
> 
>> And even then we're assuming that no interrupt signals might still be
>> "on their way" from the IO-APIC or a posted MSI message write by a
>> device to the LAPIC (I have no idea how to properly fence that, or
>> whether there are guarantees for this to never occur).
> 
> Yeah, those I expect would be completed in the window between the
> write of the new vector/destination and the reading of IRR.

Except we have no idea on the latencies.

Jan

Reply via email to