On 12.06.2024 10:47, Roger Pau Monné wrote:
> On Tue, Jun 11, 2024 at 02:45:09PM +0200, Jan Beulich wrote:
>> On 10.06.2024 16:20, Roger Pau Monne wrote:
>>> Given the current logic it's possible for ->arch.old_cpu_mask to get out of
>>> sync: if a CPU set in old_cpu_mask is offlined and then onlined
>>> again without old_cpu_mask having been updated the data in the mask will no
>>> longer be accurate, as when brought back online the CPU will no longer have
>>> old_vector configured to handle the old interrupt source.
>>>
>>> If there's an interrupt movement in progress, and the to be offlined CPU 
>>> (which
>>> is the call context) is in the old_cpu_mask clear it and update the mask, 
>>> so it
>>> doesn't contain stale data.
>>
>> This imo is too __cpu_disable()-centric. In the code you cover the
>> smp_send_stop() case afaict, where it's all _other_ CPUs which are being
>> offlined. As we're not meaning to bring CPUs online again in that case,
>> dealing with the situation likely isn't needed. Yet the description should
>> imo at least make clear that the case was considered.
> 
> What about adding the following paragraph:

Sounds good, just maybe one small adjustment:

> Note that when the system is going down fixup_irqs() will be called by
> smp_send_stop() from CPU 0 with a mask with only CPU 0 on it,
> effectively asking to move all interrupts to the current caller (CPU
> 0) which is the only CPU online.  In that case we don't care to

"... the only CPU to remain online."

> migrate interrupts that are in the process of being moved, as it's
> likely we won't be able to move all interrupts to CPU 0 due to vector
> shortage anyway.
> 
>>
>>> @@ -2589,6 +2589,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>                                 affinity);
>>>          }
>>>  
>>> +        if ( desc->arch.move_in_progress &&
>>> +             !cpumask_test_cpu(cpu, &cpu_online_map) &&
>>
>> This part of the condition is, afaict, what covers (excludes) the
>> smp_send_stop() case. Might be nice to have a brief comment here, thus
>> also clarifying ...
> 
> Would you be fine with:
> 
>         if ( desc->arch.move_in_progress &&
>              /*
>             * Only attempt to migrate if the current CPU is going
>             * offline, otherwise the whole system is going down and
>             * leaving stale interrupts is fine.
>             */
>              !cpumask_test_cpu(cpu, &cpu_online_map) &&
>              cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )

Sure, this is even more verbose (i.e. better) than I was after.

Jan

Reply via email to