On 12.06.2024 12:39, Roger Pau Monné wrote:
> On Tue, Jun 11, 2024 at 03:18:32PM +0200, Jan Beulich wrote:
>> On 10.06.2024 16:20, Roger Pau Monne wrote:
>>> Currently there's logic in fixup_irqs() that attempts to prevent
>>> _assign_irq_vector() from failing, as fixup_irqs() is required to evacuate 
>>> all
>>> interrupts from the CPUs not present in the input mask.  The current logic 
>>> in
>>> fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
>>> move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
>>>
>>> Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so 
>>> that
>>> _assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
>>> to deal with interrupts that have either move_{in_progress,cleanup_count} 
>>> set
>>> and no remaining online CPUs in ->arch.cpu_mask.
>>>
>>> If _assign_irq_vector() is requested to move an interrupt in the state
>>> described above, first attempt to see if ->arch.old_cpu_mask contains any 
>>> valid
>>> CPUs that could be used as fallback, and if that's the case do move the
>>> interrupt back to the previous destination.  Note this is easier because the
>>> vector hasn't been released yet, so there's no need to allocate and setup a 
>>> new
>>> vector on the destination.
>>>
>>> Due to the logic in fixup_irqs() that clears offline CPUs from
>>> ->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) 
>>> it
>>> shouldn't be possible to get into _assign_irq_vector() with
>>> ->arch.move_{in_progress,cleanup_count} set but no online CPUs in
>>> ->arch.old_cpu_mask.
>>>
>>> However if ->arch.move_{in_progress,cleanup_count} is set and the interrupt 
>>> has
>>> also changed affinity, it's possible the members of ->arch.old_cpu_mask are 
>>> no
>>> longer part of the affinity set,
>>
>> I'm having trouble relating this (->arch.old_cpu_mask related) to ...
>>
>>> move the interrupt to a different CPU part of
>>> the provided mask
>>
>> ... this (->arch.cpu_mask related).
> 
> No, the "provided mask" here is the "mask" parameter, not
> ->arch.cpu_mask.

Oh, so this describes the case of "hitting" the comment at the very bottom of
the first hunk then? (I probably was misreading this because I was expecting
it to describe a code change, rather than the case where original behavior
needs retaining. IOW - all fine here then.)

>>> and keep the current ->arch.old_{cpu_mask,vector} for the
>>> pending interrupt movement to be completed.
>>
>> Right, that's to clean up state from before the initial move. What isn't
>> clear to me is what's to happen with the state of the intermediate
>> placement. Description and code changes leave me with the impression that
>> it's okay to simply abandon, without any cleanup, yet I can't quite figure
>> why that would be an okay thing to do.
> 
> There isn't much we can do with the intermediate placement, as the CPU
> is going offline.  However we can drain any pending interrupts from
> IRR after the new destination has been set, since setting the
> destination is done from the CPU that's the current target of the
> interrupts.  So we can ensure the draining is done strictly after the
> target has been switched, hence ensuring no further interrupts from
> this source will be delivered to the current CPU.

Hmm, I'm afraid I still don't follow: I'm specifically in trouble with
the ...

>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -544,7 +544,53 @@ static int _assign_irq_vector(struct irq_desc *desc, 
>>> const cpumask_t *mask)
>>>      }
>>>  
>>>      if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
>>> -        return -EAGAIN;
>>> +    {
>>> +        /*
>>> +         * If the current destination is online refuse to shuffle.  Retry 
>>> after
>>> +         * the in-progress movement has finished.
>>> +         */
>>> +        if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) )
>>> +            return -EAGAIN;
>>> +
>>> +        /*
>>> +         * Due to the logic in fixup_irqs() that clears offlined CPUs from
>>> +         * ->arch.old_cpu_mask it shouldn't be possible to get here with
>>> +         * ->arch.move_{in_progress,cleanup_count} set and no online CPUs 
>>> in
>>> +         * ->arch.old_cpu_mask.
>>> +         */
>>> +        ASSERT(valid_irq_vector(desc->arch.old_vector));
>>> +        ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, 
>>> &cpu_online_map));
>>> +
>>> +        if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) )
>>> +        {
>>> +            /*
>>> +             * Fallback to the old destination if moving is in progress 
>>> and the
>>> +             * current destination is to be offlined.  This is only 
>>> possible if
>>> +             * the CPUs in old_cpu_mask intersect with the affinity mask 
>>> passed
>>> +             * in the 'mask' parameter.
>>> +             */
>>> +            desc->arch.vector = desc->arch.old_vector;
>>> +            cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask, 
>>> mask);

... replacing of vector (and associated mask), without any further accounting.

>>> +            /* Undo any possibly done cleanup. */
>>> +            for_each_cpu(cpu, desc->arch.cpu_mask)
>>> +                per_cpu(vector_irq, cpu)[desc->arch.vector] = irq;
>>> +
>>> +            /* Cancel the pending move. */
>>> +            desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
>>> +            cpumask_clear(desc->arch.old_cpu_mask);
>>> +            desc->arch.move_in_progress = 0;
>>> +            desc->arch.move_cleanup_count = 0;
>>> +
>>> +            return 0;
>>> +        }
>>
>> In how far is this guaranteed to respect the (new) affinity that was set,
>> presumably having led to the movement in the first place?
> 
> The 'mask' parameter should account for the new affinity, hence the
> cpumask_intersects() check guarantees we are moving to a CPU still in
> the affinity mask.

Ah, right, I must have been confused.

>>> @@ -600,7 +646,17 @@ next:
>>>          current_vector = vector;
>>>          current_offset = offset;
>>>  
>>> -        if ( valid_irq_vector(old_vector) )
>>> +        if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
>>> +        {
>>> +            ASSERT(!cpumask_intersects(desc->arch.cpu_mask, 
>>> &cpu_online_map));
>>> +            /*
>>> +             * Special case when evacuating an interrupt from a CPU to be
>>> +             * offlined and the interrupt was already in the process of 
>>> being
>>> +             * moved.  Leave ->arch.old_{vector,cpu_mask} as-is and just
>>> +             * replace ->arch.{cpu_mask,vector} with the new destination.
>>> +             */
>>
>> And where's the cleaning up of ->arch.old_* going to be taken care of then?
> 
> Such cleaning will be handled normally by the interrupt still having
> ->arch.move_{in_progress,cleanup_count} set.  The CPUs in
> ->arch.old_cpu_mask must not all be offline, otherwise the logic in
> fixup_irqs() would have already released the old vector.

Maybe add "Cleanup will be done normally" to the comment?

Jan

Reply via email to