On 26.07.2019 11:46, Andrew Cooper wrote:
> On 24/07/2019 12:26, Juergen Gross wrote:
>> @@ -182,30 +178,24 @@ static void __prepare_to_wait(struct waitqueue_vcpu 
>> *wqv)
>>   static void __finish_wait(struct waitqueue_vcpu *wqv)
>>   {
>>       wqv->esp = NULL;
>> -    (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity);
>> +    vcpu_temporary_affinity(current, NR_CPUS, VCPU_AFFINITY_WAIT);
>>   }
>>   
>>   void check_wakeup_from_wait(void)
>>   {
>> -    struct waitqueue_vcpu *wqv = current->waitqueue_vcpu;
>> +    struct vcpu *curr = current;
>> +    struct waitqueue_vcpu *wqv = curr->waitqueue_vcpu;
>>   
>>       ASSERT(list_empty(&wqv->list));
>>   
>>       if ( likely(wqv->esp == NULL) )
>>           return;
>>   
>> -    /* Check if we woke up on the wrong CPU. */
>> -    if ( unlikely(smp_processor_id() != wqv->wakeup_cpu) )
>> +    /* Check if we are still pinned. */
>> +    if ( unlikely(!(curr->affinity_broken & VCPU_AFFINITY_WAIT)) )
>>       {
>> -        /* Re-set VCPU affinity and re-enter the scheduler. */
>> -        struct vcpu *curr = current;
>> -        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
>> -        if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
>> -        {
>> -            gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
>> -            domain_crash(current->domain);
>> -        }
>> -        wait(); /* takes us back into the scheduler */
>> +        gdprintk(XENLOG_ERR, "vcpu affinity lost\n");
>> +        domain_crash(curr->domain);
>>       }
> 
> I'm sorry to retract my R-by after the fact, but I've only just noticed
> (while rebasing some of my pending work over this) that it is buggy.
> 
> The reason wait() was called is because it is not safe to leave that
> if() clause.
> 
> With this change in place, we'll arrange for the VM to be crashed, then
> longjump back into the stack from from the waiting vCPU, on the wrong
> CPU.  Any caller with smp_processor_id() or thread-local variables cache
> by pointer on the stack will then cause memory corruption.
> 
> Its not immediately obvious how to fix this, but bear in mind that as
> soon as the vm-event interface is done, I plan to delete this whole
> waitqueue infrastructure anyway.

In which case - should we revert the commit until this is resolved?

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to