[ re-send to Darren Hart ]

----- On Aug 23, 2021, at 11:18 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> ----- On Aug 20, 2021, at 6:50 PM, Sean Christopherson sea...@google.com 
> wrote:
> 
>> Add a test to verify an rseq's CPU ID is updated correctly if the task is
>> migrated while the kernel is handling KVM_RUN.  This is a regression test
>> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
>> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
>> without updating rseq, leading to a stale CPU ID and other badness.
>> 
> 
> [...]
> 
> +#define RSEQ_SIG 0xdeadbeef
> 
> Is there any reason for defining a custom signature rather than including
> tools/testing/selftests/rseq/rseq.h ? This should take care of including
> the proper architecture header which will define the appropriate signature.
> 
> Arguably you don't define rseq critical sections in this test per se, but
> I'm wondering why the custom signature here.
> 
> [...]
> 
>> +
>> +static void *migration_worker(void *ign)
>> +{
>> +    cpu_set_t allowed_mask;
>> +    int r, i, nr_cpus, cpu;
>> +
>> +    CPU_ZERO(&allowed_mask);
>> +
>> +    nr_cpus = CPU_COUNT(&possible_mask);
>> +
>> +    for (i = 0; i < 20000; i++) {
>> +            cpu = i % nr_cpus;
>> +            if (!CPU_ISSET(cpu, &possible_mask))
>> +                    continue;
>> +
>> +            CPU_SET(cpu, &allowed_mask);
>> +
>> +            /*
>> +             * Bump the sequence count twice to allow the reader to detect
>> +             * that a migration may have occurred in between rseq and sched
>> +             * CPU ID reads.  An odd sequence count indicates a migration
>> +             * is in-progress, while a completely different count indicates
>> +             * a migration occurred since the count was last read.
>> +             */
>> +            atomic_inc(&seq_cnt);
> 
> So technically this atomic_inc contains the required barriers because the
> selftests
> implementation uses "__sync_add_and_fetch(&addr->val, 1)". But it's rather odd
> that
> the semantic differs from the kernel implementation in terms of memory 
> barriers:
> the
> kernel implementation of atomic_inc guarantees no memory barriers, but this 
> one
> happens to provide full barriers pretty much by accident (selftests
> futex/include/atomic.h documents no such guarantee).
> 
> If this full barrier guarantee is indeed provided by the selftests atomic.h
> header,
> I would really like a comment stating that in the atomic.h header so the 
> carpet
> is
> not pulled from under our feet by a future optimization.
> 
> 
>> +            r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
>> +            TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
>> +                        errno, strerror(errno));
>> +            atomic_inc(&seq_cnt);
>> +
>> +            CPU_CLR(cpu, &allowed_mask);
>> +
>> +            /*
>> +             * Let the read-side get back into KVM_RUN to improve the odds
>> +             * of task migration coinciding with KVM's run loop.
> 
> This comment should be about increasing the odds of letting the seqlock
> read-side
> complete. Otherwise, the delay between the two back-to-back atomic_inc is so
> small
> that the seqlock read-side may never have time to complete the reading the 
> rseq
> cpu id and the sched_getcpu() call, and can retry forever.
> 
> I'm wondering if 1 microsecond is sufficient on other architectures as well. 
> One
> alternative way to make this depend less on the architecture's implementation 
> of
> sched_getcpu (whether it's a vDSO, or goes through a syscall) would be to read
> the rseq cpu id and call sched_getcpu a few times (e.g. 3 times) in the
> migration
> thread rather than use usleep, and throw away the value read. This would 
> ensure
> the delay is appropriate on all architectures.
> 
> Thanks!
> 
> Mathieu
> 
>> +             */
>> +            usleep(1);
>> +    }
>> +    done = true;
>> +    return NULL;
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +    struct kvm_vm *vm;
>> +    u32 cpu, rseq_cpu;
>> +    int r, snapshot;
>> +
>> +    /* Tell stdout not to buffer its content */
>> +    setbuf(stdout, NULL);
>> +
>> +    r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask);
>> +    TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
>> +                strerror(errno));
>> +
>> +    if (CPU_COUNT(&possible_mask) < 2) {
>> +            print_skip("Only one CPU, task migration not possible\n");
>> +            exit(KSFT_SKIP);
>> +    }
>> +
>> +    sys_rseq(0);
>> +
>> +    /*
>> +     * Create and run a dummy VM that immediately exits to userspace via
>> +     * GUEST_SYNC, while concurrently migrating the process by setting its
>> +     * CPU affinity.
>> +     */
>> +    vm = vm_create_default(VCPU_ID, 0, guest_code);
>> +
>> +    pthread_create(&migration_thread, NULL, migration_worker, 0);
>> +
>> +    while (!done) {
>> +            vcpu_run(vm, VCPU_ID);
>> +            TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
>> +                        "Guest failed?");
>> +
>> +            /*
>> +             * Verify rseq's CPU matches sched's CPU.  Ensure migration
>> +             * doesn't occur between sched_getcpu() and reading the rseq
>> +             * cpu_id by rereading both if the sequence count changes, or
>> +             * if the count is odd (migration in-progress).
>> +             */
>> +            do {
>> +                    /*
>> +                     * Drop bit 0 to force a mismatch if the count is odd,
>> +                     * i.e. if a migration is in-progress.
>> +                     */
>> +                    snapshot = atomic_read(&seq_cnt) & ~1;
>> +                    smp_rmb();
>> +                    cpu = sched_getcpu();
>> +                    rseq_cpu = READ_ONCE(__rseq.cpu_id);
>> +                    smp_rmb();
>> +            } while (snapshot != atomic_read(&seq_cnt));
>> +
>> +            TEST_ASSERT(rseq_cpu == cpu,
>> +                        "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
>> +    }
>> +
>> +    pthread_join(migration_thread, NULL);
>> +
>> +    kvm_vm_free(vm);
>> +
>> +    sys_rseq(RSEQ_FLAG_UNREGISTER);
>> +
>> +    return 0;
>> +}
>> --
>> 2.33.0.rc2.250.ged5fa647cd-goog
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Reply via email to