On Fri, 18 Nov 2016 06:08:45 -0800 "Paul E. McKenney" <paul...@linux.vnet.ibm.com> wrote: > However, let's first take a look at the overflow issue. > > If a given program could have ULONG_MAX or more readers at any given > time, there would of course be overflow. However, each read must have > an srcu_read_lock() outstanding, and the resulting four-byte return > value must be stored somewhere. Because the full address space is at > most ULONG_MAX, the maximum number of outstanding readers is at most > ULONG_MAX/4, even in the degenerate case where a single CPU/task invokes > srcu_read_lock() in a tight loop. And even this assumes that the entire > address space can somehow be devoted to srcu_read_lock() return values. > ULONG_MAX/4 is therefore a hard upper bound on the number of outstanding > SRCU readers. I agree that there should be at most ULONG_MAX/4 active readers at a time, as long as the compiler doesn't do something crazy like noticing that srcu_read_lock() always returns 0 or 1 and then packing the return values into smaller variables.
> Now srcu_readers_active_idx_check() checks for strict equality between > the number of locks and unlocks, so we can in theory tolerate ULONG_MAX-1 > readers. So, the question is whether ULONG_MAX/4 readers can result > in the updater seeing ULONG_MAX reads, due to memory misordering and > other issues. > > Because there are no memory barriers surrounding srcu_flip(), the updater > could miss an extremely large number of srcu_read_unlock()s. However, > each missed srcu_read_unlock() must have a corresponding srcu_read_lock(), > and there is a full memory barrier between between the srcu_flip() and > the read of the lock count. There is also a full barrier between any > srcu_read_lock()'s increment of the lock count and that CPU's/task's next > srcu_read_lock()'s fetch of the index. Therefore, if the updater misses > counting a given srcu_read_lock(), that CPU's/task's next srcu_read_lock() > must see the new value of the index. Because srcu_read_lock() disables > preemption across the index fetch and the lock increment, there can be at > most NR_CPUS-1 srcu_read_lock() calls that missed the recent srcu_flip()'s > update to the index. (I said NR_CPUS earlier, but Mathieu is correct > in pointing out that srcu_flip() has to run somewhere.) The trouble is that disabling preemption is not enough to ensure that there is at most one srcu_read_lock() call per CPU that missed the srcu_flip(). Define a reader to be an SRCU lock+unlock pair. A reader is called active if it has incremented ->lock_count[] but hasn't incremented ->unlock_count[] yet, and completed if it has incremented ->unlock_count[]. I think that we only want to limit the number of active readers and the number of CPUs. In particular, I don't think there should be a limit on the rate of completion of read side critical section. The goal of srcu_readers_active_idx_check() is to verify that there were zero active readers on the inactive index at some time during its execution. To do this, it totals the unlock counts, executes a memory barrier, totals the lock counts, and takes the difference. This difference counts the readers that are active when srcu_readers_lock_idx() gets to their CPU, plus the readers that completed after srcu_readers_unlock_idx() and before srcu_readers_lock_idx(). If the true (infinite precision) value of the difference is zero, then there were no active readers at some point while srcu_readers_lock_idx() is running. However, the difference is only stored in a long, so there is a potential for overflow if too many readers complete during srcu_readers_active_idx_check(). For example, let's say there are three threads, each running on their own CPU: int data, flag; struct srcu_struct *sp = /* ... */; Thread 0: data = 1; synchronize_srcu(sp); flag = 1; Thread 1: int data_r, flag_r; int idx = srcu_read_lock(sp); data_r = data; flag_r = flag; srcu_read_unlock(sp, idx); BUG_ON(flag_r == 1 && data_r == 0); Thread 2: while (true) { int idx = srcu_read_lock(sp); srcu_read_unlock(sp, idx); } Let's take the following execution order. Thread 1 increments the CPU 1 version of sp->lock_count[0], sets idx to zero, and loads data (0) into data_r. Thread 0 then sets data to be 1, verifies that there are no readers on index 1, and increments sp->completed, but the CPU actually doesn't preform the last operation, putting it off until the next memory barrier. Thread 0 then calls srcu_readers_active_idx_check() on index 0, which runs srcu_readers_unlock_idx() (returning 0). Right after srcu_readers_unlock_idx() completes, thread 2 starts running. Since Thread 0 hasn't actually incremented sp->completed in any way that is visible to thread 2, srcu_read_lock() will still return 0. Thread 2 can then run for ULONG_MAX iterations, setting the CPU 2 version of sp->unlock_count[0] to ULONG_MAX. CPU 0 then finally gets around to incrementing sp->completed, runs its memory barrier, and then reads the lock counts: 1, 0, ULONG_MAX. The total of ULONG_MAX+1 will overflow to 0 and compare equal with earlier unlock count. Thread 0 will then think that the grace period is over and set flag to one. Thread 1 can then read flag (1) into flag_r and run srcu_read_unlock(). The BUG_ON statement will then fail. Although ULONG_MAX readers completed during srcu_readers_active_idx_check(), there were at most 2 active readers at any time, so this program doesn't run into any limit. I hope that was clear enough. Thanks, Lance