Hi James, If I understand the issue correctly, then the race is:
step 1: cpu 1: starts a new rcu batch (i.e. rcp->cur++, smb_mb) step 2: cpu 2: completes the quiet state step 3: cpu 2: reads pointer 0x123 (ptr to a rcu protected struct) step 4: cpu 3: call_rcu(0x123): rcu protected struct added to rdp->nxtlist step 5: cpu 3: moves a new batch into rdp->curlist, rdp->batch = rcp->cur+1. xxxxxxxxxxxxxxx Problem: where is the smp_rmb() that guarantees that xxxxxxxxxxxxxxx update to rcp->cur from step 1 is seen by cpu 3? step 6: cpu 3: completes quiet state step 7: cpu 3: struct 0x123 destroyed step 8: cpu 2: accesses pointer 0x123, but the struct is already destroyed James: Is that the race? I agree with Paul, there are smb_rmb's on cpu 3 between Step 1 and Step 5: Either the test_and_set_bit in tasklet_action for rcu_process_callback if step 4 happens before the tasklet or somewhere in the irq handler path if step 4 happens in an irq handler that interrupted rcu_process_callback. Thus theoretically no additional smb_rmb() should be necessary. What is missing is proper documentation. I'm analyzing the code right now: Is it really true that typically a cpu only completes data in every other rcu cycle? I.e. that most structures are stored in the rcu callback list until two quiet states happened? I've tried to track the values of rcp->cur and rdp->batch. If next_pending is set, then cpu_quiet() immetiately starts the next rcu cycle and a cpu cannot both complete the currently pending rcu callbacks and add new callbacks to the next cycle, thus a cpu only takes part in every other rcu cycle. The oocalc file is at http://www.colorfullife.com/~manfred/rcu.ods http://www.colorfullife.com/~manfred/rcu.pdf Is that analysis correct? Perhaps the whole code should be rewritten? -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/