2015-07-27 18:46, rsanford2 at gmail.com: > From: Robert Sanford <rsanford at akamai.com> > > This patchset fixes a bug in timer stress test 2, adds a new stress test > to expose a race condition bug in API rte_timer_manage(), and then fixes > the rte_timer_manage() bug. > > Description of rte_timer_manage() race condition bug: Through code > inspection, we notice a potential problem in rte_timer_manage() that > leads to corruption of per-lcore pending-lists (implemented as > skip-lists). The race condition occurs when rte_timer_manage() expires > multiple timers on lcore A, while lcore B simultaneously invokes > rte_timer_reset() for one of the expiring timers (other than the first > one). > > Lcore A splits its pending-list, creating a local list of expired timers > linked through their sl_next[0] pointers, and sets the first expired > timer to the RUNNING state, all during one list-lock round trip. > Lcore A then unlocks the list-lock to run the first callback, and that > is when A and B can have different interpretations of the subsequent > expired timers' true state. Lcore B sees an expired timer still in the > PENDING state, atomically changes the timer to the CONFIG state, locks > lcore A's list-lock, and reinserts the timer into A's pending-list. > The two lcores try to use the same next-pointers to maintain both lists! > > v2 changes: > Move patch descriptions to their respective patches. > Correct checkpatch warnings.
Applied, thanks