From: Robert Sanford <rsanf...@akamai.com> This patchset fixes a bug in timer stress test 2, adds a new stress test to expose a race condition bug in API rte_timer_manage(), and then fixes the rte_timer_manage() bug.
Description of rte_timer_manage() race condition bug: Through code inspection, we notice a potential problem in rte_timer_manage() that leads to corruption of per-lcore pending-lists (implemented as skip-lists). The race condition occurs when rte_timer_manage() expires multiple timers on lcore A, while lcore B simultaneously invokes rte_timer_reset() for one of the expiring timers (other than the first one). Lcore A splits its pending-list, creating a local list of expired timers linked through their sl_next[0] pointers, and sets the first expired timer to the RUNNING state, all during one list-lock round trip. Lcore A then unlocks the list-lock to run the first callback, and that is when A and B can have different interpretations of the subsequent expired timers' true state. Lcore B sees an expired timer still in the PENDING state, atomically changes the timer to the CONFIG state, locks lcore A's list-lock, and reinserts the timer into A's pending-list. The two lcores try to use the same next-pointers to maintain both lists! v2 changes: Move patch descriptions to their respective patches. Correct checkpatch warnings. Robert Sanford (3): fix stress test 2 sync bug add timer manage race condition test fix race condition in rte_timer_manage app/test/Makefile | 1 + app/test/test_timer.c | 154 +++++++++++++++++++++++------- app/test/test_timer_racecond.c | 209 ++++++++++++++++++++++++++++++++++++++++ lib/librte_timer/rte_timer.c | 56 +++++++---- 4 files changed, 366 insertions(+), 54 deletions(-) create mode 100644 app/test/test_timer_racecond.c