When a cpu requests broadcasting, before starting the tick broadcast hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide the required synchronization when the callback is active on other core. The callback could have already executed tick_handle_oneshot_broadcast() and could have also returned. But still there is a small time window where the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns without doing anything, but the next_event of the tick broadcast clock device is already set to a timeout value.
In the race condition diagram below, CPU #1 is running the timer callback and CPU #2 is entering idle state and so calls bc_set_next(). In the worst case, the next_event will contain an expiry time, but the hrtimer will not be started which happens when the racing callback returns HRTIMER_NORESTART. The hrtimer might never recover if all further requests from the CPUs to subscribe to tick broadcast have timeout greater than the next_event of tick broadcast clock device. This leads to cascading of failures and finally noticed as rcu stall warnings. Here is a depiction of the race condition CPU #1 (Running timer callback) CPU #2 (Enter idle and subscribe to tick broadcast) --------------------- --------------------- __run_hrtimer() tick_broadcast_enter() bc_handler() __tick_broadcast_oneshot_control() tick_handle_oneshot_broadcast() raw_spin_lock(&tick_broadcast_lock); dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock //next_event for tick broadcast clock set to KTIME_MAX since no other cores subscribed to tick broadcasting raw_spin_unlock(&tick_broadcast_lock); if (dev->next_event == KTIME_MAX) return HRTIMER_NORESTART // callback function exits without restarting the hrtimer //tick_broadcast_lock acquired raw_spin_lock(&tick_broadcast_lock); tick_broadcast_set_event() clockevents_program_event() dev->next_event = expires; bc_set_next() hrtimer_try_to_cancel() //returns -1 since the timer callback is active. Exits without restarting the timer cpu_base->running = NULL; In the fix, if hrtimer_try_to_cancel returns callback is active, an additional check is added in bc_set_next() to know the state of the timer callback. If the timer callback has already released the tick_broadcast_lock, then the bc_set_next() can safely call hrtimer_cancel() to cancel the timer and restart it. When there are no more cpus subscribed to broadcast, the timer callback might not set the expiry time for hrtimer. Therefore the callback timer function is modified to set the state of broadcast clock to CLOCK_EVT_STATE_ONESHOT_STOPPED which in turn will set the expiry time of hrtimer to KTIME_MAX. Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanan...@mentor.com> --- kernel/time/tick-broadcast-hrtimer.c | 58 ++++++++++++++++++++++------ kernel/time/tick-broadcast.c | 2 + 2 files changed, 48 insertions(+), 12 deletions(-) diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c index c1f5bb590b5e..09644403a320 100644 --- a/kernel/time/tick-broadcast-hrtimer.c +++ b/kernel/time/tick-broadcast-hrtimer.c @@ -42,18 +42,13 @@ static int bc_shutdown(struct clock_event_device *evt) */ static int bc_set_next(ktime_t expires, struct clock_event_device *bc) { + ktime_t now; + ktime_t bc_expiry; int bc_moved; + /* - * We try to cancel the timer first. If the callback is on - * flight on some other cpu then we let it handle it. If we - * were able to cancel the timer nothing can rearm it as we - * own broadcast_lock. - * - * However we can also be called from the event handler of - * ce_broadcast_hrtimer itself when it expires. We cannot - * restart the timer because we are in the callback, but we - * can set the expiry time and let the callback return - * HRTIMER_RESTART. + * We try to cancel the timer first. If we were able to cancel + * the timer nothing can rearm it as we own broadcast_lock. * * Since we are in the idle loop at this point and because * hrtimer_{start/cancel} functions call into tracing, @@ -72,9 +67,47 @@ static int bc_set_next(ktime_t expires, struct clock_event_device *bc) if (bc_moved) { /* Bind the "device" to the cpu */ bc->bound_on = smp_processor_id(); - } else if (bc->bound_on == smp_processor_id()) { - hrtimer_set_expires(&bctimer, expires); + } else { + if (bc->bound_on == smp_processor_id()) { + /* We are called from the callback timer itself, we + * just set expiry time and let the callback return + * HRTIMER_RESTART + */ + hrtimer_set_expires(&bctimer, expires); + } else { + now = ktime_get(); + bc_expiry = hrtimer_get_expires(&bctimer); + + /* If the expiry time of the tick broadcast hrtimer is + * in future means that hrtimer callback is running on + * another core which has set the expiry time for the + * broadcast timer. We are holding the broadcast_lock + * means the timer callback has already released it. + * We can now safely cancel the timer and restart it. + */ + if (bc_expiry > now) { + RCU_NONIDLE( + { + hrtimer_cancel(&bctimer); + hrtimer_start(&bctimer, expires, + HRTIMER_MODE_ABS_PINNED_HARD); + } + ); + bc->bound_on = smp_processor_id(); + } + /* Else the hrtimer callback is waiting on the other + * core for the broadcast_lock. There is no need for + * us to do anything + */ + } } + + return 0; +} + +static int bc_set_oneshot_stopped(struct clock_event_device *bc) +{ + hrtimer_set_expires(&bctimer, KTIME_MAX); return 0; } @@ -85,6 +118,7 @@ static struct clock_event_device ce_broadcast_hrtimer = { .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_KTIME | CLOCK_EVT_FEAT_HRTIMER, + .set_state_oneshot_stopped = bc_set_oneshot_stopped, .rating = 0, .bound_on = -1, .min_delta_ns = 1, diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index e51778c312f1..28f04527ff54 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -672,6 +672,8 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) */ if (next_event != KTIME_MAX) tick_broadcast_set_event(dev, next_cpu, next_event); + else + clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); raw_spin_unlock(&tick_broadcast_lock); -- 2.17.1