On Wed, 20 Feb 2013, Jason Liu wrote: > void arch_idle(void) > { > .... > clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); > > enter_the_wait_mode(); > > clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); > } > > when the broadcast timer interrupt arrives(this interrupt just wakeup > the ARM, and ARM has no chance > to handle it since local irq is disabled. In fact it's disabled in > cpu_idle() of arch/arm/kernel/process.c) > > the broadcast timer interrupt will wake up the CPU and run: > > clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); -> > tick_broadcast_oneshot_control(...); > -> > tick_program_event(dev->next_event, 1); > -> > tick_dev_program_event(dev, expires, force); > -> > for (i = 0;;) { > int ret = clockevents_program_event(dev, expires, now); > if (!ret || !force) > return ret; > > dev->retries++; > .... > now = ktime_get(); > expires = ktime_add_ns(now, dev->min_delta_ns); > } > clockevents_program_event(dev, expires, now); > > delta = ktime_to_ns(ktime_sub(expires, now)); > > if (delta <= 0) > return -ETIME; > > when the bc timer interrupt arrives, which means the last local timer > expires too. so, > clockevents_program_event will return -ETIME, which will cause the > dev->retries++ > when retry to program the expired timer. > > Even under the worst case, after the re-program the expired timer, > then CPU enter idle > quickly before the re-progam timer expired, it will make system > ping-pang forever,
That's nonsense. The timer IPI brings the core out of the deep idle state. So after returning from enter_wait_mode() and after calling clockevents_notify() it returns from arch_idle() to cpu_idle(). In cpu_idle() interrupts are reenabled, so the timer IPI handler is invoked. That calls the event_handler of the per cpu local clockevent device (the one which stops in C3). That ends up in the generic timer code which expires timers and reprograms the local clock event device with the next pending timer. So you cannot go idle again, before the expired timers of this event are handled and their callbacks invoked. Now the reprogramming itself is a different issue. It's necessary as we have no information whether the wakeup was caused by the timer IPI or by something else. We might try to be clever and store the pending IPI in tick_handle_oneshot_broadcast() and avoid the reprogramming in that case. Completely untested patch below. Thanks, tglx Index: linux-2.6/kernel/time/tick-broadcast.c =================================================================== --- linux-2.6.orig/kernel/time/tick-broadcast.c +++ linux-2.6/kernel/time/tick-broadcast.c @@ -29,6 +29,7 @@ static struct tick_device tick_broadcast_device; /* FIXME: Use cpumask_var_t. */ static DECLARE_BITMAP(tick_broadcast_mask, NR_CPUS); +static DECLARE_BITMAP(tick_broadcast_pending, NR_CPUS); static DECLARE_BITMAP(tmpmask, NR_CPUS); static DEFINE_RAW_SPINLOCK(tick_broadcast_lock); static int tick_broadcast_force; @@ -417,9 +418,10 @@ again: /* Find all expired events */ for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) { td = &per_cpu(tick_cpu_device, cpu); - if (td->evtdev->next_event.tv64 <= now.tv64) + if (td->evtdev->next_event.tv64 <= now.tv64) { cpumask_set_cpu(cpu, to_cpumask(tmpmask)); - else if (td->evtdev->next_event.tv64 < next_event.tv64) + set_bit(cpu, tick_broadcast_pending); + } else if (td->evtdev->next_event.tv64 < next_event.tv64) next_event.tv64 = td->evtdev->next_event.tv64; } @@ -493,7 +495,8 @@ void tick_broadcast_oneshot_control(unsi cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT); - if (dev->next_event.tv64 != KTIME_MAX) + if (dev->next_event.tv64 != KTIME_MAX && + !test_and_clear_bit(cpu, tick_broadcast_pending)) tick_program_event(dev->next_event, 1); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/