I've been tracing this problem on sparc64 now that it uses hi-res timers, and I finally figured it out.
The symptom is that ksoftirqd on most of the cpus of an idle SMP system chew up around %10 of cpu time. The raise_softirq_irqoff() is constantly being invoked via tick_nohz_stop_sched_tick(), but why? Tracing revealed that delta_ticks is enormous, which is correct because no timers are pending on the cpu so we should schedule the hrtimer as far into the future as possible. tick_nohz_stop_sched_tick() then proceeds to start the hrtimer, and then it checks hrtimer_active() but for some reason this is always false. Why? The reason is that the expires calculation here: expires = ktime_add_ns(last_update, tick_period.tv64 * delta_jiffies); overflows on 64-bit systems. On 32-bit systems, LONG_MAX is a 32-bit quantity so the largest possible delta_jiffies don't cause an overflow in this multiply (which is 64-bit). (LONG_MAX determines how large a value will be returned to indicate "infinity" from get_next_timer_interrupt(), specifically it's the initialization of local variable 'expires' in __next_timer_interrupt). Because of this 'expires' is zero and of course that causes the hrtimer to not get scheduled at all. I'm surprised this problem is not seen with the x86_64 hrtimer patches applied :-) I'm not exactly sure how to best fix this, we could either make __next_timer_interrupt use INT_MAX or we could make tick_nohz_stop_sched_tick() realize and handle the overflow properly. Neither of those solutions are fully satisfactory in my opinion :-) FWIW, I've verified that using INT_MAX in __next_timer_interrupt() makes the problem go away. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/