On Fri, Jun 12, 2015 at 09:36:50AM +0200, Peter Zijlstra wrote: > On Thu, Jun 11, 2015 at 07:36:07PM +0200, Frederic Weisbecker wrote: > > +static void tick_nohz_full_update_dependencies(void) > > +{ > > + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched); > > + > > + if (!posix_cpu_timers_can_stop_tick(current)) > > + ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER; > > + > > + if (!perf_event_can_stop_tick()) > > + ts->tick_needed |= TICK_NEEDED_PERF_EVENT; > > + > > + if (!sched_can_stop_tick()) > > + ts->tick_needed |= TICK_NEEDED_SCHED; > > > > #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK > > /* > > + * sched_clock_tick() needs us? > > + * > > * TODO: kick full dynticks CPUs when > > * sched_clock_stable is set. > > */ > > if (!sched_clock_stable()) { > > + ts->tick_needed |= TICK_NEEDED_CLOCK_UNSTABLE; > > /* > > * Don't allow the user to think they can get > > * full NO_HZ with this machine. > > */ > > WARN_ONCE(tick_nohz_full_running, > > "NO_HZ FULL will not work with unstable sched clock"); > > } > > #endif > > } > > Colour me confused; why does this function exist at all? Should not > these bits be managed by those respective subsystems?
So we have two choices here: 1) Something changes in a subsystem which needs the tick and that subsystem sends an IPI to the CPU that is concerned such that it changes the tick dependency state. pros: The dependency bits are always modified and read locally cons: We need to also check the subsystems from task switch because the next task may have different dependencies than prev. So that's context switch overhead 2) Whenever a subsystem changes its dependency to the tick (needs or doesn't need anymore), that subsystem remotely changes the dependency bits then sends an IPI in case we switched from "tick needed" to "tick not needed". pros: Less context switch overhead cons: Works for some subsystems for which dependency is per CPU: (scheduler) Others for which dependency is per task exclusively or system wide need more complicated treatment: posix cpu timers would then need to switch to a seperate global flag. perf depends on both a global state and a per cpu state. The flags are read remotely. This involve some ordering but no full barrier since we have the IPI. This patchset takes the simple 1) way which definetly can be improved. Perhaps we should do 2) with one global mask and one per cpu mask and all flags atomically and remotely set and clear by the relevant subsystems. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/