On Tue, Mar 02 2021 at 20:06, Feng Tang wrote: > On Tue, Mar 02, 2021 at 10:16:37AM +0100, Peter Zijlstra wrote: >> On Tue, Mar 02, 2021 at 10:54:24AM +0800, Feng Tang wrote: >> > clocksource watchdog runs every 500ms, which creates some OS noise. >> > As the clocksource wreckage (especially for those that has per-cpu >> > reading hook) usually happens shortly after CPU is brought up or >> > after system resumes from sleep state, so add a time limit for >> > clocksource watchdog to only run for a period of time, and make >> > sure it run at least twice for each CPU. >> > >> > Regarding performance data, there is no improvement data with the >> > micro-benchmarks we have like hackbench/netperf/fio/will-it-scale >> > etc. But it obviously reduces periodic timer interrupts, and may >> > help in following cases: >> > * When some CPUs are isolated to only run scientific or high >> > performance computing tasks on a NOHZ_FULL kernel, where there >> > is almost no interrupts, this could make it more quiet >> > * On a cluster which runs a lot of systems in parallel with >> > barriers there are always enough systems which run the watchdog >> > and make everyone else wait >> > >> > Signed-off-by: Feng Tang <feng.t...@intel.com> >> >> Urgh.. so this hopes and prays that the TSC wrackage happens in the >> first 10 minutes after boot.
which is wishful thinking.... > Yes, the 10 minutes part is only based on our past experience and we > can make it longer. But if there was real case that the wrackage happened > long after CPU is brought up like days, then this patch won't help > much. It really depends on the BIOS wreckage. On one of my machine it takes up to a day depending on the workload. Anything pre TSC_ADJUST wants the watchdog on. With TSC ADJUST available we can probably avoid it. There is a caveat though. If the machine never goes idle then TSC adjust is not able to detect a potential wreckage. OTOH, most of the broken BIOSes tweak TSC only by a few cycles and that is usually detectable during boot. So we might be clever about it and schedule a check every hour when during the first 10 minutes a modification of TSC adjust is seen on any CPU. Where is this TSC_DISABLE_WRITE bit again? Thanks, tglx