On 10/18/19, 2:41 PM, "Thomas Gleixner" <t...@linutronix.de> wrote:
> On Wed, 2 Oct 2019, Michael Zhivich wrote: > > Introduction of clocksource_tsc_early broke functionality of "tsc=reliable" > > and "tsc=nowatchdog" boot params, since clocksource_tsc_early is *always* > > registered with CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list. > > > > If CPU is very busy during boot, the watchdog clocksource may not be > > read frequently enough, resulting in a large difference that is treated as > > "negative" by clocksource_delta() and incorrectly truncated to 0. > > What? > > > clocksource: timekeeping watchdog on CPU1: Marking clocksource > > 'tsc-early' as unstable because the skew is too large: > > clocksource: 'refined-jiffies' wd_now: fffb7019 wd_last: fffb6e28 > > 0xfffb7019 - 0xfffb6e28 = 497 > > What's treated negative there? And what would truncate that to 0? > > > mask: ffffffff > > A 'negative delta' value can only happen when the clocksource is not read > before it advanced more than mask/2. For jiffies this means 2^31 > ticks. That would be ~248 days for HZ=100 or ~24 days for HZ=1000. > > > clocksource: 'tsc-early' cs_now: 52c3918b89d6 cs_last: 52c31d736d2e > > 0x52c3918b89d6 - 0x52c31d736d2e = 1.94774e+09 > > Again nothing is treated negative here, but the point is that the TSC > advanced by ~2e9 cycles while jiffies advanced by 497. > > How that's related, I can't tell because I don't know the TSC frequency of > your machine. HZ is probably 1000 as the watchdog period is HZ/2. > > > mask: ffffffffffffffff > > tsc: Marking TSC unstable due to clocksource watchdog > > Even if the watchdog is not read out for a quite some time, the math in > there and in clocksource_delta() can handle pretty large deltas. > > The watchdog triggers when > > abs(delta_watchdog - delta_tsc) > 0.0625 seconds > > So that has absolutely nothing to do with CPU being busy and watchdog not > being serviced. jiffies and TSC drift apart for some reason, and that > reason wants to be documented in the changelog. > > That said, I have no objections against the patch itself, but I'm not going > to apply a patch with a fairy tale changelog. > > Thanks, > > tglx > Thanks for taking a look. I agree that the commit message explanation is bogus; I had incorrectly assumed that the issue was similar to what I've seen with acpi_pm based watchdog before and didn't check the numbers carefully. I'll rework the commit message and resubmit. Thanks, ~ Michael
smime.p7s
Description: S/MIME cryptographic signature