Hi, As a simple query, Is there a way to skip current available clock source (hpet) and allow to pick the next one ? I guess this will solve our purpose.
Thanks, Pintu On Fri, Apr 6, 2018 at 8:37 PM, Pintu Kumar <pintu.p...@gmail.com> wrote: > Hi, > > First the few details: > Kernel: 4.9.20 > Machine: x86_64 (AMD) > Model: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz > Cores: 8 > Available clock source: > # cat /sys/devices/system/clocksource/clocksource0/available_clocksource > tsc hpet acpi_pm > > Problem: > [ 28.027409] NMI watchdog: Watchdog detected hard LOCKUP on cpu > 1dModules linked in:c > [ 28.136317] RIP: 0010:[<ffffffff98058c43>] c [<ffffffff98058c43>] > read_hpet+0xb3/0x120 > [...] > > ------------------ > This lockup happens during boot when the cpu is stuck for about ~28 seconds. > This is because of our internal code changes. > During our init function we are running some calibrate loops > 10,000,000 (10MHz) times twice. > The LOCKUP is coming because of this loop. > > But, we observed that the main issue is the clock source that is > available at that time. > At the time this loop is executed, the available clock source is HPET (not > TSC). > With HPET the loop runs slower. It takes almost 28 seconds to complete > with HPET clock source. Hence the boot time also increase by 28 > seconds. > Where as with TSC the loop completes in less than 4 seconds. So, with > TSC we dont get the LOCKUP. > > Thus, the lockup is happening only because the loop executes with HPET > clock source. > > To fix the problem, I tried the following approach: > 1) Use late_initcall for our driver init to delay the call until TSC > clock source is ready. > => With this there is no LOCKUP trace and no impact on boot time. > This is because the loop executes with TSC. > > 2) We have 2 loops. So I split the local_irq_save/restore part for > each loops separately. > => With this also there is no backtrace seen. > => But boot time is increased. > > 3) I used delayed_workqueue to delay the execution of the loop by 5 > seconds, until TSC is ready. > => With this there is no back trace and also boot time is normal. > => But if we disable TSC then we still get the back trace. > > 4) Disabled HPET from kernel command line using : hpet=disable > => This also works as the loop executes with the next available > clock source: acpi_pm > => But changing boot args is not recommended in our case. > > 5) Disable HPET related configs in kernel > => CONFIG_HPET=n > => CONFIG_HPET_TIMER=n > => This method does not work as we were not able to disable > HPET_TIMER on x86_64. > > 6) Use hpet_disable() from our code. > => This method also does not work. It actually does not disable > HPET clock source. > > > ----------------------------- > Thus we wanted to know your opinion which is the right solution to fix > this lockup during boot time. > > Is there a way to purposefully fallback to next available clock source > (acpi_pm) instead of hpet, from the source code, before executing our > loop ? > > > Please let me know if there are alternate options. > > > > Thanks, > Pintu