While studying why it takes 0.06s to bring up every cpu, which accounts to 15.36s on 256 cpu system, I determined that it is all because of calibrate_delay() call.
After, studying code further I found that there are bugs in the current code: If tsc is enabled, and cpu has TSC_CONSTANT feature, and a cpu is in the same core has already been calibrated, we do not need to calibrate again: This check is done here: calibrate_delay() calibrate_delay_is_known() But, calibrate_delay() is called before topology for new cpu is updated, so we never actually take the optimized path. The second bug, is that inside calibrate_delay_is_known() there is branch like this: if (!tsc_disabled && !cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC)) return 0; But the logic is broken, it should be: if (tsc_disabled || !cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC)) return 0; Fixes: c25323c07345 ("x86/tsc: Use topology functions") Signed-off-by: Pavel Tatashin <pasha.tatas...@oracle.com> --- arch/x86/kernel/smpboot.c | 13 ++++++++----- arch/x86/kernel/tsc.c | 6 ++---- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index ad59edd84de7..e7a3bab6818b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -193,6 +193,14 @@ static void smp_callin(void) */ smp_store_cpu_info(cpuid); + /* + * This must be done before setting cpu_online_mask + * or calling notify_cpu_starting. + * And also before calibrate_delay(), as the information about topology + * is used to determine if calibration is needed. + */ + set_cpu_sibling_map(raw_smp_processor_id()); + /* * Get our bogomips. * Update loops_per_jiffy in cpu_data. Previous call to @@ -203,11 +211,6 @@ static void smp_callin(void) cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy; pr_debug("Stack at about %p\n", &cpuid); - /* - * This must be done before setting cpu_online_mask - * or calling notify_cpu_starting. - */ - set_cpu_sibling_map(raw_smp_processor_id()); wmb(); notify_cpu_starting(cpuid); diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 796d96bb0821..a99cde96201f 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1346,12 +1346,10 @@ void __init tsc_init(void) unsigned long calibrate_delay_is_known(void) { int sibling, cpu = smp_processor_id(); + int constant_tsc = cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC); struct cpumask *mask = topology_core_cpumask(cpu); - if (!tsc_disabled && !cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC)) - return 0; - - if (!mask) + if (tsc_disabled || !constant_tsc || !mask) return 0; sibling = cpumask_any_but(mask, cpu); -- 2.14.3