On 11/09/2016 06:35 PM, Thomas Gleixner wrote:
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID. 
> 
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map.
> 
> There exist Virtualbox and Xen implementations which violate the spec. As a
> result the physical to logical package map, which relies on the ACPI/MP
> tables does not work on those systems, because the CPUID initialized
> physical package id does not match the firmware id. This causes system
> crashes and malfunction due to invalid package mappings.
> 
> The only way to cure this is to sanitize the physical package id after the
> CPUID enumeration and yell when the APIC ids are different. If the physical
> package IDs differ use the package information from the ACPI/MP tables so
> the existing logical package map just works.
> 
> Reported-by: "Charles (Chas) Williams" <ciwil...@brocade.com>,
> Reported-by: M. Vefa Bicakci <m....@runbox.com>
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>

Hello Thomas and Sebastian,

Sorry for the delay in reporting what I have found out and for the delay in 
testing this patch, which has been due to my health.

I have found that your patch unfortunately does not improve the situation for 
me. Here is an excerpt obtained from the dmesg of a kernel compiled with this 
patch *as well as* Sebastian's patch:

=== 8< ===
[    0.002561] CPU: Physical Processor ID: 0

[    0.002566] CPU: Processor Core ID: 0

[    0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 2

[    0.002577] [Firmware Bug]: CPU0: Using firmware package id 4095 instead of 0

[    0.002586] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8

[    0.002591] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0

[    0.033319] ftrace: allocating 28753 entries in 113 pages

[    0.040121] cpu 0 spinlock event irq 1

[    0.040145] smpboot: Max logical packages: 1

[    0.040155] VPMU disabled by hypervisor.

[    0.040181] Performance Events: unsupported p6 CPU model 42 no PMU driver, 
software events only.

[    0.047050] NMI watchdog: disabled (cpu0): hardware events not enabled

[    0.047065] NMI watchdog: Shutting down hard lockup detector on all cpus

[    0.052015] installing Xen timer for CPU 1

[    0.052074] SMP alternatives: switching to SMP code

[    0.002000] Disabled fast string operations

[    0.002000] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: ffff CPUID: 2

[    0.002000] [Firmware Bug]: CPU1: Using firmware package id 4095 instead of 0

[    0.002000] smpboot: APIC(ffff) Converting physical 4095 to logical package 0

[    0.078061] cpu 1 spinlock event irq 13

...
[    0.216404] Freeing initrd memory: 4340K (ffff880001fa7000 - 
ffff8800023e4000)

[    0.216487] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 
65535

[    0.217572] futex hash table entries: 512 (order: 3, 32768 bytes)

[    0.218293] Initialise system trusted keyrings

...
[    0.216404] Freeing initrd memory: 4340K (ffff880001fa7000 - 
ffff8800023e4000)

[    0.216487] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 
65535

[    0.217572] futex hash table entries: 512 (order: 3, 32768 bytes)

...
[    2.974474] intel_rapl: Found RAPL domain package

[    2.974489] intel_rapl: Found RAPL domain core

[    2.974498] intel_rapl: Found RAPL domain uncore

[    2.974518] intel_rapl: RAPL package 4095 domain package locked by BIOS

=== >8 ===

As you can see, your patch unfortunately does not correct the issue with the 
virtual CPU package identifiers in Xen-based virtual machines using 
para-virtualization.

In summary, the root cause of the issue for me appears to be that the boot-up 
code in the init_apic_mappings function switches the APIC 'ops' structure from 
Xen's 'ops' structure to the no-op ops structure. Due to this, the 
smp_init_package_map uses the no-op APIC ops structure's 
cpu_present_to_to_apicid function, even though it should use the corresponding 
method from Xen's APIC ops structure (i.e., xen_cpu_present_to_apicid).

Here is a dmesg excerpt with a kernel patched with Sebastian's patch (and some 
debugging code), exhibiting the issue I have just explained: (Note the 
'switched to apic NOOP' line.)

=== 8< ===
(early) [    0.000000] SFI: Simple Firmware Interface v0.81 
http://simplefirmware.org
(early) [    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
(early) [    0.000000] No local APIC present
(early) [    0.000000] APIC: disable apic facility
(early) [    0.000000] APIC: switched to apic NOOP
(early) [    0.000000] e820: [mem 0xfa000000-0xffffffff] available for PCI 
devices
(early) [    0.000000] Booting paravirtualized kernel on Xen
...
[    0.034082] mvb: kernel_init_freeable:1007: About to call smp_prepare_cpus...
[    0.034123] cpu 0 spinlock event irq 1
[    0.034138] smpboot: mvb: smp_init_package_map:372: max_physical_pkg_id, 
after set: 4096
[    0.034146] mvb: __default_cpu_present_to_apicid:612: Returning 65535! 
mps_cpu: 0, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
[    0.034155] smpboot: mvb: smp_init_package_map:379: apicid: 65535, 
apic_id_valid(apicid):0
[    0.034162] smpboot: Max logical packages: 1
[    0.034169] VPMU disabled by hypervisor.
[    0.034187] Performance Events: unsupported p6 CPU model 42 no PMU driver, 
software events only.
[    0.041131] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.041142] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.046024] installing Xen timer for CPU 1
[    0.046121] SMP alternatives: switching to SMP code
[    0.002000] Disabled fast string operations
[    0.002000] mvb: detect_ht: CPU has hyper-threading capability
[    0.002000] mvb: CPU: Physical Processor ID: 0
[    0.002000] mvb: CPU: Processor Core ID: 0
[    0.002000] mvb: identify_cpu:1112: c: ffff880013b0a040, c->logical_proc_id: 
65535
[    0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535! 
mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
[    0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 4095
[    0.002000] smpboot: APIC(ffff) Converting physical 4095 to logical package 0
[    0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, 
cpu_data(cpu).logical_proc_id: 0
...
[    0.266540] RAPL PMU: mvb: init_rapl_pmus:686: rapl pmu: max package: 1
[    0.266547] RAPL PMU: mvb: rapl pmu: CPU0 (ffff880013a0a040) belongs to 
package ID 65535.
[    0.266559] RAPL PMU: mvb: rapl pmu: Package ID >= max package for CPU0 
(ffff880013a0a040). This is an error.
[    0.266569] RAPL PMU: mvb: rapl pmu: CPU1 (ffff880013b0a040) belongs to 
package ID 0.
=== >8 ===

Through some debugging, last week I came up with the patch at the end of this 
e-mail, which does correct the issue for me. Once again, I am sorry for 
reporting this too late.

With a kernel built with the patch below, Sebastian's patch and some debugging 
code, I obtain the following dmesg output, which appears to be correct and does 
not emit RAPL-related errors during boot-up:

=== 8< ===
[    0.032083] mvb: kernel_init_freeable:1007: About to call smp_prepare_cpus...

[    0.032106] cpu 0 spinlock event irq 1

[    0.032119] smpboot: mvb: smp_init_package_map:372: max_physical_pkg_id, 
after set: 4096

[    0.032127] mvb: xen_cpu_present_to_apicid:151: cpu: 0, ret: 0

[    0.032132] smpboot: mvb: topology_update_package_map:270: cpu: 0, pkg: 0

[    0.032137] smpboot: APIC(0) Converting physical 0 to logical package 0

[    0.032142] smpboot: mvb: topology_update_package_map:305: cpu: 0, 
cpu_data(cpu).logical_proc_id: 0

[    0.032149] smpboot: mvb: smp_init_package_map:385: 
topology_update_package_map successful.

[    0.032155] smpboot: Max logical packages: 1

[    0.032161] VPMU disabled by hypervisor.

[    0.032177] Performance Events: unsupported p6 CPU model 42 no PMU driver, 
software events only.

[    0.039061] NMI watchdog: disabled (cpu0): hardware events not enabled

[    0.039102] NMI watchdog: Shutting down hard lockup detector on all cpus

[    0.045048] installing Xen timer for CPU 1

[    0.045145] SMP alternatives: switching to SMP code

[    0.002000] Disabled fast string operations

[    0.002000] mvb: detect_ht: CPU has hyper-threading capability

[    0.002000] mvb: CPU: Physical Processor ID: 0

[    0.002000] mvb: CPU: Processor Core ID: 1

[    0.002000] mvb: identify_cpu:1112: c: ffff88001790a040, c->logical_proc_id: 0

[    0.002000] mvb: xen_cpu_present_to_apicid:151: cpu: 1, ret: 0

[    0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 0

[    0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, 
cpu_data(cpu).logical_proc_id: 0

[    0.067149] cpu 1 spinlock event irq 13

...
[    0.208395] RAPL PMU: mvb: init_rapl_pmus:686: rapl pmu: max package: 1

[    0.208400] RAPL PMU: mvb: rapl pmu: CPU0 (ffff88001780a040) belongs to 
package ID 0.

[    0.208409] RAPL PMU: mvb: rapl pmu: CPU1 (ffff88001790a040) belongs to 
package ID 0.

[    0.208491] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 163840 ms 
ovfl timer

[    0.208507] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules

[    0.208512] RAPL PMU: hw unit of domain package 2^-16 Joules

[    0.208517] RAPL PMU: hw unit of domain pp1-gpu 2^-16 Joules

[    0.209083] futex hash table entries: 512 (order: 3, 32768 bytes)

=== >8 ===

My patch may not be a very elegant correction for the issue at hand, 
unfortunately (and it probably does not comply with the kernel patch submission 
guidelines).

In addition, I think Sebastian's patch should be included in the mainline 
kernel, so that potential bugs do not bring down the kernel with a RAPL-related 
oops at boot-up time with virtualization.

Please note that I will need to be offline for most of the day today (my 
current timezone is UTC+03:00), and as a result my responses to your replies 
will most likely be a bit late.

Thank you,

Vefa


>From b7097820285ab6a8588879969d74c56d890d4fd4 Mon Sep 17 00:00:00 2001
From: "M. Vefa Bicakci" <m....@runbox.com>
Date: Fri, 4 Nov 2016 10:01:19 +0300
Subject: [PATCH] Do not reset apic to apic_noop with Xen

---
 arch/x86/include/asm/apic.h | 3 +++
 arch/x86/kernel/apic/apic.c | 7 +++++--
 arch/x86/xen/apic.c         | 4 +++-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b3d4c042e610..8c37580b7eb7 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -359,6 +359,9 @@ struct apic {
  */
 extern struct apic *apic;
 
+/* Indicates whether a hypervisor has already set 'apic'. */
+extern int apic_set_by_hypervisor;
+
 /*
  * APIC drivers are probed based on how they are listed in the .apicdrivers
  * section. So the order is important and enforced by the ordering
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 076c315cdf18..a3a1d4570acf 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -172,6 +172,8 @@ unsigned long mp_lapic_addr;
 int disable_apic;
 /* Disable local APIC timer from the kernel commandline or via dmi quirk */
 static int disable_apic_timer __initdata;
+/* Indicates whether a hypervisor has already set 'apic'. */
+int apic_set_by_hypervisor;
 /* Local APIC timer works in C2 */
 int local_apic_timer_c2_ok;
 EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok);
@@ -1788,8 +1790,9 @@ void __init init_apic_mappings(void)
                return;
        }
 
-       /* If no local APIC can be found return early */
-       if (!smp_found_config && detect_init_APIC()) {
+       if (apic_set_by_hypervisor) {
+               /* Hypervisor has already taken care of the APIC. */
+       } else if (!smp_found_config && detect_init_APIC()) {
                /* lets NOP'ify apic operations */
                pr_info("APIC: disable apic facility\n");
                apic_disable();
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index 5ac792a1d419..b79a003c13a5 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -229,8 +229,10 @@ void __init xen_init_apic(void)
        x86_io_apic_ops.read = xen_io_apic_read;
        /* On PV guests the APIC CPUID bit is disabled so none of the
         * routines end up executing. */
-       if (!xen_initial_domain())
+       if (!xen_initial_domain()) {
                apic = &xen_pv_apic;
+               apic_set_by_hypervisor = 1;
+       }
 
        x86_platform.apic_post_init = xen_apic_check;
 }
-- 
2.5.5

Reply via email to