** Patch added: "0001-UBUNTU-SAUCE-arm64-Kconfig-Disable-ACPI_HOTPLUG_CPU.patch"
   
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2088047/+attachment/5840165/+files/0001-UBUNTU-SAUCE-arm64-Kconfig-Disable-ACPI_HOTPLUG_CPU.patch

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2088047

Title:
  log_check / kernel_tainted test from ubuntu_boot failed on Oracular
  AWS a1.metal

Status in ubuntu-kernel-tests:
  New

Bug description:
  Found on Oracular/6.11.0-11.11 boot testing on AWS a1.metal instance.
  The relevant console log excerpts:

  -----(snip)-----
  06:55:12 INFO | 2024-11-09T06:51:17.584884+00:00 ip-172-31-6-235 kernel: 
cpuinfo: failed to register hotplug callbacks.
  -----(snip)-----
  06:55:12 INFO | 2024-11-09T06:51:17.584978+00:00 ip-172-31-6-235 kernel: 
------------[ cut here ]------------
  06:55:12 INFO | 2024-11-09T06:51:17.584980+00:00 ip-172-31-6-235 kernel: 
WARNING: CPU: 7 PID: 1 at fs/sysfs/group.c:128 internal_create_group+0xc4/0x380
  06:55:12 INFO | 2024-11-09T06:51:17.584981+00:00 ip-172-31-6-235 kernel: 
Modules linked in:
  06:55:12 INFO | 2024-11-09T06:51:17.584983+00:00 ip-172-31-6-235 kernel: CPU: 
7 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-11-generic #11-Ubuntu
  06:55:12 INFO | 2024-11-09T06:51:17.584984+00:00 ip-172-31-6-235 kernel: 
Hardware name: Amazon EC2 a1.metal/Not Specified, BIOS 1.0 10/16/2017
  06:55:12 INFO | 2024-11-09T06:51:17.584985+00:00 ip-172-31-6-235 kernel: 
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  06:55:12 INFO | 2024-11-09T06:51:17.584987+00:00 ip-172-31-6-235 kernel: pc : 
internal_create_group+0xc4/0x380
  06:55:12 INFO | 2024-11-09T06:51:17.584989+00:00 ip-172-31-6-235 kernel: lr : 
sysfs_create_group+0x24/0x50
  06:55:12 INFO | 2024-11-09T06:51:17.584993+00:00 ip-172-31-6-235 kernel: sp : 
ffff80008009bb90
  06:55:12 INFO | 2024-11-09T06:51:17.584995+00:00 ip-172-31-6-235 kernel: x29: 
ffff80008009bba0 x28: 0000000000000000 x27: ffff19093bd33ca8
  06:55:12 INFO | 2024-11-09T06:51:17.584997+00:00 ip-172-31-6-235 kernel: x26: 
0000000000000000 x25: ffff436d28704000 x24: ffffd59c11b04a88
  06:55:12 INFO | 2024-11-09T06:51:17.584998+00:00 ip-172-31-6-235 kernel: x23: 
0000000000000000 x22: ffffd59c14046768 x21: ffffd59c1362fca8
  06:55:12 INFO | 2024-11-09T06:51:17.585000+00:00 ip-172-31-6-235 kernel: x20: 
0000000000000036 x19: 0000000000000004 x18: ffff800080095060
  06:55:12 INFO | 2024-11-09T06:51:17.585001+00:00 ip-172-31-6-235 kernel: x17: 
0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  06:55:12 INFO | 2024-11-09T06:51:17.585003+00:00 ip-172-31-6-235 kernel: x14: 
0000000000000000 x13: 0000000000000000 x12: 0000000000000000
  06:55:12 INFO | 2024-11-09T06:51:17.585006+00:00 ip-172-31-6-235 kernel: x11: 
0000000000000000 x10: 0000000000000000 x9 : ffffd59c1128fc4c
  06:55:12 INFO | 2024-11-09T06:51:17.585008+00:00 ip-172-31-6-235 kernel: x8 : 
0101010101010101 x7 : 0000000000000000 x6 : 0000000000000000
  06:55:12 INFO | 2024-11-09T06:51:17.585010+00:00 ip-172-31-6-235 kernel: x5 : 
0000000000000000 x4 : 0000000000000000 x3 : ffff1902003fa280
  06:55:12 INFO | 2024-11-09T06:51:17.585011+00:00 ip-172-31-6-235 kernel: x2 : 
ffffd59c12648f88 x1 : 0000000000000000 x0 : 0000000000000000
  06:55:12 INFO | 2024-11-09T06:51:17.585012+00:00 ip-172-31-6-235 kernel: Call 
trace:
  06:55:12 INFO | 2024-11-09T06:51:17.585013+00:00 ip-172-31-6-235 kernel:  
internal_create_group+0xc4/0x380
  06:55:12 INFO | 2024-11-09T06:51:17.585014+00:00 ip-172-31-6-235 kernel:  
sysfs_create_group+0x24/0x50
  06:55:12 INFO | 2024-11-09T06:51:17.585015+00:00 ip-172-31-6-235 kernel:  
topology_add_dev+0x28/0x50
  06:55:12 INFO | 2024-11-09T06:51:17.585016+00:00 ip-172-31-6-235 kernel:  
cpuhp_invoke_callback+0x200/0x780
  06:55:12 INFO | 2024-11-09T06:51:17.585021+00:00 ip-172-31-6-235 kernel:  
cpuhp_issue_call+0x100/0x198
  06:55:12 INFO | 2024-11-09T06:51:17.585023+00:00 ip-172-31-6-235 kernel:  
__cpuhp_setup_state_cpuslocked+0x128/0x330
  06:55:12 INFO | 2024-11-09T06:51:17.585024+00:00 ip-172-31-6-235 kernel:  
__cpuhp_setup_state+0x5c/0xa8
  06:55:12 INFO | 2024-11-09T06:51:17.585025+00:00 ip-172-31-6-235 kernel:  
topology_sysfs_init+0x40/0x78
  06:55:12 INFO | 2024-11-09T06:51:17.585026+00:00 ip-172-31-6-235 kernel:  
do_one_initcall+0x64/0x3a0
  06:55:12 INFO | 2024-11-09T06:51:17.585027+00:00 ip-172-31-6-235 kernel:  
do_initcalls+0x19c/0x210
  06:55:12 INFO | 2024-11-09T06:51:17.585028+00:00 ip-172-31-6-235 kernel:  
kernel_init_freeable+0x18c/0x1e8
  06:55:12 INFO | 2024-11-09T06:51:17.585029+00:00 ip-172-31-6-235 kernel:  
kernel_init+0x3c/0x190
  06:55:12 INFO | 2024-11-09T06:51:17.585031+00:00 ip-172-31-6-235 kernel:  
ret_from_fork+0x10/0x20
  06:55:12 INFO | 2024-11-09T06:51:17.585035+00:00 ip-172-31-6-235 kernel: ---[ 
end trace 0000000000000000 ]---
  06:55:12 INFO | 2024-11-09T06:51:17.585037+00:00 ip-172-31-6-235 kernel: 
sysfs: cannot create duplicate filename '/devices/cache'
  06:55:12 INFO | 2024-11-09T06:51:17.585038+00:00 ip-172-31-6-235 kernel: CPU: 
5 UID: 0 PID: 47 Comm: cpuhp/5 Tainted: G        W          6.11.0-11-generic 
#11-Ubuntu
  06:55:12 INFO | 2024-11-09T06:51:17.585039+00:00 ip-172-31-6-235 kernel: 
Tainted: [W]=WARN
  06:55:12 INFO | 2024-11-09T06:51:17.585040+00:00 ip-172-31-6-235 kernel: 
Hardware name: Amazon EC2 a1.metal/Not Specified, BIOS 1.0 10/16/2017
  06:55:12 INFO | 2024-11-09T06:51:17.585041+00:00 ip-172-31-6-235 kernel: Call 
trace:
  06:55:12 INFO | 2024-11-09T06:51:17.585146+00:00 ip-172-31-6-235 kernel:  
dump_backtrace+0x104/0x160
  06:55:12 INFO | 2024-11-09T06:51:17.585149+00:00 ip-172-31-6-235 kernel:  
show_stack+0x24/0x50
  06:55:12 INFO | 2024-11-09T06:51:17.585150+00:00 ip-172-31-6-235 kernel:  
dump_stack_lvl+0x84/0xc0
  06:55:12 INFO | 2024-11-09T06:51:17.585155+00:00 ip-172-31-6-235 kernel:  
dump_stack+0x1c/0x40
  06:55:12 INFO | 2024-11-09T06:51:17.585191+00:00 ip-172-31-6-235 kernel:  
sysfs_warn_dup+0xa8/0xf0
  06:55:12 INFO | 2024-11-09T06:51:17.585193+00:00 ip-172-31-6-235 kernel:  
sysfs_create_dir_ns+0x124/0x150
  06:55:12 INFO | 2024-11-09T06:51:17.585194+00:00 ip-172-31-6-235 kernel:  
create_dir+0x30/0x120
  06:55:12 INFO | 2024-11-09T06:51:17.585215+00:00 ip-172-31-6-235 kernel:  
kobject_add_internal+0x90/0x240
  06:55:12 INFO | 2024-11-09T06:51:17.585218+00:00 ip-172-31-6-235 kernel:  
kobject_add+0xa0/0x140
  06:55:12 INFO | 2024-11-09T06:51:17.585234+00:00 ip-172-31-6-235 kernel:  
device_add+0xd8/0x748
  06:55:12 INFO | 2024-11-09T06:51:17.585236+00:00 ip-172-31-6-235 kernel:  
cpu_device_create+0x19c/0x1c0
  06:55:12 INFO | 2024-11-09T06:51:17.585238+00:00 ip-172-31-6-235 kernel:  
cache_add_dev+0x84/0x428
  06:55:12 INFO | 2024-11-09T06:51:17.585252+00:00 ip-172-31-6-235 kernel:  
cacheinfo_cpu_online+0x90/0x138
  06:55:12 INFO | 2024-11-09T06:51:17.585254+00:00 ip-172-31-6-235 kernel:  
cpuhp_invoke_callback+0x200/0x780
  06:55:12 INFO | 2024-11-09T06:51:17.585256+00:00 ip-172-31-6-235 kernel:  
cpuhp_thread_fun+0x140/0x358
  06:55:12 INFO | 2024-11-09T06:51:17.585281+00:00 ip-172-31-6-235 kernel:  
smpboot_thread_fn+0x224/0x250
  06:55:12 INFO | 2024-11-09T06:51:17.585287+00:00 ip-172-31-6-235 kernel:  
kthread+0xf4/0x108
  06:55:12 INFO | 2024-11-09T06:51:17.585289+00:00 ip-172-31-6-235 kernel:  
ret_from_fork+0x10/0x20
  06:55:12 INFO | 2024-11-09T06:51:17.585299+00:00 ip-172-31-6-235 kernel: 
kobject: kobject_add_internal failed for cache with -EEXIST, don't try to 
register things with the same name in the same directory.

  This also was observed on 6.11.0-1004-aws and 6.11.0-1005-aws.
  Note that Noble is not affected. See [Affected versions] section for more 
details.

  -------------------------------------

  [Summary]

    - This is not a regression but caused by problematic ACPI table on a1.metal.
    - If ACPI table won't be fixed soon, it might be an option to add a 
workaround at least in our tree. Please see more details in section [Solution]

  [Cause]

    According to the warn messages, the following two are failing:
    * cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/cpuinfo:online",
                        cpuid_cpu_online, cpuid_cpu_offline)
    * cpuhp_setup_state(CPUHP_AP_BASE_CACHEINFO_ONLINE, "base/cacheinfo:online",
                        cacheinfo_cpu_online, cacheinfo_cpu_pre_down)

    Note that there are other cpuhp callbacks that are failing. Boot-
  time tracing of cpuhp:* events reveals it:

    4)               |  /* cpuhp_enter: cpu: 0004 target: 238 step: 199 
(cpu_capacity_sysctl_add) */
    4)               |  /* cpuhp_exit:  cpu: 0004  state: 238 step: 199 ret: -2 
*/

    4)               |  /* cpuhp_enter: cpu: 0004 target: 238 step: 199 
(cpuid_cpu_online) */
    4)               |  /* cpuhp_exit:  cpu: 0004  state: 238 step: 199 ret: 
-19 */

    5)               |  /* cpuhp_enter: cpu: 0004 target: 238 step:  54 
(topology_add_dev) */
    5)               |  /* cpuhp_exit:  cpu: 0004  state: 238 step:  54 ret: 
-22 */

    5)               |  /* cpuhp_enter: cpu: 0005 target: 238 step: 193 
(cacheinfo_cpu_online) */
    5)               |  /* cpuhp_exit:  cpu: 0005  state: 238 step: 193 ret: 
-17 */

    These failures are due to non-enabled CPU#4-15 despite that they are in 
cpu_possible_mask and also online.
    The issue is that acpi_get_phys_id() fails to get phys_id for processor 
devices (CPU#4-15) because of
    discrepancies in ACPI table.

      -> acpi_processor_get_info
        -> acpi_get_phys_id
          -> map_mat_entry
          -> map_madt_entry

    Processor Device _UIDs are sequential numbers starting from 0, while 
Processor UIDs in MADT/PPTT
    are non-sequential (0x0, 0x1, 0x2, 0x3, 0x100, 0x101, 0x102, 0x103, 0x200, 
0x201, ...).
    This results in the map_madt_entry() failure for CPU#4-15.

  [Affected Versions]

    * All Oracular kernels are affected at the moment.
    * All Noble kernels are not affected at the moment.

    This is because only Oracular set CONFIG_ACPI_HOTPLUG_CPU=y because of the 
two upstream commits:
      9d0873892f4d ("arm64: Kconfig: Enable hotplug CPU on arm64 if 
ACPI_PROCESSOR is enabled.")
      46800e38ef0e ("arm64: Kconfig: Fix dependencies to enable 
ACPI_HOTPLUG_CPU")
    which are originally included in its master kernel.

  [Solution]

    There are some options:

    (a). override ACPI table (while waiting for firmware update)
    (b). apply a workaround patch for o:aws
    (c). set CONFIG_ACPI_HOTPLUG_CPU=n in some way

  [Experiment]

    Regarding (b), I cooked up a workaround patch (dirty hack), and confirmed 
that acpi_processor_get_info()
    turns to succeed for all CPU#4-15 and the warn messages disappeared. See 
the attached.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2088047/+subscriptions


-- 
Mailing list: https://launchpad.net/~canonical-ubuntu-qa
Post to     : canonical-ubuntu-qa@lists.launchpad.net
Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa
More help   : https://help.launchpad.net/ListHelp

Reply via email to