Joseph, I've just tested 4.15-rc4, and the script crashed and the system
became responsive to only the simplest commands when bringing CPU 9 back
up, accompanied by this out of dmesg:

[  166.722460] Hardware name: Cisco Systems Inc UCSC-C240-M4L/UCSC-C240-M4L, 
BIOS C240M4.2.0.10c.0.032320160820 03/23/2016
[  166.722540] RIP: 0010:__kmalloc_track_caller+0xc5/0x210
[  166.722578] RSP: 0000:ffffb75e8c7cbb08 EFLAGS: 00010206
[  166.722615] RAX: 0000000000000000 RBX: 43ea0882f873c0e8 RCX: 00000000000001bf
[  166.722663] RDX: 00000000000001be RSI: 0000000000000000 RDI: 0000000000021040
[  166.722711] RBP: ffffb75e8c7cbb40 R08: ffff9cc35d341eaa R09: ffff9ca3ff807c00
[  166.722757] R10: ffffb75e8c7cbd08 R11: bc159441a547de42 R12: ffff9cc35d341eaa
[  166.722805] R13: 00000000014000c0 R14: 0000000000000007 R15: ffff9ca3ff807c00
[  166.722852] FS:  0000000000000000(0000) GS:ffff9cc3ff240000(0000) 
knlGS:0000000000000000
[  166.722905] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  166.722945] CR2: 0000000000000000 CR3: 0000001be7e09001 CR4: 00000000001606e0
[  166.722992] Call Trace:
[  166.723020]  ? idr_alloc_cmn+0x97/0xd0
[  166.723051]  ? kstrdup_const+0x23/0x30
[  166.723081]  kstrdup+0x31/0x60
[  166.723107]  kstrdup_const+0x23/0x30
[  166.723137]  __kernfs_new_node+0x2c/0x120
[  166.723168]  kernfs_new_node+0x28/0x50
[  166.723197]  kernfs_create_dir_ns+0x34/0x90
[  166.723229]  sysfs_create_dir_ns+0x40/0x90
[  166.723261]  kobject_add_internal+0xac/0x2b0
[  166.723294]  kobject_add+0x71/0xd0
[  166.723323]  ? device_private_init+0x23/0x70
[  166.723356]  device_add+0x12c/0x680
[  166.723385]  cpu_device_create+0xe1/0x100
[  166.723418]  ? __slab_alloc+0x20/0x40
[  166.723449]  ? _cond_resched+0x19/0x40
[  166.723481]  cacheinfo_cpu_online+0x29a/0x3f0
[  166.723515]  ? get_cpu_cacheinfo+0x50/0x50
[  166.723549]  cpuhp_invoke_callback+0x9b/0x550
[  166.723587]  ? padata_replace+0xf0/0xf0
[  166.725151]  cpuhp_thread_fun+0xc4/0x150
[  166.726682]  smpboot_thread_fn+0xec/0x160
[  166.728221]  kthread+0x11e/0x140
[  166.729701]  ? sort_range+0x30/0x30
[  166.731145]  ? kthread_create_worker_on_cpu+0x70/0x70
[  166.732551]  ret_from_fork+0x1f/0x30
[  166.733906] Code: 4d 01 e0 4d 8b 18 4d 33 99 40 01 00 00 4c 89 c3 4c 31 db 
65 48 0f c7 0f 0f 94 c0 84 c0 74 ac 4d 39 d8 74 14 49 63 41 20 48 01 c3 <48> 33 
1b 49 33 99 40 01 00 00 0f 18 0b 41 f7 c5 00 80 00 00 0f 
[  166.736776] RIP: __kmalloc_track_caller+0xc5/0x210 RSP: ffffb75e8c7cbb08
[  166.738188] ---[ end trace 39ce10746b0f4324 ]---

If you want direct access to the affected hardware, that can be
arranged. (If you've already got access to the certification network in
1SS, the affected system on which I've been doing most of the testing is
boldore.) I'm also happy to run tests using test kernels that you give
me.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel 4.13, not with 4.10

Status in linux package in Ubuntu:
  Triaged
Status in linux-hwe package in Ubuntu:
  New
Status in linux source package in Artful:
  Triaged
Status in linux-hwe source package in Artful:
  New
Status in linux source package in Bionic:
  Triaged
Status in linux-hwe source package in Bionic:
  New

Bug description:
  In doing Ubuntu 17.10 regression testing, we've encountered one
  computer (boldore, a Cisco UCS C240 M4 [VIC]), that hangs about one in
  four times when running our cpu_offlining test. This test attempts to
  take all the CPU cores offline except one, then brings them back
  online again. This test ran successfully on boldore with previous
  releases, but with 17.10, the system sometimes (about one in four
  runs) hangs. Reverting to Ubuntu 16.04.3, I found no problems; but
  when I upgraded the 16.04.3 installation to linux-
  image-4.13.0-16-generic, the problem appeared again, so I'm confident
  this is a problem with the kernel. I'm attaching two files, dmesg-
  output-4.10.txt and dmesg-output-4.13.txt, which show the dmesg output
  that appears when running the cpu_offlining test with 4.10.0-38 and
  4.13.0-16 kernels, respectively; the system hung on the 4.13 run. (I
  was running "dmesg -w" in a second SSH login; the files are cut-and-
  pasted from that.)

  I initiated this bug report from an Ubuntu 16.04.3 installation
  running a 4.10 kernel; but as I said, this applies to the 4.13 kernel.

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.10.0-38-generic 4.10.0-38.42~16.04.1
  ProcVersionSignature: User Name 4.10.0-38.42~16.04.1-generic 4.10.17
  Uname: Linux 4.10.0-38-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2.10
  Architecture: amd64
  Date: Tue Nov 21 17:36:06 2017
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-hwe
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to