This article throws some light onto things:

https://lwn.net/Articles/518953/

"Second, the greater the number of idle CPUs, the more work RCU must do
when forcing quiescent states. Yes, the busier the system, the less work
RCU needs to do! The reason for the extra work is that RCU is not
permitted to disturb idle CPUs for energy-efficiency reasons. RCU must
therefore probe a per-CPU data structure to read out idleness state
during each grace period, likely incurring a cache miss on each such
probe."

Just to add, I running the VM with say 4 CPUs, all are which are idle.

In my experiments on 3.19 and 4.2, kernels, kvm is not being used on the
host, so we have QEMU emulating N CPUs with just 1 host CPU.  Plus a
loaded host means that this single CPU is busy and we have potentially
large latencies serving the N virtual CPUs in the VM. I think that's
part of the issue; large latencies from the host with a N-to-1 virt to
host mapping meaning that we are tripping the RCU grace periods.

To try and help RCU kthreads from suffering from delays, I added the
following kernel parameters to the VM:

rcu_nocb_poll rcutree.kthread_prio=90 rcuperf.verbose=1

I was able to run an 8 CPU VM without any RCU issues with the host CPU
being hammered to death with stress-ng.  I also then cranked down the
RCU stall grace period to just 5 seconds to see how easy I can trip the
issue with this more extreme setting using:

echo 5 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout

and again, no RCU issues.

@Martin,

can you try using the following kernel parameters on the VM and see if
this helps:

rcu_nocb_poll rcutree.kthread_prio=90 rcuperf.verbose=1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768

Title:
  [arm64] lockups some time after booting

Status in Auto Package Testing:
  Triaged
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I created an 8 CPU arm64 instance on Canonical's Scalingstack (which I
  want to use for armhf autopkgtesting in LXD). I started with wily as
  that has lxd available (it's not yet available in trusty nor the PPA
  for arm64).

  However, pretty much any LXD task that I do (I haven't tried much
  else) on this machine takes unbearably long. A simple "lxc profile set
  default raw.lxc lxc.seccomp=" or "lxc list" takes several minutes.

  I see tons of

  [ 1020.971955] rcu_sched kthread starved for 6000 jiffies! g1095 c1094 f0x0
  [ 1121.166926] INFO: task fsnotify_mark:69 blocked for more than 120 seconds.

  in dmesg (the attached apport info has the complete dmesg).

  ProblemType: Bug
  DistroRelease: Ubuntu 15.10
  Package: linux-image-4.2.0-22-generic 4.2.0-22.27
  ProcVersionSignature: User Name 4.2.0-22.27-generic 4.2.6
  Uname: Linux 4.2.0-22-generic aarch64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jan  7 09:18 seq
   crw-rw---- 1 root audio 116, 33 Jan  7 09:18 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.19.1-0ubuntu5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  Date: Thu Jan  7 09:24:01 2016
  IwConfig:
   eth0      no wireless extensions.

   lo        no wireless extensions.

   lxcbr0    no wireless extensions.
  Lspci:
   00:00.0 Host bridge [0600]: Red Hat, Inc. Device [1b36:0008]
    Subsystem: Red Hat, Inc Device [1af4:1100]
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
    Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize 
libusb: -99
  PciMultimedia:

  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic 
root=LABEL=cloudimg-rootfs earlyprintk
  RelatedPackageVersions:
   linux-restricted-modules-4.2.0-22-generic N/A
   linux-backports-modules-4.2.0-22-generic  N/A
   linux-firmware                            1.149.3
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/1531768/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to