@mluypaert

Thanks for the details.

Do you know whether the problem reproduces with any bowtie2 example that
I could run for myself?  I'm not familiar w/ it.

Apparently there's a workaround for it, if you're willing to test: to
disable NUMA balancing.

This _might_ impact performance on some memory-intensive workloads that use 
memory on multiple NUMA nodes.
If you previously used kernel versions 3.13.0-156 to 3.13.0-158 there should be 
no performance impact, as NUMA balancing was disabled in those versions to 
workaround a regression on 3.13.0-155. It was re-enabled by default on 
3.13.0-159.

You can disable NUMA balancing on a running system with this command:

$ sudo sysctl -w kernel.numa_balancing=0

And then make the change persistent across reboots with this command:

$ echo 'kernel.numa_balancing = 0' | sudo tee /etc/sysctl.d/99-numa-
balancing.conf

If you'd like to try that, please let us know how it goes in a few
months maybe? :-)

Thanks for your response anyway.
cheers,
Mauricio

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1813018

Title:
  Kernel Oops - unable to handle kernel paging request; RIP is at
  wait_migrate_huge_page+0x51/0x70

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Kernel oops occurs randomly every now and then, seemingly when running
  memory-intensive processes (so far, it happened to me when using
  bowtie2 or STAR).

  Running Ubuntu 14.04 LTS on AWS EC2 instances (m4.* and c4.* family
  classes). After the error occurs, the server stays accessible through
  SSH, but the commands w, htop, ps (and maybe others) seem to hang,
  while commands like ls, cd, top and others keep working. Whatever
  process was running and (probably) caused the crash seems to go into a
  sleeping mode.

  Rebooting (sudo reboot) makes the instance refuse all connections
  (more than an hour after initiating the reboot). Stopping the (AWS
  EC2) instance and starting again makes the instance function normally
  again.

  Restarting the task that was running when the instance crashed on the newly 
(re)started instance usually works with no more problems.
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jan 23 12:49 seq
   crw-rw---- 1 root audio 116, 33 Jan 23 12:49 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3.29
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  DistroRelease: Ubuntu 14.04
  Ec2AMI: ami-4473183b
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1c
  Ec2InstanceType: m4.16xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize 
libusb: -99
  MachineType: Xen HVM domU
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-164-generic 
root=UUID=d4f2aafc-946a-4514-930d-4c45e676f198 ro console=tty1 console=ttyS0
  ProcVersionSignature: Ubuntu 3.13.0-164.214-generic 3.13.11-ckt39
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-164-generic N/A
   linux-backports-modules-3.13.0-164-generic  N/A
   linux-firmware                              N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty ec2-images
  Uname: Linux 3.13.0-164-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: sudo
  WifiSyslog:
   
  _MarkForUpload: True
  dmi.bios.date: 08/24/2006
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813018/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to