I've applied the udev rules change and it doesn't seem to make a
difference on the instances I'm testing with:

(after applying the change and reloading udev, rebooting, etc)
ubuntu@hot-i3-muguasak:~$ cat /proc/zoneinfo
...
Node 0, zone   Normal
  pages free     14714755
        min      7663
        low      22874
        high     38085
   node_scanned  0
        spanned  15499264
        present  15499264
        managed  15212046


ubuntu@hot-i3-muguasak:~$ cat /lib/udev/rules.d/40-vm-hotadd.rules 
# On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as 
they appear
ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", 
ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply"
ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply"
GOTO="vm_hotadd_end"

LABEL="vm_hotadd_apply"

# Memory hotadd request
#SUBSYSTEM=="memory", ACTION=="add", 
DEVPATH=="/devices/system/memory/memory[0-9]*", TEST=="state", 
ATTR{state}="online"

# CPU hotadd request
SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", 
TEST=="online", ATTR{online}="1"

LABEL="vm_hotadd_end"

Errors are the same:
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.319668] EXT4-fs warning (device 
nvme0n1): ext4_end_bio:314: I/O error -5 writing to inode 108921589 (offset 
4185915392 size 
8388608 starting block 95900672)
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.319670] buffer_io_error: 246 
callbacks suppressed
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.319671] Buffer I/O error on device 
nvme0n1, logical block 95900416
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.322738] Buffer I/O error on device 
nvme0n1, logical block 95900417
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.325826] Buffer I/O error on device 
nvme0n1, logical block 95900418
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.329083] Buffer I/O error on device 
nvme0n1, logical block 95900419
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.332017] Buffer I/O error on device 
nvme0n1, logical block 95900420
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.334949] Buffer I/O error on device 
nvme0n1, logical block 95900421
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.337913] Buffer I/O error on device 
nvme0n1, logical block 95900422
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.340918] Buffer I/O error on device 
nvme0n1, logical block 95900423
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.343829] Buffer I/O error on device 
nvme0n1, logical block 95900424
Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.346815] Buffer I/O error on device 
nvme0n1, logical block 95900425
Mar 20 22:51:04 ip-172-30-4-8 kernel: [ 6797.826561] JBD2: Detected IO errors 
while flushing file data on nvme0n1-8
Mar 20 22:51:26 ip-172-30-4-8 kernel: [ 6820.697487] JBD2: Detected IO errors 
while flushing file data on nvme0n1-8
Mar 20 22:51:36 ip-172-30-4-8 kernel: [ 6830.697208] JBD2: Detected IO errors 
while flushing file data on nvme0n1-8

Am I missing something obvious?


** Attachment added: "kern.log"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+attachment/4841381/+files/kern.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1668129

Title:
  Amazon I3 Instance Buffer I/O error on dev nvme0n1

Status in linux package in Ubuntu:
  Triaged
Status in linux-aws package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Triaged
Status in linux-aws source package in Xenial:
  Fix Committed

Bug description:
  On the AWS i3 instance class - when putting the new NVME storage disks
  under high IO load - seeing data corruption and errors in dmesg

  
  [  662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912
  [  662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost 
async page write
  [  662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost 
async page write
  [  662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost 
async page write
  [  662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost 
async page write
  [  662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost 
async page write
  [  662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost 
async page write
  [  662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost 
async page write
  [  662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost 
async page write
  [  662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost 
async page write
  [  662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost 
async page write
  [  663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912
  <snip>
  [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744
  [ 1012.755396] buffer_io_error: 194552 callbacks suppressed
  [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async 
page write
  [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async 
page write
  [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async 
page write
  [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async 
page write
  [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async 
page write

  Able to replicate this with a bonnie++ stress test.

  bonnie++ -d /mnt/test/ -r 1000

  Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Feb 27 02:12 seq
   crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  DistroRelease: Ubuntu 16.04
  Ec2AMI: ami-bc62b2aa
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: i3.2xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  JournalErrors:
   Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] 
failed with exit code 1: Hint: You are currently not seeing messages from other 
users and the system.
         Users in the 'systemd-journal' group can see all messages. Pass -q to
         turn off this notice.
   No journal files were opened due to insufficient permissions.
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: Xen HVM domU
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic 
root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0
  ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-64-generic N/A
   linux-backports-modules-4.4.0-64-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial ec2-images
  Uname: Linux 4.4.0-64-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  WifiSyslog:
   
  _MarkForUpload: True
  dmi.bios.date: 12/12/2016
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to