I've applied the udev rules change and it doesn't seem to make a difference on the instances I'm testing with:
(after applying the change and reloading udev, rebooting, etc) ubuntu@hot-i3-muguasak:~$ cat /proc/zoneinfo ... Node 0, zone Normal pages free 14714755 min 7663 low 22874 high 38085 node_scanned 0 spanned 15499264 present 15499264 managed 15212046 ubuntu@hot-i3-muguasak:~$ cat /lib/udev/rules.d/40-vm-hotadd.rules # On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply" ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply" GOTO="vm_hotadd_end" LABEL="vm_hotadd_apply" # Memory hotadd request #SUBSYSTEM=="memory", ACTION=="add", DEVPATH=="/devices/system/memory/memory[0-9]*", TEST=="state", ATTR{state}="online" # CPU hotadd request SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1" LABEL="vm_hotadd_end" Errors are the same: Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.319668] EXT4-fs warning (device nvme0n1): ext4_end_bio:314: I/O error -5 writing to inode 108921589 (offset 4185915392 size 8388608 starting block 95900672) Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.319670] buffer_io_error: 246 callbacks suppressed Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.319671] Buffer I/O error on device nvme0n1, logical block 95900416 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.322738] Buffer I/O error on device nvme0n1, logical block 95900417 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.325826] Buffer I/O error on device nvme0n1, logical block 95900418 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.329083] Buffer I/O error on device nvme0n1, logical block 95900419 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.332017] Buffer I/O error on device nvme0n1, logical block 95900420 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.334949] Buffer I/O error on device nvme0n1, logical block 95900421 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.337913] Buffer I/O error on device nvme0n1, logical block 95900422 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.340918] Buffer I/O error on device nvme0n1, logical block 95900423 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.343829] Buffer I/O error on device nvme0n1, logical block 95900424 Mar 20 22:51:03 ip-172-30-4-8 kernel: [ 6797.346815] Buffer I/O error on device nvme0n1, logical block 95900425 Mar 20 22:51:04 ip-172-30-4-8 kernel: [ 6797.826561] JBD2: Detected IO errors while flushing file data on nvme0n1-8 Mar 20 22:51:26 ip-172-30-4-8 kernel: [ 6820.697487] JBD2: Detected IO errors while flushing file data on nvme0n1-8 Mar 20 22:51:36 ip-172-30-4-8 kernel: [ 6830.697208] JBD2: Detected IO errors while flushing file data on nvme0n1-8 Am I missing something obvious? ** Attachment added: "kern.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+attachment/4841381/+files/kern.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1668129 Title: Amazon I3 Instance Buffer I/O error on dev nvme0n1 Status in linux package in Ubuntu: Triaged Status in linux-aws package in Ubuntu: Fix Committed Status in linux source package in Xenial: Triaged Status in linux-aws source package in Xenial: Fix Committed Bug description: On the AWS i3 instance class - when putting the new NVME storage disks under high IO load - seeing data corruption and errors in dmesg [ 662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912 [ 662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost async page write [ 662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost async page write [ 662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost async page write [ 662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost async page write [ 662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost async page write [ 662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost async page write [ 662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost async page write [ 662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost async page write [ 662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost async page write [ 662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost async page write [ 663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912 <snip> [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744 [ 1012.755396] buffer_io_error: 194552 callbacks suppressed [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async page write [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async page write [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async page write [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async page write [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async page write Able to replicate this with a bonnie++ stress test. bonnie++ -d /mnt/test/ -r 1000 Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Feb 27 02:12 seq crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A DistroRelease: Ubuntu 16.04 Ec2AMI: ami-bc62b2aa Ec2AMIManifest: (unknown) Ec2AvailabilityZone: us-east-1d Ec2InstanceType: i3.2xlarge Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory JournalErrors: Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] failed with exit code 1: Hint: You are currently not seeing messages from other users and the system. Users in the 'systemd-journal' group can see all messages. Pass -q to turn off this notice. No journal files were opened due to insufficient permissions. Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0 ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44 RelatedPackageVersions: linux-restricted-modules-4.4.0-64-generic N/A linux-backports-modules-4.4.0-64-generic N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory Tags: xenial ec2-images Uname: Linux 4.4.0-64-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: WifiSyslog: _MarkForUpload: True dmi.bios.date: 12/12/2016 dmi.bios.vendor: Xen dmi.bios.version: 4.2.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.2.amazon dmi.sys.vendor: Xen To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp