> We see an address of 0xfc7ffb000
Hi Matt,
I don't think you're accounting for the additional pages due to the Xen
balloon, are you? That increases physical memory, after boot. If you
check the /proc/zoneinfo file, look at the Normal zone's spanned pages
and start pfn, e.g.:
Node 0, zone Normal
pages free 15116671
min 7661
low 22873
high 38085
node_scanned 0
spanned 15499264
present 15499264
managed 15212161
...
start_pfn: 1048576
and so,
$ printf "%x\n" $[ 1048576 + 15499264 ]
fc8000
meaning that address you see is part of the pages in the balloon memory
region...
I disabled Ubuntu's memory hotadd (commented it out in
/lib/udev/rules.d/40-vm-hotadd.rules), and rebooted, and the Normal
zone's present pages was reduced so that the end is fc0000, matching the
boot time max pfn; I then tried to reproduce the problem and it seems
gone!
So I think that must be the issue; the hypervisor's NVMe driver isn't
expecting any pages from the Xen ballooned region. I checked on Amazon
Linux, and saw why it isn't affected:
$ grep XEN_BALLOON /boot/config-4.4.41-36.55.amzn1.x86_64
# CONFIG_XEN_BALLOON is not set
I suspect that skips quite a lot of problems for Amazon Linux, as the
Xen ballooning is quite annoying (see bug 1518457 comment 126, for
example).
Maybe Ubuntu should disable Xen ballooning for AWS also? If not, then
this seems to be a hypervisor bug, it needs to allow pages from the
ballooned region also.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1668129
Title:
Amazon I3 Instance Buffer I/O error on dev nvme0n1
Status in linux package in Ubuntu:
Triaged
Status in linux source package in Xenial:
Triaged
Bug description:
On the AWS i3 instance class - when putting the new NVME storage disks
under high IO load - seeing data corruption and errors in dmesg
[ 662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912
[ 662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost
async page write
[ 662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost
async page write
[ 662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost
async page write
[ 662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost
async page write
[ 662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost
async page write
[ 662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost
async page write
[ 662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost
async page write
[ 662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost
async page write
[ 662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost
async page write
[ 662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost
async page write
[ 663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912
<snip>
[ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744
[ 1012.755396] buffer_io_error: 194552 callbacks suppressed
[ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async
page write
[ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async
page write
[ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async
page write
[ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async
page write
[ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async
page write
Able to replicate this with a bonnie++ stress test.
bonnie++ -d /mnt/test/ -r 1000
Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Feb 27 02:12 seq
crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
DistroRelease: Ubuntu 16.04
Ec2AMI: ami-bc62b2aa
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: i3.2xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory
JournalErrors:
Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000']
failed with exit code 1: Hint: You are currently not seeing messages from other
users and the system.
Users in the 'systemd-journal' group can see all messages. Pass -q to
turn off this notice.
No journal files were opened due to insufficient permissions.
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=screen-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic
root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0
ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44
RelatedPackageVersions:
linux-restricted-modules-4.4.0-64-generic N/A
linux-backports-modules-4.4.0-64-generic N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial ec2-images
Uname: Linux 4.4.0-64-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
WifiSyslog:
_MarkForUpload: True
dmi.bios.date: 12/12/2016
dmi.bios.vendor: Xen
dmi.bios.version: 4.2.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias:
dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.2.amazon
dmi.sys.vendor: Xen
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp