Hi,
I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it
seems that whether or not e820_host = 1 in the domU configuration is the
cause of the following stack trace. Please note I have #define MC_DEBUG
1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged.
I'm unsure which side of the kernel/xen boundary this really falls.
Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0
Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted
4.4.88 #157
Sep 25 22:02:50 [kernel] Workqueue: events balloon_process
Sep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78
ffffffff812f9a28 ffff88001f80a220
Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0
ffffffff81004d79 0000000000115bb7
Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330
ffff880195bb7000 0000000000000000
Sep 25 22:02:50 [kernel] Call Trace:
Sep 25 22:02:50 [kernel] [<ffffffff812f9a28>] dump_stack+0x61/0x7e
Sep 25 22:02:50 [kernel] [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0
Sep 25 22:02:50 [kernel] [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e
Sep 25 22:02:50 [kernel] [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af
Sep 25 22:02:50 [kernel] [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4
Sep 25 22:02:50 [kernel] [<ffffffff81546022>]
kernel_physical_mapping_init+0x15e/0x233
Sep 25 22:02:50 [kernel] [<ffffffff81542694>]
init_memory_mapping+0x1c7/0x264
Sep 25 22:02:50 [kernel] [<ffffffff810411be>] arch_add_memory+0x50/0xda
Sep 25 22:02:50 [kernel] [<ffffffff81543191>]
add_memory_resource+0x9c/0x12d
Sep 25 22:02:50 [kernel] [<ffffffff8137462f>]
reserve_additional_memory+0x125/0x16b
Sep 25 22:02:50 [kernel] [<ffffffff8137482d>]
balloon_process+0x1b8/0x2c5
Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Sep 25 22:02:50 [kernel] [<ffffffff81060c18>]
process_one_work+0x19d/0x2a9
Sep 25 22:02:50 [kernel] [<ffffffff8106162a>] worker_thread+0x27d/0x36e
Sep 25 22:02:50 [kernel] [<ffffffff810613ad>] ?
rescuer_thread+0x2a2/0x2a2
Sep 25 22:02:50 [kernel] [<ffffffff8106575b>] kthread+0xda/0xe2
Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
kthread_worker_fn+0x13f/0x13f
Sep 25 22:02:50 [kernel] [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70
Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
kthread_worker_fn+0x13f/0x13f
Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000]
result=0_xen_alloc_pte+0x81/0x18e
Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330]
result=-1_xen_alloc_pte+0xd7/0x18e
Sep 25 22:02:50 [kernel] ------------[ cut here ]------------
xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the
same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't
have a specific test case which triggers this but it will usually appear
within 24 hours but it depends on how much work the domU has been
performing (so probably how much ballooning it has been doing). Setting
e820_host = 0 in the config seems to prevent this happening.
In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows
some commits which seem to relate to the failed hypervisor operation and
working round the e820 map. I have not done a bisect to try and isolate
this more definitively. I suspect this could be a more general balloon
issue but perhaps is revealed with tmem more easily as the rate of
ballooning up/down is higher than occasional manual changes.
This is the guest /proc/iomem with e820_host = 0:
KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
TMEM MODULE PARAMS:
/sys/module/tmem/parameters/cleancache: Y
/sys/module/tmem/parameters/frontswap: Y
/sys/module/tmem/parameters/selfballooning: Y
/sys/module/tmem/parameters/selfshrinking: Y
KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
/proc/iomem:
00000000-00000fff : reserved
00001000-0009ffff : System RAM
000a0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-3fffffff : System RAM
01000000-015509ad : Kernel code
015509ae-01807ebf : Kernel data
01914000-019c1fff : Kernel bss
fee00000-fee00fff : Local APIC
And with e820_host = 1:
KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
TMEM MODULE PARAMS:
/sys/module/tmem/parameters/cleancache: Y
/sys/module/tmem/parameters/frontswap: Y
/sys/module/tmem/parameters/selfballooning: Y
/sys/module/tmem/parameters/selfshrinking: Y
KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
/proc/iomem:
00000000-00000fff : reserved
00001000-0009ffff : System RAM
000a0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-1fffffff : System RAM
01000000-015509ad : Kernel code
015509ae-01807ebf : Kernel data
01914000-019c1fff : Kernel bss
20000000-d7feffff : Unusable memory
d7ff0000-d7ffdfff : ACPI Tables
d7ffe000-d7ffffff : ACPI Non-volatile Storage
fee00000-fee00fff : Local APIC
100000000-11fffffff : System RAM
If other information about the environment is useful please let me know.
Thanks,
James
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel