> I've checked logs, so far I don't see anything suspicious there
> except of "acpi PNP0C80:00: Already enumerated" lines,
> probably rising log level might show more info
> + upload full logs
> + enable ACPI debug info to so that dimm device's _CRS would show up
> + QEMU's CLI that was used to produce such log
>
> wrt migration:
> could you provide exact CLI args on source and destination along with
> used intermediate mem hotplug commands or even better if it's just
> reproduced with migration of cold-plugged dimm-s for simplification
> + steps to reproduce (and guest kernel versions).
Thanks Igor,
I am using 3.10 and 3.16 guest kernels lately, but it seems that the
issue is hitting every OS. Issue is not reproducible with cold-plugged
DIMMs at all which is kinda confusing, bearing in mind race-like
behavior described previously, either the guest kernel is partially
responsible for the issue or its nature will be ultimately weird. You
can borrow full cli arg set from the message containing 'Please find
the full cli args and two guest logs for DIMM' three days ago in this
chain. The destination emulator launch string is identical to source
plus device/object pairs in the args for hotplugged memory; mem
devices are getting onlined automatically via udev script. My
colleague suggested me to disable CONFIG_SPARSEMEM_VMEMMAP to remove
the side mess of printks from sparse hotplug mapping and, as it was
shown with that, there is nothing wrong with per-dimm memory
population map, the runtime and coldplugged maps are identical in this
case.
Another trace with null IP is attached, it is produced by running fio.
The easiest way to set up the test bed and to reproduce the issue is
to launch an attached VM with xml (add disk and optionally framebuffer
for convenience), ripping out two or three dimms, then stop libvirt,
add those dimms back in a runtime config, launch libvirt back, add
those dimms, put the workload on VM and migrate a VM with live flag.
Or, if it would be more acceptable for you, launch bare qemu with some
empty slots, plug appropriate objects and devices in (object_add
memory-backend-ram,id=memX,size=512M,
pc-dimm,id=dimmX,node=0,memdev=memX) and migrate to a receiver with
same dimms added to the args. Please not forget to online dimms in
guest as well.
I don`t think that it could be ACPI-related in any way, instead, it
looks like race in vhost or simular mm-touching mechanism. The
repeated hits you mentioned should be fixed as well indeed, but they
can be barely the reason for this problem.
[ 76.906896] random: nonblocking pool is initialized
[ 89.508346] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 89.511075] IP: [<ffffffffa0150f6b>] mpage_process_page_bufs+0x2b/0x120
[ext4]
[ 89.512089] PGD 1b755f067 PUD 167e1e067 PMD 0
[ 89.512089] Oops: 0000 [#1] SMP
[ 89.512089] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd fscache sunrpc netconsole configfs loop crct10dif_pclmul crct10dif_common
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw psmouse gf128mul
glue_helper i2c_piix4 virtio_console ablk_helper serio_raw pcspkr parport_pc
pvpanic cryptd i2c_core evdev parport processor thermal_sys button ext4 crc16
[ 89.530893] mon-agent[2185]: segfault at 10 ip 00007fa05ae592a8 sp
00007fff1f5ed2c0 error 4
[ 89.530985] in libc-2.13.so[7fa05ade1000+181000]
[ 89.532418] fio[2710]: segfault at 0 ip 0000000000439970 sp 00007fffa3f4e808
error 6
[ 89.532452] in fio[400000+58000]
[ 89.512089] mbcache
[ 89.512089] jbd2 ata_generic virtio_blk virtio_net floppy crc32c_intel
xhci_hcd ata_piix usbcore libata usb_common virtio_pci virtio_ring virtio
scsi_mod
[ 89.512089] CPU: 4 PID: 2715 Comm: fio Not tainted 3.16.7-ckt9 #1
[ 89.512089] Hardware name: SuperMicro Virtual Appliance, BIOS 1.1
[ 89.512089] task: ffff8801b0d00210 ti: ffff8801b3f70000 task.ti:
ffff8801b3f70000
[ 89.512089] RIP: 0010:[<ffffffffa0150f6b>] [<ffffffffa0150f6b>]
mpage_process_page_bufs+0x2b/0x120 [ext4]
[ 89.512089] RSP: 0018:ffff8801b3f73cb8 EFLAGS: 00010213
[ 89.512089] RAX: 0000000000000000 RBX: 0000000000007824 RCX: 000000000000000c
[ 89.512089] RDX: 0000000000000000 RSI: ffff8804a7c01f98 RDI: ffff8801b3f73e58
[ 89.512089] RBP: 000000000000c000 R08: 0000000000000000 R09: 20004c74b3000000
[ 89.512089] R10: dfff3b8b601d2cc0 R11: ffff88001ffdbe00 R12: 000000000000bfff
[ 89.512089] R13: ffff8801b3f73d88 R14: ffff8801b3f73e58 R15: ffff8800131d2d00
[ 89.512089] FS: 00007fba4235e700(0000) GS:ffff88001fd00000(0000)
knlGS:0000000000000000
[ 89.512089] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 89.512089] CR2: 0000000000000000 CR3: 000000014e9a5000 CR4: 00000000000406e0
[ 89.512089] Stack:
[ 89.512089] ffff8800131d2b80 ffff8801b3f73d48 7fffffffffffcbdc
ffffffffa0151506
[ 89.512089] 0000000000000000 ffff8801b784f600 ffff8801b3f73d20
0000000000000000
[ 89.512089] 0000000000000004 000000000000782c 000000000000000e
0000000000000000
[ 89.512089] Call Trace:
[ 89.512089] [<ffffffffa0151506>] ? mpage_prepare_extent_to_map+0x1d6/0x280
[ext4]
[ 89.512089] [<ffffffffa0157ebf>] ? ext4_writepages+0x3ef/0xd00 [ext4]
[ 89.512089] [<ffffffff812508ef>] ? security_file_permission+0x2f/0xd0
[ 89.512089] [<ffffffff8114d601>] ? __filemap_fdatawrite_range+0x51/0x60
[ 89.512089] [<ffffffff811507fd>] ? SyS_fadvise64+0x24d/0x260
[ 89.512089] [<ffffffff8154d5cd>] ? system_call_fast_compare_end+0x10/0x15
[ 89.512089] Code: 66 66 66 66 90 55 bd 01 00 00 00 53 89 cb 48 83 ec 08 48
8b 07 8b 88 90 00 00 00 d3 e5 48 63 ed 48 03 68 50 48 83 ed 01 48 d3 fd <48> 8b
02 a8 04 0f 85 e7 00 00 00 39 eb 0f 83 a2 00 00 00 48 8b
[ 89.512089] RIP [<ffffffffa0150f6b>] mpage_process_page_bufs+0x2b/0x120
[ext4]
[ 89.512089] RSP <ffff8801b3f73cb8>
[ 89.512089] CR2: 0000000000000000
[ 89.609566] ---[ end trace 213ee878070f2ba5 ]---
[ 89.610884] ------------[ cut here ]------------
[ 89.612246] WARNING: CPU: 4 PID: 2715 at kernel/exit.c:669
do_exit+0x4d/0xa30()
[ 89.614252] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd fscache sunrpc netconsole configfs loop crct10dif_pclmul crct10dif_common
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw psmouse gf128mul
glue_helper i2c_piix4 virtio_console ablk_helper serio_raw pcspkr parport_pc
pvpanic cryptd i2c_core evdev parport processor thermal_sys button ext4 crc16
mbcache jbd2 ata_generic virtio_blk virtio_net floppy crc32c_intel xhci_hcd
ata_piix usbcore libata usb_common virtio_pci virtio_ring virtio scsi_mod
[ 89.635186] CPU: 4 PID: 2715 Comm: fio Tainted: G D 3.16.7-ckt9 #1
[ 89.637099] Hardware name: SuperMicro Virtual Appliance, BIOS 1.1
[ 89.638779] 0000000000000000 0000000000000009 ffffffff8154731e
0000000000000000
[ 89.641533] ffffffff8106cd0b 0000000000000009 ffff8801b3f73c08
0000000000000296
[ 89.644158] 0000000000000000 0000000000000296 ffffffff8106eecd
0000000000000000
[ 89.646974] Call Trace:
[ 89.647753] [<ffffffff8154731e>] ? dump_stack+0x41/0x51
[ 89.649242] [<ffffffff8106cd0b>] ? warn_slowpath_common+0x8b/0xc0
[ 89.650943] [<ffffffff8106eecd>] ? do_exit+0x4d/0xa30
[ 89.652391] [<ffffffff815449e6>] ? printk+0x54/0x59
[ 89.653824] [<ffffffff8101775b>] ? oops_end+0x9b/0xe0
[ 89.655234] [<ffffffff81543e2f>] ? no_context+0x2a4/0x2cf
[ 89.656773] [<ffffffff8105bcc3>] ? __do_page_fault+0x423/0x520
[ 89.658438] [<ffffffff812a1952>] ? blk_account_io_start+0x112/0x180
[ 89.660150] [<ffffffff8114f0f9>] ? mempool_alloc+0x69/0x190
[ 89.661753] [<ffffffff811b57bd>] ? mem_cgroup_update_page_stat+0x1d/0x60
[ 89.663595] [<ffffffff8154f618>] ? async_page_fault+0x28/0x30
[ 89.665214] [<ffffffffa0150f6b>] ? mpage_process_page_bufs+0x2b/0x120 [ext4]
[ 89.667172] [<ffffffffa0151030>] ? mpage_process_page_bufs+0xf0/0x120 [ext4]
[ 89.669104] [<ffffffffa0151506>] ? mpage_prepare_extent_to_map+0x1d6/0x280
[ext4]
[ 89.671132] [<ffffffffa0157ebf>] ? ext4_writepages+0x3ef/0xd00 [ext4]
[ 89.672909] [<ffffffff812508ef>] ? security_file_permission+0x2f/0xd0
[ 89.674683] [<ffffffff8114d601>] ? __filemap_fdatawrite_range+0x51/0x60
[ 89.676513] [<ffffffff811507fd>] ? SyS_fadvise64+0x24d/0x260
[ 89.678115] [<ffffffff8154d5cd>] ? system_call_fast_compare_end+0x10/0x15
[ 89.679936] ---[ end trace 213ee878070f2ba6 ]---
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<name>test</name>
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>524288</currentMemory>
</memtune>
<os>
<type arch='x86_64' machine='pc'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic eoi='on'/>
<pae/>
</features>
<cpu mode='custom' match='exact'>
<model fallback='allow'>SandyBridge</model>
<vendor>Intel</vendor>
<topology sockets='1' cores='8' threads='8'/>
<numa>
<cell cpus='0-7' memory='524288'/>
</numa>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<controller type='usb' index='0' model='nec-xhci'>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<controller type='virtio-serial' index='0'>
</controller>
<serial type='pty'>
<target type='isa-serial' port='0'/>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<memballoon model='none'/>
</devices>
<qemu:commandline>
<qemu:arg value='-object'/>
<qemu:arg value='iothread,id=vm33090blk0'/>
<qemu:arg value='-set'/>
<qemu:arg value='device.virtio-disk0.config-wce=off'/>
<qemu:arg value='-set'/>
<qemu:arg value='device.virtio-disk0.scsi=off'/>
<qemu:arg value='-set'/>
<qemu:arg value='device.virtio-disk0.iothread=vm33090blk0'/>
<qemu:arg value='-m'/>
<qemu:arg value='512,slots=31,maxmem=16384M'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem0,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm0,node=0,memdev=mem0'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem1,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm1,node=0,memdev=mem1'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem2,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm2,node=0,memdev=mem2'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem3,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm3,node=0,memdev=mem3'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem4,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm4,node=0,memdev=mem4'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem5,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm5,node=0,memdev=mem5'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem6,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm6,node=0,memdev=mem6'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem7,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm7,node=0,memdev=mem7'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem8,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm8,node=0,memdev=mem8'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem9,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm9,node=0,memdev=mem9'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem10,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm10,node=0,memdev=mem10'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem11,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm11,node=0,memdev=mem11'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem12,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm12,node=0,memdev=mem12'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem13,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm13,node=0,memdev=mem13'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem14,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm14,node=0,memdev=mem14'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem15,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm15,node=0,memdev=mem15'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem17,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm17,node=0,memdev=mem17'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem16,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm16,node=0,memdev=mem16'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem19,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm19,node=0,memdev=mem19'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem18,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm18,node=0,memdev=mem18'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem21,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm21,node=0,memdev=mem21'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem20,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm20,node=0,memdev=mem20'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem23,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm23,node=0,memdev=mem23'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem22,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm22,node=0,memdev=mem22'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem25,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm25,node=0,memdev=mem25'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem24,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm24,node=0,memdev=mem24'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem27,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm27,node=0,memdev=mem27'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem26,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm26,node=0,memdev=mem26'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem29,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm29,node=0,memdev=mem29'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem28,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm28,node=0,memdev=mem28'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-ram,id=mem30,size=512M'/>
<qemu:arg value='-device'/>
<qemu:arg value='pc-dimm,id=dimm30,node=0,memdev=mem30'/>
</qemu:commandline>
</domain>