Hello Igor, the current hotplug code for dimms effectively prohibiting a successful migration for VM if memory was added after startup:
- start a VM with certain amount of empty memory slots, - add some dimms and online them in guest (I am transitioning from 2 to 16G with 512Mb DIMMs), - migrate a VM and observe guest null pointer dereference (or BSOD with reboot, for Windows). Issue is currently touching all stable versions and assumingly master, as there are no related fixes/RFCs since 2.3 I`m currently using for testing. The issue is related to an incorrect population of the regions during runtime hotplugging, hopefully 2.4 will get the fix. You may run some workload in guest to achieve one hundred percent certainty of hitting the issue, for example, fio against http://xdel.ru/downloads/fio.txt . QEMU args are simular to '... -m 512,slots=31,maxmem=16384M -object memory-backend-ram,id=mem0,size=512M -device pc-dimm,id=dimm0,node=0,memdev=mem0 -object memory-backend-ram,id=mem1,size=512M -device pc-dimm,id=dimm1,node=0,memdev=mem1 -object memory-backend-ram,id=mem2,size=512M -device pc-dimm,id=dimm2,node=0,memdev=mem2...' Thanks for looking into this!
11 June 2015, 19:50:14 [ 141.005630] fio[2742]: segfault at 0 ip (null) sp 00007f841ab5aeb8 error 14 11 June 2015, 19:50:14 in fio[400000+58000] 11 June 2015, 19:50:14 NULL pointer dereference 11 June 2015, 19:50:14 at 0000000000000028 11 June 2015, 19:50:14 [ 141.006282] IP: 11 June 2015, 19:50:14 [ 141.006316] PGD 107ccc067 11 June 2015, 19:50:14 PUD 106056067 11 June 2015, 19:50:14 [ 141.006319] Oops: 0000 [#1] 11 June 2015, 19:50:14 SMP 11 June 2015, 19:50:14 nfsd 11 June 2015, 19:50:14 auth_rpcgss 11 June 2015, 19:50:14 oid_registry 11 June 2015, 19:50:14 nfs 11 June 2015, 19:50:14 lockd 11 June 2015, 19:50:14 fscache 11 June 2015, 19:50:14 netconsole 11 June 2015, 19:50:14 configfs 11 June 2015, 19:50:14 crct10dif_pclmul 11 June 2015, 19:50:14 crct10dif_common 11 June 2015, 19:50:14 ghash_clmulni_intel 11 June 2015, 19:50:14 aesni_intel 11 June 2015, 19:50:14 lrw 11 June 2015, 19:50:14 gf128mul 11 June 2015, 19:50:14 ablk_helper 11 June 2015, 19:50:14 psmouse 11 June 2015, 19:50:14 parport_pc 11 June 2015, 19:50:14 virtio_console 11 June 2015, 19:50:14 serio_raw 11 June 2015, 19:50:14 evdev 11 June 2015, 19:50:14 pcspkr 11 June 2015, 19:50:14 processor 11 June 2015, 19:50:14 thermal_sys 11 June 2015, 19:50:14 button 11 June 2015, 19:50:14 ext4 11 June 2015, 19:50:14 mbcache 11 June 2015, 19:50:14 ata_generic 11 June 2015, 19:50:14 virtio_blk 11 June 2015, 19:50:14 crc32c_intel 11 June 2015, 19:50:14 floppy 11 June 2015, 19:50:14 xhci_hcd 11 June 2015, 19:50:14 libata 11 June 2015, 19:50:14 virtio_ring 11 June 2015, 19:50:14 usbcore 11 June 2015, 19:50:14 usb_common 11 June 2015, 19:50:14 [ 141.006396] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt7-1~bpo70+1 11 June 2015, 19:50:14 [ 141.006397] Hardware name: SuperMicro Virtual Appliance, BIOS 1.1 11 June 2015, 19:50:14 [ 141.006403] RIP: 0010:[<ffffffffa015ba38>] 11 June 2015, 19:50:14 [<ffffffffa015ba38>] ext4_finish_bio+0xd8/0x220 [ext4] 11 June 2015, 19:50:14 [ 141.006415] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001000 11 June 2015, 19:50:14 [ 141.006415] RDX: 000000000000000d RSI: ffffea0010ea6818 RDI: ffff88001c291300 11 June 2015, 19:50:14 [ 141.006417] R10: 0000000000000002 R11: 0000000000000040 R12: ffff8804a41aaf98 11 June 2015, 19:50:14 [ 141.006419] FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 11 June 2015, 19:50:14 [ 141.006420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 11 June 2015, 19:50:14 [ 141.006434] Stack: 11 June 2015, 19:50:14 [ 141.006435] ffffffff0000005e 11 June 2015, 19:50:14 0000000000000007 11 June 2015, 19:50:14 ffff88001fdd2ec0 11 June 2015, 19:50:14 ffff88001c291300 11 June 2015, 19:50:14 ffff88001d84c240 11 June 2015, 19:50:14 [ 141.006439] 0000000000000093 11 June 2015, 19:50:14 ffff88001d84c940 11 June 2015, 19:50:14 d794c666350eb7d9 11 June 2015, 19:50:14 [ 141.006442] <IRQ> 11 June 2015, 19:50:14 [<ffffffffa015c036>] ? ext4_end_bio+0xc6/0x130 [ext4] 11 June 2015, 19:50:14 [<ffffffff8129dcfb>] ? blk_update_request+0x9b/0x310 11 June 2015, 19:50:14 [ 141.006488] 11 June 2015, 19:50:14 [ 141.006494] 11 June 2015, 19:50:14 [<ffffffff812a76c9>] ? __blk_mq_complete_request+0x79/0x110 11 June 2015, 19:50:14 [<ffffffffa01461ed>] ? virtblk_done+0x4d/0xb0 [virtio_blk] 11 June 2015, 19:50:14 [ 141.006506] 11 June 2015, 19:50:14 [ 141.006512] 11 June 2015, 19:50:14 [<ffffffff810c4f54>] ? handle_irq_event_percpu+0x54/0x1e0 11 June 2015, 19:50:14 [<ffffffff810a73aa>] ? update_blocked_averages+0x24a/0x5f0 11 June 2015, 19:50:14 [ 141.006540] 11 June 2015, 19:50:14 [ 141.006542] 11 June 2015, 19:50:14 [<ffffffff810c7f7d>] ? handle_edge_irq+0x7d/0x120 11 June 2015, 19:50:14 [<ffffffff810175ed>] ? handle_irq+0x1d/0x30 11 June 2015, 19:50:14 [ 141.006559] 11 June 2015, 19:50:14 [ 141.006579] 11 June 2015, 19:50:15 [ 141.006582] 11 June 2015, 19:50:15 [<ffffffff81072bf8>] ? __do_softirq+0x88/0x2e0 11 June 2015, 19:50:15 [<ffffffff81072b8d>] ? __do_softirq+0x1d/0x2e0 11 June 2015, 19:50:15 [ 141.006592] 11 June 2015, 19:50:15 [ 141.006600] 11 June 2015, 19:50:15 [<ffffffff810730a6>] ? irq_exit+0x86/0xb0 11 June 2015, 19:50:15 [<ffffffff8154d1d6>] ? do_IRQ+0x66/0x110 11 June 2015, 19:50:15 [<ffffffff8154b06d>] ? common_interrupt+0x6d/0x6d 11 June 2015, 19:50:15 [ 141.006609] <EOI> 11 June 2015, 19:50:15 [ 141.006616] 11 June 2015, 19:50:15 [ 141.006619] 11 June 2015, 19:50:15 [<ffffffff8101f6b2>] ? default_idle+0x22/0xf0 11 June 2015, 19:50:15 [<ffffffff810b1818>] ? cpu_startup_entry+0x2e8/0x4b0 11 June 2015, 19:50:15 [ 141.006624] 11 June 2015, 19:50:15 [ 141.006629] 11 June 2015, 19:50:15 [<ffffffff81900a1a>] ? set_init_arg+0x4d/0x4d 11 June 2015, 19:50:15 [<ffffffff81900120>] ? early_idt_handlers+0x120/0x120 11 June 2015, 19:50:15 [<ffffffff8190072b>] ? x86_64_start_kernel+0x150/0x15f 11 June 2015, 19:50:15 [ 141.006635] Code: 11 June 2015, 19:50:15 4c 89 e3 eb 26 66 0f 1f 44 00 00 48 03 43 20 48 39 c8 77 25 f0 80 63 01 fe 45 85 c0 0f 85 a1 00 00 00 48 8b 5b 08 49 39 dc 74 26 <48> 8b 43 28 25 ff 0f 00 00 4c 39 e8 73 d2 48 8b 03 48 8b 5b 08 11 June 2015, 19:50:15 [<ffffffffa015ba38>] ext4_finish_bio+0xd8/0x220 [ext4] 11 June 2015, 19:50:15 [ 141.006682] CR2: 0000000000000028 11 June 2015, 19:50:15 [ 141.006705] Kernel panic - not syncing: Fatal exception in interrupt 11 June 2015, 19:50:15 [ 141.009665] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) 11 June 2015, 19:50:15 [ 141.009665] ---[ end Kernel panic - not syncing: Fatal exception in interrupt