Hello Igor,

the current hotplug code for dimms effectively prohibiting a
successful migration for VM if memory was added after startup:

- start a VM with certain amount of empty memory slots,
- add some dimms and online them in guest (I am transitioning from 2
to 16G with 512Mb DIMMs),
- migrate a VM and observe guest null pointer dereference (or BSOD
with reboot, for Windows).

Issue is currently touching all stable versions and assumingly master,
as there are no related fixes/RFCs since 2.3 I`m currently using for
testing. The issue is related to an incorrect population of the
regions during runtime hotplugging, hopefully 2.4 will get the fix.

You may run some workload in guest to achieve one hundred percent
certainty of hitting the issue, for example, fio against
http://xdel.ru/downloads/fio.txt . QEMU args are simular to '... -m
512,slots=31,maxmem=16384M -object
memory-backend-ram,id=mem0,size=512M -device
pc-dimm,id=dimm0,node=0,memdev=mem0 -object
memory-backend-ram,id=mem1,size=512M -device
pc-dimm,id=dimm1,node=0,memdev=mem1 -object
memory-backend-ram,id=mem2,size=512M -device
pc-dimm,id=dimm2,node=0,memdev=mem2...'

Thanks for looking into this!
11 June 2015, 19:50:14  [ 141.005630] fio[2742]: segfault at 0 ip (null) sp 
00007f841ab5aeb8 error 14
11 June 2015, 19:50:14  in fio[400000+58000]
11 June 2015, 19:50:14  NULL pointer dereference
11 June 2015, 19:50:14  at 0000000000000028
11 June 2015, 19:50:14  [ 141.006282] IP:
11 June 2015, 19:50:14  [ 141.006316] PGD 107ccc067
11 June 2015, 19:50:14  PUD 106056067
11 June 2015, 19:50:14  [ 141.006319] Oops: 0000 [#1]
11 June 2015, 19:50:14  SMP
11 June 2015, 19:50:14  nfsd
11 June 2015, 19:50:14  auth_rpcgss
11 June 2015, 19:50:14  oid_registry
11 June 2015, 19:50:14  nfs
11 June 2015, 19:50:14  lockd
11 June 2015, 19:50:14  fscache
11 June 2015, 19:50:14  netconsole
11 June 2015, 19:50:14  configfs
11 June 2015, 19:50:14  crct10dif_pclmul
11 June 2015, 19:50:14  crct10dif_common
11 June 2015, 19:50:14  ghash_clmulni_intel
11 June 2015, 19:50:14  aesni_intel
11 June 2015, 19:50:14  lrw
11 June 2015, 19:50:14  gf128mul
11 June 2015, 19:50:14  ablk_helper
11 June 2015, 19:50:14  psmouse
11 June 2015, 19:50:14  parport_pc
11 June 2015, 19:50:14  virtio_console
11 June 2015, 19:50:14  serio_raw
11 June 2015, 19:50:14  evdev
11 June 2015, 19:50:14  pcspkr
11 June 2015, 19:50:14  processor
11 June 2015, 19:50:14  thermal_sys
11 June 2015, 19:50:14  button
11 June 2015, 19:50:14  ext4
11 June 2015, 19:50:14  mbcache
11 June 2015, 19:50:14  ata_generic
11 June 2015, 19:50:14  virtio_blk
11 June 2015, 19:50:14  crc32c_intel
11 June 2015, 19:50:14  floppy
11 June 2015, 19:50:14  xhci_hcd
11 June 2015, 19:50:14  libata
11 June 2015, 19:50:14  virtio_ring
11 June 2015, 19:50:14  usbcore
11 June 2015, 19:50:14  usb_common
11 June 2015, 19:50:14  [ 141.006396] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt7-1~bpo70+1
11 June 2015, 19:50:14  [ 141.006397] Hardware name: SuperMicro Virtual 
Appliance, BIOS 1.1
11 June 2015, 19:50:14  [ 141.006403] RIP: 0010:[<ffffffffa015ba38>]
11 June 2015, 19:50:14  [<ffffffffa015ba38>] ext4_finish_bio+0xd8/0x220 [ext4]
11 June 2015, 19:50:14  [ 141.006415] RAX: 0000000000000000 RBX: 
0000000000000000 RCX: 0000000000001000
11 June 2015, 19:50:14  [ 141.006415] RDX: 000000000000000d RSI: 
ffffea0010ea6818 RDI: ffff88001c291300
11 June 2015, 19:50:14  [ 141.006417] R10: 0000000000000002 R11: 
0000000000000040 R12: ffff8804a41aaf98
11 June 2015, 19:50:14  [ 141.006419] FS: 0000000000000000(0000) 
GS:ffff88001fc00000(0000) knlGS:0000000000000000
11 June 2015, 19:50:14  [ 141.006420] CS: 0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
11 June 2015, 19:50:14  [ 141.006434] Stack:
11 June 2015, 19:50:14  [ 141.006435] ffffffff0000005e
11 June 2015, 19:50:14  0000000000000007
11 June 2015, 19:50:14  ffff88001fdd2ec0
11 June 2015, 19:50:14  ffff88001c291300
11 June 2015, 19:50:14  ffff88001d84c240
11 June 2015, 19:50:14  [ 141.006439] 0000000000000093
11 June 2015, 19:50:14  ffff88001d84c940
11 June 2015, 19:50:14  d794c666350eb7d9
11 June 2015, 19:50:14  [ 141.006442] <IRQ>
11 June 2015, 19:50:14  [<ffffffffa015c036>] ? ext4_end_bio+0xc6/0x130 [ext4]
11 June 2015, 19:50:14  [<ffffffff8129dcfb>] ? blk_update_request+0x9b/0x310
11 June 2015, 19:50:14  [ 141.006488]
11 June 2015, 19:50:14  [ 141.006494]
11 June 2015, 19:50:14  [<ffffffff812a76c9>] ? 
__blk_mq_complete_request+0x79/0x110
11 June 2015, 19:50:14  [<ffffffffa01461ed>] ? virtblk_done+0x4d/0xb0 
[virtio_blk]
11 June 2015, 19:50:14  [ 141.006506]
11 June 2015, 19:50:14  [ 141.006512]
11 June 2015, 19:50:14  [<ffffffff810c4f54>] ? 
handle_irq_event_percpu+0x54/0x1e0
11 June 2015, 19:50:14  [<ffffffff810a73aa>] ? 
update_blocked_averages+0x24a/0x5f0
11 June 2015, 19:50:14  [ 141.006540]
11 June 2015, 19:50:14  [ 141.006542]
11 June 2015, 19:50:14  [<ffffffff810c7f7d>] ? handle_edge_irq+0x7d/0x120
11 June 2015, 19:50:14  [<ffffffff810175ed>] ? handle_irq+0x1d/0x30
11 June 2015, 19:50:14  [ 141.006559]
11 June 2015, 19:50:14  [ 141.006579]
11 June 2015, 19:50:15  [ 141.006582]
11 June 2015, 19:50:15  [<ffffffff81072bf8>] ? __do_softirq+0x88/0x2e0
11 June 2015, 19:50:15  [<ffffffff81072b8d>] ? __do_softirq+0x1d/0x2e0
11 June 2015, 19:50:15  [ 141.006592]
11 June 2015, 19:50:15  [ 141.006600]
11 June 2015, 19:50:15  [<ffffffff810730a6>] ? irq_exit+0x86/0xb0
11 June 2015, 19:50:15  [<ffffffff8154d1d6>] ? do_IRQ+0x66/0x110
11 June 2015, 19:50:15  [<ffffffff8154b06d>] ? common_interrupt+0x6d/0x6d
11 June 2015, 19:50:15  [ 141.006609] <EOI>
11 June 2015, 19:50:15  [ 141.006616]
11 June 2015, 19:50:15  [ 141.006619]
11 June 2015, 19:50:15  [<ffffffff8101f6b2>] ? default_idle+0x22/0xf0
11 June 2015, 19:50:15  [<ffffffff810b1818>] ? cpu_startup_entry+0x2e8/0x4b0
11 June 2015, 19:50:15  [ 141.006624]
11 June 2015, 19:50:15  [ 141.006629]
11 June 2015, 19:50:15  [<ffffffff81900a1a>] ? set_init_arg+0x4d/0x4d
11 June 2015, 19:50:15  [<ffffffff81900120>] ? early_idt_handlers+0x120/0x120
11 June 2015, 19:50:15  [<ffffffff8190072b>] ? x86_64_start_kernel+0x150/0x15f
11 June 2015, 19:50:15  [ 141.006635] Code:
11 June 2015, 19:50:15  4c 89 e3 eb 26 66 0f 1f 44 00 00 48 03 43 20 48 39 c8 
77 25 f0 80 63 01 fe 45 85 c0 0f 85 a1 00 00 00 48 8b 5b 08 49 39 dc 74 26 <48> 
8b 43 28 25 ff 0f 00 00 4c 39 e8 73 d2 48 8b 03 48 8b 5b 08
11 June 2015, 19:50:15  [<ffffffffa015ba38>] ext4_finish_bio+0xd8/0x220 [ext4]
11 June 2015, 19:50:15  [ 141.006682] CR2: 0000000000000028
11 June 2015, 19:50:15  [ 141.006705] Kernel panic - not syncing: Fatal 
exception in interrupt
11 June 2015, 19:50:15  [ 141.009665] Kernel Offset: 0x0 from 
0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
11 June 2015, 19:50:15  [ 141.009665] ---[ end Kernel panic - not syncing: 
Fatal exception in interrupt

Reply via email to