On 12/04/17 00:45, Glenn Enright wrote: > On 12/04/17 10:23, Andrew Cooper wrote: >> On 11/04/2017 23:13, Glenn Enright wrote: >>> On 11/04/17 21:49, Dietmar Hahn wrote: >>>> Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright: >>>>> On 11/04/17 17:59, Juergen Gross wrote: >>>>>> On 11/04/17 07:25, Glenn Enright wrote: >>>>>>> Hi all >>>>>>> >>>>>>> We are seeing an odd issue with domu domains from xl destroy, under >>>>>>> recent 4.9 kernels a (null) domain is left behind. >>>>>> >>>>>> I guess this is the dom0 kernel version? >>>>>> >>>>>>> This has occurred on a variety of hardware, with no obvious >>>>>>> commonality. >>>>>>> >>>>>>> 4.4.55 does not show this behavior. >>>>>>> >>>>>>> On my test machine I have the following packages installed under >>>>>>> centos6, from https://xen.crc.id.au/ >>>>>>> >>>>>>> ~]# rpm -qa | grep xen >>>>>>> xen47-licenses-4.7.2-4.el6.x86_64 >>>>>>> xen47-4.7.2-4.el6.x86_64 >>>>>>> kernel-xen-4.9.21-1.el6xen.x86_64 >>>>>>> xen47-ocaml-4.7.2-4.el6.x86_64 >>>>>>> xen47-libs-4.7.2-4.el6.x86_64 >>>>>>> xen47-libcacard-4.7.2-4.el6.x86_64 >>>>>>> xen47-hypervisor-4.7.2-4.el6.x86_64 >>>>>>> xen47-runtime-4.7.2-4.el6.x86_64 >>>>>>> kernel-xen-firmware-4.9.21-1.el6xen.x86_64 >>>>>>> >>>>>>> I've also replicated the issue with 4.9.17 and 4.9.20 >>>>>>> >>>>>>> To replicate, on a cleanly booted dom0 with one pv VM, I run the >>>>>>> following on the VM >>>>>>> >>>>>>> { >>>>>>> while true; do >>>>>>> dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync >>>>>>> done >>>>>>> } >>>>>>> >>>>>>> Then on the dom0 I do this sequence to reliably get a null domain. >>>>>>> This >>>>>>> occurs with oxenstored and xenstored both. >>>>>>> >>>>>>> { >>>>>>> xl sync 1 >>>>>>> xl destroy 1 >>>>>>> } >>>>>>> >>>>>>> xl list then renders something like ... >>>>>>> >>>>>>> (null) 1 4 4 >>>>>>> --p--d >>>>>>> 9.8 0 >>>>>> >>>>>> Something is referencing the domain, e.g. some of its memory pages >>>>>> are >>>>>> still mapped by dom0. >>>> >>>> You can try >>>> # xl debug-keys q >>>> and further >>>> # xl dmesg >>>> to see the output of the previous command. The 'q' dumps domain >>>> (and guest debug) info. >>>> # xl debug-keys h >>>> prints all possible parameters for more informations. >>>> >>>> Dietmar. >>>> >>> >>> I've done this as requested, below is the output. >>> >>> <snip> >>> (XEN) Memory pages belonging to domain 1: >>> (XEN) DomPage 0000000000071c00: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c01: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c02: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c03: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c04: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c05: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c06: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c07: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c08: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c09: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0a: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0b: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0c: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0d: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0e: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0f: caf=00000001, taf=7400000000000001 >> >> There are 16 pages still referenced from somewhere.
Just a wild guess: could you please try the attached kernel patch? This might give us some more diagnostic data... Juergen
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 8fe61b5dc5a6..304d5d130e0c 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -313,7 +313,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) static void xen_blkif_free(struct xen_blkif *blkif) { - xen_blkif_disconnect(blkif); + WARN_ON(xen_blkif_disconnect(blkif)); xen_vbd_free(&blkif->vbd); /* Make sure everything is drained before shutting down */ @@ -505,7 +505,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev) dev_set_drvdata(&dev->dev, NULL); if (be->blkif) - xen_blkif_disconnect(be->blkif); + WARN_ON(xen_blkif_disconnect(be->blkif)); /* Put the reference we set in xen_blkif_alloc(). */ xen_blkif_put(be->blkif); @@ -792,7 +792,7 @@ static void frontend_changed(struct xenbus_device *dev, * Clean up so that memory resources can be used by * other devices. connect_ring reported already error. */ - xen_blkif_disconnect(be->blkif); + WARN_ON(xen_blkif_disconnect(be->blkif)); break; } xen_update_blkif_status(be->blkif); @@ -803,7 +803,7 @@ static void frontend_changed(struct xenbus_device *dev, break; case XenbusStateClosed: - xen_blkif_disconnect(be->blkif); + WARN_ON(xen_blkif_disconnect(be->blkif)); xenbus_switch_state(dev, XenbusStateClosed); if (xenbus_dev_is_online(dev)) break;
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel