On Fri, Apr 11, 2014 at 02:47:20PM +0200, Heinz Graalfs wrote: > Hello Markus, > > I finally managed to reproduce the problem, at least once ... > > The scenario was: > dd if=/dev/vdx1 of=partitionone > > followed by a virsh detach... (with the device_del() under the cover) > during active dd processing > > dmesg shows: > > [79026.220718] User process fault: interruption code 0x40009 in > qemu-system-s390x[80000000+43c000] > [79026.220737] CPU: 3 PID: 20117 Comm: qemu-system-s39 Not tainted > 3.10.33+ #4 > [79026.220742] task: 0000000085930000 ti: 00000000e0b74000 task.ti: > 00000000e0b74000 > [79026.220746] User PSW : 0705000180000000 0000000080269722 (0x80269722) > [79026.220774] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 > CC:0 PM:0 EA:3 > User GPRS: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [79026.220827] 0000000000000000 0000000000000000 > 00000000803761dc 00000000809001d0 > [79026.220831] 000003fffd0391a8 00000000808ff560 > 00000000808eb020 000003ffffc03838 > [79026.220834] 0000000000000000 0000000080375e88 > 00000000802696f6 000003ffffc03838 > [79026.220847] User Code: 0000000080269716: b9040034 lgr %r3,%r4 > 000000008026971a: a7290000 lghi %r2,0 > #000000008026971e: b9870021 dlgr %r2,%r1 > >0000000080269722: b9040012 lgr %r1,%r2 > 0000000080269726: 5010b0a0 st %r1,160(%r11) > 000000008026972a: 5820b0a0 l %r2,160(%r11) > 000000008026972e: e310b0a80004 lg %r1,168(%r11) > 0000000080269734: 58101000 l %r1,0(%r1) > [79026.220875] Last Breaking-Event-Address: > [79026.220878] [<000000008026904c>] 0x8026904c > > with > PSW addr: 0000000080269722 pointing into virtio_queue_empty > > ... > 000000008026a67c t virtio_notify_vector > 00000000802694a4 T virtio_queue_empty > 000000008026b440 T virtio_queue_get_addr > ... > > and coming from: > R14 00000000802696f6 virtqueue_fill > ... > 0000000080269f74 T virtqueue_avail_bytes > 0000000080269540 T virtqueue_fill > 000000008026979c T virtqueue_flush > 0000000080269c2c T virtqueue_get_avail_bytes > ... > > see below the complete core_backtrace ... > > On 02/04/14 19:40, Markus Armbruster wrote: > >Heinz Graalfs <graa...@linux.vnet.ibm.com> writes: > > > >>On 01/04/14 17:48, Markus Armbruster wrote: > >>>Heinz Graalfs <graa...@linux.vnet.ibm.com> writes: > >>> > >>>>Hi Kevin, > >>>> > >>>>doing a > >>>> > >>>> virsh detach-device ... > >>>> > >>>>ends up in the following QEMU monitor commands: > >>>> > >>>>1. device_del ... > >>>>2. drive_del ... > >>>> > >>>>qmp_device_del() performs the device unplug path. > >>>>In case of a block device do_drive_del() tries to > >>>>prevent further IO against the host device. > >>>> > >>>>However, bdrv_find() during drive_del() results in > >>>>an error, because the device is already gone. Due to > >>>>this error all the bdrv_xxx calls to quiesce the block > >>>>driver as well as all other processing is skipped. > >>>> > >>>>Is the sequence that libvirt triggers OK? > >>>>Shouldn't drive_del be executed first? > >>> > >>>No. > >> > >>OK, I see. The drive is deleted implicitly (release_drive()). > >>Doing a device_del() requires another drive_add() AND device_add(). > > > >Yes. The automatic drive delete on unplug is a wart that will not be > >preserved in the new interfaces we're building now. > > > >>(Doing just a device_add() complains about the missing drive. > >>A subsequent info qtree lets QEMU abort.) > > > >Really? Got a reproducer for me? > > > >>>drive_del is nasty. Its purpose is to revoke access to an image even > >>>when the guest refuses to cooperate. To the guest, this looks like > >>>hardware failure. > >> > >>Deleting a device during active IO is nasty and it should look like a > >>hardware failure. I would expect lots of errors. > >> > >>> > >>>If you drive_del before device_del, even a perfectly well-behaved guest > >>>guest is exposed to a terminally broken device between drive_del and > >>>completion of unplug. > >> > >>The early drive_del() would mean that no further IO would be > >>possible. > > > >A polite way to describe the effect of drive_del on the guest. > > > >drive_del makes all attempts at actual I/O error out without any > >forewarning whatsoever. If you do that to your guest, and something > >breaks, you get to keep the pieces. Even if you "only" do it for a > >short period of time. > > > >>>Always try a device_del first, and only if that does not complete within > >>>reasonable time, and you absolutely must revoke access to the image, > >>>then whack it over the head with drive_del. > >> > >>What is this reasonable time? > > > >Depends on how heavily loaded host and guest are. Suggest to measure > >with a suitably thrashing guest, then shift left to taste. > > > >>On 390 we experience problems (QEMU abort) when asynch block IO > >>completes and the virtqueues are already gone. I suppose the > >>bdrv_drain_all() in bdrv_close() is a little late. I don't see such > >>problems with an early bdrv_drain_all() (drive_del) and an unplug > >>(device_del) afterwards. > > > >Sounds like a bug. Got a reproducer? > > > > > here is the complete core_backtrace: > > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269722 virtqueue_fill > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x269924 virtqueue_push > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a400 > virtio_blk_req_complete > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x23a6a4 > virtio_blk_rw_complete > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x34686 bdrv_co_em_bh > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xcd40 aio_bh_poll > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xc7f0 aio_poll > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2ac9c bdrv_drain_all > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2a718 bdrv_close > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2b97e bdrv_delete > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x36db4 bdrv_unref > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa02ea drive_uninit > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xa0448 drive_put_ref > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x9fae6 blockdev_auto_del > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd0bf0 release_drive > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c158 > object_property_del_all > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c4e4 object_finalize > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18d6d2 object_unref > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x18c382 object_unparent > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x25ca14 > virtio_ccw_busdev_unplug > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0xd6c88 qdev_unplug > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15c5f0 qmp_device_del > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x182b92 > qmp_marshal_input_device_del > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fa36 qmp_call_cmd > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fd2c handle_qmp_command > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x2fff3c > json_message_process_token > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32adfe > json_lexer_feed_char > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x32b08c json_lexer_feed > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x3000ee > json_message_parser_feed > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x28fe4c > monitor_control_read > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x15cd66 qemu_chr_be_write > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x162bfe tcp_chr_read > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > b8d9d8c328fccab2bc69bfa6d17bcb2f46dfb163 0x4ff58 > g_main_context_dispatch /lib64/libglib-2.0.so.0 - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b654 glib_pollfds_poll > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b7e0 > os_host_main_loop_wait > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x10b934 main_loop_wait > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1e6a7e main_loop > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > 7d40b30ece2f256dfd8600446a9da7b0a1962730 0x1ef642 main > /test/qemu/build/s390x-softmmu/qemu-system-s390x - > > I suppose that asynchronous block IO processing completes after the > guest already did the virtqueue cleanup. > Both, machine check injection and device release processing are > triggered in the unplug callback virtio_ccw_busdev_unplug(). > I suppose the device release processing (with its wait for > asynchronous processing to complete) happens too late. > > What do you think?
This is interesting and definitely something that should never happen. QEMU cannot rely on well-behaved guests, so we need to ensure that hotplug works while guest I/O is in in-flight. Does this reproducer trigger on qemu.git/master? I'm concerned about BlockDriverState refcounting and the virtio device model refcounts which I think are sane in qemu.git/master but maybe not before 2.0.0. Stefan