Re: [Qemu-devel] [PULL for-2.5 2/4] block: Don't wait serialising for non-COR read requests

Kevin Wolf Tue, 08 Dec 2015 06:12:08 -0800

Am 08.12.2015 um 14:58 hat Christian Borntraeger geschrieben:
> On 12/08/2015 02:45 PM, Kevin Wolf wrote:
> > Am 08.12.2015 um 14:28 hat Christian Borntraeger geschrieben:
> >> On 12/08/2015 01:30 PM, Christian Borntraeger wrote:
> >>> On 12/08/2015 01:00 PM, Cornelia Huck wrote:
> >>>> On Tue, 8 Dec 2015 10:59:54 +0100
> >>>> Kevin Wolf <kw...@redhat.com> wrote:
> >>>>
> >>>>> Am 07.12.2015 um 17:42 hat Cornelia Huck geschrieben:
> >>>>>> On Mon, 7 Dec 2015 11:02:51 +0100
> >>>>>> Cornelia Huck <cornelia.h...@de.ibm.com> wrote:
> >>>>>>
> >>>>>>> On Thu,  3 Dec 2015 13:00:00 +0800
> >>>>>>> Stefan Hajnoczi <stefa...@redhat.com> wrote:
> >>>>>>>
> >>>>>>>> From: Fam Zheng <f...@redhat.com>
> >>>>>>>>
> >>>>>>>> The assertion problem was noticed in 06c3916b35a, but it wasn't
> >>>>>>>> completely fixed, because even though the req is not marked as
> >>>>>>>> serialising, it still gets serialised by wait_serialising_requests
> >>>>>>>> against other serialising requests, which could lead to the same
> >>>>>>>> assertion failure.
> >>>>>>>>
> >>>>>>>> Fix it by even more explicitly skipping the serialising for this
> >>>>>>>> specific case.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Fam Zheng <f...@redhat.com>
> >>>>>>>> Message-id: 1448962590-2842-2-git-send-email-f...@redhat.com
> >>>>>>>> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
> >>>>>>>> ---
> >>>>>>>>  block/backup.c        |  2 +-
> >>>>>>>>  block/io.c            | 12 +++++++-----
> >>>>>>>>  include/block/block.h |  4 ++--
> >>>>>>>>  trace-events          |  2 +-
> >>>>>>>>  4 files changed, 11 insertions(+), 9 deletions(-)
> >>>>>>>
> >>>>>>> This one causes segfaults for me:
> >>>>>>>
> >>>>>>> Program received signal SIGSEGV, Segmentation fault.
> >>>>>>> bdrv_is_inserted (bs=0x800000000000) at 
> >>>>>>> /data/git/yyy/qemu/block.c:3071
> >>>>>>> 3071      if (!drv) {
> >>>>>>>
> >>>>>>> (gdb) bt
> >>>>>>> #0  bdrv_is_inserted (bs=0x800000000000) at 
> >>>>>>> /data/git/yyy/qemu/block.c:3071
> >>>>>
> >>>>> This looks like some kind of memory corruption that hit blk->bs. It's
> >>>>> most definitely not a valid pointer anyway.
> >>>>>
> >>>>>>> #1  0x0000000080216974 in blk_is_inserted (blk=<optimized out>)
> >>>>>>>     at /data/git/yyy/qemu/block/block-backend.c:986
> >>>>>>> #2  0x00000000802169c6 in blk_is_available 
> >>>>>>> (blk=blk@entry=0x3ffb17e7960)
> >>>>>>>     at /data/git/yyy/qemu/block/block-backend.c:991
> >>>>>>> #3  0x0000000080216d12 in blk_check_byte_request 
> >>>>>>> (blk=blk@entry=0x3ffb17e7960, 
> >>>>>>>     offset=offset@entry=4928966656, size=16384)
> >>>>>>>     at /data/git/yyy/qemu/block/block-backend.c:558
> >>>>>>> #4  0x0000000080216df2 in blk_check_request 
> >>>>>>> (blk=blk@entry=0x3ffb17e7960, 
> >>>>>>>     sector_num=sector_num@entry=9626888, 
> >>>>>>> nb_sectors=nb_sectors@entry=32)
> >>>>>>>     at /data/git/yyy/qemu/block/block-backend.c:589
> >>>>>>> #5  0x0000000080217ee8 in blk_aio_readv (blk=0x3ffb17e7960, 
> >>>>>>> sector_num=
> >>>>>>>     9626888, iov=0x8098c658, nb_sectors=<optimized out>, cb=
> >>>>>>>     0x80081150 <virtio_blk_rw_complete>, opaque=0x80980620)
> >>>>>>>     at /data/git/yyy/qemu/block/block-backend.c:727
> >>>>>>> #6  0x000000008008186e in submit_requests (niov=<optimized out>, 
> >>>>>>>     num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized 
> >>>>>>> out>, 
> >>>>>>>     blk=<optimized out>) at 
> >>>>>>> /data/git/yyy/qemu/hw/block/virtio-blk.c:366
> >>>>>>> #7  virtio_blk_submit_multireq (mrb=<optimized out>, blk=<optimized 
> >>>>>>> out>)
> >>>>>>>     at /data/git/yyy/qemu/hw/block/virtio-blk.c:444
> >>>>>>> #8  virtio_blk_submit_multireq (blk=0x3ffb17e7960, mrb=0x3ffffffeb58)
> >>>>>>>     at /data/git/yyy/qemu/hw/block/virtio-blk.c:389
> >>>>>>> #9  0x00000000800823ee in virtio_blk_handle_output (vdev=<optimized 
> >>>>>>> out>, 
> >>>>>>>     vq=<optimized out>) at 
> >>>>>>> /data/git/yyy/qemu/hw/block/virtio-blk.c:615
> >>>>>>> #10 0x00000000801e367e in aio_dispatch (ctx=0x80918520)
> >>>>>>>     at /data/git/yyy/qemu/aio-posix.c:326
> >>>>>>> #11 0x00000000801d28b0 in aio_ctx_dispatch (source=<optimized out>, 
> >>>>>>>     callback=<optimized out>, user_data=<optimized out>)
> >>>>>>>     at /data/git/yyy/qemu/async.c:231
> >>>>>>> #12 0x000003fffd36a05a in g_main_context_dispatch ()
> >>>>>>>    from /lib64/libglib-2.0.so.0
> >>>>>>> #13 0x00000000801e0ffa in glib_pollfds_poll ()
> >>>>>>>     at /data/git/yyy/qemu/main-loop.c:211
> >>>>>>> #14 os_host_main_loop_wait (timeout=<optimized out>)
> >>>>>>>     at /data/git/yyy/qemu/main-loop.c:256
> >>>>>>> #15 main_loop_wait (nonblocking=<optimized out>)
> >>>>>>>     at /data/git/yyy/qemu/main-loop.c:504
> >>>>>>> #16 0x00000000800148a6 in main_loop () at /data/git/yyy/qemu/vl.c:1923
> >>>>>>> #17 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized 
> >>>>>>> out>)
> >>>>>>>     at /data/git/yyy/qemu/vl.c:4684
> >>>>>>>
> >>>>>>> Relevant part of command line:
> >>>>>>>
> >>>>>>> -drive 
> >>>>>>> file=/dev/sda,if=none,id=drive-virtio-disk0,format=raw,serial=ccwzfcp1,cache=none
> >>>>>>>  -device 
> >>>>>>> virtio-blk-ccw,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,scsi=off
> >>>>>>
> >>>>>> I played around a bit. The main part of this change seems to be calling
> >>>>>> wait_serialising_requests() conditionally; reverting this makes the
> >>>>>> guest boot again.
> >>>>>>
> >>>>>> I then tried to find out when wait_serialising_requests() was NOT
> >>>>>> called and added fprintfs: well, it was _always_ called. I then added a
> >>>>>> fprintf for flags at the beginning of the function: this produced a
> >>>>>> segfault no matter whether wait_serialising_requests() was called
> >>>>>> conditionally or unconditionally. Weird race?
> >>>>>>
> >>>>>> Anything further I can do? I guess this patch fixes a bug for someone,
> >>>>>> but it means insta-death for my setup...
> >>>>>
> >>>>> If it happens immediately, perhaps running under valgrind is possible
> >>>>> and could give some hints about what happened with blk->bs?
> >>>>
> >>>> Just a quick update: This triggers on a qemu built with a not-so-fresh
> >>>> gcc 4.7.2 (and it seems to depend on compiler optimizations as well).
> >>>> We can't trigger it with newer gccs. No hints from valgrind, sadly.
> >>>> Investigation ongoing.
> >>>>
> >>>>
> >>>
> >>> Some updates after looking at Cornelias system. It seem to be related to 
> >>> the F18 gcc that is still on that test system.
> >>>
> >>> Problem happens when hw/block/virtio-blk.c is compiled
> >>> with -O2 and goes away with -O1 and -O0 (I trimmed that down to
> >>> -fexpensive-optimizations) 
> >>>
> >>> The system calls virtio_blk_submit_multireq 26 times and then it messes
> >>> up the blk pointer:
> >>>
> >>> (gdb) display blk->name
> >>> 1: blk->name = 0x80894300 "drive-virtio-disk0"
> >>> (gdb) next
> >>> 419                   if (niov + req->qiov.niov > IOV_MAX) {
> >>> 1: blk->name = 0x80894300 "drive-virtio-disk0"
> >>> (gdb) 
> >>> 424                   if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors 
> >>> > max_xfer_len) {
> >>> 1: blk->name = 0x80894300 "drive-virtio-disk0"
> >>> (gdb) 
> >>> 419                   if (niov + req->qiov.niov > IOV_MAX) {
> >>> 1: blk->name = 0x80894300 "drive-virtio-disk0"
> >>> (gdb) 
> >>> 429                   if (sector_num + nb_sectors != req->sector_num) {
> >>> 1: blk->name = 0x80894300 "drive-virtio-disk0"
> >>> (gdb) 
> >>> 434                       submit_requests(blk, mrb, start, num_reqs, 
> >>> niov);
> >>> 1: blk->name = 0x80894300 "drive-virtio-disk0"
> >>> (gdb) 
> >>> 413           for (i = 0; i < mrb->num_reqs; i++) {
> >>> 1: blk->name = 0x8089ae40 ""
> >>>
> >>>
> >>> uninlining submit_request makes the problem go away, so might be a
> >>> gcc bug in Fedora18. I am now going to look at the disassembly to be
> >>> sure.
> >>
> >> Not a compiler bug. gcc uses a floating point register 8 to spill
> >> the pointer of blk (which is call saved) submit_request will later
> >> on call  qemu_coroutine_enter and after returning from 
> >> qemu_coroutine_enter, the fpr8 contains junk. Not sure yet, what happened.
> > 
> > Coroutines don't save the FPU state, so you're not supposed to use
> > floating point operations inside coroutines. That the compiler spills
> > some integer value into a floating point register is a bit nasty...
> 
> Just checked.  bdrv_aligned_preadv does also use fprs (also for filling
> and spilling). Some versions of gcc seem to like that as the LDGR and LGDR
> instructions are pretty cheap and move the content from/to fprs in a bitwise
> fashion. So this coroutine DOES trash floating point registers.
> 
> Without the patch gcc seems to be fine with the 16 gprs and does not
> spilling/filling from/to fprs in bdrv_aligned_preadv.


Actually, on closer look it seems that the reason why there is no code
for saving the floating point registers in setjmp() on x86 is that they
are caller-save registers anyway, so it doesn't have to. Otherwise the
internet seems to be of the opinion that longjmp() must indeed restore
floating point registers.

So this might be a libc bug on s390 then.

Kevin

Re: [Qemu-devel] [PULL for-2.5 2/4] block: Don't wait serialising for non-COR read requests

Reply via email to