On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote: > Folks, > > I had a crash of a qemu guest in tracked_request_begin. > The testcase was a guest with ramdisk/kernel that reboots in a > loop. (about 10 times per second) with a single null-co disk > attached. No idea how to reproduce this, seems to be a lucky hit. > > (gdb) bt > #0 0x00000000101db5ba in tracked_request_begin (req=req@entry=0x3ff90f1bdc0, > bs=bs@entry=0x42a39190, offset=offset@entry=0, bytes=bytes@entry=4096, > type=type@entry=BDRV_TRACKED_READ) > at /home/cborntra/REPOS/qemu/block/io.c:390 > #1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, > bytes=4096, qiov=0x3ff7400cbd8, flags=<optimized out>, flags@entry=(unknown: > 0)) > at /home/cborntra/REPOS/qemu/block/io.c:1001 > #2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), > qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, > bs=<optimized out>) > at /home/cborntra/REPOS/qemu/block/io.c:1024 > #3 bdrv_co_do_rw (opaque=0x3ff7400e370) at > /home/cborntra/REPOS/qemu/block/io.c:2173 > #4 0x000000001022d8f6 in coroutine_trampoline (i0=<optimized out>, > i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79 > #5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6 > > looking at the code we are at > > QLIST_INSERT_HEAD(&bs->tracked_requests, req, list); > which translates to > > if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL) > (&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next; > (&bs->tracked_requests)->lh_first = (req); > (req)->list.le_prev = &(&bs->tracked_requests)->lh_first; > > gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile > (gdb) print /x bs->tracked_requests > $6 = {lh_first = 0x0} > > Now looking at the code I am asking myself if this can happen in parallel > to another code that touches tracked_requests, because gcc seems to read > &bs->tracked_requests)->lh_first twice (first to check the value, then > to use it as pointer)
tracked_requests is protected by AioContext. Perhaps something is doing I/O without acquiring AioContext? Luckily there is only 1 place where items are added and removed from tracked_requests. This might make debugging somewhat easier. > > 388 qemu_co_queue_init(&req->wait_queue); > 0x00000000101db594 <+76>: la %r2,72(%r13) > 0x00000000101db598 <+80>: brasl %r14,0x1022cdc0 <qemu_co_queue_init> > > 389 > 390 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list); > 0x00000000101db59e <+86>: lg %r1,12744(%r12) # r1 = > (&bs->tracked_requests)->lh_first) > 0x00000000101db5a4 <+92>: stg %r1,48(%r13) # > (req)->list.le_next = r1 > 0x00000000101db5aa <+98>: cgij %r1,0,8,0x101db5c0 ---+ # if r1==0 goto > 0x00000000101db5b0 <+104>: lg %r1,12744(%r12) | # r1 = > (&bs->tracked_requests)->lh_first) (again!!) > 0x00000000101db5b6 <+110>: la %r2,48(%r13) | > => 0x00000000101db5ba <+114>: stg %r2,56(%r1) | # r1==0 bang > 0x00000000101db5c0 <+120>: stg %r13,12744(%r12)<-----+ > 0x00000000101db5c6 <+126>: lay %r12,12744(%r12) > 0x00000000101db5cc <+132>: stg %r12,56(%r13) > > > Christian >
signature.asc
Description: PGP signature