On Mon, 30 Aug 2021 17:55:04 +0200 Christian Schoenebeck <qemu_...@crudebyte.com> wrote:
> Apparently commit 8d6cb100731c4d28535adbf2a3c2d1f29be3fef4 '9pfs: reduce > latency of Twalk' has introduced occasional crashes. > > My first impression after looking at the backtrace: looks like the patch > itself is probably not causing this, but rather unmasked this issue (i.e. > increased the chance to be triggered). > > The crash is because of 'elem' is NULL in virtio_pdu_vunmarshal() (frame 0). > Ouch... this certainly isn't expected to happen :-\ elem is popped out the vq in handle_9p_output(): elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); if (!elem) { goto out_free_pdu; } [...] v->elems[pdu->idx] = elem; and cleared on PDU completion in virtio_9p_push_and_notify() : v->elems[pdu->idx] = NULL; I can't think of a way where push_and_notify() could collide with handle_output()... both are supposed to be run sequentially by the main event loop. Maybe active some traces ? > bt taken with HEAD being 8d6cb100731c4d28535adbf2a3c2d1f29be3fef4: > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 virtio_pdu_vunmarshal (pdu=0x55a93717cde8, offset=7, fmt=0x55a9352766d1 > "ddw", ap=0x7f38a9ad9cd0) at ../hw/9pfs/virtio-9p-device.c:146 > 146 ret = v9fs_iov_vunmarshal(elem->out_sg, elem->out_num, offset, 1, > fmt, ap); > [Current thread is 1 (Thread 0x7f3bddd2ac40 (LWP 7811))] > (gdb) bt full > #0 0x000055a934dfb9a7 in virtio_pdu_vunmarshal (pdu=0x55a93717cde8, > offset=7, > fmt=0x55a9352766d1 "ddw", ap=0x7f38a9ad9cd0) at ../hw/9pfs/virtio-9p-device.c: > 146 > s = 0x55a93717b4b8 > v = 0x55a93717aee0 > elem = 0x0 So this is v->elems[pdu->idx]... what's the value of pdu->idx ? > ret = <optimized out> > #1 0x000055a934bf35e8 in pdu_unmarshal (pdu=pdu@entry=0x55a93717cde8, > offset=offset@entry=7, fmt=fmt@entry=0x55a9352766d1 "ddw") at ../hw/9pfs/9p.c: > 71 > ret = <optimized out> > ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = > 0x7f38a9ad9db0, reg_save_area = 0x7f38a9ad9cf0}} > #2 0x000055a934bf68db in v9fs_walk (opaque=0x55a93717cde8) at ../hw/9pfs/ > 9p.c:1720 > name_idx = <optimized out> > qids = 0x0 > i = <optimized out> > err = 0 > dpath = {size = 0, data = 0x0} > path = {size = 0, data = 0x0} > pathes = 0x0 > nwnames = 1 > stbuf = > {st_dev = 2050, st_ino = 1199848, st_nlink = 1, st_mode = 41471, > st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 13, st_blksize = > 4096, st_blocks = 16, s} > fidst = > {st_dev = 2050, st_ino = 1198183, st_nlink = 3, st_mode = 16877, > st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 12288, st_blksize > = > 4096, st_blocks = 32} > stbufs = 0x0 > offset = 7 > fid = 299 > newfid = 687 > wnames = 0x0 > fidp = <optimized out> > newfidp = 0x0 > pdu = 0x55a93717cde8 > s = 0x55a93717b4b8 > qid = {type = 2 '\002', version = 1556732739, path = 2399697} > #3 0x000055a93505760b in coroutine_trampoline (i0=<optimized out>, > i1=<optimized out>) at ../util/coroutine-ucontext.c:173 > > > >