On 2012-02-29 22:48, Stefan Weil wrote: > Am 29.02.2012 22:33, schrieb Jan Kiszka: >> On 2012-02-29 22:00, Stefan Weil wrote: >>> Am 29.02.2012 20:15, schrieb Jan Kiszka: >>>> This is an alternative, more complete approach to fix the requeuing- >>>> related crashes reported recently. See patch 2 for details. The rest >>>> are >>>> simple cleanups. >>>> >>>> Please check carefully if I messed something up. >>>> >>> >>> Hi Jan, >>> >>> here is the result of MIPS Malta with your patch series applied: >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x000055555577db5b in slirp_remque (a=0x555556cff360) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39 >>> 39 ((struct quehead *)(element->qh_rlink))->qh_link = >>> element->qh_link; >>> (gdb) i s >>> #0 0x000055555577db5b in slirp_remque (a=0x555556cff360) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39 >>> #1 0x000055555577b7a2 in if_start (slirp=0x5555564bfb80) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:208 >>> #2 0x000055555577b607 in if_output (so=0x555556ea0b70, >>> ifm=0x555556cff9e0) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:139 >>> #3 0x000055555577d040 in ip_output (so=0x555556ea0b70, >>> m0=0x555556cff9e0) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/ip_output.c:84 >>> #4 0x00005555557865d6 in tcp_output (tp=0x555556ea0c20) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/tcp_output.c:456 >>> #5 0x000055555577ff5a in slirp_select_poll (readfds=0x7fffffffda10, >>> writefds=0x7fffffffda90, xfds=0x7fffffffdb10, select_error=0) >>> at /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/slirp.c:480 >>> #6 0x000055555572d8c0 in main_loop_wait (nonblocking=0) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/main-loop.c:469 >>> #7 0x0000555555721a61 in main_loop () at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:1558 >>> #8 0x00005555557284a2 in main (argc=25, argv=0x7fffffffdfe8, >>> envp=0x7fffffffe0b8) at >>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:3667 >>> (gdb) p element >>> $1 = (struct quehead *) 0x555556cff360 >>> (gdb) p *element >>> $2 = {qh_link = 0x555556cff360, qh_rlink = 0x0} >>> (gdb) p (struct quehead *)(element->qh_rlink) >>> $3 = (struct quehead *) 0x0 >> >> Hmm. Two options: >> >> - you try to debug what happens to that mbuf, why its queue anchors >> get corrupted (maybe while in if_encap?) >> - you tell me how to reproduce it (image file, host characteristics) >> >> Jan > > I'm afraid that the first variant won't happen this or next week > because lack of time. > > This is my test environment: > > Debian Squeeze x86_64 host, Debian Squeeze mips guest. > > I use NFS root, and the latest crash happened during boot. > All other crashes happened after the guest had booted > when I startet apt-get update, so maybe booting from a > Debian CDROM might also reproduce the crash. > > I compiled QEMU with a default configuration, but used > CFLAGS=-g (no optimization) and startet QEMU like this: > > gdb --args > /home/stefan/src/qemu/repo.or.cz/qemu/ar7/bin/debug/x86/mips-softmmu/qemu-system-mips > --kernel /tftpboot/malta/boot/vmlinux-2.6.26-2-4kc-malta --initrd > /tftpboot/malta/boot/initrd.img-2.6.26-2-4kc-malta --append "debug > nohz=off root=/dev/nfs rw ip=::::malta::dhcp > nfsroot=10.0.2.2:/tftpboot/malta -bootp abc -tftp /tftpboot/malta" -M > malta --cpu 4KEc -m 256 --net nic,model=pcnet --net user,hostname=malta > --redir tcp:5800::5800 --redir tcp:5900::5900 --redir tcp:10022::22 > --redir tcp:10080::80 > > Kernel and initrd are from Debian Squeeze (mips).
OK, thanks. Here is a last shot (on top of my queue) before I try to reproduce: diff --git a/slirp/if.c b/slirp/if.c index 90bf398..d3bdf58 100644 --- a/slirp/if.c +++ b/slirp/if.c @@ -181,13 +181,12 @@ void if_start(Slirp *slirp) from_batchq = from_batchq_next; ifm_next = ifm->ifq_next; - if (!from_batchq) { - if (ifm_next == &slirp->if_fastq) { - /* No more packets in fastq, switch to batchq */ - ifm_next = slirp->next_m; - from_batchq_next = true; - } - } else if (ifm_next == &slirp->if_batchq) { + if (ifm_next == &slirp->if_fastq) { + /* No more packets in fastq, switch to batchq */ + ifm_next = slirp->next_m; + from_batchq_next = true; + } + if (ifm_next == &slirp->if_batchq) { /* end of batchq */ ifm_next = NULL; } > > I had no slirp problems with that test environment during the last two > years. Yes, these regression here are unfortunate. Hope we can resolve them quickly. Jan
signature.asc
Description: OpenPGP digital signature