On 29.09.2011, at 18:06, Amit Shah wrote: > On (Wed) 03 Aug 2011 [13:24:22], Jan Kiszka wrote: >> From: Fabien Chouteau <chout...@adacore.com> >> >> In the current implementation, if Slirp tries to send an IP packet to a >> client >> with an unknown hardware address, the packet is simply dropped and an ARP >> request is sent (if_encap in slirp/slirp.c). >> >> With this patch, Slirp will send the ARP request, re-queue the packet and try >> to send it later. The packet is dropped after one second if the ARP reply is >> not received. > > This patch causes a segfault when guests wake up from hibernate. > > Recipe: > 1. Start guest with -net user -net nic,model=virtio > 2. (guest) ping 10.0.2.2 > 3. (guest) echo "disk" > /sys/power/state > 4. Re-start guest with same command line > 5. Ping has stopped receiving replies. > 6. Kill that ping process and start a new one. qemu segfaults. > > This needs the not-upstream-yet virtio S4 handling patches, found at > http://thread.gmane.org/gmane.linux.kernel/1197141 > > The backtrace is: > > (gdb) bt > #0 0x00007ffff7e421f7 in slirp_insque (a=0x0, b=0x7ffff8f95d50) at > /home/amit/src/qemu/slirp/misc.c:27 > #1 0x00007ffff7e40738 in if_start (slirp=0x7ffff8a9cdf0) at > /home/amit/src/qemu/slirp/if.c:194 > #2 0x00007ffff7e44828 in slirp_select_poll (readfds=0x7fffffffd930, > writefds=0x7fffffffd9b0, xfds=0x7fffffffda30, select_error=0) > at /home/amit/src/qemu/slirp/slirp.c:588 > #3 0x00007ffff7e110f1 in main_loop_wait (nonblocking=<optimized out>) > at /home/amit/src/qemu/vl.c:1549 > #4 0x00007ffff7d7dc47 in main_loop () at > /home/amit/src/qemu/vl.c:1579 > #5 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized > out>) at /home/amit/src/qemu/vl.c:3574 > > > Reverting the patch keeps the ping going on after resume.
I get the same thing with yesterday's HEAD (close to 1.0-rc3), but without hibernation. I'm running KVM Autotest on PPC machines to check my ppc-next queue and every single test failed for me because of segmentation faults in the slirp code. Reverting this patch (and the follow-up patch which fixes the struct mbuf definition) makes all tests not segfault for me, so I'm fairly sure this is the offending one :). I'm not saying that the patch is actually wrong - maybe it only exposes another bug that was only hidden so far. Either way, the breakage looks pretty much like memory corruption to me. Also, I'm having a hard time reproducing the problem manually. It triggers every time in Autotest, but never when I try to trigger it manually. Essentially Autotest is merely trying to connect to the guest using ssh every couple of seconds, so I don't know why I can't reproduce it without it. Please fix or revert this for 1.0. Alex