Stefan Hajnoczi <stefa...@gmail.com> writes: > In a case like this it might be most effective to catch a VM in the > bad state and then go in with gdb to see what is broken. The basic > approach would be putting breakpoints on the e1000 device model's > transmit/receive paths to see if the guest is giving us packets and > whether the tap device is transmitting/receiving. If guest and host > appear to be working then QEMU's e1000 model must be in a bad state > and it's a question of looking at the tx/rx rings and other hardware > emulation state to figure out what went wrong.
Hi Stefan. I tried setting a breakpoint on start_xmit, but the qemu blew up when I hit it: (gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:start_xmit Function "start_xmit" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:528 Breakpoint 1 at 0x46dcd6: file /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c, line 528. (gdb) cont Continuing. Program terminated with signal SIGTRAP, Trace/breakpoint trap. The program no longer exists. I assume this is some subtlety with breakpointing threaded code? However, along these lines, I note that the guest appears to have received packets, though this count is stuck at 1993 bytes. The TX count marches upwards as I ping outbound from the guest. If I attach a tcpdump to tap1 on the host, I see the ARP requests going out and apparently no reply: 0024# tcpdump -i tap1 tcpdump: WARNING: tap1: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on tap1, link-type EN10MB (Ethernet), capture size 65535 bytes 12:08:35.654992 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:08:36.654976 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:08:37.654975 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:08:38.670933 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:08:39.670922 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:08:40.670908 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28 Looking on br0, I do seem to see the replies: 12:12:53.509471 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:12:53.509914 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46 12:12:54.509455 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:12:54.509875 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46 12:12:55.509447 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:12:55.509878 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46 12:12:56.525424 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:12:56.525854 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46 12:12:57.525408 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28 12:12:57.525837 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46 but they never get to tap1 despite STP being disabled and no bridge port filtering: # ebtables -L Bridge table: filter Bridge chain: INPUT, entries: 0, policy: ACCEPT Bridge chain: FORWARD, entries: 0, policy: ACCEPT Bridge chain: OUTPUT, entries: 0, policy: ACCEPT # brctl show br0 bridge name bridge id STP enabled interfaces br0 8000.002590224ffa no eth0 This looks uncannily like a kernel problem doesn't it? However, remove the -usbdevice tablet, and it goes away, which is truly weird! I've just done a hundred successful reboots without it once again to confirm to myself that I'm definitely not imagining that behaviour. > Have you tried unloading the e1000 kernel module inside the guest and > then modprobing it again? Does this "fix" the issue? Hadn't thought of that, but no, it apparently has no effect. It's still broken after I rmmod it, modprobe it again, and reconfigure the networking. Cheers, Chris.