On 06/29/2015 06:39 PM, Stefan Hajnoczi wrote: > On Fri, Jun 26, 2015 at 05:06:01PM +0800, Jason Wang wrote: >> >> On 06/25/2015 05:18 PM, Stefan Hajnoczi wrote: >>> e1000_can_receive() checks the link up status register bit. If the bit >>> is clear, packets will be queued and the peer may disable receive to >>> avoid wasting CPU reading packets that cannot be delivered. The queue >>> must be flushed once the link comes back up again. >>> >>> This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests >>> and tap networking. Flushing the queue invokes the async send callback, >>> which re-enables tap fd read. >>> >>> Reported-by: Jonathan Liu <net...@gmail.com> >>> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> >>> --- >>> hw/net/e1000.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c >>> index bab8e2a..5c6bcd0 100644 >>> --- a/hw/net/e1000.c >>> +++ b/hw/net/e1000.c >>> @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) >>> { >>> s->mac_reg[STATUS] |= E1000_STATUS_LU; >>> s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; >>> + >>> + /* E1000_STATUS_LU is tested by e1000_can_receive() */ >>> + qemu_flush_queued_packets(qemu_get_queue(s->nic)); >>> } >>> >>> static bool >> This only solves the issue partially and just for e1000. After checking >> all can_receive() functions, another card with similar behaviour is vmxnet3. > No, this bug is specific to e1000 (details on that below in case 2b). > > There are more issues though, so additional patches are welcome. > >> Looking at commit a90a7425cf592a3afeff3eaf32f543b83050ee5c ("tap: Drop >> tap_can_send") again. The commit disable tap read poll when >> qemu_net_queue_send() returns zero. Which is usually the following cases: >> >> 1) queue->delivering is 1 >> 2) qemu_can_send_packet() returns zero, which is: >> 2a) vm_running is false >> or >> 2b) can_receive() return zero >> 3) qemu_net_queue_deliver() returns zero, which is: >> 3a) nc->receive_disabled is true >> or >> 3b) info->receive_raw() or nc->receive->receive() returns zero >> >> This means we should enable tap read poll when one of those conditions >> is not existed. This patch fixes 2b) only for e1000. >> >> for 1, I'm not quite sure it's a real problem or how to fix. > This is not a problem. When the delivery code returns it flushes the > queue. This will invoke send callback functions, so tap will re-enable > read poll. > >> for 2a, we may probably need a vm state change handler to flush the >> queue, otherwise network will stall after a stop and cont. > Yes, I'm surprised the code isn't doing this already. > >> for 2b, we can either fixes the card that check link status in its >> can_receive() or just simply can qemu_flush_queued_packets() in set_link. > No, this doesn't work because the e1000 link status bit is *not* > equivalent to nc->link_down. The link auto-negotiation code changes the > e1000 bit independently from nc->link_down using a timer. > > The net subsystem cannot know when this e1000-specific bit is supposed > to change. Therefore there is no general fix. > > Also, qemu_send_packet() drops packets while nc->link_down is true. > We're not supposed to queue so it shouldn't be necessary to flush > packets in set_link at the net subsystem level.
But qemu_send_packet() only check sender's nc->link_down. So tap read poll will be disabled and won't get enabled when its peers' link status is recovered. > >> 3a and 3b looks ok, since this happen when there's no space for rx, >> after guest refill, qemu will call qemu_flush_queued_packets() to flush >> and re-enable tap read poll. >> >> 2a and 2b does not exits before this commit, since tap_send check >> qemu_can_send_packet() before. > The delivery and queuing mechanism in QEMU is overcomplicated. The > semantics of .can_receive() and .receive() need to be clarified before > we can do real cleanup though: > > .receive() returning 0 is not quite the same as .can_receive() returning > false. > > When .receive() returns 0 this means the peer should wait before sending > more. In most cases the net queue only needs to hold 1 packet. But > there are cases like slirp where QEMU code generates packets and may not > be able to continue without calling qemu_send_packet() on more packets > (I'm not sure but I suspect this is the case). > > When .can_receive() returns false it also casues packets to be queued, > but NIC emulation code often includes more conditions in .can_receive() > than .receive() returning 0. Some NICs do even more tricks. E.g when receiver is disabled, 8139's can_receive() returns true and expect receive() to drop the packets. > > Cleaning this up requires auditing all the NICs. Maybe we can get rid > of .can_receive() and just have a simplified .receive(): > > false - queue packet and tell peer to stop sending. Flush must be > called later to re-enable sending. And call can_receive() in receive(). > true - packet delivered or dropped This looks good, but do you think we can do this in 2.4? Looks a little bit risky.