On Fri, 06/26 17:06, Jason Wang wrote: > > > On 06/25/2015 05:18 PM, Stefan Hajnoczi wrote: > > e1000_can_receive() checks the link up status register bit. If the bit > > is clear, packets will be queued and the peer may disable receive to > > avoid wasting CPU reading packets that cannot be delivered. The queue > > must be flushed once the link comes back up again. > > > > This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests > > and tap networking. Flushing the queue invokes the async send callback, > > which re-enables tap fd read. > > > > Reported-by: Jonathan Liu <net...@gmail.com> > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > --- > > hw/net/e1000.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > > index bab8e2a..5c6bcd0 100644 > > --- a/hw/net/e1000.c > > +++ b/hw/net/e1000.c > > @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) > > { > > s->mac_reg[STATUS] |= E1000_STATUS_LU; > > s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; > > + > > + /* E1000_STATUS_LU is tested by e1000_can_receive() */ > > + qemu_flush_queued_packets(qemu_get_queue(s->nic)); > > } > > > > static bool > > This only solves the issue partially and just for e1000. After checking > all can_receive() functions, another card with similar behaviour is vmxnet3. > > Looking at commit a90a7425cf592a3afeff3eaf32f543b83050ee5c ("tap: Drop > tap_can_send") again. The commit disable tap read poll when > qemu_net_queue_send() returns zero. Which is usually the following cases: > > 1) queue->delivering is 1 > 2) qemu_can_send_packet() returns zero, which is: > 2a) vm_running is false > or > 2b) can_receive() return zero > 3) qemu_net_queue_deliver() returns zero, which is: > 3a) nc->receive_disabled is true > or > 3b) info->receive_raw() or nc->receive->receive() returns zero > > This means we should enable tap read poll when one of those conditions > is not existed. This patch fixes 2b) only for e1000. > > for 1, I'm not quite sure it's a real problem or how to fix. > for 2a, we may probably need a vm state change handler to flush the > queue, otherwise network will stall after a stop and cont. > for 2b, we can either fixes the card that check link status in its > can_receive() or just simply can qemu_flush_queued_packets() in set_link. > 3a and 3b looks ok, since this happen when there's no space for rx, > after guest refill, qemu will call qemu_flush_queued_packets() to flush > and re-enable tap read poll. > > 2a and 2b does not exits before this commit, since tap_send check > qemu_can_send_packet() before. > > Looks like netmap has the same issue. >
Ouch! Good catch! I will take a look at the devices today and see if we can fix the problem by adding qemu_flush_queued_packets() calls in the state transition points (vm_running, can_receive). The worst case is reverting the whole series (0bc12c4f7..f4d248bdc3) for 2.4, but dropping can_read is a worthwhile step for optimizing our event loop. Thanks, Fam