Il 31/05/2012 00:53, Luigi Rizzo ha scritto: > The image contains my fast packet generator "pkt-gen" (a stock > traffic generator such as netperf etc. is too slow to show the > problem). pkt-gen can send about 1Mpps in this configuration using > -net netmap in the backend. The qemu process in this case takes 100% > CPU. On the receive side, i cannot receive more than 50Kpps, even if i > flood the bridge with a a huge amount of traffic. The qemu process stays > at 5% cpu or less. > > Then i read on the docs in main-loop.h which says that one case where > the qemu_notify_event() is needed is when using > qemu_set_fd_handler2(), which is exactly what my backend uses > (similar to tap.c)
The path is a bit involved, but I think Luigi is right. The docs say "Remember to call qemu_notify_event whenever the [return value of the fd_read_poll callback] may change from false to true." Now net/tap.c has static int tap_can_send(void *opaque) { TAPState *s = opaque; return qemu_can_send_packet(&s->nc); } and (ignoring VLANs) qemu_can_send_packet is int qemu_can_send_packet(VLANClientState *sender) { if (sender->peer->receive_disabled) { return 0; } else if (sender->peer->info->can_receive && !sender->peer->info->can_receive(sender->peer)) { return 0; } else { return 1; } } So whenever receive_disabled goes from 0 to 1 or can_receive goes from 0 to 1, the _peer_ has to call qemu_notify_event. In e1000.c we have static bool e1000_has_rxbufs(E1000State *s, size_t total_size) { int bufs; /* Fast-path short packets */ if (total_size <= s->rxbuf_size) { return s->mac_reg[RDH] != s->mac_reg[RDT] || !s->check_rxov; } if (s->mac_reg[RDH] < s->mac_reg[RDT]) { bufs = s->mac_reg[RDT] - s->mac_reg[RDH]; } else if (s->mac_reg[RDH] > s->mac_reg[RDT] || !s->check_rxov) { bufs = s->mac_reg[RDLEN] / sizeof(struct e1000_rx_desc) + s->mac_reg[RDT] - s->mac_reg[RDH]; } else { return false; } return total_size <= bufs * s->rxbuf_size; } static int e1000_can_receive(VLANClientState *nc) { E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque; return (s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1); } So as a conservative approximation, you need to fire qemu_notify_event whenever you write to RDH, RDT, RDLEN and RCTL, or when check_rxov becomes zero. In practice, only RDT, RCTL and check_rxov matter. Luigi, does this patch work for you? diff --git a/hw/e1000.c b/hw/e1000.c index 4573f13..0069103 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -295,6 +295,7 @@ set_rx_control(E1000State *s, int index, uint32_t val) s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1; DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT], s->mac_reg[RCTL]); + qemu_notify_event(); } static void @@ -922,6 +923,7 @@ set_rdt(E1000State *s, int index, uint32_t val) { s->check_rxov = 0; s->mac_reg[index] = val & 0xffff; + qemu_notify_event(); } static void RDT is indeed written in the ISR. In the Linux driver, e1000_clean_rx_irq calls adapter->alloc_rx_buf which is e1000_alloc_rx_buffers. There you see this: if (likely(rx_ring->next_to_use != i)) { rx_ring->next_to_use = i; if (unlikely(i-- == 0)) i = (rx_ring->count - 1); /* Force memory writes to complete before letting h/w * know there are new descriptors to fetch. (Only * applicable for weak-ordered memory model archs, * such as IA-64). */ wmb(); writel(i, hw->hw_addr + rx_ring->rdt); } Similarly for all other devices: - cadence_gem -> GEM_NWCTRL - dp8393x -> SONIC_CR, SONIC_ISR - eepro100 -> set_ru_state - mcf_fec -> mcf_fec_enable_rx - milkymist-minimax2 -> R_STATE0, R_STATE1 - mipsnet -> MIPSNET_INT_CTL, MIPSNET_RX_DATA_BUFFER - ne2000 -> EN0_STARTPG, EN0_STOPPG, E8390_CMD - opencores_eth -> TX_BD_NUM, MODER, rx_desc - pcnet -> pcnet_start, csr[5] - rtl8139 -> RxBufPtr and Cfg9346 - smc91c111 -> RCR, smc91c111_release_packet - spapr_llan -> h_add_logical_lan_buffer - stellaris_enet -> RCTL, DATA - xgmac -> DMA_CONTROL - xilinx_axienet -> rcw[1] - xilinx_ethlite -> R_RX_CTRL0 For Xen I think this is not possible at the moment because it doesn't implement rx notification. Paolo