On Tue, Jul 9, 2024 at 10:41 AM Jason Wang <jasow...@redhat.com> wrote:

> On Mon, Jul 8, 2024 at 1:17 PM Yong Huang <yong.hu...@smartx.com> wrote:
> >
> >
> >
> > On Mon, Jul 8, 2024 at 11:21 AM Jason Wang <jasow...@redhat.com> wrote:
> >>
> >> On Sat, Jul 6, 2024 at 4:30 AM Hyman Huang <yong.hu...@smartx.com>
> wrote:
> >> >
> >> > Unexpected work by certain Windows guests equipped with the e1000
> >> > interface can cause the network to go down and never come back up
> >> > again unless the guest's interface is reset.
> >> >
> >> > To reproduce the failure:
> >> > 1. Set up two guests with a Windows 2016 or 2019 server operating
> >> >    system.
> >>
> >> I vaguely remember e1000 support for Windows has been deprecated for
> >> several years...
> >>
> >> That's why e1000e or igb is implemented in Qemu.
> >>
> >> > 2. Set up the e1000 interface for the guests.
> >> > 3. Pressurize the network slightly between two guests using the iPerf
> tool.
> >> >
> >> > The network goes down after a few days (2-5days), and the issue
> >> > is the result of not adhering to the e1000 specification. Refer
> >> > to the details of the specification at the following link:
> >> >
> https://www.intel.com/content/dam/doc/manual/pci-pci-x-family-gbe-controllers-software-dev-manual.pdf
> >> >
> >> > Chapter 3.2.6 describe the Receive Descriptor Tail register(RDT)
> >> > as following:
> >> > This register holds a value that is an offset from the base, and
> >> > identifies the location beyond the last descriptor hardware can
> >> > process. Note that tail should still point to an area in the
> >> > descriptor ring (somewhere between RDBA and RDBA + RDLEN).
> >> > This is because tail points to the location where software writes
> >> > the first new descriptor.
> >> >
> >> > This means that if the provider—in this case, QEMU—has not yet
> >> > loaded the packet,
> >>
> >> What do you mean by "load" here?
> >
> >
> > Sorry for failing to describe the details.
> >
> > The guest driver retrieves the packet from the receive ring buffer
> > after QEMU forwards it from the tun/tap interface in the e1000
> > emulation.
> >
> > I used "load" to express "putting packets into the receive ring buffer."
> >
> >>
> >>
> >> > RDT should never point to that place.
> >>
> >> And "that place"?
> >
> > If a descriptor in the receive ring buffer has not been filled with a
> > packet address by QEMU, the descriptor therefore doesn't have any
> > available packets. The location of the descriptor should not be referred
> > to by RDT because the location is in the range that "hardware" handles.
> >
> > "that place" means the location of the descriptor in the ring buffer
> > that QEMU hasn't set any available packets related to.
> >
> >>
> >>
> >> > When
> >> > implementing the emulation of the e1000 interface, QEMU evaluates
> >> > if the receive ring buffer is full once the RDT equals the RDH,
> >> > based on the assumption that guest drivers adhere to this
> >> > criterion strictly.
> >> >
> >> > We applied the following log patch to assist in analyzing the
> >> > issue and eventually obtained the unexpected information.
> >> >
> >> > Log patch:
> >> > -----------------------------------------------------------------
> >> > |--- a/hw/net/e1000.c
> >> > |+++ b/hw/net/e1000.c
> >> > |@@ -836,6 +836,9 @@ e1000_set_link_status(NetClientState *nc)
> >> > | static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
> >> > | {
> >> > |     int bufs;
> >> > |+    DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u,
> s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n",
> >> > |+           s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH],
> s->mac_reg[RDT]);
> >> > |+
> >> > |     /* Fast-path short packets */
> >> > |     if (total_size <= s->rxbuf_size) {
> >> > |         if (s->mac_reg[RDH] == s->mac_reg[RDT] && s->last_overrun)
> >> > |@@ -1022,6 +1025,9 @@ e1000_receive_iov(NetClientState *nc, const
> struct iovec *iov, int iovcnt)
> >> > |         s->rxbuf_min_shift)
> >> > |         n |= E1000_ICS_RXDMT0;
> >> > |
> >> > |+    DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u,
> s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n",
> >> > |+           s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH],
> s->mac_reg[RDT]);
> >> > |+
> >> > -----------------------------------------------------------------
> >> >
> >> > The last few logs of information when the network is down:
> >> >
> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384,
> s->mac_reg[RDH] = 897, s->mac_reg[RDT] = 885
> >> > <- the receive ring buffer is checked for fullness in the
> >> > e1000_has_rxbufs function, not full.
> >> >
> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384,
> s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885
> >> > <- RDT stays the same, RDH updates to 898, and 1 descriptor
> >> > utilized after putting the packet to ring buffer.
> >> >
> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384,
> s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885
> >> > <- the receive ring buffer is checked for fullness in the
> >> > e1000_has_rxbufs function, not full.
> >> >
> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384,
> s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885
> >> > <- RDT stays the same, RDH updates to 899, and 1 descriptor
> >> > utilized after putting the packet to ring buffer.
> >> >
> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384,
> s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885
> >> > <- the receive ring buffer is checked for fullness in the
> >> > e1000_has_rxbufs function, not full.
> >> >
> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384,
> s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 885
> >> > <- RDT stays the same, RDH updates to 900 , and 1 descriptor
> >> > utilized after putting the packet to ring buffer.
> >> >
> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384,
> s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 900
> >> > <- The ring is full, according to e1000_has_rxbufs, because
> >> > of the RDT update to 900 and equals RDH !
> >>
> >> Just to make sure I understand this, RDT==RDH means the ring is empty I
> think?
> >>
> >>
> >> See commit:
> >>
> >> commit e5b8b0d4ba29fe1268ba049519a1b0cf8552a21a
> >> Author: Dmitry Fleytman <dmi...@daynix.com>
> >> Date:   Fri Oct 19 07:56:55 2012 +0200
> >>
> >>     e1000: drop check_rxov, always treat RX ring with RDH == RDT as
> empty
> >>
> >>     Real HW always treats RX ring with RDH == RDT as empty.
> >>     Emulation is supposed to behave the same.
> >
> >
> > Indeed, I'm confused :(,  the description in the comment claims that RX
> > rings with RDH == RDT as empty, but in implementation, it treats that as
> > overrun.
> >
> > See the following 2 contexts:
> >
> > 1. e1000_can_receive:
> > static bool e1000_can_receive(NetClientState *nc)
> > {
> >     E1000State *s = qemu_get_nic_opaque(nc);
> >     // e1000_has_rxbufs return true means ring buffer has
> >     // available descriptors to use for QEMU.
> >     // false means ring buffer overrun and QEMU should queue the packet
> >     // and wait for the RDT update and available descriptors can be used.
> >
> >     return e1000x_rx_ready(&s->parent_obj, s->mac_reg) &&
> >         e1000_has_rxbufs(s, 1) && !timer_pending(s->flush_queue_timer);
> > }
>
> Well we had in e1000_has_rx_bufs
>
>     if (total_size <= s->rxbuf_size) {
>         return s->mac_reg[RDH] != s->mac_reg[RDT];
>     }
>
> RDT!=RDH means RX ring has available descriptors for hardware?
>

IMHO, Yes.


> Adding more people.
>
> Thanks
>
>

-- 
Best regards

Reply via email to