Great! Thanks, Alex. I'll prepare a new changeset that drops check_rxov completely. Also migration-related patch becomes unneeded with this solution.
On Thu, Oct 18, 2012 at 6:06 PM, Alexander Duyck <alexander.h.du...@intel.com> wrote: > On 10/18/2012 07:31 AM, Stefan Hajnoczi wrote: >> On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <dmi...@daynix.com> wrote: >>> The real purpose of check_rxov it a bit confusing indeed, mainly >>> because of unclear name (rename?), >>> however it works as following: >>> >>> There are 2 possible when RDT == RDH for RX ring: >>> 1. Device used all the buffers from ring, no empty buffers available >>> 2. Driver fully refilled the ring and all buffers are empty and ready >>> to use > > The 2nd case is not true. We should only have RDT == RDH when the ring > is empty. If RDT == RDH and the ring is full then we have a bug in the > driver. The driver should only ever allow RDT to be one less than head, > or ring size - 1 if head is 0. > >>> check_rxov is used to distinguish these 2 cases: >>> 1. It must be 1 initially (init, reset, etc.) >>> 2. It must be set to one when device uses buffer >>> 3. It must be set to 0 when driver adds buffer to the ring >>> check_rxov == 1 - ring is empty >>> check_rxov == 0 - ring is full >>> >>> Indeed, RX init sequence doesn't look logical, however this is the way >>> all Intel driver behave from e1000 and up to ixgbe. >>> Also see some explanation here: >>> http://permalink.gmane.org/gmane.linux.kernel/1375917 >>> >>> If we drop check_rxov and always treat RDH == RDT as empty ring we'll >>> probably get correct behavior for current Linux driver's code (needs >>> testing of course), >>> however we have no idea how Windows drivers work. > > The windows driver should work the same way. If RDH == RDT the hardware > will treat that as a empty ring and will hang. If there is a driver > that is setting RDH == RDT to indicate the ring is full please let us > know as that is likely a buggy driver. > >> Thanks, for the great explanation, Dmitry. >> >> Alexander: I CCed you because I hope you might be able to explain what >> the 82540EM card does when a driver sets RDT to the value of RDH. The >> QEMU NIC emulation code treats this as a full ring (i.e. the >> descriptors are valid and will be filled in by the hardware). Does >> the real hardware act like this or will it treat this condition as >> ring empty (i.e. if the driver sets RDT to the value of RDH then the >> hardware stops receive because there are no descriptors available)? >> >> I can't find a statement in the Intel datasheet about what happens >> when the driver sets RDT = RDH. The QEMU check_rxov variable is >> trying to distinguish between ring empty (RDH has moved to RDT) and >> ring full (driver has set RDH = RDT because the full descriptor ring >> is available). > > If RDT == RDH then we should stop receiving traffic. As far as I know > all of our e1000 hardware treat RDT == RDH as an empty ring state. All > of the drivers should have code in place to stop it. For example the > E1000_DESC_UNUSED macro should be returning ring size - 1 in the case of > RDT == RDH which will result in the head being 0 and the tail being ring > size - 2. > >> Dmitry: At this point we'd need to test what happens on real hardware >> when RDH = RDT in order to be able to remove check_rxov. As you >> mentioned, with the Linux e1000 driver we don't see ring full RDH = >> RDT: >> >> /* call E1000_DESC_UNUSED which always leaves >> * at least 1 descriptor unused to make sure >> * next_to_use != next_to_clean */ >> for (i = 0; i < adapter->num_rx_queues; i++) { >> struct e1000_rx_ring *ring = &adapter->rx_ring[i]; >> adapter->alloc_rx_buf(adapter, ring, >> E1000_DESC_UNUSED(ring)); >> } >> >> Here some sample output from a QEMU printf, notice how RDH is never >> the same as RDT once rx begins: >> >> set_rdt rdh=0 rdt_old=0 rdt_new=0 >> set_rdt rdh=0 rdt_old=0 rdt_new=254 >> set_rdt rdh=1 rdt_old=254 rdt_new=255 >> set_rdt rdh=2 rdt_old=255 rdt_new=0 >> set_rdt rdh=3 rdt_old=0 rdt_new=1 >> set_rdt rdh=4 rdt_old=1 rdt_new=2 >> set_rdt rdh=5 rdt_old=2 rdt_new=3 >> set_rdt rdh=6 rdt_old=3 rdt_new=4 >> set_rdt rdh=7 rdt_old=4 rdt_new=5 >> set_rdt rdh=9 rdt_old=5 rdt_new=7 >> set_rdt rdh=10 rdt_old=7 rdt_new=8 >> set_rdt rdh=11 rdt_old=8 rdt_new=9 >> set_rdt rdh=12 rdt_old=9 rdt_new=10 >> set_rdt rdh=13 rdt_old=10 rdt_new=11 >> set_rdt rdh=14 rdt_old=11 rdt_new=12 >> >> The iPXE 'intel' driver (supports e1000 cards) also does not set RDH = >> RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a >> time. >> >> The reason I'm digging into the need for check_rxov is because it's a >> dangerous piece of code to have. If check_rxov logic is ever out of >> sync we risk memory corruption. I'd really like to drop it >> completely. >> >> Stefan > > There should be no need for check_rxov. As far as I know none of our > drivers will ever set RDT == RDH if there are descriptors available on > the ring. > > Thanks, > > Alex -- Dmitry Fleytman Technology Expert and Consultant, Daynix Computing Ltd. Cell: +972-54-2819481 Skype: dmitry.fleytman