The following reply was made to PR kern/176446; it has been noted by GNATS.
From: Jack Vogel <jfvo...@gmail.com> To: John Baldwin <j...@freebsd.org> Cc: FreeBSD Net <freebsd-net@freebsd.org>, bug-follo...@freebsd.org, Mike Karels <m...@karels.net> Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST Date: Fri, 19 Apr 2013 12:32:59 -0700 --14dae9cdc48767db0904dabbc98d Content-Type: text/plain; charset=ISO-8859-1 Thanks John, I'm incorporating your changes into my source tree. I also plan on changing the "glue" between mq_start and mq_start_locked on igb after some UDP testing that was done, and believe ixgbe should follow suit. Results there have shown the latency is just too high if I only use the task_enqueue... What works best is to always queue to the buf ring, but then also always to do the TRY_LOCK. I will update HEAD as soon as I handle an internal firedrill I have today :) Jack On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin <j...@freebsd.org> wrote: > A second patch. This is not something I mentioned before, but I had this > in > my checkout. In the legacy IRQ case this could also result in out-of-order > processing. It also fixes a potential OACTIVE-stuck type bug that we used > to > have in igb. I have no way to test this, so it would be good if some other > folks could test this. > > The patch changes ixgbe_txeof() return void and changes the few places that > checked its return value to ignore it. While it is true that ixgbe has a > tx > processing limit (which I think is dubious.. TX completion processing is > very > cheap unlike RX processing, so it seems to me like it should always run to > completion as in igb), in the common case I think the result will be to do > what igb used to do: poll the ring at 100% CPU (either in the interrupt > handler or in the task it keeps rescheduling) waiting for pending TX > packets > to be completed (which is pointless: the host CPU can't make the NIC > transmit > packets any faster by polling). > > It also changes the interrupt handlers to restart packet transmission > synchronously rather than always deferring that to a task (the former is > what > (nearly) all other drivers do). It also fixes the interrupt handlers to be > consistent (one looped on txeof but not the others). In the case of the > legacy interrupt handler it is possible it could fail to restart packet > transmission if there were no pending RX packets after rxeof returned and > txeof fully cleaned its ring without this change. > > It also fixes the legacy interrupt handler to not re-enable the interrupt > if > it schedules the task but to wait until the task completes (this could > result > in concurrent, out-of-order RX processing). > > Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c > =================================================================== > --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c (revision > 249553) > +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c (working > copy) > @@ -149,7 +149,7 @@ > static void ixgbe_enable_intr(struct adapter *); > static void ixgbe_disable_intr(struct adapter *); > static void ixgbe_update_stats_counters(struct adapter *); > -static bool ixgbe_txeof(struct tx_ring *); > +static void ixgbe_txeof(struct tx_ring *); > static bool ixgbe_rxeof(struct ix_queue *); > static void ixgbe_rx_checksum(u32, struct mbuf *, u32); > static void ixgbe_set_promisc(struct adapter *); > @@ -1431,7 +1414,10 @@ > } > > /* Reenable this interrupt */ > - ixgbe_enable_queue(adapter, que->msix); > + if (que->res != NULL) > + ixgbe_enable_queue(adapter, que->msix); > + else > + ixgbe_enable_intr(adapter); > return; > } > > @@ -1449,8 +1435,9 @@ > struct adapter *adapter = que->adapter; > struct ixgbe_hw *hw = &adapter->hw; > struct tx_ring *txr = adapter->tx_rings; > - bool more_tx, more_rx; > - u32 reg_eicr, loop = MAX_LOOP; > + struct ifnet *ifp = adapter->ifp; > + bool more; > + u32 reg_eicr; > > > reg_eicr = IXGBE_READ_REG(hw, IXGBE_EICR); > @@ -1461,17 +1448,19 @@ > return; > } > > - more_rx = ixgbe_rxeof(que); > + more = ixgbe_rxeof(que); > > IXGBE_TX_LOCK(txr); > - do { > - more_tx = ixgbe_txeof(txr); > - } while (loop-- && more_tx); > + ixgbe_txeof(txr); > +#if __FreeBSD_version >= 800000 > + if (!drbr_empty(ifp, txr->br)) > + ixgbe_mq_start_locked(ifp, txr, NULL); > +#else > + if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) > + ixgbe_start_locked(txr, ifp); > +#endif > IXGBE_TX_UNLOCK(txr); > > - if (more_rx || more_tx) > - taskqueue_enqueue(que->tq, &que->que_task); > - > /* Check for fan failure */ > if ((hw->phy.media_type == ixgbe_media_type_copper) && > (reg_eicr & IXGBE_EICR_GPI_SDP1)) { > @@ -1484,7 +1473,10 @@ > if (reg_eicr & IXGBE_EICR_LSC) > taskqueue_enqueue(adapter->tq, &adapter->link_task); > > - ixgbe_enable_intr(adapter); > + if (more) > + taskqueue_enqueue(que->tq, &que->que_task); > + else > + ixgbe_enable_intr(adapter); > return; > } > > @@ -1501,27 +1493,24 @@ > struct adapter *adapter = que->adapter; > struct tx_ring *txr = que->txr; > struct rx_ring *rxr = que->rxr; > - bool more_tx, more_rx; > + struct ifnet *ifp = adapter->ifp; > + bool more; > u32 newitr = 0; > > ixgbe_disable_queue(adapter, que->msix); > ++que->irqs; > > - more_rx = ixgbe_rxeof(que); > + more = ixgbe_rxeof(que); > > IXGBE_TX_LOCK(txr); > - more_tx = ixgbe_txeof(txr); > - /* > - ** Make certain that if the stack > - ** has anything queued the task gets > - ** scheduled to handle it. > - */ > + ixgbe_txeof(txr); > #ifdef IXGBE_LEGACY_TX > if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd)) > + ixgbe_start_locked(txr, ifp); > #else > - if (!drbr_empty(adapter->ifp, txr->br)) > + if (!drbr_empty(ifp, txr->br)) > + ixgbe_mq_start_locked(ifp, txr, NULL); > #endif > - more_tx = 1; > IXGBE_TX_UNLOCK(txr); > > /* Do AIM now? */ > @@ -1575,7 +1564,7 @@ > rxr->packets = 0; > > no_calc: > - if (more_tx || more_rx) > + if (more) > taskqueue_enqueue(que->tq, &que->que_task); > else /* Reenable this interrupt */ > ixgbe_enable_queue(adapter, que->msix); > @@ -3557,7 +3545,7 @@ > * tx_buffer is put back on the free queue. > * > **********************************************************************/ > -static bool > +static void > ixgbe_txeof(struct tx_ring *txr) > { > struct adapter *adapter = txr->adapter; > @@ -3605,13 +3593,13 @@ > IXGBE_CORE_UNLOCK(adapter); > IXGBE_TX_LOCK(txr); > } > - return FALSE; > + return; > } > #endif /* DEV_NETMAP */ > > if (txr->tx_avail == txr->num_desc) { > txr->queue_status = IXGBE_QUEUE_IDLE; > - return FALSE; > + return; > } > > /* Get work starting point */ > @@ -3705,12 +3693,8 @@ > if ((!processed) && ((ticks - txr->watchdog_time) > > IXGBE_WATCHDOG)) > txr->queue_status = IXGBE_QUEUE_HUNG; > > - if (txr->tx_avail == txr->num_desc) { > + if (txr->tx_avail == txr->num_desc) > txr->queue_status = IXGBE_QUEUE_IDLE; > - return (FALSE); > - } > - > - return TRUE; > } > > /********************************************************************* > > > -- > John Baldwin > --14dae9cdc48767db0904dabbc98d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div><div><div><div>Thanks John, I'm incorporating you= r changes into my source tree. I also plan on changing the<br></div>"g= lue" between mq_start and mq_start_locked on igb after some UDP testin= g that was done, and<br> </div>believe ixgbe should follow suit. Results there have shown the latenc= y is just too high if I only use<br>the task_enqueue... What works best is = to always queue to the buf ring, but then also always to<br></div>do the TR= Y_LOCK. I will update HEAD as soon as I handle an internal firedrill I have= today :)<br> <br></div>Jack<br><br></div><div class=3D"gmail_extra"><br><br><div class= =3D"gmail_quote">On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin <span dir=3D= "ltr"><<a href=3D"mailto:j...@freebsd.org" target=3D"_blank">jhb@freebsd.= org</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex">A second patch. =A0This is not something I m= entioned before, but I had this in<br> my checkout. =A0In the legacy IRQ case this could also result in out-of-ord= er<br> processing. =A0It also fixes a potential OACTIVE-stuck type bug that we use= d to<br> have in igb. =A0I have no way to test this, so it would be good if some oth= er<br> folks could test this.<br> <br> The patch changes ixgbe_txeof() return void and changes the few places that= <br> checked its return value to ignore it. =A0While it is true that ixgbe has a= tx<br> processing limit (which I think is dubious.. TX completion processing is ve= ry<br> cheap unlike RX processing, so it seems to me like it should always run to<= br> completion as in igb), in the common case I think the result will be to do<= br> what igb used to do: poll the ring at 100% CPU (either in the interrupt<br> handler or in the task it keeps rescheduling) waiting for pending TX packet= s<br> to be completed (which is pointless: the host CPU can't make the NIC tr= ansmit<br> packets any faster by polling).<br> <br> It also changes the interrupt handlers to restart packet transmission<br> synchronously rather than always deferring that to a task (the former is wh= at<br> (nearly) all other drivers do). =A0It also fixes the interrupt handlers to = be<br> consistent (one looped on txeof but not the others). =A0In the case of the<= br> legacy interrupt handler it is possible it could fail to restart packet<br> transmission if there were no pending RX packets after rxeof returned and<b= r> txeof fully cleaned its ring without this change.<br> <br> It also fixes the legacy interrupt handler to not re-enable the interrupt i= f<br> it schedules the task but to wait until the task completes (this could resu= lt<br> in concurrent, out-of-order RX processing).<br> <div class=3D"im"><br> Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c<br> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br> --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (revi= sion 249553)<br> +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (work= ing copy)<br> </div>@@ -149,7 +149,7 @@<br> =A0static void =A0 =A0 ixgbe_enable_intr(struct adapter *);<br> =A0static void =A0 =A0 ixgbe_disable_intr(struct adapter *);<br> =A0static void =A0 =A0 ixgbe_update_stats_counters(struct adapter *);<br> -static bool =A0 =A0ixgbe_txeof(struct tx_ring *);<br> +static void =A0 =A0ixgbe_txeof(struct tx_ring *);<br> =A0static bool =A0 =A0ixgbe_rxeof(struct ix_queue *);<br> =A0static void =A0 =A0ixgbe_rx_checksum(u32, struct mbuf *, u32);<br> =A0static void =A0 =A0 ixgbe_set_promisc(struct adapter *);<br> @@ -1431,7 +1414,10 @@<br> =A0 =A0 =A0 =A0 }<br> <br> =A0 =A0 =A0 =A0 /* Reenable this interrupt */<br> - =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);<br> + =A0 =A0 =A0 if (que->res !=3D NULL)<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);<br= > + =A0 =A0 =A0 else<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br> =A0 =A0 =A0 =A0 return;<br> =A0}<br> <br> @@ -1449,8 +1435,9 @@<br> =A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que->adapter;<br> =A0 =A0 =A0 =A0 struct ixgbe_hw *hw =3D &adapter->hw;<br> =A0 =A0 =A0 =A0 struct =A0 =A0 =A0 =A0 =A0tx_ring *txr =3D adapter->tx_r= ings;<br> - =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;<br> - =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr, loop =3D MAX_LOOP;<br> + =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter->ifp;<br> + =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;<br> + =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr;<br> <br> <br> =A0 =A0 =A0 =A0 reg_eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR);<br> @@ -1461,17 +1448,19 @@<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br> =A0 =A0 =A0 =A0 }<br> <br> - =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);<br> + =A0 =A0 =A0 more =3D ixgbe_rxeof(que);<br> <br> =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br> - =A0 =A0 =A0 do {<br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);<br> - =A0 =A0 =A0 } while (loop-- && more_tx);<br> + =A0 =A0 =A0 ixgbe_txeof(txr);<br> +#if __FreeBSD_version >=3D 800000<br> + =A0 =A0 =A0 if (!drbr_empty(ifp, txr->br))<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);<br> +#else<br> + =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);<br> +#endif<br> =A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);<br> <br> - =A0 =A0 =A0 if (more_rx || more_tx)<br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->qu= e_task);<br> -<br> =A0 =A0 =A0 =A0 /* Check for fan failure */<br> =A0 =A0 =A0 =A0 if ((hw->phy.media_type =3D=3D ixgbe_media_type_copper) = &&<br> =A0 =A0 =A0 =A0 =A0 =A0 (reg_eicr & IXGBE_EICR_GPI_SDP1)) {<br> @@ -1484,7 +1473,10 @@<br> =A0 =A0 =A0 =A0 if (reg_eicr & IXGBE_EICR_LSC)<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(adapter->tq, &adap= ter->link_task);<br> <br> - =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br> + =A0 =A0 =A0 if (more)<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->qu= e_task);<br> + =A0 =A0 =A0 else<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br> =A0 =A0 =A0 =A0 return;<br> =A0}<br> <br> @@ -1501,27 +1493,24 @@<br> =A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que->adapter;<br> =A0 =A0 =A0 =A0 struct tx_ring =A0*txr =3D que->txr;<br> =A0 =A0 =A0 =A0 struct rx_ring =A0*rxr =3D que->rxr;<br> - =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;<br> + =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter->ifp;<br> + =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;<br> =A0 =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 newitr =3D 0;<br> <br> =A0 =A0 =A0 =A0 ixgbe_disable_queue(adapter, que->msix);<br> =A0 =A0 =A0 =A0 ++que->irqs;<br> <br> - =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);<br> + =A0 =A0 =A0 more =3D ixgbe_rxeof(que);<br> <br> =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br> - =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);<br> - =A0 =A0 =A0 /*<br> - =A0 =A0 =A0 ** Make certain that if the stack<br> - =A0 =A0 =A0 ** has anything queued the task gets<br> - =A0 =A0 =A0 ** scheduled to handle it.<br> - =A0 =A0 =A0 */<br> + =A0 =A0 =A0 ixgbe_txeof(txr);<br> =A0#ifdef IXGBE_LEGACY_TX<br> =A0 =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd))<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);<br> =A0#else<br> - =A0 =A0 =A0 if (!drbr_empty(adapter->ifp, txr->br))<br> + =A0 =A0 =A0 if (!drbr_empty(ifp, txr->br))<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);<br> =A0#endif<br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D 1;<br> =A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);<br> <br> =A0 =A0 =A0 =A0 /* Do AIM now? */<br> @@ -1575,7 +1564,7 @@<br> =A0 =A0 =A0 =A0 =A0rxr->packets =3D 0;<br> <br> =A0no_calc:<br> - =A0 =A0 =A0 if (more_tx || more_rx)<br> + =A0 =A0 =A0 if (more)<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->= que_task);<br> =A0 =A0 =A0 =A0 else /* Reenable this interrupt */<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);<= br> @@ -3557,7 +3545,7 @@<br> =A0 * =A0tx_buffer is put back on the free queue.<br> =A0 *<br> =A0 **********************************************************************/= <br> -static bool<br> +static void<br> =A0ixgbe_txeof(struct tx_ring *txr)<br> =A0{<br> =A0 =A0 =A0 =A0 struct adapter =A0 =A0 =A0 =A0 =A0*adapter =3D txr->adap= ter;<br> @@ -3605,13 +3593,13 @@<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_CORE_UNLOCK(adapter);= <br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }<br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br> =A0 =A0 =A0 =A0 }<br> =A0#endif /* DEV_NETMAP */<br> <br> =A0 =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc) {<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_IDLE;<= br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;<br> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br> =A0 =A0 =A0 =A0 }<br> <br> =A0 =A0 =A0 =A0 /* Get work starting point */<br> @@ -3705,12 +3693,8 @@<br> =A0 =A0 =A0 =A0 if ((!processed) && ((ticks - txr->watchdog_time= ) > IXGBE_WATCHDOG))<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_HUNG;<= br> <br> - =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc) {<br> + =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc)<br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_IDLE;<= br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (FALSE);<br> - =A0 =A0 =A0 }<br> -<br> - =A0 =A0 =A0 return TRUE;<br> =A0}<br> <br> =A0/*********************************************************************<b= r> <span class=3D"HOEnZb"><font color=3D"#888888"><br> <br> --<br> John Baldwin<br> </font></span></blockquote></div><br></div> --14dae9cdc48767db0904dabbc98d-- _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"