On 3/13/2024 5:54 AM, Ivan Vecera wrote: > As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts > during poll exit") I'm seeing the similar issue also with i40e driver. > > In certain situation when busy-loop is enabled together with adaptive > coalescing, the driver occasionally miss that there are outstanding > descriptors to clean when exiting busy poll. > > Try to catch the remaining work by triggering a software interrupt > when exiting busy poll. No extra interrupts will be generated when > busy polling is not used. > > The issue was found when running sockperf ping-pong tcp test with > adaptive coalescing and busy poll enabled (50 as value busy_pool > and busy_read sysctl knobs) and results in huge latency spikes > with more than 100000us.
I like the results of this fix! Thanks for working on it. > > The fix is inspired from the ice driver and do the following: > 1) During napi poll exit in case of busy-poll (napo_complete_done() > returns false) this is recorded to q_vector that we were in busy > loop. > 2) In i40e_update_enable_itr() > - updates refreshed ITR intervals directly using PFINT_ITRN register > - if we are exiting ordinary poll then just enables the interrupt > using PFINT_DYN_CTLN > - if we are exiting busy poll then enables the interrupt and > additionally triggers an immediate software interrupt to catch any > pending clean-ups > 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to > 20K interrupts per second to limit the number of these sw interrupts. This is a good idea. > > @@ -2702,8 +2716,8 @@ static inline void i40e_update_enable_itr(struct > i40e_vsi *vsi, > */ > if (q_vector->rx.target_itr < q_vector->rx.current_itr) { > /* Rx ITR needs to be reduced, this is highest priority */ > - intval = i40e_buildreg_itr(I40E_RX_ITR, > - q_vector->rx.target_itr); > + wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx), > + q_vector->rx.target_itr >> 1); so here you write (this is a new write) > q_vector->rx.current_itr = q_vector->rx.target_itr; > q_vector->itr_countdown = ITR_COUNTDOWN_START; > } else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) || > @@ -2712,25 +2726,33 @@ static inline void i40e_update_enable_itr(struct > i40e_vsi *vsi, > /* Tx ITR needs to be reduced, this is second priority > * Tx ITR needs to be increased more than Rx, fourth priority > */ > - intval = i40e_buildreg_itr(I40E_TX_ITR, > - q_vector->tx.target_itr); > + wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, q_vector->reg_idx), > + q_vector->tx.target_itr >> 1); > q_vector->tx.current_itr = q_vector->tx.target_itr; > q_vector->itr_countdown = ITR_COUNTDOWN_START; > } else if (q_vector->rx.current_itr != q_vector->rx.target_itr) { > /* Rx ITR needs to be increased, third priority */ > - intval = i40e_buildreg_itr(I40E_RX_ITR, > - q_vector->rx.target_itr); > + wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx), > + q_vector->rx.target_itr >> 1); or here (new write) > q_vector->rx.current_itr = q_vector->rx.target_itr; > q_vector->itr_countdown = ITR_COUNTDOWN_START; > } else { > /* No ITR update, lowest priority */ > - intval = i40e_buildreg_itr(I40E_ITR_NONE, 0); > if (q_vector->itr_countdown) > q_vector->itr_countdown--; > } > > - if (!test_bit(__I40E_VSI_DOWN, vsi->state)) > - wr32(hw, INTREG(q_vector->reg_idx), intval); The above used to be the *only* write. > + /* Do not enable interrupt if VSI is down */ > + if (test_bit(__I40E_VSI_DOWN, vsi->state)) > + return; > + > + if (!q_vector->in_busy_poll) { > + intval = i40e_buildreg_itr(I40E_ITR_NONE, 0); > + } else { > + q_vector->in_busy_poll = false; > + intval = i40e_buildreg_swint(I40E_SW_ITR); > + } > + wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), intval); and then you write again here. So this function will now regularly have two writes in hot-path. Before it was very carefully crafted to reduce the number of writes to 1. This is made possible because the PFINT_DYN_CTLN register can do multiple tasks at once with a single write. Can you just modify intval to *both* trigger a software interrupt, and update the ITR simultaneously? I'm really not sure that's even possible. It may make more sense to only do the second write when exiting busy poll, what do you think? If there is no way to get the functionality without the two writes everytime I'd put up with it, but we can have a lot of interrupts per second, per queue (especially with adaptive disabled and rate set to a small number like 2us or 0) and doubling the number of writes will have a performance effect.