On 6/3/2024 11:47 AM, Joshua Hay wrote:

There are several reasons for a TX completion to take longer than usual
to be written back by HW. For example, the completion for a packet that
misses a rule will have increased latency. The side effect of these
variable latencies for any given packet is out of order completions. The
stack sends packet X and Y. If packet X takes longer because of the rule
miss in the example above, but packet Y hits, it can go on the wire
immediately. Which also means it can be completed first.  The driver
will then receive a completion for packet Y before packet X.  The driver
will stash the buffers for packet X in a hash table to allow the tx send
queue descriptors for both packet X and Y to be reused. The driver will
receive the completion for packet X sometime later and have to search
the hash table for the associated packet.

The driver cleans packets directly on the ring first, i.e. not out of
order completions since they are to some extent considered "slow(er)
path". However, certain workloads can increase the frequency of out of
order completions thus introducing even more latency into the cleaning
path. Bump up the timeout value to account for these workloads.

Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Reviewed-by: Sridhar Samudrala <sridhar.samudr...@intel.com>
Signed-off-by: Joshua Hay <joshua.a....@intel.com>
---
  drivers/net/ethernet/intel/idpf/idpf_lib.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


We tested this patch with our intensive high-performance workloads that
have been able to reproduce the issue, and it was sufficient to avoid tx
timeouts. We also noticed that these longer timeouts are not unusual in
the smartnic space: we see 100s or 50s timeouts for a few NICs, and
other NICs receive this timeout as a hint from the fw.

Reviewed-by: David Decotigny <ddeco...@google.com>

Reply via email to