From: David Decotigny <ddeco...@google.com> Date: Wed, 12 Jun 2024 11:01:46 -0700
> On Tue, Jun 11, 2024 at 11:13 AM Josh Hay <joshua.a....@intel.com> wrote: >> >> >> >> On 6/11/2024 3:44 AM, Alexander Lobakin wrote: >>> From: David Decotigny <ddeco...@gmail.com> >>> Date: Tue, 4 Jun 2024 16:34:48 -0700 >>> >>>> >>>> >>>> On 6/3/2024 11:47 AM, Joshua Hay wrote: >>>>> >>>>> There are several reasons for a TX completion to take longer than usual >>>>> to be written back by HW. For example, the completion for a packet that >>>>> misses a rule will have increased latency. The side effect of these >>>>> variable latencies for any given packet is out of order completions. The >>>>> stack sends packet X and Y. If packet X takes longer because of the rule >>>>> miss in the example above, but packet Y hits, it can go on the wire >>>>> immediately. Which also means it can be completed first. The driver >>>>> will then receive a completion for packet Y before packet X. The driver >>>>> will stash the buffers for packet X in a hash table to allow the tx send >>>>> queue descriptors for both packet X and Y to be reused. The driver will >>>>> receive the completion for packet X sometime later and have to search >>>>> the hash table for the associated packet. >>>>> >>>>> The driver cleans packets directly on the ring first, i.e. not out of >>>>> order completions since they are to some extent considered "slow(er) >>>>> path". However, certain workloads can increase the frequency of out of >>>>> order completions thus introducing even more latency into the cleaning >>>>> path. Bump up the timeout value to account for these workloads. >>>>> >>>>> Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") >>>>> Reviewed-by: Sridhar Samudrala <sridhar.samudr...@intel.com> >>>>> Signed-off-by: Joshua Hay <joshua.a....@intel.com> >>>>> --- >>>>> drivers/net/ethernet/intel/idpf/idpf_lib.c | 4 ++-- >>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> >>>> We tested this patch with our intensive high-performance workloads that >>>> have been able to reproduce the issue, and it was sufficient to avoid tx >>>> timeouts. We also noticed that these longer timeouts are not unusual in >>>> the smartnic space: we see 100s or 50s timeouts for a few NICs, and >>> >>> Example? > > a sample: > > drivers/net/ethernet/cavium/thunder/nic.h:#define > NICVF_TX_TIMEOUT (50 * HZ) > drivers/net/ethernet/cavium/thunder/nicvf_main.c: > netdev->watchdog_timeo = NICVF_TX_TIMEOUT; > drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h:#define > OTX2_TX_TIMEOUT (100 * HZ) > drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c: > netdev->watchdog_timeo = OTX2_TX_TIMEOUT; > drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c: > netdev->watchdog_timeo = OTX2_TX_TIMEOUT; > drivers/net/ethernet/amazon/ena/ena_netdev.c: > netdev->watchdog_timeo = msecs_to_jiffies(hints->netdev_wd_timeout); This one doesn't say anything at all :D mlx5 has 15 seconds, but mlx4 has the same TO as well, so this might be some legacy stuff. Netronome has 5, QLogic has 5, Broadcom 5 etc etc. These 50-100 belong to one vendor (Cavium is Marvell) and look like a hack to hide HW issues. Re "some issues were observed" -- this patch only hides the symptoms, at least from what I'm seeing currently. Still no details, so that I could understand the reasons for it. Thanks, Olek