On Tue, Jun 11, 2024 at 11:13 AM Josh Hay <joshua.a....@intel.com> wrote: > > > > On 6/11/2024 3:44 AM, Alexander Lobakin wrote: > > From: David Decotigny <ddeco...@gmail.com> > > Date: Tue, 4 Jun 2024 16:34:48 -0700 > > > >> > >> > >> On 6/3/2024 11:47 AM, Joshua Hay wrote: > >>> > >>> There are several reasons for a TX completion to take longer than usual > >>> to be written back by HW. For example, the completion for a packet that > >>> misses a rule will have increased latency. The side effect of these > >>> variable latencies for any given packet is out of order completions. The > >>> stack sends packet X and Y. If packet X takes longer because of the rule > >>> miss in the example above, but packet Y hits, it can go on the wire > >>> immediately. Which also means it can be completed first. The driver > >>> will then receive a completion for packet Y before packet X. The driver > >>> will stash the buffers for packet X in a hash table to allow the tx send > >>> queue descriptors for both packet X and Y to be reused. The driver will > >>> receive the completion for packet X sometime later and have to search > >>> the hash table for the associated packet. > >>> > >>> The driver cleans packets directly on the ring first, i.e. not out of > >>> order completions since they are to some extent considered "slow(er) > >>> path". However, certain workloads can increase the frequency of out of > >>> order completions thus introducing even more latency into the cleaning > >>> path. Bump up the timeout value to account for these workloads. > >>> > >>> Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") > >>> Reviewed-by: Sridhar Samudrala <sridhar.samudr...@intel.com> > >>> Signed-off-by: Joshua Hay <joshua.a....@intel.com> > >>> --- > >>> drivers/net/ethernet/intel/idpf/idpf_lib.c | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> > >> We tested this patch with our intensive high-performance workloads that > >> have been able to reproduce the issue, and it was sufficient to avoid tx > >> timeouts. We also noticed that these longer timeouts are not unusual in > >> the smartnic space: we see 100s or 50s timeouts for a few NICs, and > > > > Example?
a sample: drivers/net/ethernet/cavium/thunder/nic.h:#define NICVF_TX_TIMEOUT (50 * HZ) drivers/net/ethernet/cavium/thunder/nicvf_main.c: netdev->watchdog_timeo = NICVF_TX_TIMEOUT; drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h:#define OTX2_TX_TIMEOUT (100 * HZ) drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c: netdev->watchdog_timeo = OTX2_TX_TIMEOUT; drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c: netdev->watchdog_timeo = OTX2_TX_TIMEOUT; drivers/net/ethernet/amazon/ena/ena_netdev.c: netdev->watchdog_timeo = msecs_to_jiffies(hints->netdev_wd_timeout); > > > >> other NICs receive this timeout as a hint from the fw. > >> > >> Reviewed-by: David Decotigny <ddeco...@google.com> > > > > We either need to teach watchdog core to account OOOs or there's > > something really wrong in the driver. For example, how can we then > > distinguish if > 5 sec delay happened just due to an OOO or due to that > > something went bad with the HW? > > > > Note that there are several patches fixing Tx (incl. timeouts) in my > > tree, including yours (Joshua's) which you somehow didn't send yet ._. > > Maybe start from them first? > > I believe it was you who specifically asked our team to defer pushing > any upstream patches while you were working on the XDP series "to avoid > having to rebase", which was a reasonable request at the time. We also > had no reason to believe the existing upstream idpf implementation was > experiencing timeouts (it is being tested by numerous validation teams). > So there was no urgency to get those patches upstream. Which patches in > your tree do you believe fix specific timeout situations? It appears you > pulled in some of the changes from the out-of-tree driver, but those > were all enhancements. It wasn't until the workload that David mentioned > was run on the current driver that we had any indication there were > timeout issues. Some issues were observed with high cpu/memory/network utilization workloads such as a SAP HANA benchmark. > > > > > I don't buy 30 seconds, at least for now. Maybe I'm missing something. > > > > Nacked-by: Alexander Lobakin <aleksander.loba...@intel.com> > > > In the process of debugging the newly discovered timeout, our > architecture team made it clear that 5 seconds is too low for this type > of device, with a non deterministic pipeline where packets can take a > number of exception/slow paths. Admittedly, we don't know the exact > number, so the solution for the time being was to bump it up with a > comfortable buffer. As we tune things and debug with various workloads, > we can bring it back down. As David mentioned, there is precedent for an > extended timeout for smartnics. Why is it suddenly unacceptable for > Intel's device? > > > > > Thanks, > > Olek > > Thanks, > Josh -- David Decotigny -- David Decotigny