Dear Brian,
Thank you for your reply.
Am 07.07.25 um 16:43 schrieb Brian Vazquez:
O Mon, Jun 30, 2025 at 06:22:11PM +0200, Paul Menzel wrote:
Am 30.06.25 um 18:08 schrieb Hay, Joshua A:
Am 25.06.25 um 18:11 schrieb Joshua Hay:
This series fixes a stability issue in the flow scheduling Tx send/clean
path that results in a Tx timeout.
The existing guardrails in the Tx path were not sufficient to prevent
the driver from reusing completion tags that were still in flight (held
by the HW). This collision would cause the driver to erroneously clean
the wrong packet thus leaving the descriptor ring in a bad state.
The main point of this refactor is replace the flow scheduling buffer
… to replace …?
Thanks, will fix in v2
ring with a large pool/array of buffers. The completion tag then simply
is the index into this array. The driver tracks the free tags and pulls
the next free one from a refillq. The cleaning routines simply use the
completion tag from the completion descriptor to index into the array to
quickly find the buffers to clean.
All of the code to support the refactor is added first to ensure traffic
still passes with each patch. The final patch then removes all of the
obsolete stashing code.
Do you have reproducers for the issue?
This issue cannot be reproduced without the customer specific device
configuration, but it can impact any traffic once in place.
Interesting. Then it’d be great if you could describe that setup in more
detail.
The hardware can process packets and return completions out of order;
this depends on HW configuration that is difficult to replicate.
To match completions with packets, each packet with pending completions
must be associated to a unique ID. The previous code would occasionally
reassigned the same ID to multiple pending packets, resulting in
resource leaks and eventually panics.
Thank you for describing the problem again. Too bad it’s not easily
reproducible.
The new code uses a much simpler data structure to assign IDs that
is immune to duplicate assignment, and also much more efficient at
runtime.
Maybe that could be added to the commit message too. How can the
efficiency claim be verified?
Joshua Hay (5):
idpf: add support for Tx refillqs in flow scheduling mode
idpf: improve when to set RE bit logic
idpf: replace flow scheduling buffer ring with buffer pool
idpf: stop Tx if there are insufficient buffer resources
idpf: remove obsolete stashing code
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 6 +-
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 626 ++++++------------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 76 +--
3 files changed, 239 insertions(+), 469 deletions(-)
Kind regards,
Paul