mlx5_tx_complete() polls completion queue multiple times until it encounters an invalid entry. As Tx completions are suppressed by MLX5_TX_COMP_THRESH, it is waste of cycles to expect multiple completions in a poll. And freeing too many buffers in a call can cause high jitter. This patch improves throughput a little.
What if the device generates burst of completions? Holding these completions un-reaped can theoretically cause resource stress on the corresponding mempool(s). I totally get the need for a stopping condition, but is "loop once" the best stop condition? Perhaps an adaptive budget (based on online stats) perform better?