On Wed, 6 Dec 2017, David Miller wrote:

From: Govindarajulu Varadarajan <gvara...@cisco.com>
Date: Tue,  5 Dec 2017 11:14:41 -0800

In case of tx clean up, we set '-1' as budget. This means clean up until
wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
will run for long time and cause
"watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.

This patch sets wq clean up budget to 256.

Signed-off-by: Govindarajulu Varadarajan <gvara...@cisco.com>

This driver with all of it's indirection and layers upon layers of
macros for queue processing is so difficult to read, and this can't
be generating nice optimal code either...

Anyways, I was walking over the driver to see if the logic is
contributing to this.

The limit you are proposing sounds unnecessary, nobody else I can
see needs this, and that includes all of the most heavily used
drivers under load.

I used 256 as the limit because most of the other drivers use it.

* mlx4 uses MLX4_EN_DEFAULT_TX_WORK as the tx budget in mlx4_en_process_tx_cq()
  Added in commit fbc6daf19745 ("net/mlx4_en: Ignore budget on TX napi polling")

* i40e&vf uses vsi->work_limit as tx budget in i40e_clean_tx_irq(), which is
  set to I40E_DEFAULT_IRQ_WORK. Added in commit
  a619afe814453 ("i40e/i40evf: Add support for bulk free in Tx cleanup")

* ixgbe uses q_vector->tx.work_limit as tx budget in ixgbe_clean_tx_irq(),
  which is set to IXGBE_DEFAULT_TX_WORK. Added in commit
  592245559e900 ("ixgbe: Change default Tx work limit size to 256 buffers")


If I had to guess I'd say that the problem is that the queue loop
keeps sampling the head and tail pointers, where as it should just
do that _once_ and only process that TX entries found in that
snapshot and return to the poll() routine immedately afterwards.

The only way to know the tail pointer at the time napi is scheduled is to read
hw fetch_index register. This is discouraged by hw engineers.

We work around this by using color bit. Every cq entry has color bit. It is
either 0 or 1. Hw flips the bit when it creates a new cq entry. So every new
cq entry will have a different color bit than previous. We reach end of the
queue when previous color bit is same as current cq entry's color. i.e hw did
not flip the bit, so its not a new cq entry.

So enic driver cannot know the tail pointer at the time napi is scheduled, until
we reach the tail pointer.

Reply via email to