I have installed and booted to this kernel, and ensured no
new regression introduced, although I cannot repro the issue.


** Tags removed: 4.15.0-24-generic cosmic kernel verification-needed-bionic 
verification-needed-cosmic
** Tags added: verification-done-bionic verification-done-cosmic

** Description changed:

  [Impact]
  The i40e driver can get stalled on tx timeouts. This can happen when
  DCB is enabled on the connected switch. This can also trigger a
  second situation when a tx timeout occurs before the recovery of
  a previous timeout has completed due to CPU load, which is not
  handled correctly. This leads to networking delays, drops and
  application timeouts and hangs. Note that the first tx timeout
  cause is just one of the ways to end up in the second situation.
  
  This issue was seen on a heavily loaded Kafka broker node running
- the 4.15.0-38-generic kernel on Xenial. 
+ the 4.15.0-38-generic kernel on Xenial.
  
  Symptoms include messages in the kernel log of the form:
  
  ---
  [4733544.982116] i40e 0000:18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
  [4733544.982119] i40e 0000:18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
  ----
  
  With the test kernel provided in this LP bug which had these
  two commits compiled in, the problem has not been seen again,
  and has been running successfully for several months:
  
- "i40e: Fix for Tx timeouts when interface is brought up if 
-  DCB is enabled"
+ "i40e: Fix for Tx timeouts when interface is brought up if
+  DCB is enabled"
  Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee
  
  "i40e: prevent overlapping tx_timeout recover"
  Commit: d5585b7b6846a6d0f9517afe57be3843150719da
  
  * The first commit is already in Disco, Cosmic
  * The second commit is already in Disco
  * Bionic needs both patches and Cosmic needs the second
  
  [Test Case]
  * We are considering the case of both issues above occurring.
  * Seen by reporter on a Kafka broker node with heavy traffic.
- * Not easy to reproduce as it requires something like the 
-   following example environment and heavy load:
+ * Not easy to reproduce as it requires something like the
+   following example environment and heavy load:
  
-   Kernel: 4.15.0-38-generic
-   Network driver: i40e
-         version: 2.1.14-k
-         firmware-version: 6.00 0x800034e6 18.3.6
-   NIC: Intel 40Gb XL710 
-   DCB enabled
- 
+   Kernel: 4.15.0-38-generic
+   Network driver: i40e
+         version: 2.1.14-k
+         firmware-version: 6.00 0x800034e6 18.3.6
+   NIC: Intel 40Gb XL710
+   DCB enabled
  
  [Regression Potential]
  Low, as the first only impacts i40e DCB environment, and has
- been running for several months in production-load testing 
+ been running for several months in production-load testing
  successfully.
- 
  
  --- Original Description
  Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to 
the Kernel 4.15.0-24-generic.
  
  On a "Dell PowerEdge R330" server with a network adapter "Intel Ethernet
  Converged Network Adapter X710-DA2" (driver i40e) the network card no
  longer works and permanently displays these three lines :
  
  [   98.012098] i40e 0000:01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e 0000:01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e 0000:01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to