Public bug reported:

== Comment: #0 - Mauro Sergio Martins Rodrigues - 2017-02-22 06:48:42 ==
While investigating bug #145959 I got blocked in the reproduction process due 
to the follow issue during interface link bring up:

[    1.590591] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC 
failed = 14
[    1.590661] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
[    1.590669] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 
0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL

which prevented me to bring the interface up and associate an ip to it.

== Comment: #2 - Mauro Sergio Martins Rodrigues - 2017-02-22 07:26:36 ==
some missing Information kernel is Ubuntu's 4.4.0-62-generic.

When testing with 4.8.0-36-generic (from xenial's proposed) device probe
works fine, no similar message is seen.

To obtain some more data on this I added some statements to see which TC
MAP was applied in a healthy probe (note that the other functions, like
function 1 works fine but those functions have no cable on them).

root@yangtze-lp1:~/_maurosr/linux-4.4.0/drivers/net/ethernet/intel/i40e# dmesg 
[52448.914605] i40e 0045:01:00.3: i40e_ptp_stop: removed PHC on enP69p1s0f3
[52448.981801] i40e 0045:01:00.2: i40e_ptp_stop: removed PHC on enP69p1s0f2
[52449.069793] i40e 0045:01:00.1: i40e_ptp_stop: removed PHC on enP69p1s0f1
[52449.173834] i40e 0045:01:00.0: i40e_ptp_stop: removed PHC on enP69p1s0f0
[52449.264462] i40e: Intel(R) Ethernet Connection XL710 Network Driver - 
version 1.4.25-k
[52449.264468] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[52449.264625] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[52449.286138] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[52449.505657] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[52449.508977] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
[52449.531210] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC 
failed = 14
[52449.531213] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
[52449.531217] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 
0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
[52449.544642] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
[52449.697424] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
[52449.727043] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 0 RX: 
1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
[52449.727098] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
[52449.748667] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[52449.976665] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
[52449.980685] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
[52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1
[52450.015610] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
[52450.074479] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
[52450.080516] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 
RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA

Comparing function 0:
[52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
and function 1:
[52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1


Then looking at 4.8:
[  123.425399] i40e: loading out-of-tree module taints kernel.
[  123.428958] i40e: module verification failed: signature and/or required key 
missing - tainting kernel
[  123.430690] i40e: Intel(R) Ethernet Connection XL710 Network Driver - 
version 1.6.11-k
[  123.430691] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[  123.430918] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[  123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[  123.664088] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[  123.667878] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[  123.681915] Non-contiguous TC - Disabling DCB
[  123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1
[  123.713262] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
[  123.864601] i40e 0045:01:00.0: Added LAN device PF0 bus=0x00 func=0x00
[  123.864611] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
[  123.893254] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 
RSS FD_ATR DCB VxLAN Geneve PTP VEPA
[  123.893321] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
[  123.914829] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[  124.152980] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
[  124.156999] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
[  124.171266] i40e 0045:01:00.1: DEBUG DATA vsi > 398, enabled_tc 1
[  124.196080] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
[  124.253353] i40e 0045:01:00.1: Added LAN device PF1 bus=0x00 func=0x01
[  124.253387] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
[  124.263908] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 
RSS FD_ATR DCB VxLAN Geneve PTP VEPA


These 2 lines are important here:
[  123.681915] Non-contiguous TC - Disabling DCB
[  123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1

First it decided to disable DCB feature due to lack of contiguous
traffic classes, and then it used TC MAP (enabled_tc in device driver
code as 1, same we already knew works). With that information in hand I
forced enabled_tc (TC MAP) to 1 in 4.4's code and it worked, so I'm
suspecting of a bad TC mask due to DCB being enabled.

== Comment: #3 - Mauro Sergio Martins Rodrigues - 2017-02-23 11:24:41 ==
I tried the 4.4's version of the i40e but with dcbx disabled in switch's port, 
Traffic class setup and function bring up worked fine! It user TC MAP (or 
traffic class mask) as 1. I do understand that this is just a workaround 
though, the device driver should deal with the case where the switch has such 
feature enabled instead of leaving the device 'broken':

[  199.762738] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[  199.786589] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[  200.045270] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[  200.048955] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[  200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
[  200.069232] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 1
[  200.088056] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
[  200.240641] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
[  200.270717] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 
RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA

The line
[  200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
corresponds to the piece of code where the traffic class is defined (see: 
http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/i40e/i40e_main.c?v=4.4#L4563)

Another interesting discovery is that the device behaves well when we
turn dcbx on in the switch after it's already probed:

[  609.566786] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
[  609.566794] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
[  611.574987] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
[  611.574990] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
[  611.574994] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 31

and such transition set traffic class mask as 31 instead of 255. and if
we unload/load the module it goes to the original bad state we
experienced in this bug again:

[  746.151068] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[  746.174695] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[  746.433649] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[  746.437552] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[  746.457815] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
[  746.457819] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
[  746.459537] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC 
failed = 14
[  746.459541] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
[  746.459550] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 
0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL

== Comment: #4 - Mauro Sergio Martins Rodrigues - 2017-02-23 14:25:30 ==
Things are going smoothly in kernel 4.8 even if dcbx is enabled in the port due 
to this commit 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c
 which disabledcbx when TC are not contiguous (it's not supported by the 
device) 

We should ask for a backport into 4.4.0 but I'm still investigating to
see if something else should be included since in comment #3 we can see
it transitioning into a valid state when dcbx is enabled in the switch.

== Comment: #5 - Mauro Sergio Martins Rodrigues - 2017-03-13 13:41:19 ==
Even though it was already clear that was related to kernel code, since it 
works on 4.8 and doesn't in 4.4 I decided to perform a nvm update and it didn't 
change the scenario. 

comment #2 show nvm version as:
> [  123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
> 0.0.0

Current version is:
firmware-version: 5.05 0x8000289d 1.1568.0

and the issue continues reproducible .

As stated in comment #4, now I can confirm we need to backport
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c
to 4.4 to avoid getting into the broken state when probing Intel x710
(driver i40e).

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bugnameltc-151930 severity-high 
targetmilestone-inin---

** Tags added: architecture-ppc64le bugnameltc-151930 severity-high
targetmilestone-inin---

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1672550

Title:
  i40e Intel X710 error during device probe prevents link set up and ip
  association

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672550/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to