Leann,

This looks like a kernel patch for your team to evaluate.

Thanks.

                     Michael


On 03/13/2017 02:49 PM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> == Comment: #0 - Mauro Sergio Martins Rodrigues - 2017-02-22 06:48:42 ==
> While investigating bug #145959 I got blocked in the reproduction process due 
> to the follow issue during interface link bring up:
>
> [    1.590591] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC 
> failed = 14
> [    1.590661] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
> [    1.590669] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 
> 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
>
> which prevented me to bring the interface up and associate an ip to it.
>
> == Comment: #2 - Mauro Sergio Martins Rodrigues - 2017-02-22 07:26:36 ==
> some missing Information kernel is Ubuntu's 4.4.0-62-generic.
>
> When testing with 4.8.0-36-generic (from xenial's proposed) device probe
> works fine, no similar message is seen.
>
> To obtain some more data on this I added some statements to see which TC
> MAP was applied in a healthy probe (note that the other functions, like
> function 1 works fine but those functions have no cable on them).
>
> root@yangtze-lp1:~/_maurosr/linux-4.4.0/drivers/net/ethernet/intel/i40e# 
> dmesg 
> [52448.914605] i40e 0045:01:00.3: i40e_ptp_stop: removed PHC on enP69p1s0f3
> [52448.981801] i40e 0045:01:00.2: i40e_ptp_stop: removed PHC on enP69p1s0f2
> [52449.069793] i40e 0045:01:00.1: i40e_ptp_stop: removed PHC on enP69p1s0f1
> [52449.173834] i40e 0045:01:00.0: i40e_ptp_stop: removed PHC on enP69p1s0f0
> [52449.264462] i40e: Intel(R) Ethernet Connection XL710 Network Driver - 
> version 1.4.25-k
> [52449.264468] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
> [52449.264625] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
> [52449.286138] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
> 0.0.0
> [52449.505657] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
> [52449.508977] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
> [52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
> [52449.531210] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC 
> failed = 14
> [52449.531213] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
> [52449.531217] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 
> 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
> [52449.544642] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
> [52449.697424] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
> [52449.727043] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 0 
> RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
> [52449.727098] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
> [52449.748667] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
> 0.0.0
> [52449.976665] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
> [52449.980685] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
> [52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1
> [52450.015610] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
> [52450.074479] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
> [52450.080516] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 
> RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
>
> Comparing function 0:
> [52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
> and function 1:
> [52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1
>
>
> Then looking at 4.8:
> [  123.425399] i40e: loading out-of-tree module taints kernel.
> [  123.428958] i40e: module verification failed: signature and/or required 
> key missing - tainting kernel
> [  123.430690] i40e: Intel(R) Ethernet Connection XL710 Network Driver - 
> version 1.6.11-k
> [  123.430691] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
> [  123.430918] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
> [  123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
> 0.0.0
> [  123.664088] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
> [  123.667878] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
> [  123.681915] Non-contiguous TC - Disabling DCB
> [  123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1
> [  123.713262] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
> [  123.864601] i40e 0045:01:00.0: Added LAN device PF0 bus=0x00 func=0x00
> [  123.864611] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
> [  123.893254] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 
> RSS FD_ATR DCB VxLAN Geneve PTP VEPA
> [  123.893321] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
> [  123.914829] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
> 0.0.0
> [  124.152980] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
> [  124.156999] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
> [  124.171266] i40e 0045:01:00.1: DEBUG DATA vsi > 398, enabled_tc 1
> [  124.196080] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
> [  124.253353] i40e 0045:01:00.1: Added LAN device PF1 bus=0x00 func=0x01
> [  124.253387] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
> [  124.263908] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 
> RSS FD_ATR DCB VxLAN Geneve PTP VEPA
>
>
> These 2 lines are important here:
> [  123.681915] Non-contiguous TC - Disabling DCB
> [  123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1
>
> First it decided to disable DCB feature due to lack of contiguous
> traffic classes, and then it used TC MAP (enabled_tc in device driver
> code as 1, same we already knew works). With that information in hand I
> forced enabled_tc (TC MAP) to 1 in 4.4's code and it worked, so I'm
> suspecting of a bad TC mask due to DCB being enabled.
>
> == Comment: #3 - Mauro Sergio Martins Rodrigues - 2017-02-23 11:24:41 ==
> I tried the 4.4's version of the i40e but with dcbx disabled in switch's 
> port, Traffic class setup and function bring up worked fine! It user TC MAP 
> (or traffic class mask) as 1. I do understand that this is just a workaround 
> though, the device driver should deal with the case where the switch has such 
> feature enabled instead of leaving the device 'broken':
>
> [  199.762738] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
> [  199.786589] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
> 0.0.0
> [  200.045270] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
> [  200.048955] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
> [  200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
> [  200.069232] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 1
> [  200.088056] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
> [  200.240641] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
> [  200.270717] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 
> RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
>
> The line
> [  200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
> corresponds to the piece of code where the traffic class is defined (see: 
> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/i40e/i40e_main.c?v=4.4#L4563)
>
> Another interesting discovery is that the device behaves well when we
> turn dcbx on in the switch after it's already probed:
>
> [  609.566786] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
> [  609.566794] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
> [  611.574987] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
> [  611.574990] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
> [  611.574994] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 31
>
> and such transition set traffic class mask as 31 instead of 255. and if
> we unload/load the module it goes to the original bad state we
> experienced in this bug again:
>
> [  746.151068] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
> [  746.174695] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
> 0.0.0
> [  746.433649] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
> [  746.437552] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
> [  746.457815] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
> [  746.457819] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
> [  746.459537] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC 
> failed = 14
> [  746.459541] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
> [  746.459550] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 
> 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
>
> == Comment: #4 - Mauro Sergio Martins Rodrigues - 2017-02-23 14:25:30 ==
> Things are going smoothly in kernel 4.8 even if dcbx is enabled in the port 
> due to this commit 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c
>  which disabledcbx when TC are not contiguous (it's not supported by the 
> device) 
>
> We should ask for a backport into 4.4.0 but I'm still investigating to
> see if something else should be included since in comment #3 we can see
> it transitioning into a valid state when dcbx is enabled in the switch.
>
> == Comment: #5 - Mauro Sergio Martins Rodrigues - 2017-03-13 13:41:19 ==
> Even though it was already clear that was related to kernel code, since it 
> works on 4.8 and doesn't in 4.4 I decided to perform a nvm update and it 
> didn't change the scenario. 
>
> comment #2 show nvm version as:
>> [  123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 
>> 0.0.0
> Current version is:
> firmware-version: 5.05 0x8000289d 1.1568.0
>
> and the issue continues reproducible .
>
> As stated in comment #4, now I can confirm we need to backport
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c
> to 4.4 to avoid getting into the broken state when probing Intel x710
> (driver i40e).
>
> ** Affects: ubuntu
>      Importance: Undecided
>      Assignee: Taco Screen team (taco-screen-team)
>          Status: New
>
>
> ** Tags: architecture-ppc64le bugnameltc-151930 severity-high 
> targetmilestone-inin---

-- 
Michael Hohnbaum
OIL Program Manager
Power (ppc64el) Development Project Manager
Canonical, Ltd.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1672550

Title:
  i40e Intel X710 error during device probe prevents link set up and ip
  association

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672550/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to