https://bugs.dpdk.org/show_bug.cgi?id=388
Bug ID: 388 Summary: ixgbe: link state race condition can occur when starting a fiber port Product: DPDK Version: 19.08 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: mgsm...@netgate.com Target Milestone: --- Created attachment 81 --> https://bugs.dpdk.org/attachment.cgi?id=81&action=edit patch Overview: If the link is down when ports on an SFP+ X552 (device ID 0x15ac) are started, a race condition can occur that prevents them from working when the link peer becomes available and the link comes up. If 2 ports are started individually with some time in between them, the issue is not observed. The race condition seems to occur only when one port is started and then the other is started immediately afterwards (e.g. via script or control plane programmatically applying configuration). Steps to reproduce: 1. Install FD.IO VPP packages (available at https://packagecloud.io/fdio/release - vpp, vpp-lib, vpp-plugins needed) on a CentOS 7 system with X552 SFP+ devices attached. 2. If the X552 ports are bound to the kernel ixgbe driver, take them administratively down so VPP will take over management via '[sudo] ifdown eth0'. 3. Start VPP with '[sudo] systemctl start vpp'. 4. Create a text file commands.txt containing API commands to start the ports: echo 'sw_interface_set_flags sw_if_index 1 admin-up sw_interface_set_flags sw_if_index 2 admin-up' > commands.txt 5. Remove the SFP+ cables from the X552 ports so that link will not be established when they are brought up. 6. Run commands to start both ports in rapid succession with '[sudo] vpp_api_test in commands.txt' 7. Check the link state by running '[sudo] vppctl show hardware-interface'. The link speed should be displayed as "Unknown" and the link state should be displayed as "no carrier". 8. Connect an SFP+ cable between the two ports. 9. Check the link state again. One port may should that it is up and the link speed now. The other should still report Unknown/no carrier. Actual results: The second port started reports that it's link is down and never recovers, even if the port is stopped and restarted. Expected results: The second port reports that it's link is up and can forward and receive packets. Build date and hardware: Observed in DPDK 19.08 (VPP 20.01). Current DPDK master branch appears to have the same issue. Observed on a Xeon-D 1537 SoC with 2 copper i350 ports and 2 SFP+ X552 ports. Additional information: Attached gdb and found that when rte_eth_link_get_nowait() is called for the port which was having the issue, ixgbe_dev_link_update_share() would return before attempting to check the link state because the IXGBE_FLAG_NEED_LINK_CONFIG flag was set on the struct ixgbe_interrupt for the device. Further exploration showed that following sequence of events occurred: 1. ixgbe_dev_link_update_share() sets the IXGBE_FLAG_NEED_LINK_CONFIG flag and schedules ixgbe_dev_setup_link_alarm_handler() to run after 10us. 2. ixgbe_dev_start() is executed and cancels the execution of ixgbe_dev_setup_link_alarm_handler(). 3. Since ixgbe_dev_setup_link_alarm_handler() is where the IXGBE_FLAG_NEED_LINK_CONFIG flag would normally be cleared and its execution was cancelled, the flag remains set. All subsequent calls to ixgbe_dev_link_update_share() return early and never actually check the link state again. The attached patch seems to fix the issue. -- You are receiving this mail because: You are the assignee for the bug.