** Attachment added: "system_details.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+attachment/5272083/+files/system_details.txt
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1832082 Title: bnx2x driver causes 100% CPU load Status in linux package in Ubuntu: Confirmed Status in linux source package in Xenial: Confirmed Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Confirmed Status in linux source package in Eoan: Confirmed Status in linux source package in FF-Series: Confirmed Bug description: For the customer OpenStack deployment we deploy infra nodes on Dell R630 servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we observe 100% CPU load. So in total, we observe 4 CPUs with 100% load. perf report shows function bnx2x_ptp_task taking up much of the CPUs time: https://pastebin.canonical.com/p/kfrpd6Pwh5/ Also, /var/log/syslog contains the following outputs every few seconds: [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped So, the problem seems to be in a "timestampped" TX packet; the driver for some reason (to be yet understood) get an unexpected value from a register and then, it that same function, reschedule itself to try again this register read, read gets a bad value again, and so on infinitely. This is showing in the system as the 100% CPU usage kthreads; the message "The device supports only a single outstanding packet to timestamp, this packet will not be timestamped" happens because the driver can only timestamp a single TX packet at a time, and given it's stuck trying, it cannot accept another packet in this "queue". The infinite loop appears to be: static void bnx2x_ptp_task(struct work_struct *work) { struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task); int port = BP_PORT(bp); u32 val_seq; u64 timestamp, ns; struct skb_shared_hwtstamps shhwtstamps; /* Read Tx timestamp registers */ val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID : NIG_REG_P0_TLLH_PTP_BUF_SEQID); if (val_seq & 0x10000) { [...] } else { DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n"); /* Reschedule to keep checking for a valid timestamp value */ schedule_work(&bp->ptp_task); } It appears that val_seq & 0x10000 is never true, so the task constantly reschedules itself immediately. Instrumenting the function shows that it is being called in excess of 100,000 times per second. The REG_RD call does appear to be expensive (as it's a register read from the device) and shows high in the perf report, but that by itself doesn't appear to be the root cause (i.e., it's not hanging forever in the REG_RD). The cause appears to be that the driver is not prepared to deal with the PTP request never being completed by the hardware. It's unclear why it isn't completing, but regardless, the driver should not loop forever here. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp