** Attachment added: "system_details.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+attachment/5272083/+files/system_details.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Confirmed
Status in linux source package in FF-Series:
  Confirmed

Bug description:
  For the customer OpenStack deployment we deploy infra nodes on Dell
  R630 servers. The servers have onboard Broadcom's NetXtreme II
  BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP
  state, we observe 100% CPU load. So in total, we observe 4 CPUs with
  100% load.

  perf report shows function bnx2x_ptp_task taking up much of the CPUs
  time: https://pastebin.canonical.com/p/kfrpd6Pwh5/

  Also, /var/log/syslog contains the following outputs every few
  seconds:

  [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
  [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
  [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
  [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped

  So, the problem seems to be in a "timestampped" TX packet; the driver
  for some reason (to be yet understood) get an unexpected value from a
  register and then, it that same function, reschedule itself to try
  again this register read, read gets a bad value again, and so on
  infinitely.

  This is showing in the system as the 100% CPU usage kthreads; the
  message "The device supports only a single outstanding packet to
  timestamp, this packet will not be timestamped" happens because the
  driver can only timestamp a single TX packet at a time, and given it's
  stuck trying, it cannot accept another packet in this "queue".

  The infinite loop appears to be:

  static void bnx2x_ptp_task(struct work_struct *work)
  {
  struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
  int port = BP_PORT(bp);
  u32 val_seq;
  u64 timestamp, ns;
  struct skb_shared_hwtstamps shhwtstamps;

  /* Read Tx timestamp registers */
  val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
  NIG_REG_P0_TLLH_PTP_BUF_SEQID);
  if (val_seq & 0x10000) {
  [...]
  } else {
  DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
  /* Reschedule to keep checking for a valid timestamp value */
  schedule_work(&bp->ptp_task);
  }

  It appears that val_seq & 0x10000 is never true, so the task
  constantly reschedules itself immediately. Instrumenting the function
  shows that it is being called in excess of 100,000 times per second.
  The REG_RD call does appear to be expensive (as it's a register read
  from the device) and shows high in the perf report, but that by itself
  doesn't appear to be the root cause (i.e., it's not hanging forever in
  the REG_RD).

  The cause appears to be that the driver is not prepared to deal with
  the PTP request never being completed by the hardware. It's unclear
  why it isn't completing, but regardless, the driver should not loop
  forever here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to