Public bug reported: SRU Justification:
[Impact] This is reproducible on systems which already have heavy background traffic. On top of that, the user issues one of the 2 docker pulls below: docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest OR docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa The second one is a very large container (17GB) When they run docker pull, the OOB interface stops being pingable, the docker pull is interrupted for a very long time (>3mn) or times out. [Fix] * Update the RX_CQE_CI before updating the RX_PI to avoid a race condition where we wrongly inform HW that there is space for the WQE. * disable the RX DMA while we are handling incoming packets to avoid overflow. [Test Case] * Created a script which loops 200 times and does a docker pull in each loop: docker pull nvcr.io/ea-doca-hbn/hbn/hbn:latest OR docker pull gitlab-master.nvidia.com:5005/dl/dgx/tritonserver:22.02-py3-qa [Regression Potential] * This could result in slower handling since we are disabling/enabling the DMA periodically. * Although this fix has been tested by the people who opened the bug, QA needs to thoroughly test it to make sure it is not reproducible. ** Affects: linux-bluefield (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1964984 Title: Fix OOB handling RX packets in heavy traffic To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1964984/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs