Hello i40e and testpmd maintainers,

A gentle reminder - would you please advise how to debug the issue
described below?

Thanks,
Juraj

On Fri, Jan 20, 2023 at 1:07 PM Juraj Linkeš <juraj.lin...@pantheon.tech> wrote:
>
> Adding the logfile.
>
>
>
> One thing that's in the logs but didn't explicitly mention is the DPDK 
> version we've tried this with:
>
> EAL: RTE Version: 'DPDK 22.07.0'
>
>
>
> We also tried earlier versions going back to 21.08, with no luck. I also did 
> a quick check on 22.11, also with no luck.
>
>
>
> Juraj
>
>
>
> From: Juraj Linkeš
> Sent: Friday, January 20, 2023 12:56 PM
> To: 'aman.deep.si...@intel.com' <aman.deep.si...@intel.com>; 
> 'yuying.zh...@intel.com' <yuying.zh...@intel.com>; Xing, Beilei 
> <beilei.x...@intel.com>
> Cc: dev@dpdk.org; Ruifeng Wang <ruifeng.w...@arm.com>; 'Lijian Zhang' 
> <lijian.zh...@arm.com>; 'Honnappa Nagarahalli' <honnappa.nagaraha...@arm.com>
> Subject: Testpmd/l3fwd port shutdown failure on Arm Altra systems
>
>
>
> Hello i40e and testpmd maintainers,
>
>
>
> We're hitting an issue with DPDK testpmd on Ampere Altra servers in FD.io lab.
>
>
>
> A bit of background: along with VPP performance tests (which uses DPDK), 
> we're running a small number of basic DPDK testpmd and l3fwd tests in FD.io 
> as well. This is to catch any performance differences due to VPP updating its 
> DPDK version.
>
>
>
> We're running both l3fwd tests and testpmd tests. The Altra servers are two 
> socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows in both 
> directions, but nothing gets forwarded (with a slight caveat - put a pin in 
> this). There's nothing special in the tests, just forwarding traffic. The NIC 
> we're testing is xl710-QDA2.
>
>
>
> The same tests are passing on all other testbeds - we have various two node 
> (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds and with 
> various NICs (Intel 700 and 800 series and the Intel testbeds use some 
> Mellanox NICs as well). We don't have quite the same combination of another 
> three node topology with the same NIC though, so it looks like something with 
> testpmd/l3fwd and xl710-QDA2 on Altra servers.
>
>
>
> VPP performance tests are passing, but l3fwd and testpmd fail. This leads us 
> to believe to it's a software issue, but there could something wrong with the 
> hardware. I'll talk about testpmd from now on, but as far we can tell, the 
> behavior is the same for testpmd and l3fwd.
>
>
>
> Getting back to the caveat mentioned earlier, there seems to be something 
> wrong with port shutdown. When running testpmd on a testbed that hasn't been 
> used for a while it seems that all ports are up right away (we don't see any 
> "Port 0|1: link state change event") and the setup works fine (forwarding 
> works). After restarting testpmd (restarting on one server is sufficient), 
> the ports between DUT1 and DUT2 (but not between DUTs and TG) go down and are 
> not usable in DPDK, VPP or in Linux (with i40e kernel driver) for a while 
> (measured in minutes, sometimes dozens of minutes; the duration is seemingly 
> random). The ports eventually recover and can be used again, but there's 
> nothing in syslog suggesting what happened.
>
>
>
> What seems to be happening is testpmd put the ports into some faulty state. 
> This only happens on the DUT1 -> DUT2 link though (the ports between the two 
> testpmds), not on TG -> DUT1 link (the TG port is left alone).
>
>
>
> Some more info:
>
> We've come across the issue with this configuration:
>
> OS: Ubuntu20.04 with kernel 5.4.0-65-generic.
>
> Old NIC firmware, never upgraded: 6.01 0x800035da 1.1747.0.
>
> Drivers versions: i40e 2.17.15 and iavf 4.3.19.
>
>
>
> As well as with this configuration:
>
> OS: Ubuntu22.04 with kernel 5.15.0-46-generic.
>
> Updated firmware: 8.30 0x8000a4ae 1.2926.0.
>
> Drivers: i40e 2.19.3 and iavf 4.5.3.
>
>
>
> Unsafe noiommu mode is disabled:
>
> cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode
>
> N
>
>
>
> We used DPDK 22.07 in manual testing and built it on DUTs, using generic 
> build:
>
> meson -Dexamples=l3fwd -Dc_args=-DRTE_LIBRTE_I40E_16BYTE_RX_DESC=y 
> -Dplatform=generic build
>
>
>
> We're running testpmd with this command:
>
> sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a 0004:04:00.0 
> --in-memory -- -i --forward-mode=io --burst=64 --txq=1 --rxq=1 
> --tx-offloads=0x0 --numa --auto-start --total-num-mbufs=32768 --nb-ports=2 
> --portmask=0x3 --max-pkt-len=1518 --mbuf-size=16384 --nb-cores=1
>
>
>
> And l3fwd (with different macs on the other server):
>
> sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,2 -a 
> 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype 
> --eth-dest="0,40:a6:b7:85:e7:79" --eth-dest="1,3c:fd:fe:c3:e7:a1" 
> --config="(0, 0, 2),(1, 0, 2)" -P -L -p 0x3
>
>
>
> We tried adding logs with  --log-level=pmd,debug and --no-lsc-interrupt, but 
> that didn't reveal anything helpful, as far as we can tell - please have a 
> look at the attached log. The faulty port is port0 (starts out as down, then 
> we waited for around 25 minutes for it to go up and then we shut down 
> testpmd).
>
>
>
> We'd like to ask for pointers on what could be the cause or how to debug this 
> issue further.
>
>
>
> Thanks,
> Juraj

Reply via email to