Hi Rob, Thanks for documenting your steps. I can confirm most if not all of your problems on Centos 7, USRP N321, Intel XL710. @Ettus can we get some attention for this issue? DPDK is marketted as a huge improvement for max bandwidth applications, and I have failed to see any real testing or use cases of it working more than once in a row. It is certainly a barrier for my applications, forcing me to reduce the sample rate and simplify the use cases.
-Pat On Wed, Feb 3, 2021 at 4:53 PM Rob Kossler via USRP-users < usrp-users@lists.ettus.com> wrote: > I am now to the point where things are kind of working and I'm basically > giving up trying to make them better. A few remarks for anyone who tries > DPDK in the future (with N310, Ubuntu 20.04, Intel XL710 NIC, and UHD 4.0). > > 1) I can only get my application to run once and then I have to do some > stuff (see NOTE 1 below) to run again. > 2) I get occasional (but much too often) lock-ups of other applications > running in Ubuntu. This was previously my experience using DPDK under 3.15 > (DPDK 17.11) but I had hoped things were better now. They are not. See > below for more details (NOTE 2 below) on this. Note that these lockups do > not occur even occasionally when not running with DPDK. > 3) The instructions in the UHD manual are not nearly good enough to get > things running. > 4) I first got things working as "root" (as recommended), but this caused > some ancillary issues with my apps. Fortunately, I was able to get it to > run as a lowly user (see NOTE 3 below) > 5) I could not get things working even once until I followed Aaron's > advice of putting just a few symlinks in a folder and pointing to that > folder from .config/uhd.conf (dpdk_driver=<folder>). See NOTE 4 below. > > Read on for the details if interested. > Rob > > NOTE 1: After I run and exit my app, I notice that the link LEDs on the > SFP ports of the N310 are not both on as they should be and I am unable to > run a second time. The following sequence fixes this (perhaps there is a > better sequence but I haven't found it yet) such that I am able to re-run > successfully. > - sudo dpdk-devbind -b i40e 03:00.0 03.00.1 # bind normal driver > - sudo dpdk-devbind -b vfio-pci 03:00.0 03.00.1 # re-bind vfio-pci driver > - physically, unplug & plug QSFP+ transceiver on XL710 (sometimes have to > do this 2 or 3 times before it "fixes" the link LEDs on N310 SFPs) > > NOTE 2: The fact that DPDK takes over the CPU cores (at least 1 if not 2 > of them) seems to cause issues with other apps. In the past I have even > had issues with keyboard/mouse input that became intolerably slow. I didn't > have keyboard/mouse issues this time, but I did have issues with a > companion application that I run alongside my c++/UHD application. This > companion application (actually Matlab based control/display GUI) would > lock up such that I couldn't even close it down. But, once I stop my > c++/UHD application, everything starts behaving normally. Note that I > NEVER have this issue when running the same applications without DPDK. I > tried the grub update "isolcpus=N,M" but not sure if this helped or not. I > also tried changing my DPDK corelist from 0,1 to 6,7 because in the past I > had convinced myself (perhaps wrongly) that things behaved better if not > using CPU 0. I have no hard evidence to support this. In the end, things > mostly work, but these lockups are reason enough to avoid DPDK. > > NOTE 3: I did the following to run as lowly user rather than root. > 1) updated /etc/security/limits.conf to use the following. I really have > no idea if these are reasonable values or not. The DPDK docs indicated that > these are the relevant settings to adjust but gave no advice on what they > should be set to. > <username> - memlock 2000000 > <username> - nofile 2000 > <username> - locks 2000 > 2) after binding the vfio-pci driver using dpdk-devbind, I ran the > following. The first two are commands I determined after running the DPDK > usertools/dpdk-setup.sh utility and then looking at the source to see the > exact chmod settings used by this utility (BTW, this utility was helpful). > The third was recommended in the DPDK documentation. > sudo chmod a+x /dev/vfio > sudo chmod 0666 /dev/vfio/* > sudo chmod a+w /dev/hugepages/ > > NOTE 4: The following are the few symlinks I put in a folder I created > "/usr/local/lib/dpdk-pmds/". After pointing the dpdk_driver=<folder> > setting in uhd.conf to this, I was able to run successfully. > librte_mempool_ring.so, librte_pmd_i40e.so, librte_pmd_ixgbe.so, and > librte_pmd_ring.so. > > On Wed, Feb 3, 2021 at 10:44 AM Rob Kossler <rkoss...@nd.edu> wrote: > >> Hi Aaron, >> Two things: >> 1) I am getting an error message at the conclusion of a successful run >> (see below). Not sure if this is something I should be looking at or if it >> is harmless. >> 2) I figured out a sequence of steps that can "fix" my broken state >> following a successful run. If I do the following, the links are fixed: >> a) dpdk-devbind -b i40e 03:00.0 03:00.1 // bind to the normal driver >> b) dpdk-devbind -b vfio-pci 03:00.0 03:00.1 // bind back to the >> vfio-pci driver >> c) physically unplug & plug the XL710 QSFP+ transceiver (mine is >> optical, but unplugging just the MTP does not do the trick - I need to >> unplug the full transceiver) >> >> Once I complete the sequence above, the link LEDs are back to normal and >> I can complete another run of benchmark_rate. This is obviously a bad >> solution so if you have any ideas, please let me know. >> Rob >> >> [00:00:05.113788990] Testing receive rate 125.000000 Msps on 4 channels >> [00:00:05.120454627] Testing transmit rate 125.000000 Msps on 4 channels >> [00:00:15.373972384] Benchmark complete. >> >> Benchmark rate summary: >> Num received samples: 5099558824 >> Num dropped samples: 0 >> Num overruns detected: 0 >> Num transmitted samples: 4999335588 >> Num sequence errors (Tx): 0 >> Num sequence errors (Rx): 0 >> Num underruns detected: 0 >> Num late commands: 0 >> Num timeouts (Tx): 0 >> Num timeouts (Rx): 0 >> >> >> Done! >> >> i40e_phy_conf_link(): Failed to get PHY capabilities: -7 >> >> >> On Wed, Feb 3, 2021 at 10:16 AM Rob Kossler <rkoss...@nd.edu> wrote: >> >>> Hi Aaron, >>> Unfortunately, I already tried playing around with the link timeout >>> increasing up to 10 seconds. No luck. But, I am presently troubleshooting >>> the issue and trying to switch back and forth between DPDK and normal >>> networking. I am finding that normal networking is not working after 1 run >>> of DPDK. And, I'm noticing that link LEDs are messed up and normal pings >>> are not working. I am playing around with disconnecting / reconnecting >>> links in order to get the link LEDs back to normal. My guess is that >>> things are not cleaning up as they should. >>> Rob >>> >>> On Wed, Feb 3, 2021 at 9:51 AM Aaron Rossetto via USRP-users < >>> usrp-users@lists.ettus.com> wrote: >>> >>>> I notice in the second and subsequent runs, you get this message from >>>> UHD: >>>> >>>> [ERROR] [DPDK] All DPDK links did not report as up! >>>> >>>> One of the other issues I've noticed with DPDK (and unfortunately >>>> don't have an answer for) is that link detection seems to have issues. >>>> I'm not sure if this is an XL710-specific problem or whether it's more >>>> widespread, but I added some code to try to mitigate things somewhat >>>> in commit eada49e4d. This commit checks the link status at >>>> 250-millisecond intervals for up to the link status timeout (default 1 >>>> second) in case the links take a while to register as up. One thing >>>> you could try is overriding the default link status timeout and >>>> increasing the value, which you can do by adding a dpdk_link_timeout=X >>>> line to the [use_dpdk=1] section of your uhd.conf file, where X is the >>>> new timeout in number of milliseconds. >>>> >>>> Best regards, >>>> Aaron >>>> >>>> On Tue, Feb 2, 2021 at 1:47 PM Rob Kossler <rkoss...@nd.edu> wrote: >>>> > >>>> > Hi Aaron, >>>> > This did indeed help. Now I am able to run ONCE successfully. After >>>> that I get an error. Same behavior on both systems. Not yet sure how to >>>> clear the error. I played with dpdk_link_timeout and even tried resetting >>>> the N310 using "overlay rm n310 && overlay add n310 && systemctl restart >>>> usrp-hwd". But no luck. >>>> > Rob >>>> > >>>> > // First run succeeds >>>> > root@irisheyes5-hp-z240-sff:~# uhd_image_loader >>>> --args="addr=192.168.1.88,type=n3xx,fpga=XG" >>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; >>>> UHD_4.0.0.0-50-ge520e3ff >>>> > [INFO] [MPMD] Initializing 1 device(s) in parallel with args: >>>> mgmt_addr=192.168.1.88,type=n3xx,product=n310,serial=3144673,claimed=False,skip_init=1 >>>> > [WARNING] [MPM.RPCServer] A timeout event occured! >>>> > [INFO] [MPMD] Claimed device without full initialization. >>>> > [INFO] [MPMD IMAGE LOADER] Starting update. This may take a while. >>>> > [INFO] [MPM.PeriphManager] Updating component `fpga' >>>> > [INFO] [MPM.PeriphManager] Updating component `dts' >>>> > [INFO] [MPM.RPCServer] Resetting peripheral manager. >>>> > [INFO] [MPM.PeriphManager] Device serial number: 3144673 >>>> > [INFO] [MPM.PeriphManager] Initialized 2 daughterboard(s). >>>> > [INFO] [MPM.PeriphManager] init() called with device args >>>> `clock_source=internal,time_source=internal'. >>>> > [INFO] [MPMD IMAGE LOADER] Update component function succeeded. >>>> > root@irisheyes5-hp-z240-sff:~# benchmark_rate --tx_rate=62.5e6 >>>> --rx_rate=62.5e6 --channels="0,1,2,3" >>>> --args="use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2" >>>> > >>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; >>>> UHD_4.0.0.0-50-ge520e3ff >>>> > EAL: Detected 8 lcore(s) >>>> > EAL: Detected 1 NUMA nodes >>>> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket >>>> > EAL: No free hugepages reported in hugepages-1048576kB >>>> > EAL: Probing VFIO support... >>>> > EAL: VFIO support initialized >>>> > EAL: PCI device 0000:03:00.0 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: using IOMMU type 1 (Type 1) >>>> > EAL: PCI device 0000:03:00.1 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: PCI device 0000:03:00.2 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: PCI device 0000:03:00.3 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > [00:00:00.000152] Creating the usrp device with: >>>> use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2... >>>> > [INFO] [MPMD] Initializing 1 device(s) in parallel with args: >>>> mgmt_addr=192.168.1.88,type=n3xx,product=n310,serial=3144673,claimed=False,use_dpdk=1,addr=192.168.60.2 >>>> > [INFO] [MPM.PeriphManager] init() called with device args >>>> `mgmt_addr=192.168.1.88,product=n310,use_dpdk=1,clock_source=internal,time_source=internal'. >>>> > Using Device: Single USRP: >>>> > Device: N300-Series Device >>>> > Mboard 0: n310 >>>> > RX Channel: 0 >>>> > RX DSP: 0 >>>> > RX Dboard: A >>>> > RX Subdev: Magnesium >>>> > RX Channel: 1 >>>> > RX DSP: 1 >>>> > RX Dboard: A >>>> > RX Subdev: Magnesium >>>> > RX Channel: 2 >>>> > RX DSP: 2 >>>> > RX Dboard: B >>>> > RX Subdev: Magnesium >>>> > RX Channel: 3 >>>> > RX DSP: 3 >>>> > RX Dboard: B >>>> > RX Subdev: Magnesium >>>> > TX Channel: 0 >>>> > TX DSP: 0 >>>> > TX Dboard: A >>>> > TX Subdev: Magnesium >>>> > TX Channel: 1 >>>> > TX DSP: 1 >>>> > TX Dboard: A >>>> > TX Subdev: Magnesium >>>> > TX Channel: 2 >>>> > TX DSP: 2 >>>> > TX Dboard: B >>>> > TX Subdev: Magnesium >>>> > TX Channel: 3 >>>> > TX DSP: 3 >>>> > TX Dboard: B >>>> > TX Subdev: Magnesium >>>> > >>>> > [00:00:03.21715319] Setting device timestamp to 0... >>>> > [INFO] [MULTI_USRP] 1) catch time transition at pps edge >>>> > [INFO] [MULTI_USRP] 2) set times next pps (synchronously) >>>> > [WARNING] [0/Radio#0] Attempting to set tick rate to 0. Skipping. >>>> > [WARNING] [0/Radio#1] Attempting to set tick rate to 0. Skipping. >>>> > [WARNING] [0/Radio#1] Attempting to set tick rate to 0. Skipping. >>>> > [WARNING] [0/Radio#0] Attempting to set tick rate to 0. Skipping. >>>> > Setting TX spp to 1989 >>>> > [00:00:04.907401082] Testing receive rate 62.500000 Msps on 4 channels >>>> > [00:00:04.914615576] Testing transmit rate 62.500000 Msps on 4 >>>> channels >>>> > [00:00:15.167869894] Benchmark complete. >>>> > >>>> > >>>> > Benchmark rate summary: >>>> > Num received samples: 2549794336 >>>> > Num dropped samples: 0 >>>> > Num overruns detected: 0 >>>> > Num transmitted samples: 2499910452 >>>> > Num sequence errors (Tx): 0 >>>> > Num sequence errors (Rx): 0 >>>> > Num underruns detected: 0 >>>> > Num late commands: 0 >>>> > Num timeouts (Tx): 0 >>>> > Num timeouts (Rx): 0 >>>> > >>>> > >>>> > Done! >>>> > >>>> > // Second run fails >>>> > root@irisheyes5-hp-z240-sff:~# benchmark_rate --tx_rate=62.5e6 >>>> --rx_rate=62.5e6 --channels="0,1,2,3" >>>> --args="use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2" >>>> > >>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; >>>> UHD_4.0.0.0-50-ge520e3ff >>>> > EAL: Detected 8 lcore(s) >>>> > EAL: Detected 1 NUMA nodes >>>> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket >>>> > EAL: No free hugepages reported in hugepages-1048576kB >>>> > EAL: Probing VFIO support... >>>> > EAL: VFIO support initialized >>>> > EAL: PCI device 0000:03:00.0 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: using IOMMU type 1 (Type 1) >>>> > EAL: PCI device 0000:03:00.1 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: PCI device 0000:03:00.2 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: PCI device 0000:03:00.3 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > [ERROR] [DPDK] All DPDK links did not report as up! >>>> > EAL: FATAL: already called initialization. >>>> > EAL: already called initialization. >>>> > [ERROR] [UHD] Device discovery error: RuntimeError: DPDK: All DPDK >>>> links did not report as up! >>>> > [ERROR] [DPDK] Error with EAL initialization >>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with >>>> EAL initialization >>>> > [00:00:00.000122] Creating the usrp device with: >>>> use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2... >>>> > EAL: FATAL: already called initialization. >>>> > EAL: already called initialization. >>>> > [ERROR] [DPDK] Error with EAL initialization >>>> > [ERROR] [UHD] Device discovery error: RuntimeError: Error with EAL >>>> initialization >>>> > EAL: FATAL: already called initialization. >>>> > EAL: already called initialization. >>>> > [ERROR] [DPDK] Error with EAL initialization >>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with >>>> EAL initialization >>>> > Error: LookupError: KeyError: No devices found for -----> >>>> > Device Address: >>>> > use_dpdk: 1 >>>> > mgmt_addr: 192.168.1.88 >>>> > addr: 192.168.60.2 >>>> > >>>> > // Third run fails >>>> > root@irisheyes5-hp-z240-sff:~# benchmark_rate --tx_rate=62.5e6 >>>> --rx_rate=62.5e6 --channels="0,1,2,3" >>>> --args="use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2" >>>> > >>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; >>>> UHD_4.0.0.0-50-ge520e3ff >>>> > EAL: Detected 8 lcore(s) >>>> > EAL: Detected 1 NUMA nodes >>>> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket >>>> > EAL: No free hugepages reported in hugepages-1048576kB >>>> > EAL: Probing VFIO support... >>>> > EAL: VFIO support initialized >>>> > EAL: PCI device 0000:03:00.0 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: using IOMMU type 1 (Type 1) >>>> > EAL: PCI device 0000:03:00.1 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: PCI device 0000:03:00.2 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > EAL: PCI device 0000:03:00.3 on NUMA socket -1 >>>> > EAL: Invalid NUMA socket, default to 0 >>>> > EAL: probe driver: 8086:1584 net_i40e >>>> > [ERROR] [DPDK] All DPDK links did not report as up! >>>> > EAL: FATAL: already called initialization. >>>> > EAL: already called initialization. >>>> > [ERROR] [UHD] Device discovery error: RuntimeError: DPDK: All DPDK >>>> links did not report as up! >>>> > [ERROR] [DPDK] Error with EAL initialization >>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with >>>> EAL initialization >>>> > [00:00:00.000148] Creating the usrp device with: >>>> use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2... >>>> > EAL: FATAL: already called initialization. >>>> > EAL: already called initialization. >>>> > [ERROR] [DPDK] Error with EAL initialization >>>> > [ERROR] [UHD] Device discovery error: RuntimeError: Error with EAL >>>> initialization >>>> > EAL: FATAL: already called initialization. >>>> > EAL: already called initialization. >>>> > [ERROR] [DPDK] Error with EAL initialization >>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with >>>> EAL initialization >>>> > Error: LookupError: KeyError: No devices found for -----> >>>> > Device Address: >>>> > use_dpdk: 1 >>>> > mgmt_addr: 192.168.1.88 >>>> > addr: 192.168.60.2 >>>> > >>>> > >>>> > >>>> > On Tue, Feb 2, 2021 at 11:53 AM Aaron Rossetto via USRP-users < >>>> usrp-users@lists.ettus.com> wrote: >>>> >> >>>> >> On Mon, Feb 1, 2021 at 9:02 PM Rob Kossler via USRP-users >>>> >> <usrp-users@lists.ettus.com> wrote: >>>> >> >>>> >> > Has anyone successfully used DPDK with Ubuntu 20.04, UHD 4.0, >>>> Intel XL710 NIC, and N310 (or X310)? >>>> >> >>>> >> If I remember correctly, I believe DPDK tries to dlopen() >>>> *everything* >>>> >> in the directory specified by the dpdk_driver parameter in the DPDK >>>> >> section of uhd.conf, leading to a lot of errors similar to yours >>>> >> ('Invalid ELF header' and the like). Having the correct collection of >>>> >> .so files in that directory is key. >>>> >> >>>> >> What's worked for me in the past when using DPDK with an Intel XL710 >>>> >> is creating a directory (I used /usr/local/lib/dpdk-pmds) and copying >>>> >> a specific set of DPDK .so files into this directory: >>>> >> * librte_mempool_ring.so >>>> >> * librte_pdump.so (I think this one is optional--I had been trying to >>>> >> get packet dumps from DPDK a while back) >>>> >> * librte_pmd_i40e.so >>>> >> * librte_pmd_ixgbe.so (may be optional?) >>>> >> * librte_pmd_pcap.so (this one is also optional, I think) >>>> >> * librte_pmd_ring.so >>>> >> >>>> >> (Symlinking to the actual libraries wherever they get installed >>>> >> instead of copying them into the directory would probably work as >>>> >> well.) >>>> >> >>>> >> Then, make sure that the dpdk-driver key in the [use_dpdk=1] section >>>> >> of uhd.conf points to that directory: >>>> >> dpdk_driver = /usr/local/lib/dpdk-pmds >>>> >> >>>> >> Hopefully that will resolve the issue and get you a little further >>>> >> down the road. >>>> >> >>>> >> Best regards, >>>> >> Aaron >>>> >> >>>> >> _______________________________________________ >>>> >> USRP-users mailing list >>>> >> USRP-users@lists.ettus.com >>>> >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>>> >>>> _______________________________________________ >>>> USRP-users mailing list >>>> USRP-users@lists.ettus.com >>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>>> >>> _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >
_______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com