Hi Rob,

Thanks for documenting your steps. I can confirm most if not all of your
problems on Centos 7, USRP N321, Intel XL710. @Ettus can we get some
attention for this issue? DPDK is marketted as a huge improvement for max
bandwidth applications, and I have failed to see any real testing or use
cases of it working more than once in a row. It is certainly a barrier for
my applications, forcing me to reduce the sample rate and simplify the use
cases.

-Pat


On Wed, Feb 3, 2021 at 4:53 PM Rob Kossler via USRP-users <
usrp-users@lists.ettus.com> wrote:

> I am now to the point where things are kind of working and I'm basically
> giving up trying to make them better.  A few remarks for anyone who tries
> DPDK in the future (with N310, Ubuntu 20.04, Intel XL710 NIC, and UHD 4.0).
>
> 1) I can only get my application to run once and then I have to do some
> stuff (see NOTE 1 below) to run again.
> 2) I get occasional (but much too often) lock-ups of other applications
> running in Ubuntu.  This was previously my experience using DPDK under 3.15
> (DPDK 17.11) but I had hoped things were better now.  They are not.  See
> below for more details (NOTE 2 below) on this. Note that these lockups do
> not occur even occasionally when not running with DPDK.
> 3) The instructions in the UHD manual are not nearly good enough to get
> things running.
> 4) I first got things working as "root" (as recommended), but this caused
> some ancillary issues with my apps. Fortunately, I was able to get it to
> run as a lowly user (see NOTE 3 below)
> 5) I could not get things working even once until I followed Aaron's
> advice of putting just a few symlinks in a folder and pointing to that
> folder from .config/uhd.conf (dpdk_driver=<folder>). See NOTE 4 below.
>
> Read on for the details if interested.
> Rob
>
> NOTE 1: After I run and exit my app, I notice that the link LEDs on the
> SFP ports of the N310 are not both on as they should be and I am unable to
> run a second time.  The following sequence fixes this (perhaps there is a
> better sequence but I haven't found it yet) such that I am able to re-run
> successfully.
> - sudo dpdk-devbind -b i40e  03:00.0 03.00.1  # bind normal driver
> - sudo dpdk-devbind -b vfio-pci 03:00.0 03.00.1 # re-bind vfio-pci driver
> - physically, unplug & plug QSFP+ transceiver on XL710 (sometimes have to
> do this 2 or 3 times before it "fixes" the link LEDs on N310 SFPs)
>
> NOTE 2: The fact that DPDK takes over the CPU cores (at least 1 if not 2
> of them) seems to cause issues with other apps.  In the past I have even
> had issues with keyboard/mouse input that became intolerably slow. I didn't
> have keyboard/mouse issues this time, but I did have issues with a
> companion application that I run alongside my c++/UHD application.  This
> companion application (actually Matlab based control/display GUI) would
> lock up such that I couldn't even close it down.  But, once I stop my
> c++/UHD application, everything starts behaving normally.  Note that I
> NEVER have this issue when running the same applications without DPDK.  I
> tried the grub update "isolcpus=N,M" but not sure if this helped or not.  I
> also tried changing my DPDK corelist from 0,1 to 6,7 because in the past I
> had convinced myself (perhaps wrongly) that things behaved better if not
> using CPU 0.  I have no hard evidence to support this.  In the end, things
> mostly work, but these lockups are reason enough to avoid DPDK.
>
> NOTE 3: I did the following to run as lowly user rather than root.
> 1) updated /etc/security/limits.conf to use the following. I really have
> no idea if these are reasonable values or not. The DPDK docs indicated that
> these are the relevant settings to adjust but gave no advice on what they
> should be set to.
> <username> - memlock 2000000
> <username> - nofile  2000
> <username> - locks   2000
> 2) after binding the vfio-pci driver using dpdk-devbind, I ran the
> following. The first two are commands I determined after running the DPDK
> usertools/dpdk-setup.sh utility and then looking at the source to see the
> exact chmod settings used by this utility (BTW, this utility was helpful).
> The third was recommended in the DPDK documentation.
> sudo chmod a+x /dev/vfio
> sudo chmod 0666 /dev/vfio/*
> sudo chmod a+w /dev/hugepages/
>
> NOTE 4: The following are the few symlinks I put in a folder I created
> "/usr/local/lib/dpdk-pmds/".  After pointing the dpdk_driver=<folder>
> setting in uhd.conf to this, I was able to run successfully.
> librte_mempool_ring.so, librte_pmd_i40e.so, librte_pmd_ixgbe.so, and
> librte_pmd_ring.so.
>
> On Wed, Feb 3, 2021 at 10:44 AM Rob Kossler <rkoss...@nd.edu> wrote:
>
>> Hi Aaron,
>> Two things:
>> 1) I am getting an error message at the conclusion of a successful run
>> (see below). Not sure if this is something I should be looking at or if it
>> is harmless.
>> 2) I figured out a sequence of steps that can "fix" my broken state
>> following a successful run. If I do the following, the links are fixed:
>>   a) dpdk-devbind -b i40e 03:00.0 03:00.1   // bind to the normal driver
>>   b) dpdk-devbind -b vfio-pci 03:00.0 03:00.1  // bind back to the
>> vfio-pci driver
>>   c) physically unplug & plug the XL710 QSFP+ transceiver (mine is
>> optical, but unplugging just the MTP does not do the trick - I need to
>> unplug the full transceiver)
>>
>> Once I complete the sequence above, the link LEDs are back to normal and
>> I can complete another run of benchmark_rate. This is obviously a bad
>> solution so if you have any ideas, please let me know.
>> Rob
>>
>> [00:00:05.113788990] Testing receive rate 125.000000 Msps on 4 channels
>> [00:00:05.120454627] Testing transmit rate 125.000000 Msps on 4 channels
>> [00:00:15.373972384] Benchmark complete.
>>
>> Benchmark rate summary:
>>   Num received samples:     5099558824
>>   Num dropped samples:      0
>>   Num overruns detected:    0
>>   Num transmitted samples:  4999335588
>>   Num sequence errors (Tx): 0
>>   Num sequence errors (Rx): 0
>>   Num underruns detected:   0
>>   Num late commands:        0
>>   Num timeouts (Tx):        0
>>   Num timeouts (Rx):        0
>>
>>
>> Done!
>>
>> i40e_phy_conf_link(): Failed to get PHY capabilities: -7
>>
>>
>> On Wed, Feb 3, 2021 at 10:16 AM Rob Kossler <rkoss...@nd.edu> wrote:
>>
>>> Hi Aaron,
>>> Unfortunately, I already tried playing around with the link timeout
>>> increasing up to 10 seconds.  No luck.  But, I am presently troubleshooting
>>> the issue and trying to switch back and forth between DPDK and normal
>>> networking. I am finding that normal networking is not working after 1 run
>>> of DPDK. And, I'm noticing that link LEDs are messed up and normal pings
>>> are not working.  I am playing around with disconnecting / reconnecting
>>> links in order to get the link LEDs back to normal.  My guess is that
>>> things are not cleaning up as they should.
>>> Rob
>>>
>>> On Wed, Feb 3, 2021 at 9:51 AM Aaron Rossetto via USRP-users <
>>> usrp-users@lists.ettus.com> wrote:
>>>
>>>> I notice in the second and subsequent runs, you get this message from
>>>> UHD:
>>>>
>>>> [ERROR] [DPDK] All DPDK links did not report as up!
>>>>
>>>> One of the other issues I've noticed with DPDK (and unfortunately
>>>> don't have an answer for) is that link detection seems to have issues.
>>>> I'm not sure if this is an XL710-specific problem or whether it's more
>>>> widespread, but I added some code to try to mitigate things somewhat
>>>> in commit eada49e4d. This commit checks the link status at
>>>> 250-millisecond intervals for up to the link status timeout (default 1
>>>> second) in case the links take a while to register as up. One thing
>>>> you could try is overriding the default link status timeout and
>>>> increasing the value, which you can do by adding a dpdk_link_timeout=X
>>>> line to the [use_dpdk=1] section of your uhd.conf file, where X is the
>>>> new timeout in number of milliseconds.
>>>>
>>>> Best regards,
>>>> Aaron
>>>>
>>>> On Tue, Feb 2, 2021 at 1:47 PM Rob Kossler <rkoss...@nd.edu> wrote:
>>>> >
>>>> > Hi Aaron,
>>>> > This did indeed help.  Now I am able to run ONCE successfully.  After
>>>> that I get an error.  Same behavior on both systems.  Not yet sure how to
>>>> clear the error.  I played with dpdk_link_timeout and even tried resetting
>>>> the N310 using "overlay rm n310 && overlay add n310 && systemctl restart
>>>> usrp-hwd".  But no luck.
>>>> > Rob
>>>> >
>>>> > // First run succeeds
>>>> > root@irisheyes5-hp-z240-sff:~# uhd_image_loader
>>>> --args="addr=192.168.1.88,type=n3xx,fpga=XG"
>>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100;
>>>> UHD_4.0.0.0-50-ge520e3ff
>>>> > [INFO] [MPMD] Initializing 1 device(s) in parallel with args:
>>>> mgmt_addr=192.168.1.88,type=n3xx,product=n310,serial=3144673,claimed=False,skip_init=1
>>>> > [WARNING] [MPM.RPCServer] A timeout event occured!
>>>> > [INFO] [MPMD] Claimed device without full initialization.
>>>> > [INFO] [MPMD IMAGE LOADER] Starting update. This may take a while.
>>>> > [INFO] [MPM.PeriphManager] Updating component `fpga'
>>>> > [INFO] [MPM.PeriphManager] Updating component `dts'
>>>> > [INFO] [MPM.RPCServer] Resetting peripheral manager.
>>>> > [INFO] [MPM.PeriphManager] Device serial number: 3144673
>>>> > [INFO] [MPM.PeriphManager] Initialized 2 daughterboard(s).
>>>> > [INFO] [MPM.PeriphManager] init() called with device args
>>>> `clock_source=internal,time_source=internal'.
>>>> > [INFO] [MPMD IMAGE LOADER] Update component function succeeded.
>>>> > root@irisheyes5-hp-z240-sff:~# benchmark_rate --tx_rate=62.5e6
>>>> --rx_rate=62.5e6 --channels="0,1,2,3"
>>>> --args="use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2"
>>>> >
>>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100;
>>>> UHD_4.0.0.0-50-ge520e3ff
>>>> > EAL: Detected 8 lcore(s)
>>>> > EAL: Detected 1 NUMA nodes
>>>> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>>> > EAL: No free hugepages reported in hugepages-1048576kB
>>>> > EAL: Probing VFIO support...
>>>> > EAL: VFIO support initialized
>>>> > EAL: PCI device 0000:03:00.0 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL:   using IOMMU type 1 (Type 1)
>>>> > EAL: PCI device 0000:03:00.1 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL: PCI device 0000:03:00.2 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL: PCI device 0000:03:00.3 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > [00:00:00.000152] Creating the usrp device with:
>>>> use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2...
>>>> > [INFO] [MPMD] Initializing 1 device(s) in parallel with args:
>>>> mgmt_addr=192.168.1.88,type=n3xx,product=n310,serial=3144673,claimed=False,use_dpdk=1,addr=192.168.60.2
>>>> > [INFO] [MPM.PeriphManager] init() called with device args
>>>> `mgmt_addr=192.168.1.88,product=n310,use_dpdk=1,clock_source=internal,time_source=internal'.
>>>> > Using Device: Single USRP:
>>>> >   Device: N300-Series Device
>>>> >   Mboard 0: n310
>>>> >   RX Channel: 0
>>>> >     RX DSP: 0
>>>> >     RX Dboard: A
>>>> >     RX Subdev: Magnesium
>>>> >   RX Channel: 1
>>>> >     RX DSP: 1
>>>> >     RX Dboard: A
>>>> >     RX Subdev: Magnesium
>>>> >   RX Channel: 2
>>>> >     RX DSP: 2
>>>> >     RX Dboard: B
>>>> >     RX Subdev: Magnesium
>>>> >   RX Channel: 3
>>>> >     RX DSP: 3
>>>> >     RX Dboard: B
>>>> >     RX Subdev: Magnesium
>>>> >   TX Channel: 0
>>>> >     TX DSP: 0
>>>> >     TX Dboard: A
>>>> >     TX Subdev: Magnesium
>>>> >   TX Channel: 1
>>>> >     TX DSP: 1
>>>> >     TX Dboard: A
>>>> >     TX Subdev: Magnesium
>>>> >   TX Channel: 2
>>>> >     TX DSP: 2
>>>> >     TX Dboard: B
>>>> >     TX Subdev: Magnesium
>>>> >   TX Channel: 3
>>>> >     TX DSP: 3
>>>> >     TX Dboard: B
>>>> >     TX Subdev: Magnesium
>>>> >
>>>> > [00:00:03.21715319] Setting device timestamp to 0...
>>>> > [INFO] [MULTI_USRP]     1) catch time transition at pps edge
>>>> > [INFO] [MULTI_USRP]     2) set times next pps (synchronously)
>>>> > [WARNING] [0/Radio#0] Attempting to set tick rate to 0. Skipping.
>>>> > [WARNING] [0/Radio#1] Attempting to set tick rate to 0. Skipping.
>>>> > [WARNING] [0/Radio#1] Attempting to set tick rate to 0. Skipping.
>>>> > [WARNING] [0/Radio#0] Attempting to set tick rate to 0. Skipping.
>>>> > Setting TX spp to 1989
>>>> > [00:00:04.907401082] Testing receive rate 62.500000 Msps on 4 channels
>>>> > [00:00:04.914615576] Testing transmit rate 62.500000 Msps on 4
>>>> channels
>>>> > [00:00:15.167869894] Benchmark complete.
>>>> >
>>>> >
>>>> > Benchmark rate summary:
>>>> >   Num received samples:     2549794336
>>>> >   Num dropped samples:      0
>>>> >   Num overruns detected:    0
>>>> >   Num transmitted samples:  2499910452
>>>> >   Num sequence errors (Tx): 0
>>>> >   Num sequence errors (Rx): 0
>>>> >   Num underruns detected:   0
>>>> >   Num late commands:        0
>>>> >   Num timeouts (Tx):        0
>>>> >   Num timeouts (Rx):        0
>>>> >
>>>> >
>>>> > Done!
>>>> >
>>>> > // Second run fails
>>>> > root@irisheyes5-hp-z240-sff:~# benchmark_rate --tx_rate=62.5e6
>>>> --rx_rate=62.5e6 --channels="0,1,2,3"
>>>> --args="use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2"
>>>> >
>>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100;
>>>> UHD_4.0.0.0-50-ge520e3ff
>>>> > EAL: Detected 8 lcore(s)
>>>> > EAL: Detected 1 NUMA nodes
>>>> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>>> > EAL: No free hugepages reported in hugepages-1048576kB
>>>> > EAL: Probing VFIO support...
>>>> > EAL: VFIO support initialized
>>>> > EAL: PCI device 0000:03:00.0 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL:   using IOMMU type 1 (Type 1)
>>>> > EAL: PCI device 0000:03:00.1 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL: PCI device 0000:03:00.2 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL: PCI device 0000:03:00.3 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > [ERROR] [DPDK] All DPDK links did not report as up!
>>>> > EAL: FATAL: already called initialization.
>>>> > EAL: already called initialization.
>>>> > [ERROR] [UHD] Device discovery error: RuntimeError: DPDK: All DPDK
>>>> links did not report as up!
>>>> > [ERROR] [DPDK] Error with EAL initialization
>>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with
>>>> EAL initialization
>>>> > [00:00:00.000122] Creating the usrp device with:
>>>> use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2...
>>>> > EAL: FATAL: already called initialization.
>>>> > EAL: already called initialization.
>>>> > [ERROR] [DPDK] Error with EAL initialization
>>>> > [ERROR] [UHD] Device discovery error: RuntimeError: Error with EAL
>>>> initialization
>>>> > EAL: FATAL: already called initialization.
>>>> > EAL: already called initialization.
>>>> > [ERROR] [DPDK] Error with EAL initialization
>>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with
>>>> EAL initialization
>>>> > Error: LookupError: KeyError: No devices found for ----->
>>>> > Device Address:
>>>> >     use_dpdk: 1
>>>> >     mgmt_addr: 192.168.1.88
>>>> >     addr: 192.168.60.2
>>>> >
>>>> > // Third run fails
>>>> > root@irisheyes5-hp-z240-sff:~# benchmark_rate --tx_rate=62.5e6
>>>> --rx_rate=62.5e6 --channels="0,1,2,3"
>>>> --args="use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2"
>>>> >
>>>> > [INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100;
>>>> UHD_4.0.0.0-50-ge520e3ff
>>>> > EAL: Detected 8 lcore(s)
>>>> > EAL: Detected 1 NUMA nodes
>>>> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>>> > EAL: No free hugepages reported in hugepages-1048576kB
>>>> > EAL: Probing VFIO support...
>>>> > EAL: VFIO support initialized
>>>> > EAL: PCI device 0000:03:00.0 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL:   using IOMMU type 1 (Type 1)
>>>> > EAL: PCI device 0000:03:00.1 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL: PCI device 0000:03:00.2 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > EAL: PCI device 0000:03:00.3 on NUMA socket -1
>>>> > EAL:   Invalid NUMA socket, default to 0
>>>> > EAL:   probe driver: 8086:1584 net_i40e
>>>> > [ERROR] [DPDK] All DPDK links did not report as up!
>>>> > EAL: FATAL: already called initialization.
>>>> > EAL: already called initialization.
>>>> > [ERROR] [UHD] Device discovery error: RuntimeError: DPDK: All DPDK
>>>> links did not report as up!
>>>> > [ERROR] [DPDK] Error with EAL initialization
>>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with
>>>> EAL initialization
>>>> > [00:00:00.000148] Creating the usrp device with:
>>>> use_dpdk=1,mgmt_addr=192.168.1.88,addr=192.168.60.2...
>>>> > EAL: FATAL: already called initialization.
>>>> > EAL: already called initialization.
>>>> > [ERROR] [DPDK] Error with EAL initialization
>>>> > [ERROR] [UHD] Device discovery error: RuntimeError: Error with EAL
>>>> initialization
>>>> > EAL: FATAL: already called initialization.
>>>> > EAL: already called initialization.
>>>> > [ERROR] [DPDK] Error with EAL initialization
>>>> > [ERROR] [X300] X300 Network discovery error RuntimeError: Error with
>>>> EAL initialization
>>>> > Error: LookupError: KeyError: No devices found for ----->
>>>> > Device Address:
>>>> >     use_dpdk: 1
>>>> >     mgmt_addr: 192.168.1.88
>>>> >     addr: 192.168.60.2
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Feb 2, 2021 at 11:53 AM Aaron Rossetto via USRP-users <
>>>> usrp-users@lists.ettus.com> wrote:
>>>> >>
>>>> >> On Mon, Feb 1, 2021 at 9:02 PM Rob Kossler via USRP-users
>>>> >> <usrp-users@lists.ettus.com> wrote:
>>>> >>
>>>> >> > Has anyone successfully used DPDK with Ubuntu 20.04, UHD 4.0,
>>>> Intel XL710 NIC, and N310 (or X310)?
>>>> >>
>>>> >> If I remember correctly, I believe DPDK tries to dlopen()
>>>> *everything*
>>>> >> in the directory specified by the dpdk_driver parameter in the DPDK
>>>> >> section of uhd.conf, leading to a lot of errors similar to yours
>>>> >> ('Invalid ELF header' and the like). Having the correct collection of
>>>> >> .so files in that directory is key.
>>>> >>
>>>> >> What's worked for me in the past when using DPDK with an Intel XL710
>>>> >> is creating a directory (I used /usr/local/lib/dpdk-pmds) and copying
>>>> >> a specific set of DPDK .so files into this directory:
>>>> >> * librte_mempool_ring.so
>>>> >> * librte_pdump.so (I think this one is optional--I had been trying to
>>>> >> get packet dumps from DPDK a while back)
>>>> >> * librte_pmd_i40e.so
>>>> >> * librte_pmd_ixgbe.so (may be optional?)
>>>> >> * librte_pmd_pcap.so (this one is also optional, I think)
>>>> >> * librte_pmd_ring.so
>>>> >>
>>>> >> (Symlinking to the actual libraries wherever they get installed
>>>> >> instead of copying them into the directory would probably work as
>>>> >> well.)
>>>> >>
>>>> >> Then, make sure that the dpdk-driver key in the [use_dpdk=1] section
>>>> >> of uhd.conf points to that directory:
>>>> >> dpdk_driver = /usr/local/lib/dpdk-pmds
>>>> >>
>>>> >> Hopefully that will resolve the issue and get you a little further
>>>> >> down the road.
>>>> >>
>>>> >> Best regards,
>>>> >> Aaron
>>>> >>
>>>> >> _______________________________________________
>>>> >> USRP-users mailing list
>>>> >> USRP-users@lists.ettus.com
>>>> >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>>>
>>>> _______________________________________________
>>>> USRP-users mailing list
>>>> USRP-users@lists.ettus.com
>>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>>>
>>> _______________________________________________
> USRP-users mailing list
> USRP-users@lists.ettus.com
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>
_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to