Thanks Marcus!

I thought I had set the MTU for the interfaces in question to MTU 9000 permanently. I was wrong. For some reason it went back to MTU 1500. Now things are working again. I totally forgot about this configuration.
For reference:
`sudo ip link set [INTERFACENAME] mtu 9000`.

You may want to configure this permanently in `/etc/netplan/*/yaml` with an `mtu: 9000` entry for the interface in question.
I have a feature request:
Can UHD check the MTU settings on the Host and the device during initialization and print a WARNING in case there's a mismatch? That would be really awesome. (And would've saved me a lot of time) That would be in line with the `sudo sysctl -w net.core.rmem_max=62500000` hint/warning.
Cheers
Johannes

On 28.10.20 18:10, Marcus D Leech wrote:
Check the network configuration of the cards on all hosts. Do they have the 
same MTUs? Are there errors showing on the interface?  Do regular pings work 
reliably?

Sent from my iPhone

On Oct 28, 2020, at 1:05 PM, Johannes Demel <de...@ant.uni-bremen.de> wrote:

Hi Marcus,

no, I didn't swap cables. I put this on the list of things I will try. Physical 
access is cumbersome this year.

Thanks for the hint.
Do you have more ideas what to check?

Cheers
Johannes

On 28.10.20 17:49, Marcus D Leech wrote:
Have you tried swapping cables to see if the problem follows the cable?
Sent from my iPhone
On Oct 28, 2020, at 12:44 PM, Johannes Demel via USRP-users 
<usrp-users@lists.ettus.com> wrote:
Hi all,

we have a couple of N310s in our lab and some of them seem to fail to transmit 
reliably.

Each N310 is connected to a host via one of those SFP+ cables that came with 
them from Ettus. We have 3 N310s that are connected via said cables to one host 
each with an Intel X710 DA2 with an AMD TRX3970. All machines run Ubuntu 20.04 
with all updates.
I use the UHD 3.15LTS branch: UHD_3.15.0.0-7-g8d228dbe
I made sure to check out the very same commit and recompile and install it.

On 2 hosts I can run:
`./benchmark_rate --args "addr=192.168.20.213,master_clock_rate=122.88e6" --tx_rate 61.44e6 
--tx_channels "3" --rx_rate 61.44e6 --rx_channels "0,1"`
The full output is attached at the bottom of this email.

What I observe:
- It runs fine with 2 hosts
- The third host fails.
-- On the third host RX only works.
-- On the third host TX only is haunted: cf. full test output.
- We have a server with Intel Xeon 6254 and X722 where I observe the same issue
- I switched USRPs between hosts, the issue seems to stick with the host.

It started with one host a couple of weeks back. But now our server starts to 
fail with the same error: The exact same setup used to work on that machine.
I am looking into this for quite a while now. I can't find the source of the 
issue.

Has anyone had experience with that? I'd really appreciate hints how to debug 
this.


Cheers
Johannes


On the working hosts the benchmark rate summary looks like this:
---------
Benchmark rate summary:
  Num received samples:     1270556340
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  614440368
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0
---------

But on the third device:
---------
[....]
SUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSU[00:00:16.262123]
 Receiver error: ERROR_CODE_TIMEOUT, continuing...
SUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUU[00:00:16.565159]
 Benchmark complete.


Benchmark rate summary:
  Num received samples:     66501280
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  154312704
  Num sequence errors (Tx): 3149
  Num sequence errors (Rx): 0
  Num underruns detected:   3156
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        97
----------

We have a server with Intel X722 and Intel Xeon Gold 6252 that reports the same 
issue:
----------
UUUUUUUU[00:00:16.180094] Receiver error: ERROR_CODE_TIMEOUT, continuing...
US[00:00:16.382393] Benchmark complete.


Benchmark rate summary:
  Num received samples:     99763328
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  155804944
  Num sequence errors (Tx): 3180
  Num sequence errors (Rx): 0
  Num underruns detected:   164974
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        95
----------
Though, there are even more underruns.



Working output:
============
[INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; 
UHD_3.15.0.0-7-g8d228dbe
[00:00:00.000002] Creating the usrp device with: 
addr=192.168.20.213,master_clock_rate=122.88e6...
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: 
mgmt_addr=192.168.20.213,type=n3xx,product=n310,serial=319841B,claimed=False,addr=192.168.20.213,master_clock_rate=122.88e6
[INFO] [MPM.PeriphManager] init() called with device args 
`time_source=gpsdo,clock_source=gpsdo,mgmt_addr=192.168.20.213,product=n310,master_clock_rate=122.88e6'.
[INFO] [0/Replay_0] Initializing block control (NOC ID: 0x4E91A00000000004)
[INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD100000011312)
[INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD100000011312)
[INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0000000000000)
[INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0000000000000)
[INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0000000000002)
[INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0000000000002)
[INFO] [0/FIFO_0] Initializing block control (NOC ID: 0xF1F0000000000000)
[INFO] [0/FIFO_1] Initializing block control (NOC ID: 0xF1F0000000000000)
[INFO] [0/FIFO_2] Initializing block control (NOC ID: 0xF1F0000000000000)
[INFO] [0/FIFO_3] Initializing block control (NOC ID: 0xF1F0000000000000)
Using Device: Single USRP:
  Device: N300-Series Device
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: Magnesium
  RX Channel: 1
    RX DSP: 1
    RX Dboard: A
    RX Subdev: Magnesium
  RX Channel: 2
    RX DSP: 0
    RX Dboard: B
    RX Subdev: Magnesium
  RX Channel: 3
    RX DSP: 1
    RX Dboard: B
    RX Subdev: Magnesium
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: Magnesium
  TX Channel: 1
    TX DSP: 1
    TX Dboard: A
    TX Subdev: Magnesium
  TX Channel: 2
    TX DSP: 0
    TX Dboard: B
    TX Subdev: Magnesium
  TX Channel: 3
    TX DSP: 1
    TX Dboard: B
    TX Subdev: Magnesium

[00:00:04.045700] Setting device timestamp to 0...
[INFO] [MULTI_USRP]     1) catch time transition at pps edge
[INFO] [MULTI_USRP]     2) set times next pps (synchronously)
[00:00:05.689405] Testing receive rate 61.440000 Msps on 2 channels
[00:00:05.829315] Testing transmit rate 61.440000 Msps on 1 channels
[00:00:16.180163] Benchmark complete.


Benchmark rate summary:
  Num received samples:     1270556340
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  614440368
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0


Done!
=====================

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to