Hi all,

we have a couple of N310s in our lab and some of them seem to fail to transmit reliably.

Each N310 is connected to a host via one of those SFP+ cables that came with them from Ettus. We have 3 N310s that are connected via said cables to one host each with an Intel X710 DA2 with an AMD TRX3970. All machines run Ubuntu 20.04 with all updates.
I use the UHD 3.15LTS branch: UHD_3.15.0.0-7-g8d228dbe
I made sure to check out the very same commit and recompile and install it.

On 2 hosts I can run:
`./benchmark_rate --args "addr=192.168.20.213,master_clock_rate=122.88e6" --tx_rate 61.44e6 --tx_channels "3" --rx_rate 61.44e6 --rx_channels "0,1"`
The full output is attached at the bottom of this email.

What I observe:
- It runs fine with 2 hosts
- The third host fails.
-- On the third host RX only works.
-- On the third host TX only is haunted: cf. full test output.
- We have a server with Intel Xeon 6254 and X722 where I observe the same issue
- I switched USRPs between hosts, the issue seems to stick with the host.

It started with one host a couple of weeks back. But now our server starts to fail with the same error: The exact same setup used to work on that machine. I am looking into this for quite a while now. I can't find the source of the issue.

Has anyone had experience with that? I'd really appreciate hints how to debug this.


Cheers
Johannes


On the working hosts the benchmark rate summary looks like this:
---------
Benchmark rate summary:
  Num received samples:     1270556340
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  614440368
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0
---------

But on the third device:
---------
[....]
SUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSU[00:00:16.262123] Receiver error: ERROR_CODE_TIMEOUT, continuing... SUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUU[00:00:16.565159] Benchmark complete.


Benchmark rate summary:
  Num received samples:     66501280
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  154312704
  Num sequence errors (Tx): 3149
  Num sequence errors (Rx): 0
  Num underruns detected:   3156
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        97
----------

We have a server with Intel X722 and Intel Xeon Gold 6252 that reports the same issue:
----------
UUUUUUUU[00:00:16.180094] Receiver error: ERROR_CODE_TIMEOUT, continuing...
US[00:00:16.382393] Benchmark complete.


Benchmark rate summary:
  Num received samples:     99763328
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  155804944
  Num sequence errors (Tx): 3180
  Num sequence errors (Rx): 0
  Num underruns detected:   164974
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        95
----------
Though, there are even more underruns.



Working output:
============
[INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; UHD_3.15.0.0-7-g8d228dbe [00:00:00.000002] Creating the usrp device with: addr=192.168.20.213,master_clock_rate=122.88e6... [INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=192.168.20.213,type=n3xx,product=n310,serial=319841B,claimed=False,addr=192.168.20.213,master_clock_rate=122.88e6 [INFO] [MPM.PeriphManager] init() called with device args `time_source=gpsdo,clock_source=gpsdo,mgmt_addr=192.168.20.213,product=n310,master_clock_rate=122.88e6'.
[INFO] [0/Replay_0] Initializing block control (NOC ID: 0x4E91A00000000004)
[INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD100000011312)
[INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD100000011312)
[INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0000000000000)
[INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0000000000000)
[INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0000000000002)
[INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0000000000002)
[INFO] [0/FIFO_0] Initializing block control (NOC ID: 0xF1F0000000000000)
[INFO] [0/FIFO_1] Initializing block control (NOC ID: 0xF1F0000000000000)
[INFO] [0/FIFO_2] Initializing block control (NOC ID: 0xF1F0000000000000)
[INFO] [0/FIFO_3] Initializing block control (NOC ID: 0xF1F0000000000000)
Using Device: Single USRP:
  Device: N300-Series Device
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: Magnesium
  RX Channel: 1
    RX DSP: 1
    RX Dboard: A
    RX Subdev: Magnesium
  RX Channel: 2
    RX DSP: 0
    RX Dboard: B
    RX Subdev: Magnesium
  RX Channel: 3
    RX DSP: 1
    RX Dboard: B
    RX Subdev: Magnesium
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: Magnesium
  TX Channel: 1
    TX DSP: 1
    TX Dboard: A
    TX Subdev: Magnesium
  TX Channel: 2
    TX DSP: 0
    TX Dboard: B
    TX Subdev: Magnesium
  TX Channel: 3
    TX DSP: 1
    TX Dboard: B
    TX Subdev: Magnesium

[00:00:04.045700] Setting device timestamp to 0...
[INFO] [MULTI_USRP]     1) catch time transition at pps edge
[INFO] [MULTI_USRP]     2) set times next pps (synchronously)
[00:00:05.689405] Testing receive rate 61.440000 Msps on 2 channels
[00:00:05.829315] Testing transmit rate 61.440000 Msps on 1 channels
[00:00:16.180163] Benchmark complete.


Benchmark rate summary:
  Num received samples:     1270556340
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  614440368
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0


Done!
=====================

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to