Re: [USRP-users] transmitting on two channels with replay block

Thomas Harder via USRP-users Fri, 13 Dec 2019 07:41:51 -0800

Hi Nate, Hi Rob,
You can’t believe how much I appreciate your help.
Finally, after trying it for months: IT WORKS!!!!!
TX streaming two different waveforms on the two channels of the USRP X310 with 
the two UBX-160 daughterboards with full bandwidth of 200MS/s per channels.
I would like to invite you for a champagne.
I changed the rfnoc_ce_default_inst_x310.v file as you described below and 
built the XGS image with the full design vivado 2017.4  version (free one month 
trial license). UHD v3.14.1.1 needs this version because before I had vivado 
2018.3 which was not working.
Setting the CPU governor to performance, setting the network buffers as 
described and the MTU to 8000 and setting up DPDK as described in the ettus 
manual. Even without disabling hyper threading and KTPI with my host CPU (16GB 
RAM, Intel Xeon W-2125 CPU@ 4.5GHz x8).
Using dual 10Gbit Ethernet and a changed version of the example 
“tx_samples_from_file” by reading in a second file with my second waveform.

Thank you a lot for your help.
Thomas

From: Nate Temple
Sent: Wednesday, December 11, 2019 7:00 PM
To: Thomas Harder
Cc: Rob Kossler; USRP-users@lists.ettus.com; EJ Kreinar
Subject: Re: [USRP-users] transmitting on two channels with replay block

On Wed, Dec 11, 2019 at 9:33 AM Nate Temple <nate.tem...@ettus.com> wrote:
Hi Thomas,

You will need to apply these changes below to the 
fpga-src/usrp3/top/x300/rfnoc_ce_default_inst_x310.v file. This will add 
additional SRAM FIFOs, which is basically what the "XGS" / SRAM image is. Make 
sure to start with the v3.14.1.1 fpga sources. (run git submodule init; git 
submodule update; in your UHD repo after checking out v3.14.1.1).

########################################################################

diff --git a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v 
b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v
index d20a64962..bcb4c3c32 100644
--- a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v
+++ b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v
@@ -1,4 +1,4 @@
-  localparam NUM_CE = 4;  // Must be no more than 10 (6 ports taken by 
transport and IO connected CEs)
+  localparam NUM_CE = 6;  // Must be no more than 10 (6 ports taken by 
transport and IO connected CEs)

  wire [NUM_CE*64-1:0] ce_flat_o_tdata, ce_flat_i_tdata;
  wire [63:0]          ce_o_tdata[0:NUM_CE-1], ce_i_tdata[0:NUM_CE-1];
@@ -46,7 +46,9 @@
  genvar n;
  generate
    for (n = 4; n < NUM_CE; n = n + 1) begin
-      noc_block_axi_fifo_loopback inst_noc_block_axi_fifo_loopback (
+      noc_block_axi_fifo_loopback #(
+        .STR_SINK_FIFOSIZE(15)
+      ) inst_noc_block_axi_fifo_loopback (
        .bus_clk(bus_clk), .bus_rst(bus_rst),
        .ce_clk(ce_clk), .ce_rst(ce_rst),
        .i_tdata(ce_o_tdata[n]), .i_tlast(ce_o_tlast[n]), 
.i_tvalid(ce_o_tvalid[n]), .i_tready(ce_o_tready[n]),

########################################################################

After making these modifications to the FPGA sources, you can build a FPGA 
image with the commands:

cd fpga-src/usrp3/top/x300/
source setupenv.sh
make X310_XG

Note: Even though you are calling X310_XG, it is really a "XGS" image since it 
has the additional SRAM fifos.

After that has completed building, you should write that FPGA image to the X310 
using uhd_image_loader.

uhd_image_lodaer --args "addr=192.168.40.2,type=x300" --fpga-path 
/path/to/x300.bit

After the FPGA image load and restarting the USRP, run uhd_usrp_probe and at 
the end of the output where the RFNoC blocks are listed, you should see two 
additional FIFO blocks:

FIFO_0
FIFO_1

Random performance tuning notes:

* Ensure your CPU governor is set to performance:

sudo apt install cpufrequtils

To set performance for all cores:

for ((i=0;i<$(nproc);i++)); do sudo cpufreq-set -c $i -r -g performance; done

Verify with:

cpufreq-info

* Set your network buffers

sudo sysctl -w net.core.rmem_max=625000000
sudo sysctl -w net.core.wmem_max=625000000

* Set the MTU to 8000 on your 10Gb NICs

* Ensure you have pthreads enabled for your user
https://kb.ettus.com/Building_and_Installing_the_USRP_Open-Source_Toolchain_(UHD_and_GNU_Radio)_on_Linux#Thread_priority_scheduling

http://files.ettus.com/manual/page_general.html#general_threading

* Disable hyper threading in bios. This will typically give about a 10% boost 
in core performance if you can work without the additional cores. You'll need 
to update your cpu core list in DPDK.

* Disable KPTI for spectra/meltdown. I would recommend to try disabling the 
KPTI protections for your CPU if the machine is offline, you may see a 10-15% 
performance increase.

This can be done by adding the lines below to your /etc/default/grub at 
GRUB_CMDLINE_LINUX_DEFAULT="", then running sudo update-grub and rebooting.

pti=off spectre_v2=off l1tf=off nospec_store_bypass_disable no_stf_barrier

Note, this disables protections against Meltdown/Spectra (links below). So if 
you try to do this, I would recommend disconnecting that host from any internet 
connected network.

https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)

* There are additional recommendations here from Intel on various adjustments 
you can do to improve performance with DPDK:
http://doc.dpdk.org/guides/linux_gsg/nic_perf_intel_platform.html

Specifically I would recommend to try section 10.1.3 #3 where you isolate the 
CPU cores that are used for DPDK.

* Here is a performance report from Intel on DPDK 17.11: 
https://fast.dpdk.org/doc/perf/DPDK_17_11_Intel_NIC_performance_report.pdf

In the tables of boot and bio's settings the additional CPU options of 
nohz_full="" and rcu_nocbs="" are added to their kernel configs, this may help 
as well.

Additionally they made the changes listed below:

CPU Power and Performance Policy <Performance> (you should already be doing 
this)
CPU C-state Disabled
CPU P-state Disabled
Enhanced Intel® Speedstep® Tech Disabled
Turbo Boost Disabled

Regards,
Nate Temple

On Wed, Dec 11, 2019 at 9:18 AM Thomas Harder <thomas.har...@oca.eu> wrote:
Rob,
I am definitely interested in your costum ‘txarb’ RFNoC block. For now I am 
using tx waveforms of about 10.000 samples, so the 2^15 samples would be 
sufficient.
I was already searching what exactly this SRAM image means. Because today I was 
able to setup DPDK with UHD 3.14.1 and the benchmark_rate code(excactly 
described as in the mail of Nate) was still full of underruns with the stock XG 
fpga image which I downloaded with uhd_images_downloader. So I will also try to 
build a second FIFO block, since I have still for two weeks the test version of 
Vivado.
Thomas

From: Rob Kossler
Sent: Wednesday, December 11, 2019 4:50 PM
To: Thomas Harder; Nate Temple
Subject: Re: [USRP-users] transmitting on two channels with replay block

Thomas,
I believe that Nate and I were saying basically the same thing.  When he 
referred to an SRAM image, I believe that this means an image with the FIFO 
blocks.  I believe that such an image needs to be built by the user (rather 
than downloaded using uhd_images_downloader), but I'm not 100% certain.

If you are interested, I have a custom 'txarb' RFNoC block that implements my 
2nd option below.  By default, it includes storage of up to 2^15 samples, but 
this can be modified using an input parameter (FPGA resources permitting). This 
block requires some specialized behavior, but it is pretty simple.  Similar to 
the Replay block, you need to construct a custom RFNoC graph that connects the 
txarb block to the Radio.  When you want to stream, you need to stream just one 
full waveform to the txarb block.  Once the txarb block receives end-of-burst, 
it will automatically stop "recording the samples to memory" and begin "playing 
the samples from memory repeatedly".  The streaming will continue indefinitely 
until you send a new tx waveform.  If the new tx waveform contains less than 2 
samples, the streaming is turned off.  There are no control registers to worry 
about. Timed behavior is supported because the block preserves the command time 
of the incoming stream from the host when it starts playing out.

It is not terribly difficult to build this custom block, but if you haven't 
built out-of-tree RFNOC blocks before, it might be easiest to just put this 
block in-tree (in the Ettus folder structure) and manually modify makefiles as 
needed. Let me know if you are interested.
Rob

On Wed, Dec 11, 2019 at 10:07 AM Nate Temple <nate.tem...@ettus.com> wrote:
Hi Thomas,

One option instead of using the Replay block could be to stream 2x 200e6 from 
your host. 

On the X310, this requires using a SRAM image and DPDK. DPDK support was added 
with UHD 3.14.1.0 for the X310, I'd suggest to use 3.14.1.1 at this time though.

Some links on DPDK:

https://www.dpdk.org/
http://files.ettus.com/manual/page_dpdk.html

I've been able to run 2x2 @ 200e6 with the X310 with DPDK using a 4 GHz CPU.

./benchmark_rate --rx_rate 200e6 --rx_channels 0,1 --tx_rate 200e6 
--tx_channels 0,1 --args 
"addr=192.168.10.2,second_addr=192.168.20.2,use_dpdk=1,num_recv_frames=512,enable_tx_dual_eth=1,skip_ddc=1,skip_duc=1"

num_recv_frames=512 can help if you're seeing overflows.

enable_tx_dual_eth=1 is required for 2x TX @ 200e6

skip_ddc=1,skip_duc=1 can help as well since you'd be sending at full rate.

Regards,
Nate Temple

On Wed, Dec 11, 2019 at 7:03 AM Rob Kossler via USRP-users 
<usrp-users@lists.ettus.com> wrote:
I do not think it is possible using the stock FPGA image.  However, I can think 
of a couple of possibilities
•       On the N310, Ettus includes 4 FIFO blocks (rather than the DmaFIFO 
which used the off-FPGA RAM for memory), to provide capability for 4x125 MS/s 
streaming. Perhaps if you built an X310 FPGA image with 2 such FIFO blocks, you 
could use these rather than the DmaFIFO and achieve the desired streaming.  
Note that this requires a Vivado license to build your own FPGA image, but does 
not require FPGA experience because you would be building an image with "stock" 
blocks.  One caution though is that streaming at this very high rate still 
requires a high performance host and so it is still possible that you would 
have underruns if your host could not keep up.  If you go this route, I believe 
you will likely need to use the "DPDK" capability which is a bit of a pain to 
configure and get it working properly.
•       Another possibility is to create a custom RFNoC block that is similar 
to the replay block but that uses FPGA memory to store a fixed duration 
waveform and then plays it out cyclically like the replay block. The Ettus 
'window' RFNoC block provides a good example of how to store coefficients and 
play them out repeatedly.  But, making the needed modifications is not a 
trivial task except for someone who is pretty good at FPGA programming.
Given that you were trying the replay block, I'm guessing that your Tx 
waveforms are of fixed duration.  What is the duration (in number of samples) 
that you require?
Rob

On Wed, Dec 11, 2019 at 5:05 AM Thomas Harder <thomas.har...@oca.eu> wrote:
Thank you Rob for this comment.
But I am not sure if I understand you correctly. Do you want to say, that it is 
IMPOSSIBLE to stream TX two different waveforms synchronized  on the 2 channels 
of the x310 with the full bandwidth of 200MS/s on each channel?
That is what I am trying the last 6 months full time, starting with Labview 
under windows and then UHD under Linux with a Dell Precision 5820 desktop (16GB 
RAM, Intel Xeon W-2125 CPU@ 4.GHz x8) with MXI connection, dual 10Gbit 
connection(Intel X520-DA2), the replay block recently: always the same result: 
continuous underruns.
If you can confirm that this is not possible without an important FPGA change 
(because I have no experience in this field and I have not the time to invest 
into it), I must search for another solution to create two different 
synchronized RF waveforms with 160MHz bandwidth (optical, electronical,…) 
because this will be just a part of my experimental setup but it is crucial to 
go on .
I am thankful for any advise,
Thomas

From: Rob Kossler
Sent: Tuesday, December 10, 2019 5:01 AM
To: Thomas Harder
Cc: Sam Reiter; usrp-users@lists.ettus.com
Subject: Re: [USRP-users] transmitting on two channels with replay block

Apart from solving the underrun issue, there is also an issue with 
synchronization.  The replay block doesn't presently support timed commands.

And, as a side note, the issue with streaming from the host is not just the 
host.  The DMA FIFO has a maximum bandwidth of something like 600 MS/s 
(combination of all inputs and outputs) that precludes streaming 400 MS/s in 
and out of the block simultaneously.  So, even if the host could keep up, the 
FIFO could not.
Rob

On Mon, Dec 9, 2019 at 4:34 AM Thomas Harder via USRP-users 
<usrp-users@lists.ettus.com> wrote:
Hi Sam,
Thank you for your reply.
This morning I set the MCR to 184.32 and I am still having continuous underruns 
using also 
replay_ctrl->get_record_fullness
for both channels.

But since I need the full bandwidth of 160MHz I would like implement a second 
replay block in my fpga image.

Could anyone help me with this? 
I am really new in fpga programming and for the image with one replay block I 
was just following the instructions in 
https://kb.ettus.com/Using_the_RFNoC_Replay_Block.
Thank you,
Thomas

From: Sam Reiter
Sent: Friday, December 6, 2019 10:23 PM
To: Thomas Harder
Cc: usrp-users@lists.ettus.com
Subject: Re: [USRP-users] transmitting on two channels with replay block

Thomas,

Upon further investigation, we may be running up to a practical limit of a 
single CHDR interface rather than an issue with your code. A single replay 
block servicing two radios will have a max (theoretical) rate of 187.5 MSPS on 
either channel. This means that you might be able to squeeze full rate out on 2 
channels with an MCR of 184.32, but that's cutting it pretty close. Sounds like 
2 channels at 200 MSPS with a replay setup will require 2 replay blocks serving 
each channel independently. If you end up trying either of the above out, I'd 
be curious to know what results you observe.

Sam Reiter 
Ettus Research

On Fri, Dec 6, 2019 at 2:38 PM Sam Reiter <sam.rei...@ettus.com> wrote:
Thomas,

I'd need to set it up on my end, but I believe you can TX two distinct 
waveforms from a single replay block instance. You'd need to make sure that 
your adding your data to the buffer in separate locations and at an address 
that is a multiple of 8 bytes (which it looks like you're doing from the above 
snippets). Are you seeing continuous underruns, or just a handful at the 
beginning on the run? Does your duplicated code also use:

replay_ctrl->get_record_fullness

on both channels before kicking off the stream start?

Sam Reiter 
Ettus Research

On Wed, Dec 4, 2019 at 3:48 AM Thomas Harder via USRP-users 
<usrp-users@lists.ettus.com> wrote:
Hello everyone,
Is it possible to transmit two different waveforms on the two channels of the 
USRP X310 with the two UBX-160 daughterboards?
I want to transmit two different waveforms simultaneous (synchronized ) on the 
two channels of the USRP with the full sample rate of 200 MS/s. I tried already 
to do it with a dual 10Gbit-ethernet connection and I seemed to be limited by 
my computer. Now I am trying to do it with the replay block.

I built the FPGA image with one Replay block as described in 
https://kb.ettus.com/Using_the_RFNoC_Replay_Block to run the example 
“replay_samples_from_file” and it is working fine if I transmit just on one 
channel. Now I was modifying the code by connecting the replay block to both 
channels:
replay_graph->connect(replay_ctrl->get_block_id(),replay_chan,tx_blockid,tx_chan,replay_spp);
replay_graph->connect(replay_ctrl->get_block_id(),replay_chan1,tx_blockid1,tx_chan,replay_spp);

and writing the same waveform into another region of the DRAM-buffer:
replay_ctrl->config_record(0,words_to_replay*replay_word_size, replay_chan);
replay_ctrl->config_record(20000,words_to_replay*replay_word_size, 
replay_chan1);
and
replay_ctrl->config_play(0,words_to_replay*replay_word_size, replay_chan);
replay_ctrl->config_play(20000,words_to_replay*replay_word_size, replay_chan1);

where 
words_to_replay*replay_word_size=16000
replay_chan=0
replay_chan1=1
tx_blockid=0/Radio_0
tx_blockid=0/Radio_1

then I stream my waveforms to the replay block as defined in the example and I 
start to replay the data:
replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan);
replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan1);

It works but with plenty of Underflows!!

So what does it mean when it says in the manual:
“Note that the record and playback buffers do not need to the same, allowing a 
single Replay block to both record and playback to different regions of memory 
simultaneously.”
(https://kb.ettus.com/Using_the_RFNoC_Replay_Block)?

Because in the manual it says also:
“The replay block has the following features: One input and one output”
(https://files.ettus.com/manual/classuhd_1_1rfnoc_1_1replay__block__ctrl.html)

So if the replay block has just one output why does it have two channels 
connected to it (replay_chan= 0 and 1)?

If one replay block can just stream to one channel at the same time, can I 
implement easily a second replay block in the FPGA to stream on the two 
channels of my USRP two different waveforms simultaneously?

Thank you,
Thomas

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Re: [USRP-users] transmitting on two channels with replay block

Reply via email to